Analyze with GREAT

Note: On Aug 19 2019 GREAT released version 4 where it supports hg38 genome and removes some ontologies such pathways. submitGreatJob() still takes hg19 as default. hg38 can be specified by the species = "hg38" argument. To use the older versions such as 3.0.0, specify as submitGreatJob(..., version = "3.0.0").

GREAT (Genomic Regions Enrichment of Annotations Tool) is a popular web-based tool to associate biological functions to genomic regions. The rGREAT package makes GREAT anlaysis automatic by first constructing a HTTP POST request according to user’s input and retrieving results from GREAT web server afterwards.

The input data is either a GRanges object or a BED-format data frame, no matter it is sorted or not. In following example, we use a data frame which is randomly generated.

Submit genomic regions by submitGreatJob(). Before submitting, genomic regions will be sorted and overlapping regions will be merged.

The returned variable job is a GreatJob class instance which can be used to retrieve results from GREAT server and stored results which are already downloaded.

With job, we can now retrieve results from GREAT. The first and the primary results are the tables which contain enrichment statistics for the analysis. By default it will retrieve results from three GO Ontologies and all pathway ontologies. All tables contains statistics for all terms no matter they are significant or not. Users can then make filtering yb self-defined cutoff.

There is a column for adjusted p-values by “BH” method. Other p-value adjustment methods can be applied by p.adjust().

The returned value of getEnrichmentTables() is a list of data frames in which each one corresponds to tables for single ontology. The structure of data frames are same as the tables on GREAT website.

You can get results by either specifying the ontologies or by the pre-defined categories (categories already contains pre-defined sets of ontologies):

As you have seen in the previous messages and results, The enrichment tables contain no associated genes. However, you can set download_by = 'tsv' in getEnrichmentTables() to download the complete tables, but due to the restriction from GREAT web server, only the top 500 regions can be retreived.

All available ontology names for given species can be get by availableOntologies() and all available ontology categories can be get by availableCategories(). Here you do not need to provide species information because job already contains it.

Association between genomic regions and genes can be get by plotRegionGeneAssociationGraphs(). The function will make the three plots which are same as on GREAT website and returns a GRanges object which contains the gene-region associations.

For those regions that are not associated with any genes under current settings, the corresponding gene and distTSS columns will be NA.

By specifying ontology and term ID, you can get the association in a certain term. Here the term ID is from the first column of the data frame which is returned by getEnrichmentTables().

Session info

sessionInfo()

## R version 4.0.0 (2020-04-24)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.4 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.11-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.11-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
##  [4] LC_COLLATE=C               LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
## [10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] rGREAT_1.20.0        GenomicRanges_1.40.0 GenomeInfoDb_1.24.0  IRanges_2.22.0      
## [5] S4Vectors_0.26.0     BiocGenerics_0.34.0  knitr_1.28          
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.4.6           XVector_0.28.0         magrittr_1.5           zlibbioc_1.34.0       
##  [5] colorspace_1.4-1       rjson_0.2.20           rlang_0.4.5            stringr_1.4.0         
##  [9] tools_4.0.0            grid_4.0.0             circlize_0.4.8         xfun_0.13             
## [13] htmltools_0.4.0        yaml_2.2.1             digest_0.6.25          GenomeInfoDbData_1.2.3
## [17] GlobalOptions_0.1.1    bitops_1.0-6           RCurl_1.98-1.2         shape_1.4.4           
## [21] evaluate_0.14          rmarkdown_2.1          stringi_1.4.6          compiler_4.0.0        
## [25] GetoptLong_0.1.8

Analyze with GREAT

Zuguang Gu ( z.gu@dkfz.de )

2020-04-27

Session info