Loading and re-analysing public data through ReactomeGSA

Johannes Griss

2024-05-01

Introduction

Since October 2023, ReactomeGSA was extended to simplify the reuse of public data. As key features, ReactomeGSA can now directly load data from EBI’s ExpressionAtlas, and NCBI’s GREIN. Both of these resources reprocess available public datasets using consistent pipelines.

Additionally, a search function was integrated into ReactomeGSA that can search for datasets simultaneously in all of these supported resources.

The ReactomeGSA R package now also has all required functions to directly access this web-based service. It is thereby possible to search for public datasets directly and download them as ExpressionSet objects.

Installation

The ReactomeGSA package can be directly installed from Bioconductor:

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

if (!require(ReactomeGSA))
  BiocManager::install("ReactomeGSA")

For more information, see https://bioconductor.org/install/.

Searching for Public Datasets

The find_public_datasets function uses ReactomeGSA’s web service to search for public datasets in all supported resources.

By default, the datasets are limited to human studies. This can be changed by setting the species parameter. The complete list of available species is returned by the get_public_species function.

library(ReactomeGSA)

# get all available species found in the datasets
all_species <- get_public_species()

head(all_species)
#> [1] "Aegilops tauschii"                "Anas platyrhynchos"              
#> [3] "Anolis carolinensis"              "Anopheles gambiae"               
#> [5] "Arabidopsis lyrata"               "Arabidopsis lyrata subsp. lyrata"

The search_term parameter takes a single string as an argument. Words separated by a space are logically combined using an AND.

# search for datasets on BRAF and melanoma
datasets <- find_public_datasets("melanoma BRAF")

# the function returns the found datasets as a data.frame
datasets[1:4, c("id", "title")]
#>          id
#> 1  GSE83592
#> 2 GSE110054
#> 3 GSE100066
#> 4 GSE107370
#>                                                                           title
#> 1                            JQ1 +/- Vemurafenib in BRAF mutant melanoma (A375)
#> 2                Transcriptional responses of melanoma cells to BRAF inhibition
#> 3         Transcriptome sequencing analysis of BRAF-mutant melanoma metastases.
#> 4 Concomitant BCORL1 and BRAF mutations in vemurafenib-resistant melanoma cells

Load a public dataset

Datasets found through the find_public_datasets function can subsequently loaded using the load_public_dataset function.

# find the correct entry in the search result
# this must be the complete row of the data.frame returned
# by the find_public_datasets function
dataset_search_entry <- datasets[datasets$id == "E-MTAB-7453", ]

str(dataset_search_entry)
#> 'data.frame':    1 obs. of  7 variables:
#>  $ title              : chr "RNA-seq of the human melanoma cell-line A375P treated with the BRAF inhibitor PLX4720 and a DMSO control"
#>  $ id                 : chr "E-MTAB-7453"
#>  $ resource_name      : chr "EBI Expression Atlas"
#>  $ description        : chr ""
#>  $ resource_loading_id: chr "ebi_gxa"
#>  $ loading_parameters :List of 1
#>   ..$ :'data.frame': 1 obs. of  2 variables:
#>   .. ..$ name : chr "dataset_id"
#>   .. ..$ value: chr "E-MTAB-7453"
#>  $ web_link           : chr "https://www.ebi.ac.uk/gxa/experiments/E-MTAB-7453/Results"

The selected dataset can now be loaded through the load_public_dataset function.

# this function only takes one argument, which must be
# a single row from the data.frame returned by the
# find_public_datasets function
mel_cells_braf <- load_public_dataset(dataset_search_entry, verbose = TRUE)
#> Dataset E-MTAB-7453 available.

The returned object is an ExpressionSet object that already contains all available metada.

# use the biobase functions to access the metadata
library(Biobase)
#> Loading required package: BiocGenerics
#> 
#> Attaching package: 'BiocGenerics'
#> The following object is masked from 'package:SeuratObject':
#> 
#>     intersect
#> The following object is masked from 'package:limma':
#> 
#>     plotMA
#> The following objects are masked from 'package:stats':
#> 
#>     IQR, mad, sd, var, xtabs
#> The following objects are masked from 'package:base':
#> 
#>     anyDuplicated, aperm, append, as.data.frame, basename, cbind,
#>     colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
#>     get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,
#>     match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
#>     Position, rank, rbind, Reduce, rownames, sapply, setdiff, table,
#>     tapply, union, unique, unsplit, which.max, which.min
#> Welcome to Bioconductor
#> 
#>     Vignettes contain introductory material; view with
#>     'browseVignettes()'. To cite Bioconductor, see
#>     'citation("Biobase")', and for packages 'citation("pkgname")'.

# basic metadata
pData(mel_cells_braf)
#>             Sample.Id AtlasAssayGroup     organism cell_line organism_part
#> ERR2950741 ERR2950741              g2 Homo sapiens    A375-P          skin
#> ERR2950742 ERR2950742              g2 Homo sapiens    A375-P          skin
#> ERR2950743 ERR2950743              g2 Homo sapiens    A375-P          skin
#> ERR2950744 ERR2950744              g1 Homo sapiens    A375-P          skin
#> ERR2950745 ERR2950745              g1 Homo sapiens    A375-P          skin
#> ERR2950746 ERR2950746              g1 Homo sapiens    A375-P          skin
#>                  cell_type  disease compound         dose
#> ERR2950741 epithelial cell melanoma     none             
#> ERR2950742 epithelial cell melanoma     none             
#> ERR2950743 epithelial cell melanoma     none             
#> ERR2950744 epithelial cell melanoma  PLX4720 1 micromolar
#> ERR2950745 epithelial cell melanoma  PLX4720 1 micromolar
#> ERR2950746 epithelial cell melanoma  PLX4720 1 micromolar

Detailed descriptions of the loaded study are further stored in the metadata slot.

# access the stored metadata using the experimentData function
experimentData(mel_cells_braf)
#> Experiment data
#>   Experimenter name: E-MTAB-7453 
#>   Laboratory:  
#>   Contact information:  
#>   Title: RNA-seq of the human melanoma cell-line A375P treated with the BRAF inhibitor PLX4720 and a DMSO control 
#>   URL: https://www.ebi.ac.uk/gxa/experiments/E-MTAB-7453/Results 
#>   PMIDs:  
#>   No abstract available.
#>   notes:
#>    notes:     
#>       Public dataset loaded from EBI Expression Atlas through ReactomeGSA.

# for some datasets, longer descriptions are available. These
# can be accessed using the abstract function
abstract(mel_cells_braf)
#> [1] ""

Additionally, you can use the table function to quickly get the number of available samples for a specific metadata field.

table(mel_cells_braf$compound)
#> 
#>    none PLX4720 
#>       3       3

Perform the pathway analysis using ReactomeGSA

This object is now directly compatible with ReactomeGSA’s pathway analysis functions. A detailed explanation of how to perform this analysis, please have a look at the respective vignette.

# create the analysis request
my_request <-ReactomeAnalysisRequest(method = "Camera")

# do not create a visualization for this example
my_request <- set_parameters(request = my_request, create_reactome_visualization = FALSE)

# add the dataset using the loaded object
my_request <- add_dataset(request = my_request, 
                          expression_values = mel_cells_braf, 
                          name = "E-MTAB-7453", 
                          type = "rnaseq_counts",
                          comparison_factor = "compound", 
                          comparison_group_1 = "PLX4720", 
                          comparison_group_2 = "none")
#> Converting expression data to string... (This may take a moment)
#> Conversion complete

my_request
#> ReactomeAnalysisRequestObject
#>   Method = Camera
#>   Parameters:
#>   - create_reactome_visualization: FALSE
#>   Datasets:
#>   - E-MTAB-7453 (rnaseq_counts)
#>     No parameters set.
#> ReactomeAnalysisRequest

The analysis can now started using the standard workflow:

# perform the analysis using ReactomeGSA
res <- perform_reactome_analysis(my_request)
#> Submitting request to Reactome API...
#> Compressing request data...
#> Reactome Analysis submitted succesfully
#> Converting dataset E-MTAB-7453...
#> Mapping identifiers...
#> Performing gene set analysis using Camera
#> Analysing dataset 'E-MTAB-7453' using Camera
#> Retrieving result...

# basic overview of the result
print(res)
#> ReactomeAnalysisResult object
#>   Reactome Release: 88
#>   Results:
#>   - E-MTAB-7453:
#>     2591 pathways
#>     9369 fold changes for genes
#>   No Reactome visualizations available
#> [1] "ReactomeAnalysisResult"
#> attr(,"package")
#> [1] "ReactomeGSA"

# key pathways
res_pathways <- pathways(res)

head(res_pathways)
#>                                                                                              Name
#> R-HSA-69620                                                                Cell Cycle Checkpoints
#> R-HSA-69239                                                                      Synthesis of DNA
#> R-HSA-68962                                             Activation of the pre-replicative complex
#> R-HSA-141424                                        Amplification of signal from the kinetochores
#> R-HSA-141444 Amplification  of signal from unattached  kinetochores via a MAD2  inhibitory signal
#> R-HSA-176187                                  Activation of ATR in response to replication stress
#>              Direction.E-MTAB-7453 FDR.E-MTAB-7453 PValue.E-MTAB-7453
#> R-HSA-69620                     Up    2.089166e-13       8.063165e-17
#> R-HSA-69239                     Up    1.277460e-12       9.881164e-16
#> R-HSA-68962                     Up    1.277460e-12       1.888810e-15
#> R-HSA-141424                    Up    1.277460e-12       2.465187e-15
#> R-HSA-141444                    Up    1.277460e-12       2.465187e-15
#> R-HSA-176187                    Up    1.408521e-12       3.345173e-15
#>              NGenes.E-MTAB-7453 av_foldchange.E-MTAB-7453 sig.E-MTAB-7453
#> R-HSA-69620                 266                  1.468642            TRUE
#> R-HSA-69239                 118                  1.415883            TRUE
#> R-HSA-68962                  33                  2.564247            TRUE
#> R-HSA-141424                 94                  2.001927            TRUE
#> R-HSA-141444                 94                  2.001927            TRUE
#> R-HSA-176187                 37                  2.501863            TRUE

Session Info

sessionInfo()
#> R version 4.4.0 beta (2024-04-15 r86425)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 22.04.4 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.19-bioc/R/lib/libRblas.so 
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: America/New_York
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] Biobase_2.64.0          BiocGenerics_0.50.0     ReactomeGSA.data_1.17.1
#> [4] Seurat_5.0.3            SeuratObject_5.0.1      sp_2.1-4               
#> [7] ReactomeGSA_1.18.0      edgeR_4.2.0             limma_3.60.0           
#> 
#> loaded via a namespace (and not attached):
#>   [1] RColorBrewer_1.1-3     jsonlite_1.8.8         magrittr_2.0.3        
#>   [4] spatstat.utils_3.0-4   farver_2.1.1           rmarkdown_2.26        
#>   [7] vctrs_0.6.5            ROCR_1.0-11            spatstat.explore_3.2-7
#>  [10] htmltools_0.5.8.1      progress_1.2.3         curl_5.2.1            
#>  [13] sass_0.4.9             sctransform_0.4.1      parallelly_1.37.1     
#>  [16] KernSmooth_2.23-22     bslib_0.7.0            htmlwidgets_1.6.4     
#>  [19] ica_1.0-3              plyr_1.8.9             plotly_4.10.4         
#>  [22] zoo_1.8-12             cachem_1.0.8           igraph_2.0.3          
#>  [25] mime_0.12              lifecycle_1.0.4        pkgconfig_2.0.3       
#>  [28] Matrix_1.7-0           R6_2.5.1               fastmap_1.1.1         
#>  [31] fitdistrplus_1.1-11    future_1.33.2          shiny_1.8.1.1         
#>  [34] digest_0.6.35          colorspace_2.1-0       patchwork_1.2.0       
#>  [37] tensor_1.5             RSpectra_0.16-1        irlba_2.3.5.1         
#>  [40] labeling_0.4.3         progressr_0.14.0       fansi_1.0.6           
#>  [43] spatstat.sparse_3.0-3  httr_1.4.7             polyclip_1.10-6       
#>  [46] abind_1.4-5            compiler_4.4.0         withr_3.0.0           
#>  [49] fastDummies_1.7.3      highr_0.10             gplots_3.1.3.1        
#>  [52] MASS_7.3-60.2          gtools_3.9.5           caTools_1.18.2        
#>  [55] tools_4.4.0            lmtest_0.9-40          httpuv_1.6.15         
#>  [58] future.apply_1.11.2    goftest_1.2-3          glue_1.7.0            
#>  [61] nlme_3.1-164           promises_1.3.0         grid_4.4.0            
#>  [64] Rtsne_0.17             cluster_2.1.6          reshape2_1.4.4        
#>  [67] generics_0.1.3         gtable_0.3.5           spatstat.data_3.0-4   
#>  [70] tidyr_1.3.1            hms_1.1.3              data.table_1.15.4     
#>  [73] utf8_1.2.4             spatstat.geom_3.2-9    RcppAnnoy_0.0.22      
#>  [76] ggrepel_0.9.5          RANN_2.6.1             pillar_1.9.0          
#>  [79] stringr_1.5.1          spam_2.10-0            RcppHNSW_0.6.0        
#>  [82] later_1.3.2            splines_4.4.0          dplyr_1.1.4           
#>  [85] lattice_0.22-6         survival_3.6-4         deldir_2.0-4          
#>  [88] tidyselect_1.2.1       locfit_1.5-9.9         miniUI_0.1.1.1        
#>  [91] pbapply_1.7-2          knitr_1.46             gridExtra_2.3         
#>  [94] scattermore_1.2        xfun_0.43              statmod_1.5.0         
#>  [97] matrixStats_1.3.0      stringi_1.8.3          lazyeval_0.2.2        
#> [100] yaml_2.3.8             evaluate_0.23          codetools_0.2-20      
#> [103] tibble_3.2.1           cli_3.6.2              uwot_0.2.2            
#> [106] xtable_1.8-4           reticulate_1.36.1      munsell_0.5.1         
#> [109] jquerylib_0.1.4        Rcpp_1.0.12            globals_0.16.3        
#> [112] spatstat.random_3.2-3  png_0.1-8              parallel_4.4.0        
#> [115] ggplot2_3.5.1          prettyunits_1.2.0      dotCall64_1.1-1       
#> [118] bitops_1.0-7           listenv_0.9.1          viridisLite_0.4.2     
#> [121] scales_1.3.0           ggridges_0.5.6         leiden_0.4.3.1        
#> [124] purrr_1.0.2            crayon_1.5.2           rlang_1.1.3           
#> [127] cowplot_1.1.3