The ReactomeGSA package is a client to the web-based Reactome Analysis System. Essentially, it performs a gene set analysis using the latest version of the Reactome pathway database as a backend.
The main advantages of using the Reactome Analysis System are:
To cite this package, use
Griss J. ReactomeGSA, https://github.com/reactome/ReactomeGSA (2019)
The ReactomeGSA
package can be directly installed from
Bioconductor:
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
if (!require(ReactomeGSA))
::install("ReactomeGSA") BiocManager
For more information, see https://bioconductor.org/install/.
The Reactome Analysis System will be continuously updated. Before starting your analysis it is therefore a good approach to check which methods are available.
This can simply be done by using:
library(ReactomeGSA)
<- get_reactome_methods(print_methods = FALSE, return_result = TRUE)
available_methods
# only show the names of the available methods
$name
available_methods#> [1] "PADOG" "Camera" "ssGSEA"
To get more information about a specific method, set
print_details
to TRUE
and specify the
method
:
# Use this command to print the description of the specific method to the console
# get_reactome_methods(print_methods = TRUE, print_details = TRUE, method = "PADOG", return_result = FALSE)
# show the parameter names for the method
<- available_methods$parameters[available_methods$name == "PADOG"][[1]]
padog_params
paste0(padog_params$name, " (", padog_params$type, ", ", padog_params$default, ")")
#> [1] "use_interactors (bool, False)"
#> [2] "include_disease_pathways (bool, True)"
#> [3] "max_missing_values (float, 0.5)"
#> [4] "create_reactome_visualization (bool, True)"
#> [5] "create_reports (bool, False)"
#> [6] "email (string, )"
#> [7] "reactome_server (string, production)"
#> [8] "sample_groups (string, )"
#> [9] "discrete_norm_function (string, TMM)"
#> [10] "continuous_norm_function (string, none)"
To start a gene set analysis, you first have to create an analysis request. This is a simple S4 class that takes care of submitting multiple datasets simultaneously to the analysis system.
When creating the request object, you already have to specify the analysis method you want to use:
# Create a new request object using 'Camera' for the gene set analysis
<-ReactomeAnalysisRequest(method = "Camera")
my_request
my_request#> ReactomeAnalysisRequestObject
#> Method = Camera
#> No request data stored
#> ReactomeAnalysisRequest
To get a list of supported parameters for each method, use the
get_reactome_methods
function (see above).
Parameters are simply set using the set_parameters
function:
# set the maximum number of allowed missing values to 50%
<- set_parameters(request = my_request, max_missing_values = 0.5)
my_request
my_request#> ReactomeAnalysisRequestObject
#> Method = Camera
#> Parameters:
#> - max_missing_values: 0.5
#> Datasets: none
#> ReactomeAnalysisRequest
Multiple parameters can by set simulataneously by simply adding more name-value pairs to the function call.
One analysis request can contain multiple datasets. This can be used to, for example, visualize the results of an RNA-seq and Proteomics experiment (of the same / similar samples) side by side:
library(ReactomeGSA.data)
data("griss_melanoma_proteomics")
This is a limma EList
object with the sample data
already added
class(griss_melanoma_proteomics)
#> [1] "EList"
#> attr(,"package")
#> [1] "limma"
head(griss_melanoma_proteomics$samples)
#> patient.id condition cell.type
#> M-D MOCK PBMCB P3 MOCK PBMCB
#> M-D MCM PBMCB P3 MCM PBMCB
#> M-K MOCK PBMCB P4 MOCK PBMCB
#> M-K MCM PBMCB P4 MCM PBMCB
#> P-A MOCK PBMCB P1 MOCK PBMCB
#> P-A MCM PBMCB P1 MCM PBMCB
The dataset can now simply be added to the request using the
add_dataset
function:
<- add_dataset(request = my_request,
my_request expression_values = griss_melanoma_proteomics,
name = "Proteomics",
type = "proteomics_int",
comparison_factor = "condition",
comparison_group_1 = "MOCK",
comparison_group_2 = "MCM",
additional_factors = c("cell.type", "patient.id"))
my_request#> ReactomeAnalysisRequestObject
#> Method = Camera
#> Parameters:
#> - max_missing_values: 0.5
#> Datasets:
#> - Proteomics (proteomics_int)
#> No parameters set.
#> ReactomeAnalysisRequest
Several datasets (of the same experiment) can be added to one
request. This RNA-seq data is stored as an edgeR
DGEList
object:
data("griss_melanoma_rnaseq")
# only keep genes with >= 100 reads in total
<- rowSums(griss_melanoma_rnaseq$counts)
total_reads <- griss_melanoma_rnaseq[total_reads >= 100, ]
griss_melanoma_rnaseq
# this is a edgeR DGEList object
class(griss_melanoma_rnaseq)
#> [1] "DGEList"
#> attr(,"package")
#> [1] "edgeR"
head(griss_melanoma_rnaseq$samples)
#> group lib.size norm.factors patient cell_type treatment
#> 195-13 MOCK 29907534 1.0629977 P1 TIBC MOCK
#> 195-14 MCM 26397322 0.9927768 P1 TIBC MCM
#> 195-19 MOCK 18194834 1.0077827 P2 PBMCB MOCK
#> 195-20 MCM 24282215 1.0041410 P2 PBMCB MCM
#> 197-11 MOCK 22628117 0.9522869 P1 PBMCB MOCK
#> 197-12 MCM 23319849 1.0115732 P1 PBMCB MCM
Again, the dataset can simply be added using
add_dataset
. Here, we added an additional parameter to the
add_dataset
call. Such additional parameters are treated as
additional dataset-level parameters.
# add the dataset
<- add_dataset(request = my_request,
my_request expression_values = griss_melanoma_rnaseq,
name = "RNA-seq",
type = "rnaseq_counts",
comparison_factor = "treatment",
comparison_group_1 = "MOCK",
comparison_group_2 = "MCM",
additional_factors = c("cell_type", "patient"),
# This adds the dataset-level parameter 'discrete_norm_function' to the request
discrete_norm_function = "TMM")
#> Converting expression data to string... (This may take a moment)
#> Conversion complete
my_request#> ReactomeAnalysisRequestObject
#> Method = Camera
#> Parameters:
#> - max_missing_values: 0.5
#> Datasets:
#> - Proteomics (proteomics_int)
#> No parameters set.
#> - RNA-seq (rnaseq_counts)
#> discrete_norm_function: TMM
#> ReactomeAnalysisRequest
Datasets can be passed as limma EList
, edgeR
DGEList
, any implementation of the Bioconductor
ExpressionSet
, or simply a data.frame
.
For the first three, sample annotations are simply read from the
respective slot. When supplying the expression values as a
data.frame
, the sample_data
parameter has to
be set using a data.frame
where each row represents one
sample and each column one proptery. If the the sample_data
option is set while providing the expression data as an
EList
, DGEList
, or ExpressionSet
,
the data in sample_data
will be used instead of the sample
annotations in the expression data object.
Each dataset has to have a name. This can be anything but has to be unique within one analysis request.
The ReactomeAnalysisSystem supports different types of ’omics data.
To get a list of supported types, use the
get_reactome_data_types
function:
get_reactome_data_types()
#> rnaseq_counts:
#> RNA-seq (raw counts)
#> Raw RNA-seq based read counts per gene (recommended).
#> rnaseq_norm:
#> RNA-seq (normalized)
#> log2 transformed, normalized RNA-seq based read counts per gene (f.e. RPKM, TPM)
#> proteomics_int:
#> Proteomics (intensity)
#> Intensity-based quantitative proteomics data (for example, iTRAQ/TMT or intensity-based label-free quantitation). Values must be log2 transformed.
#> proteomics_sc:
#> Proteomics (spectral counts)
#> Raw spectral-counts of label-free proteomics experiments
#> microarray_norm:
#> Microarray (normalized)
#> Normalized and log2 transformed microarray-based gene expression values.
Defining the experimental design for a
ReactomeAnalysisRequest
is very simple. Basically, it only
takes three parameters:
comparison_factor
: Name of the property within the
sample data to usecomparison_group_1
: The first group to comparecomparison_group_2
: The second group to compareThe value set in comparison_factor
must match a column
name in the sample data (either the slot in an Elist
,
DGEList
, or ExpressionSet
object or in the
sample_data
parameter).
Additionally, it is possible to define blocking factors. These are
supported by all methods that rely on linear models in the backend. Some
methods though might simply ignore this parameter. For more information
on whether a method supports blocking factors, please use
get_reactome_methods
.
Blocking factors can simply be set additional_factors
to
a vector of names. These should again reference properties (or columns)
in the sample data.
Once the ReactomeAnalysisRequest
is created, the
complete analysis can be run using
perform_reactome_analysis
:
<- perform_reactome_analysis(request = my_request, compress = F)
result #> Submitting request to Reactome API...
#> Reactome Analysis submitted succesfully
#> Converting dataset Proteomics...
#> Converting dataset RNA-seq...
#> Mapping identifiers...
#> Performing gene set analysis using Camera
#> Analysing dataset 'Proteomics' using Camera
#> Analysing dataset 'RNA-seq' using Camera
#> Creating REACTOME visualization
#> Retrieving result...
The result object is a ReactomeAnalysisResult
S4 class
with several helper functions to access the data.
To retrieve the names of all available results (generally one per
dataset), use the names
function:
names(result)
#> [1] "Proteomics" "RNA-seq"
For every dataset, different result types may be available. These can
be shown using the result_types
function:
result_types(result)
#> [1] "pathways" "fold_changes"
The Camera
analysis method returns two types of results,
pathway-level data and gene- / protein-level fold changes.
A specific result can be retrieved using the get_result
method:
# retrieve the fold-change data for the proteomics dataset
<- get_result(result, type = "fold_changes", name = "Proteomics")
proteomics_fc head(proteomics_fc)
#> Identifier logFC AveExpr t P.Value adj.P.Val B
#> 1 Q14526 0.4937650 -3.346909 14.517108 1.518770e-10 8.335009e-07 13.984626
#> 2 Q6VY07 0.2981411 -3.330347 13.555864 4.149887e-10 1.138729e-06 13.124401
#> 3 P07093 1.7950301 -3.648968 12.284758 1.727993e-09 3.161076e-06 11.870572
#> 4 P10124 1.0758634 -3.436961 10.323661 2.015520e-08 2.765294e-05 9.633468
#> 5 P55210 0.5018522 -3.347932 9.510808 6.206174e-08 6.811897e-05 8.581793
#> 6 O43683 -0.4754083 -3.345551 -9.362315 7.679356e-08 7.024051e-05 8.380963
Additionally, it is possible to directly merge the pathway level data
for all result sets using the pathways
function:
<- pathways(result)
combined_pathways
head(combined_pathways)
#> Name
#> R-HSA-163200 Respiratory electron transport, ATP synthesis by chemiosmotic coupling, and heat production by uncoupling proteins.
#> R-HSA-1428517 The citric acid (TCA) cycle and respiratory electron transport
#> R-HSA-611105 Respiratory electron transport
#> R-HSA-6799198 Complex I biogenesis
#> R-HSA-72649 Translation initiation complex formation
#> R-HSA-72662 Activation of the mRNA upon binding of the cap-binding complex and eIFs, and subsequent binding to 43S
#> Direction.Proteomics FDR.Proteomics PValue.Proteomics
#> R-HSA-163200 Up 1.845820e-14 8.589206e-18
#> R-HSA-1428517 Up 2.270776e-14 2.113332e-17
#> R-HSA-611105 Up 5.265429e-14 7.350529e-17
#> R-HSA-6799198 Up 3.653543e-12 6.800452e-15
#> R-HSA-72649 Down 3.217567e-08 7.486197e-11
#> R-HSA-72662 Down 6.434150e-08 1.796412e-10
#> NGenes.Proteomics av_foldchange.Proteomics sig.Proteomics
#> R-HSA-163200 107 0.13980458 TRUE
#> R-HSA-1428517 147 0.12760898 TRUE
#> R-HSA-611105 93 0.14055433 TRUE
#> R-HSA-6799198 55 0.15261118 TRUE
#> R-HSA-72649 57 -0.09502561 TRUE
#> R-HSA-72662 58 -0.09089517 TRUE
#> Direction.RNA-seq FDR.RNA-seq PValue.RNA-seq NGenes.RNA-seq
#> R-HSA-163200 Down 1.429284e-05 4.442449e-07 122
#> R-HSA-1428517 Down 2.223055e-05 7.370260e-07 167
#> R-HSA-611105 Down 2.426042e-04 1.257371e-05 102
#> R-HSA-6799198 Down 2.980324e-03 2.457872e-04 57
#> R-HSA-72649 Down 1.327528e-01 2.976348e-02 58
#> R-HSA-72662 Down 1.729122e-01 4.199194e-02 59
#> av_foldchange.RNA-seq sig.RNA-seq
#> R-HSA-163200 -0.19680674 TRUE
#> R-HSA-1428517 -0.17754501 TRUE
#> R-HSA-611105 -0.18081596 TRUE
#> R-HSA-6799198 -0.17521923 TRUE
#> R-HSA-72649 -0.10393578 FALSE
#> R-HSA-72662 -0.07794782 FALSE
The ReactomeGSA package includes several basic plotting functions to
visualise the pathway results. For comparative gene set analysis like
the one presented here, two functions are available:
plot_correlations
and plot_volcano
.
plot_correlations
can be used to quickly assess how
similar two datasets are on the pathway level:
plot_correlations(result)
#> Comparing 1 vs 2
#> [[1]]
#> Warning: Removed 270 rows containing missing values (geom_point).
Individual datasets can further be visualised using volcano plots of the pathway data:
plot_volcano(result, 2)
Finally, it is possible to view the result as a heatmap:
plot_heatmap(result) +
# reduce text size to create a better HTML rendering
::theme(text = ggplot2::element_text(size = 6)) ggplot2
By default, only 30 pathways are shown in the heatmap. It is also possible to easily manually select pathways of interest to plot:
# get the data ready to plot with ggplot2
<- plot_heatmap(result, return_data = TRUE)
plot_data
# select the pathways of interest - here all pathways
# with "Interleukin" in their name
<- grepl("Interleukin", plot_data$Name)
interleukin_pathways
<- plot_data[interleukin_pathways, ]
interesting_data
# create the heatmap
::ggplot(interesting_data, ggplot2::aes(x = dataset, y = Name, fill = direction)) +
ggplot2::geom_tile() +
ggplot2::scale_fill_brewer(palette = "RdYlBu") +
ggplot2::labs(x = "Dataset", fill = "Direction") +
ggplot2::theme(text = ggplot2::element_text(size = 6)) ggplot2
Additionally, it is possible to open the analysis in Reactome’s web
interface using the open_reactome
command:
# Note: This command is not execute in the vignette, since it tries
# to open the default web browser
# open_reactome(result)
sessionInfo()
#> R version 4.2.1 (2022-06-23)
#> Platform: aarch64-apple-darwin20 (64-bit)
#> Running under: macOS Ventura 13.0
#>
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib
#>
#> locale:
#> [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] ReactomeGSA.data_1.11.0 sp_1.5-0 SeuratObject_4.1.0
#> [4] Seurat_4.1.1 edgeR_3.40.0 limma_3.54.0
#> [7] ReactomeGSA_1.12.0
#>
#> loaded via a namespace (and not attached):
#> [1] Rtsne_0.16 colorspace_2.0-3 deldir_1.0-6
#> [4] ellipsis_0.3.2 ggridges_0.5.3 rgdal_1.5-32
#> [7] spatstat.data_2.2-0 farver_2.1.1 leiden_0.4.2
#> [10] listenv_0.8.0 ggrepel_0.9.1 fansi_1.0.3
#> [13] codetools_0.2-18 splines_4.2.1 knitr_1.39
#> [16] polyclip_1.10-0 jsonlite_1.8.0 ica_1.0-3
#> [19] cluster_2.1.3 png_0.1-7 rgeos_0.5-9
#> [22] uwot_0.1.11 shiny_1.7.1 sctransform_0.3.3
#> [25] spatstat.sparse_2.1-1 BiocManager_1.30.18 compiler_4.2.1
#> [28] httr_1.4.3 assertthat_0.2.1 Matrix_1.4-1
#> [31] fastmap_1.1.0 lazyeval_0.2.2 cli_3.3.0
#> [34] later_1.3.0 prettyunits_1.1.1 htmltools_0.5.2
#> [37] tools_4.2.1 igraph_1.3.5 gtable_0.3.0
#> [40] glue_1.6.2 RANN_2.6.1 reshape2_1.4.4
#> [43] dplyr_1.0.9 Rcpp_1.0.9 scattermore_0.8
#> [46] jquerylib_0.1.4 vctrs_0.4.1 nlme_3.1-158
#> [49] progressr_0.10.1 lmtest_0.9-40 spatstat.random_2.2-0
#> [52] xfun_0.31 stringr_1.4.0 globals_0.15.1
#> [55] mime_0.12 miniUI_0.1.1.1 lifecycle_1.0.1
#> [58] irlba_2.3.5 gtools_3.9.3 goftest_1.2-3
#> [61] future_1.26.1 MASS_7.3-58 zoo_1.8-10
#> [64] scales_1.2.0 spatstat.core_2.4-4 hms_1.1.1
#> [67] promises_1.2.0.1 spatstat.utils_2.3-1 parallel_4.2.1
#> [70] RColorBrewer_1.1-3 curl_4.3.2 yaml_2.3.5
#> [73] reticulate_1.25 pbapply_1.5-0 gridExtra_2.3
#> [76] ggplot2_3.3.6 sass_0.4.1 rpart_4.1.16
#> [79] stringi_1.7.8 highr_0.9 caTools_1.18.2
#> [82] rlang_1.0.4 pkgconfig_2.0.3 matrixStats_0.62.0
#> [85] bitops_1.0-7 evaluate_0.15 lattice_0.20-45
#> [88] tensor_1.5 ROCR_1.0-11 purrr_0.3.4
#> [91] labeling_0.4.2 patchwork_1.1.1 htmlwidgets_1.5.4
#> [94] cowplot_1.1.1 tidyselect_1.1.2 parallelly_1.32.0
#> [97] RcppAnnoy_0.0.19 plyr_1.8.7 magrittr_2.0.3
#> [100] R6_2.5.1 gplots_3.1.3 generics_0.1.3
#> [103] DBI_1.1.3 mgcv_1.8-40 pillar_1.7.0
#> [106] fitdistrplus_1.1-8 abind_1.4-5 survival_3.3-1
#> [109] tibble_3.1.7 future.apply_1.9.0 crayon_1.5.1
#> [112] KernSmooth_2.23-20 utf8_1.2.2 spatstat.geom_2.4-0
#> [115] plotly_4.10.0 rmarkdown_2.14 progress_1.2.2
#> [118] locfit_1.5-9.6 grid_4.2.1 data.table_1.14.2
#> [121] digest_0.6.29 xtable_1.8-4 tidyr_1.2.0
#> [124] httpuv_1.6.5 munsell_0.5.0 viridisLite_0.4.0
#> [127] bslib_0.3.1