ADAM is a GSEA R package created to group a set of genes from comparative samples (control versus experiment) according to their respective functions (Gene Ontology and KEGG pathways as default) and show their significance by calculating p-values referring to gene diversity and activity ((???)). Each group of genes is called GFAG (Group of Functionally Associated Genes). The package has support for many species in regards to the genes and their respective functions.
In the package’s analysis, all genes present in the expression data are grouped by their respective functions according to the domains described by AnalysisDomain argument. The relationship between genes and functions are made based on the species annotation package. If there is no annotation package, a three column file (gene, function and function description) must be provided. For each GFAG, gene diversity and activity in each sample are calculated. As the package always compare two samples (control versus experiment), relative gene diversity and activity for each GFAG are calculated. Using bootstrap method, for each GFAG, according to relative gene diversity and activity, two p-values are calculated. The p-values are then corrected, by using the correction method defined by PCorrectionMethod argument, generating a q-value ((???)). The significative GFAGs will be those whose q-value stay under the cutoff set by PCorrection argument. Optionally, it’s possible to run Wilcoxon test and/or Fisher’s exact test ((???)). These tests also provide a corrected p-value, and siginificative groups can be seen through them.
GFAGAnalysis function provides a complete analysis, using all available arguments. As an example, lets consider a gene expression set of Aedes aegypti:
suppressMessages(library(ADAM))
data("ExpressionAedes")
head(ExpressionAedes)
## gene control1 experiment1 control2 experiment2
## 1 AAEL000372 0.9750520 11.744300 15.432300 0.00000
## 2 AAEL000433 1.5934700 11.876400 0.133743 1.23991
## 3 AAEL000596 0.5680230 3.365710 7.411010 13.06860
## 4 AAEL000599 0.0370953 0.361577 0.154117 0.34019
## 5 AAEL000656 0.0648374 0.278074 1.340890 1.15750
## 6 AAEL000759 2.3512100 48.762500 6.276890 25.02600
The first column refers to the gene names, while the others are the expression obtained by a specific experiment (in this case, RNA-seq). ADAM always need two samples (control versus experiment). This way, we must select two sample columns from the expression data:
Comparison <- c("control1,experiment1","control2,experiment2")
Each GFAG has a number of genes associated to it. This way, the analysis can consider all GFAGs or just those with a certain number of genes:
Minimum <- 3
Maximum <- 20
The p-values for each GFAG is calculated through the bootstrap method, which demands a seed for generating random numbers and a number of bootstraps steps (the number of bootstraps should be a value that ensures the p-value precision):
SeedBootstrap <- 1049
StepsBootstrap <- 1000
The p-values will be corrected by a specific method with a certain cutoff value:
CutoffValue <- 0.05
MethodCorrection <- "fdr"
In order to group the genes according to their functions, it’s necessary an annotation package or a file relating genes and functions. In this case, Aedes aegypti doesn’t have an annotation package. This way, we build our own file:
data("KeggPathwaysAedes")
head(KeggPathwaysAedes)
## gene pathwayID pathwayDescription
## 1 AAEL012172 aag00270 Cysteine and methionine metabolism
## 2 AAEL026440 aag00270 Cysteine and methionine metabolism
## 3 AAEL026231 aag00270 Cysteine and methionine metabolism
## 4 AAEL001176 aag00270 Cysteine and methionine metabolism
## 5 AAEL002399 aag00270 Cysteine and methionine metabolism
## 6 AAEL026357 aag00270 Cysteine and methionine metabolism
It’s necessary to inform which function domain and gene nomenclature are being used. As Aedes agypti doesn’t have an annotation package, the domain will be “own” and the nomenclature “gene”:
Domain <- "own"
Nomenclature <- "geneStableID"
Wilcoxon test and Fisher’s exact test will be run:
Wilcoxon <- TRUE
Fisher <- TRUE
As all arguments were defined, then we can run GFAGAnalysis function:
ResultAnalysis <- suppressMessages(GFAGAnalysis(ComparisonID = Comparison,
ExpressionData = ExpressionAedes,
MinGene = Minimum,
MaxGene = Maximum,
SeedNumber = SeedBootstrap,
BootstrapNumber = StepsBootstrap,
PCorrection = CutoffValue,
DBSpecies = KeggPathwaysAedes,
PCorrectionMethod = MethodCorrection,
WilcoxonTest = Wilcoxon,
FisherTest = Fisher,
AnalysisDomain = Domain,
GeneIdentifier = Nomenclature))
In the example above, we used the function supressMessages just to stop showing messages during the GFAGAnalysis function execution. The output of GFAGAnalysis will be a list with two elements. The first corresponds to a data frame showing genes and their respective functions:
head(ResultAnalysis[[1]])
## gene GroupID
## 1 AAEL000759 aag00270
## 2 AAEL001176 aag00270
## 3 AAEL001378 aag00270
## 4 AAEL002399 aag00270
## 5 AAEL004059 aag00270
## 6 AAEL004728 aag00270
The second element of the output list result corresponds to data frames according to the argument ComparisonID:
DT::datatable(as.data.frame(ResultAnalysis[[2]][1]), width = 800,
options = list(scrollX = TRUE))
DT::datatable(as.data.frame(ResultAnalysis[[2]][2]), width = 800,
options = list(scrollX = TRUE))
The data frames corresponding to the second element of the list have the following columns:
ADAnalysis function provides a partial analysis, where is calculated just gene diversity and activity of each GFAG with no signicance by bootstrap, Wilcoxon or Fisher. As an example, lets consider the same gene expression set of Aedes aegypti previously used in GFAGAnalysis funcion example:
suppressMessages(library(ADAM))
data("ExpressionAedes")
data("KeggPathwaysAedes")
As ADAM always need two samples (control versus experiment), let’s select two sample columns from the expression data and define minimum and maximum number of genes per GFAG:
Comparison <- c("control1,experiment1")
Minimum <- 3
Maximum <- 100
Aedes aegypti doesn’t have an annotation package. This way, we build our own file:
SpeciesID <- "KeggPathwaysAedes"
It’s necessary to inform which function domain and gene nomenclature are being used. Aedes agypti doesn’t have an annotation package. So the domain will be “own” and the nomenclature “geneStableID”:
Domain <- "own"
Nomenclature <- "geneStableID"
As all arguments were defined, then we can run ADAnalysis function:
ResultAnalysis <- suppressMessages(ADAnalysis(ComparisonID = Comparison,
ExpressionData = ExpressionAedes,
MinGene = Minimum,
MaxGene = Maximum,
DBSpecies = KeggPathwaysAedes,
AnalysisDomain = Domain,
GeneIdentifier = Nomenclature))
In the example above, we used the function supressMessages just to stop showing messages during the ADAnalysis function execution. The output of ADAnalysis will be a list with two elements. The first corresponds to a data frame showing genes and their respective functions:
head(ResultAnalysis[[1]])
## gene GroupID
## 1 AAEL000759 aag00270
## 2 AAEL001176 aag00270
## 3 AAEL001378 aag00270
## 4 AAEL002399 aag00270
## 5 AAEL004059 aag00270
## 6 AAEL004728 aag00270
The second element of the output list result corresponds to data frames according to the argument ComparisonID:
DT::datatable(as.data.frame(ResultAnalysis[[2]][1]), width = 800,
options = list(scrollX = TRUE))
The data frames corresponding to the second element of the list have the following columns:
## R version 4.1.0 (2021-05-18)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.2 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.13-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.13-bioc/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] ADAM_1.8.0 BiocStyle_2.20.0
##
## loaded via a namespace (and not attached):
## [1] MatrixGenerics_1.4.0 Biobase_2.52.0
## [3] httr_1.4.2 sass_0.4.0
## [5] bit64_4.0.5 jsonlite_1.7.2
## [7] bslib_0.2.5.1 assertthat_0.2.1
## [9] BiocManager_1.30.15 stats4_4.1.0
## [11] blob_1.2.1 GenomeInfoDbData_1.2.6
## [13] yaml_2.2.1 pillar_1.6.1
## [15] RSQLite_2.2.7 lattice_0.20-44
## [17] glue_1.4.2 digest_0.6.27
## [19] GenomicRanges_1.44.0 XVector_0.32.0
## [21] htmltools_0.5.1.1 Matrix_1.3-3
## [23] pkgconfig_2.0.3 bookdown_0.22
## [25] zlibbioc_1.38.0 purrr_0.3.4
## [27] GO.db_3.13.0 tibble_3.1.2
## [29] KEGGREST_1.32.0 generics_0.1.0
## [31] IRanges_2.26.0 ellipsis_0.3.2
## [33] DT_0.18 cachem_1.0.5
## [35] SummarizedExperiment_1.22.0 pbapply_1.4-3
## [37] BiocGenerics_0.38.0 magrittr_2.0.1
## [39] crayon_1.4.1 memoise_2.0.0
## [41] evaluate_0.14 fansi_0.4.2
## [43] tools_4.1.0 lifecycle_1.0.0
## [45] matrixStats_0.58.0 stringr_1.4.0
## [47] S4Vectors_0.30.0 DelayedArray_0.18.0
## [49] AnnotationDbi_1.54.0 Biostrings_2.60.0
## [51] compiler_4.1.0 jquerylib_0.1.4
## [53] GenomeInfoDb_1.28.0 rlang_0.4.11
## [55] grid_4.1.0 RCurl_1.98-1.3
## [57] htmlwidgets_1.5.3 crosstalk_1.1.1
## [59] bitops_1.0-7 rmarkdown_2.8
## [61] DBI_1.1.1 R6_2.5.0
## [63] knitr_1.33 dplyr_1.0.6
## [65] fastmap_1.1.0 bit_4.0.4
## [67] utf8_1.2.1 stringi_1.6.2
## [69] parallel_4.1.0 Rcpp_1.0.6
## [71] vctrs_0.3.8 png_0.1-7
## [73] tidyselect_1.1.1 xfun_0.23