library(TDbasedUFE)
library(TDbasedUFEadv)
#> Warning: replacing previous import 'utils::findMatches' by
#> 'S4Vectors::findMatches' when loading 'AnnotationDbi'
#>
library(DOSE)
#> DOSE v3.26.0 For help: https://yulab-smu.top/biomedical-knowledge-mining-book/
#>
#> If you use DOSE in published research, please cite:
#> Guangchuang Yu, Li-Gen Wang, Guang-Rong Yan, Qing-Yu He. DOSE: an R/Bioconductor package for Disease Ontology Semantic and Enrichment analysis. Bioinformatics 2015, 31(4):608-609
library(enrichplot)
library(RTCGA.rnaseq)
#> Loading required package: RTCGA
#> Welcome to the RTCGA (version: 1.30.0). Read more about the project under https://rtcga.github.io/RTCGA/
library(RTCGA.clinical)
library(enrichR)
#> Welcome to enrichR
#> Checking connection ...
#> Enrichr ... Connection is Live!
#> FlyEnrichr ... Connection is Live!
#> WormEnrichr ... Connection is Live!
#> YeastEnrichr ... Connection is Live!
#> FishEnrichr ... Connection is Live!
#> OxEnrichr ... Connection is Live!
library(STRINGdb)
It might be helpful to demonstrate how to evaluate selected genes by enrichment analysis. Here, we show some of useful tools applied to the output from TDbasedUFEadv In order foe this, we reproduce one example in “How to use TDbasedUFEadv” as follows.
Multi <- list(
BLCA.rnaseq[seq_len(100), 1 + seq_len(1000)],
BRCA.rnaseq[seq_len(100), 1 + seq_len(1000)],
CESC.rnaseq[seq_len(100), 1 + seq_len(1000)],
COAD.rnaseq[seq_len(100), 1 + seq_len(1000)]
)
Z <- prepareTensorfromList(Multi, 10L)
Z <- aperm(Z, c(2, 1, 3))
Clinical <- list(BLCA.clinical, BRCA.clinical, CESC.clinical, COAD.clinical)
Multi_sample <- list(
BLCA.rnaseq[seq_len(100), 1, drop = FALSE],
BRCA.rnaseq[seq_len(100), 1, drop = FALSE],
CESC.rnaseq[seq_len(100), 1, drop = FALSE],
COAD.rnaseq[seq_len(100), 1, drop = FALSE]
)
# patient.stage_event.tnm_categories.pathologic_categories.pathologic_m
ID_column_of_Multi_sample <- c(770, 1482, 773, 791)
# patient.bcr_patient_barcode
ID_column_of_Clinical <- c(20, 20, 12, 14)
Z <- PrepareSummarizedExperimentTensor(
feature = colnames(ACC.rnaseq)[1 + seq_len(1000)],
sample = array("", 1), value = Z,
sampleData = prepareCondTCGA(
Multi_sample, Clinical,
ID_column_of_Multi_sample, ID_column_of_Clinical
)
)
HOSVD <- computeHosvd(Z)
#>
|
| | 0%
|
|======================= | 33%
|
|=============================================== | 67%
|
|======================================================================| 100%
cond <- attr(Z, "sampleData")
index <- selectFeatureProj(HOSVD, Multi, cond, de = 1e-3, input_all = 3) # Batch mode
head(tableFeatures(Z, index))
#> Feature p value adjusted p value
#> 10 ACTB|60 0.000000e+00 0.000000e+00
#> 11 ACTG1|71 0.000000e+00 0.000000e+00
#> 37 ALDOA|226 0.000000e+00 0.000000e+00
#> 19 ADAM6|8755 5.698305e-299 1.424576e-296
#> 22 AEBP1|165 1.057392e-218 2.114785e-216
#> 9 ACTA2|59 7.862975e-198 1.310496e-195
genes <- unlist(lapply(strsplit(tableFeatures(Z, index)[, 1], "|",
fixed = TRUE
), "[", 1))
entrez <- unlist(lapply(strsplit(tableFeatures(Z, index)[, 1], "|",
fixed = TRUE
), "[", 2))
Enrichr(Kuleshov et al. 2016) is one of tools that often provides us significant results toward genes selected by TDbasedUFE and TDbasedUFEadv.
setEnrichrSite("Enrichr")
#> Connection changed to https://maayanlab.cloud/Enrichr/
#> Connection is Live!
websiteLive <- TRUE
dbs <- c(
"GO_Molecular_Function_2015", "GO_Cellular_Component_2015",
"GO_Biological_Process_2015"
)
enriched <- enrichr(genes, dbs)
#> Uploading data to Enrichr... Done.
#> Querying GO_Molecular_Function_2015... Done.
#> Querying GO_Cellular_Component_2015... Done.
#> Querying GO_Biological_Process_2015... Done.
#> Parsing results... Done.
if (websiteLive) {
plotEnrich(enriched$GO_Biological_Process_2015,
showTerms = 20, numChar = 40, y = "Count",
orderBy = "P.value"
)
}
Enrichr can provide you huge number of enrichment analyses, many of which have good compatibility with the genes selected by TDbasedUFE as well as TDbasedUFEadv by the experience. Please check Enrichr’s web site to see what kinds of enrichment analyses can be done.
STRING(Szklarczyk et al. 2018) is enrichment analyses based upon protein-protein interaction, which is known to provide often significant results toward genes selected by TDbasedUFE as well as TDbasedUFEadv.
options(timeout = 200)
string_db <- STRINGdb$new(
version = "11.5",
species = 9606, score_threshold = 200,
network_type = "full", input_directory = ""
)
example1_mapped <- string_db$map(data.frame(genes = genes),
"genes",
removeUnmappedRows = TRUE
)
#> Warning: we couldn't map to STRING 1% of your identifiers
hits <- example1_mapped$STRING_id
string_db$plot_network(hits)
Although these above can provide us enough number of information to evaluate the genes selected by TDbasedUFE as well as TDbasedUFEadv, one might need all one package for which one does not how to decide which category must be evaluated in enrichment analysis.
In this case, we would recommend Metascape(Zhou et al. 2019) that unfortunately
does not have the ways approached from R. Thus, we recommend RITAN as
an alternative. It can list significant ones among multiple categories.
edo <- enrichDGN(entrez)
dotplot(edo, showCategory=30) + ggtitle("dotplot for ORA")
sessionInfo()
#> R version 4.3.0 RC (2023-04-13 r84269)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 22.04.2 LTS
#>
#> Matrix products: default
#> BLAS: /home/biocbuild/bbs-3.17-bioc/R/lib/libRblas.so
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_GB LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: America/New_York
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] STRINGdb_2.12.0 enrichR_3.2
#> [3] RTCGA.clinical_20151101.29.0 RTCGA.rnaseq_20151101.29.0
#> [5] RTCGA_1.30.0 enrichplot_1.20.0
#> [7] DOSE_3.26.0 TDbasedUFEadv_1.0.0
#> [9] TDbasedUFE_1.0.0 BiocStyle_2.28.0
#>
#> loaded via a namespace (and not attached):
#> [1] rTensor_1.4.8 splines_4.3.0 later_1.3.0
#> [4] bitops_1.0-7 ggplotify_0.1.0 tibble_3.2.1
#> [7] polyclip_1.10-4 XML_3.99-0.14 lifecycle_1.0.3
#> [10] rstatix_0.7.2 lattice_0.21-8 MASS_7.3-59
#> [13] backports_1.4.1 magrittr_2.0.3 sass_0.4.5
#> [16] rmarkdown_2.21 jquerylib_0.1.4 yaml_2.3.7
#> [19] plotrix_3.8-2 httpuv_1.6.9 cowplot_1.1.1
#> [22] DBI_1.1.3 RColorBrewer_1.1-3 abind_1.4-5
#> [25] MOFAdata_1.15.0 zlibbioc_1.46.0 rvest_1.0.3
#> [28] GenomicRanges_1.52.0 purrr_1.0.1 ggraph_2.1.0
#> [31] BiocGenerics_0.46.0 RCurl_1.98-1.12 hash_2.2.6.2
#> [34] yulab.utils_0.0.6 WriteXLS_6.4.0 tweenr_2.0.2
#> [37] GenomeInfoDbData_1.2.10 IRanges_2.34.0 KMsurv_0.1-5
#> [40] S4Vectors_0.38.0 ggrepel_0.9.3 tidytree_0.4.2
#> [43] proto_1.0.0 codetools_0.2-19 xml2_1.3.3
#> [46] ggforce_0.4.1 tximportData_1.27.0 tidyselect_1.2.0
#> [49] aplot_0.1.10 farver_2.1.1 viridis_0.6.2
#> [52] stats4_4.3.0 jsonlite_1.8.4 ellipsis_0.3.2
#> [55] tidygraph_1.2.3 survival_3.5-5 tools_4.3.0
#> [58] chron_2.3-60 treeio_1.24.0 Rcpp_1.0.10
#> [61] glue_1.6.2 gridExtra_2.3 xfun_0.39
#> [64] qvalue_2.32.0 ggthemes_4.2.4 GenomeInfoDb_1.36.0
#> [67] dplyr_1.1.2 withr_2.5.0 BiocManager_1.30.20
#> [70] fastmap_1.1.1 fansi_1.0.4 caTools_1.18.2
#> [73] digest_0.6.31 R6_2.5.1 mime_0.12
#> [76] gridGraphics_0.5-1 colorspace_2.1-0 GO.db_3.17.0
#> [79] gtools_3.9.4 RSQLite_2.3.1 utf8_1.2.3
#> [82] tidyr_1.3.0 generics_0.1.3 data.table_1.14.8
#> [85] graphlayouts_0.8.4 httr_1.4.5 scatterpie_0.1.9
#> [88] sqldf_0.4-11 pkgconfig_2.0.3 gtable_0.3.3
#> [91] blob_1.2.4 XVector_0.40.0 survMisc_0.5.6
#> [94] shadowtext_0.1.2 htmltools_0.5.5 carData_3.0-5
#> [97] bookdown_0.33 fgsea_1.26.0 scales_1.2.1
#> [100] Biobase_2.60.0 png_0.1-8 ggfun_0.0.9
#> [103] knitr_1.42 km.ci_0.5-6 tzdb_0.3.0
#> [106] reshape2_1.4.4 rjson_0.2.21 nlme_3.1-162
#> [109] curl_5.0.0 cachem_1.0.7 zoo_1.8-12
#> [112] stringr_1.5.0 KernSmooth_2.23-20 parallel_4.3.0
#> [115] HDO.db_0.99.1 AnnotationDbi_1.62.0 pillar_1.9.0
#> [118] grid_4.3.0 vctrs_0.6.2 gplots_3.1.3
#> [121] promises_1.2.0.1 ggpubr_0.6.0 car_3.1-2
#> [124] xtable_1.8-4 tximport_1.28.0 evaluate_0.20
#> [127] readr_2.1.4 gsubfn_0.7 cli_3.6.1
#> [130] compiler_4.3.0 rlang_1.1.0 crayon_1.5.2
#> [133] ggsignif_0.6.4 labeling_0.4.2 survminer_0.4.9
#> [136] plyr_1.8.8 stringi_1.7.12 viridisLite_0.4.1
#> [139] BiocParallel_1.34.0 assertthat_0.2.1 munsell_0.5.0
#> [142] Biostrings_2.68.0 lazyeval_0.2.2 GOSemSim_2.26.0
#> [145] Matrix_1.5-4 hms_1.1.3 patchwork_1.1.2
#> [148] bit64_4.0.5 ggplot2_3.4.2 KEGGREST_1.40.0
#> [151] shiny_1.7.4 highr_0.10 igraph_1.4.2
#> [154] broom_1.0.4 memoise_2.0.1 bslib_0.4.2
#> [157] ggtree_3.8.0 fastmatch_1.1-3 bit_4.0.5
#> [160] ape_5.7-1
Kuleshov, Maxim V., Matthew R. Jones, Andrew D. Rouillard, Nicolas F. Fernandez, Qiaonan Duan, Zichen Wang, Simon Koplev, et al. 2016. “Enrichr: a comprehensive gene set enrichment analysis web server 2016 update.” Nucleic Acids Research 44 (W1): W90–W97. https://doi.org/10.1093/nar/gkw377.
Szklarczyk, Damian, Annika L Gable, David Lyon, Alexander Junge, Stefan Wyder, Jaime Huerta-Cepas, Milan Simonovic, et al. 2018. “STRING v11: protein窶菟rotein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets.” Nucleic Acids Research 47 (D1): D607–D613. https://doi.org/10.1093/nar/gky1131.
Zhou, Yingyao, Bin Zhou, Lars Pache, Max Chang, Alireza Hadj Khodabakhshi, Olga Tanaseichuk, Christopher Benner, and Sumit K Chanda. 2019. “Metascape provides a biologist-oriented resource for the analysis of systems-level datasets.” Nature Communications 10 (1): 1523. https://doi.org/10.1038/s41467-019-09234-6.