Contents

library(TDbasedUFE)
library(TDbasedUFEadv)
#> Warning: replacing previous import 'utils::findMatches' by
#> 'S4Vectors::findMatches' when loading 'AnnotationDbi'
#> 
library(DOSE)
#> DOSE v3.26.0  For help: https://yulab-smu.top/biomedical-knowledge-mining-book/
#> 
#> If you use DOSE in published research, please cite:
#> Guangchuang Yu, Li-Gen Wang, Guang-Rong Yan, Qing-Yu He. DOSE: an R/Bioconductor package for Disease Ontology Semantic and Enrichment analysis. Bioinformatics 2015, 31(4):608-609
library(enrichplot)
library(RTCGA.rnaseq)
#> Loading required package: RTCGA
#> Welcome to the RTCGA (version: 1.30.0). Read more about the project under https://rtcga.github.io/RTCGA/
library(RTCGA.clinical)
library(enrichR)
#> Welcome to enrichR
#> Checking connection ...
#> Enrichr ... Connection is Live!
#> FlyEnrichr ... Connection is Live!
#> WormEnrichr ... Connection is Live!
#> YeastEnrichr ... Connection is Live!
#> FishEnrichr ... Connection is Live!
#> OxEnrichr ... Connection is Live!
library(STRINGdb)

1 Introduction

It might be helpful to demonstrate how to evaluate selected genes by enrichment analysis. Here, we show some of useful tools applied to the output from TDbasedUFEadv In order foe this, we reproduce one example in “How to use TDbasedUFEadv” as follows.

Multi <- list(
  BLCA.rnaseq[seq_len(100), 1 + seq_len(1000)],
  BRCA.rnaseq[seq_len(100), 1 + seq_len(1000)],
  CESC.rnaseq[seq_len(100), 1 + seq_len(1000)],
  COAD.rnaseq[seq_len(100), 1 + seq_len(1000)]
)
Z <- prepareTensorfromList(Multi, 10L)
Z <- aperm(Z, c(2, 1, 3))
Clinical <- list(BLCA.clinical, BRCA.clinical, CESC.clinical, COAD.clinical)
Multi_sample <- list(
  BLCA.rnaseq[seq_len(100), 1, drop = FALSE],
  BRCA.rnaseq[seq_len(100), 1, drop = FALSE],
  CESC.rnaseq[seq_len(100), 1, drop = FALSE],
  COAD.rnaseq[seq_len(100), 1, drop = FALSE]
)
# patient.stage_event.tnm_categories.pathologic_categories.pathologic_m
ID_column_of_Multi_sample <- c(770, 1482, 773, 791)
# patient.bcr_patient_barcode
ID_column_of_Clinical <- c(20, 20, 12, 14)
Z <- PrepareSummarizedExperimentTensor(
  feature = colnames(ACC.rnaseq)[1 + seq_len(1000)],
  sample = array("", 1), value = Z,
  sampleData = prepareCondTCGA(
    Multi_sample, Clinical,
    ID_column_of_Multi_sample, ID_column_of_Clinical
  )
)
HOSVD <- computeHosvd(Z)
#> 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=======================                                               |  33%
  |                                                                            
  |===============================================                       |  67%
  |                                                                            
  |======================================================================| 100%
cond <- attr(Z, "sampleData")
index <- selectFeatureProj(HOSVD, Multi, cond, de = 1e-3, input_all = 3) # Batch mode

head(tableFeatures(Z, index))
#>       Feature       p value adjusted p value
#> 10    ACTB|60  0.000000e+00     0.000000e+00
#> 11   ACTG1|71  0.000000e+00     0.000000e+00
#> 37  ALDOA|226  0.000000e+00     0.000000e+00
#> 19 ADAM6|8755 5.698305e-299    1.424576e-296
#> 22  AEBP1|165 1.057392e-218    2.114785e-216
#> 9    ACTA2|59 7.862975e-198    1.310496e-195
genes <- unlist(lapply(strsplit(tableFeatures(Z, index)[, 1], "|",
  fixed = TRUE
), "[", 1))
entrez <- unlist(lapply(strsplit(tableFeatures(Z, index)[, 1], "|",
  fixed = TRUE
), "[", 2))

2 Enrichr

Enrichr(Kuleshov et al. 2016) is one of tools that often provides us significant results toward genes selected by TDbasedUFE and TDbasedUFEadv.

setEnrichrSite("Enrichr")
#> Connection changed to https://maayanlab.cloud/Enrichr/
#> Connection is Live!
websiteLive <- TRUE
dbs <- c(
  "GO_Molecular_Function_2015", "GO_Cellular_Component_2015",
  "GO_Biological_Process_2015"
)
enriched <- enrichr(genes, dbs)
#> Uploading data to Enrichr... Done.
#>   Querying GO_Molecular_Function_2015... Done.
#>   Querying GO_Cellular_Component_2015... Done.
#>   Querying GO_Biological_Process_2015... Done.
#> Parsing results... Done.
if (websiteLive) {
  plotEnrich(enriched$GO_Biological_Process_2015,
    showTerms = 20, numChar = 40, y = "Count",
    orderBy = "P.value"
  )
}

Enrichr can provide you huge number of enrichment analyses, many of which have good compatibility with the genes selected by TDbasedUFE as well as TDbasedUFEadv by the experience. Please check Enrichr’s web site to see what kinds of enrichment analyses can be done.

3 STRING

STRING(Szklarczyk et al. 2018) is enrichment analyses based upon protein-protein interaction, which is known to provide often significant results toward genes selected by TDbasedUFE as well as TDbasedUFEadv.

options(timeout = 200)
string_db <- STRINGdb$new(
  version = "11.5",
  species = 9606, score_threshold = 200,
  network_type = "full", input_directory = ""
)
example1_mapped <- string_db$map(data.frame(genes = genes),
  "genes",
  removeUnmappedRows = TRUE
)
#> Warning:  we couldn't map to STRING 1% of your identifiers
hits <- example1_mapped$STRING_id
string_db$plot_network(hits)

4 enrichplot

Although these above can provide us enough number of information to evaluate the genes selected by TDbasedUFE as well as TDbasedUFEadv, one might need all one package for which one does not how to decide which category must be evaluated in enrichment analysis.

In this case, we would recommend Metascape(Zhou et al. 2019) that unfortunately
does not have the ways approached from R. Thus, we recommend RITAN as an alternative. It can list significant ones among multiple categories.

edo <- enrichDGN(entrez)
dotplot(edo, showCategory=30) + ggtitle("dotplot for ORA")

sessionInfo()
#> R version 4.3.0 RC (2023-04-13 r84269)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 22.04.2 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.17-bioc/R/lib/libRblas.so 
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB              LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: America/New_York
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#>  [1] STRINGdb_2.12.0              enrichR_3.2                 
#>  [3] RTCGA.clinical_20151101.29.0 RTCGA.rnaseq_20151101.29.0  
#>  [5] RTCGA_1.30.0                 enrichplot_1.20.0           
#>  [7] DOSE_3.26.0                  TDbasedUFEadv_1.0.0         
#>  [9] TDbasedUFE_1.0.0             BiocStyle_2.28.0            
#> 
#> loaded via a namespace (and not attached):
#>   [1] rTensor_1.4.8           splines_4.3.0           later_1.3.0            
#>   [4] bitops_1.0-7            ggplotify_0.1.0         tibble_3.2.1           
#>   [7] polyclip_1.10-4         XML_3.99-0.14           lifecycle_1.0.3        
#>  [10] rstatix_0.7.2           lattice_0.21-8          MASS_7.3-59            
#>  [13] backports_1.4.1         magrittr_2.0.3          sass_0.4.5             
#>  [16] rmarkdown_2.21          jquerylib_0.1.4         yaml_2.3.7             
#>  [19] plotrix_3.8-2           httpuv_1.6.9            cowplot_1.1.1          
#>  [22] DBI_1.1.3               RColorBrewer_1.1-3      abind_1.4-5            
#>  [25] MOFAdata_1.15.0         zlibbioc_1.46.0         rvest_1.0.3            
#>  [28] GenomicRanges_1.52.0    purrr_1.0.1             ggraph_2.1.0           
#>  [31] BiocGenerics_0.46.0     RCurl_1.98-1.12         hash_2.2.6.2           
#>  [34] yulab.utils_0.0.6       WriteXLS_6.4.0          tweenr_2.0.2           
#>  [37] GenomeInfoDbData_1.2.10 IRanges_2.34.0          KMsurv_0.1-5           
#>  [40] S4Vectors_0.38.0        ggrepel_0.9.3           tidytree_0.4.2         
#>  [43] proto_1.0.0             codetools_0.2-19        xml2_1.3.3             
#>  [46] ggforce_0.4.1           tximportData_1.27.0     tidyselect_1.2.0       
#>  [49] aplot_0.1.10            farver_2.1.1            viridis_0.6.2          
#>  [52] stats4_4.3.0            jsonlite_1.8.4          ellipsis_0.3.2         
#>  [55] tidygraph_1.2.3         survival_3.5-5          tools_4.3.0            
#>  [58] chron_2.3-60            treeio_1.24.0           Rcpp_1.0.10            
#>  [61] glue_1.6.2              gridExtra_2.3           xfun_0.39              
#>  [64] qvalue_2.32.0           ggthemes_4.2.4          GenomeInfoDb_1.36.0    
#>  [67] dplyr_1.1.2             withr_2.5.0             BiocManager_1.30.20    
#>  [70] fastmap_1.1.1           fansi_1.0.4             caTools_1.18.2         
#>  [73] digest_0.6.31           R6_2.5.1                mime_0.12              
#>  [76] gridGraphics_0.5-1      colorspace_2.1-0        GO.db_3.17.0           
#>  [79] gtools_3.9.4            RSQLite_2.3.1           utf8_1.2.3             
#>  [82] tidyr_1.3.0             generics_0.1.3          data.table_1.14.8      
#>  [85] graphlayouts_0.8.4      httr_1.4.5              scatterpie_0.1.9       
#>  [88] sqldf_0.4-11            pkgconfig_2.0.3         gtable_0.3.3           
#>  [91] blob_1.2.4              XVector_0.40.0          survMisc_0.5.6         
#>  [94] shadowtext_0.1.2        htmltools_0.5.5         carData_3.0-5          
#>  [97] bookdown_0.33           fgsea_1.26.0            scales_1.2.1           
#> [100] Biobase_2.60.0          png_0.1-8               ggfun_0.0.9            
#> [103] knitr_1.42              km.ci_0.5-6             tzdb_0.3.0             
#> [106] reshape2_1.4.4          rjson_0.2.21            nlme_3.1-162           
#> [109] curl_5.0.0              cachem_1.0.7            zoo_1.8-12             
#> [112] stringr_1.5.0           KernSmooth_2.23-20      parallel_4.3.0         
#> [115] HDO.db_0.99.1           AnnotationDbi_1.62.0    pillar_1.9.0           
#> [118] grid_4.3.0              vctrs_0.6.2             gplots_3.1.3           
#> [121] promises_1.2.0.1        ggpubr_0.6.0            car_3.1-2              
#> [124] xtable_1.8-4            tximport_1.28.0         evaluate_0.20          
#> [127] readr_2.1.4             gsubfn_0.7              cli_3.6.1              
#> [130] compiler_4.3.0          rlang_1.1.0             crayon_1.5.2           
#> [133] ggsignif_0.6.4          labeling_0.4.2          survminer_0.4.9        
#> [136] plyr_1.8.8              stringi_1.7.12          viridisLite_0.4.1      
#> [139] BiocParallel_1.34.0     assertthat_0.2.1        munsell_0.5.0          
#> [142] Biostrings_2.68.0       lazyeval_0.2.2          GOSemSim_2.26.0        
#> [145] Matrix_1.5-4            hms_1.1.3               patchwork_1.1.2        
#> [148] bit64_4.0.5             ggplot2_3.4.2           KEGGREST_1.40.0        
#> [151] shiny_1.7.4             highr_0.10              igraph_1.4.2           
#> [154] broom_1.0.4             memoise_2.0.1           bslib_0.4.2            
#> [157] ggtree_3.8.0            fastmatch_1.1-3         bit_4.0.5              
#> [160] ape_5.7-1

Kuleshov, Maxim V., Matthew R. Jones, Andrew D. Rouillard, Nicolas F. Fernandez, Qiaonan Duan, Zichen Wang, Simon Koplev, et al. 2016. “Enrichr: a comprehensive gene set enrichment analysis web server 2016 update.” Nucleic Acids Research 44 (W1): W90–W97. https://doi.org/10.1093/nar/gkw377.

Szklarczyk, Damian, Annika L Gable, David Lyon, Alexander Junge, Stefan Wyder, Jaime Huerta-Cepas, Milan Simonovic, et al. 2018. “STRING v11: protein窶菟rotein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets.” Nucleic Acids Research 47 (D1): D607–D613. https://doi.org/10.1093/nar/gky1131.

Zhou, Yingyao, Bin Zhou, Lars Pache, Max Chang, Alireza Hadj Khodabakhshi, Olga Tanaseichuk, Christopher Benner, and Sumit K Chanda. 2019. “Metascape provides a biologist-oriented resource for the analysis of systems-level datasets.” Nature Communications 10 (1): 1523. https://doi.org/10.1038/s41467-019-09234-6.