Contents

1 Introduction

MicroRNAs (miRNAs) play key roles in many biological processes including cancers [1-5]. Thus, uncovering miRNA functions and regulatory mechanisms is important for gene diagnosis and therapy.

Previous studies [6-9] have shown that a pool of coding and non-coding RNAs that shares common miRNA biding sites competes with each other, thus alter miRNA activity. The corresponding regulatory mechanism is named competing endogenous RNA (ceRNA) hypothesis [10]. These RNAs are called ceRNAs or miRNA sponges or miRNA decoys, and include long non-coding RNAs (lncRNAs), pseudogenes, circular RNAs (circRNAs) and messenger RNAs (mRNAs), etc. To study the module-level properties of miRNA sponges, it is necessary to identify miRNA sponge modules. The miRNA sponge modules will help to reveal the biological mechanism in cancer.

To speed up the research of miRNA sponge modules, we develop an R/Bioconductor package ‘miRSM’ to infer miRNA sponge modules. Unlike the existing R/Bioconductor packages (‘miRsponge’ and ‘SPONGE’), ‘miRSM’ focuses on identifying miRNA sponge modules by integrating expression data and miRNA-target binding information instead of miRNA sponge interaction networks.

2 Identification of gene modules

Given matched ceRNA and mRNA expression data, we infer gene modules by using several methods from 14 packages, including ‘WGCNA’, ‘GFA’, ‘igraph’, ‘ProNet’, ‘NMF’, ‘biclust’, ‘runibic’, ‘iBBiG’, ‘fabia’, ‘BicARE’, ‘isa2’, ‘s4vd’, ‘BiBitR’ and ‘rqubic’. We assemble these methods into 6 functions: module_WGCNA, module_GFA, module_igraph, module_ProNet, module_NMF, and module_biclust.

2.1 Load BRCA sample data

The BRCA sample data includes matched miRNA, lncRNA, mRNA expression data, and miRNA-target binding information.

data(BRCASampleData)

2.2 module_WGCNA

By using WGCNA method [11], we identify co-expressed gene modules from matched ceRNA and mRNA expression data.

modulegenes_WGCNA <- module_WGCNA(ceRExp[, seq_len(80)], 
                                  mRExp[, seq_len(80)])
##    Power SFT.R.sq  slope truncated.R.sq mean.k. median.k. max.k.
## 1      1   0.0755  0.331       -0.10700  54.100  65.50000  83.70
## 2      2   0.0476 -0.260        0.09470  28.000  33.30000  56.60
## 3      3   0.1720 -0.315        0.40700  17.100  18.20000  41.80
## 4      4   0.2940 -0.400        0.26500  11.400  10.30000  32.80
## 5      5   0.5200 -0.518        0.41100   7.970   5.99000  26.50
## 6      6   0.7650 -0.599        0.72800   5.770   3.55000  21.90
## 7      7   0.8470 -0.644        0.87100   4.290   2.14000  18.30
## 8      8   0.7110 -0.740        0.70500   3.260   1.31000  15.40
## 9      9   0.1560 -1.720        0.07550   2.520   0.81700  13.10
## 10    10   0.1690 -1.850        0.08770   1.980   0.52000  11.30
## 11    12   0.8620 -0.954        0.89900   1.260   0.22000   8.43
## 12    14   0.8510 -1.030        0.85100   0.844   0.09870   6.44
## 13    16   0.1800 -1.910        0.00065   0.584   0.04630   4.99
## 14    18   0.1850 -1.870        0.01020   0.417   0.02160   3.91
## 15    20   0.9260 -1.030        0.96000   0.305   0.00998   3.11
## ..connectivity..
## ..matrix multiplication (system BLAS)..
## ..normalization..
## ..done.
modulegenes_WGCNA
## GeneSetCollection
##   names: Module 1 (1 total)
##   unique identifiers: A2M-AS1, EMX2OS, ..., GRASP (36 total)
##   types in collection:
##     geneIdType: NullIdentifier (1 total)
##     collectionType: NullCollection (1 total)

2.3 module_GFA

The gene modules are identified by using GFA method [12, 13] from matched ceRNA and mRNA expression data.

modulegenes_GFA <- module_GFA(ceRExp[seq_len(20), seq_len(15)],
                              mRExp[seq_len(20), seq_len(15)], 
                              iter.max = 2600)
modulegenes_GFA

2.4 module_igraph

By using ‘igraph’ package [14], we infer gene modules from matched ceRNA and mRNA expression data. In the ‘igraph’ package, we can select “betweenness”, “greedy”, “infomap”, “prop”, “eigen”, “louvain”, “walktrap” methods for gene module identification. The default method is “greedy”.

modulegenes_igraph <- module_igraph(ceRExp[, seq_len(10)],
                                    mRExp[, seq_len(10)])
modulegenes_igraph
## GeneSetCollection
##   names: Module 1, Module 2 (2 total)
##   unique identifiers: A2M-AS1, ABCA11P, ..., E2F8 (19 total)
##   types in collection:
##     geneIdType: NullIdentifier (1 total)
##     collectionType: NullCollection (1 total)

2.5 module_ProNet

In the ‘ProNet’ package, we can select FN [15], MCL [16], LINKCOMM [17] and MCODE [18] for gene module identification. The default method is MCL.

modulegenes_ProNet <- module_ProNet(ceRExp[, seq_len(10)],
                                    mRExp[, seq_len(10)])
modulegenes_ProNet
## GeneSetCollection
##   names: Module 1, Module 2 (2 total)
##   unique identifiers: A2M-AS1, ACVR2B-AS1, ..., E2F7 (12 total)
##   types in collection:
##     geneIdType: NullIdentifier (1 total)
##     collectionType: NullCollection (1 total)

2.6 module_NMF

By using ‘NMF’ package [20], we infer gene modules from matched ceRNA and mRNA expression data. In the ‘NMF’ package, we can select “brunet”, “Frobenius”, “KL”, “lee”, “nsNMF”, “offset”, “siNMF”, “snmf/l”, “snmf/r” methods for gene module identification. The default method is “brunet”.

# Reimport NMF package to avoid conflicts with DelayedArray package
library(NMF)
modulegenes_NMF <- module_NMF(ceRExp[, seq_len(10)],
                              mRExp[, seq_len(10)])
modulegenes_NMF
## GeneSetCollection
##   names: Module 1 (1 total)
##   unique identifiers: A1BG-AS1, ABCA11P, C1orf43, COL10A1 (4 total)
##   types in collection:
##     geneIdType: NullIdentifier (1 total)
##     collectionType: NullCollection (1 total)

2.7 module_biclust

We Identify gene modules from matched ceRNA and mRNA expression data using a series of biclustering packages, including biclust [21], runibic [22], iBBiG [23], fabia [24], BicARE [25], isa2 [26], s4vd [27], BiBitR [28] and rqubic [29]. The biclustering methods include “BCBimax”, “BCCC”, “BCPlaid”, “BCQuest”, “BCSpectral”, “BCXmotifs”, “BCUnibic”, iBBiG“,”fabia“,”fabiap“,”fabias“,”mfsc“,”nmfdiv“,”nmfeu“,”nmfsc“,”FLOC“,”isa“,”BCs4vd“,”BCssvd“,”bibit" and “quBicluster”. The default method is “fabia”.

modulegenes_biclust <- module_biclust(ceRExp[, seq_len(30)],
                                      mRExp[, seq_len(30)])
## Cycle: 0
Cycle: 20
Cycle: 40
Cycle: 60
Cycle: 80
Cycle: 100
Cycle: 120
Cycle: 140
Cycle: 160
Cycle: 180
Cycle: 200
Cycle: 220
Cycle: 240
Cycle: 260
Cycle: 280
Cycle: 300
Cycle: 320
Cycle: 340
Cycle: 360
Cycle: 380
Cycle: 400
Cycle: 420
Cycle: 440
Cycle: 460
Cycle: 480
Cycle: 500
modulegenes_biclust
## GeneSetCollection
##   names: Module 1 (1 total)
##   unique identifiers: FIGF, ABCC13, ..., ECE2 (27 total)
##   types in collection:
##     geneIdType: NullIdentifier (1 total)
##     collectionType: NullCollection (1 total)

3 Identification of miRNA sponge modules

The identified gene modules are regarded as candidate miRNA sponge modules. Based on the candidate miRNA sponge modules, we use the cannonical correlation (CC) [30] and sensitivity cannonical correlation (SCC) methods to identify miRNA sponge modules.

modulegenes_igraph <- module_igraph(ceRExp[, seq_len(10)], 
                                  mRExp[, seq_len(10)])
# Identify miRNA sponge modules using cannonical correlation (CC)
miRSM_igraph_CC <- miRSM(miRExp, ceRExp, mRExp, miRTarget, 
                        modulegenes_igraph, nperms = 5, 
                        num_shared_miRNAs = 3, pvalue.cutoff = 0.05, 
                        method = "CC", CC.cutoff = 0.8)
## 
##  Permutation  1  out of  5  12345678910
##  Permutation  2  out of  5  12345678910
##  Permutation  3  out of  5  12345678910
##  Permutation  4  out of  5  12345678910
##  Permutation  5  out of  5  12345678910
## 1234
miRSM_igraph_CC
## $`Cannonical correlation of miRNA sponge modules`
##              #miRNAs regulating ceRNAs 
##                           1.700000e+01 
##               #miRNAs regulating mRNAs 
##                           3.700000e+01 
##                         #Shared miRNAs 
##                           8.000000e+00 
##                     #Background miRNAs 
##                           2.260000e+02 
##         Sig. p.value of sharing miRNAs 
##                           2.008372e-03 
## Cannonical correlation of ceRNAs:mRNAs 
##                           8.428848e-01 
## 
## $`miRNA sponge modules`
## $`miRNA sponge modules`$`miRSM 1`
##  [1] "A2M-AS1"    "ABCA11P"    "ACVR2B-AS1" "ADCY10P1"   "C10orf10"  
##  [6] "C10orf54"   "C10orf90"   "C14orf180"  "C17orf51"   "EBF1"

4 Functional analysis of miRNA sponge modules

We implement ‘module_FA’ function to conduct functional analysis of miRNA sponge modules. The functional analysis includes two types: functional enrichment analysis (FEA) and disease enrichment analysis (DEA). Functional enrichment analysis includes GO, KEGG and Reactome enrichment analysis. The ontology databases used contain GO: Gene Ontology database (http://www.geneontology.org/), KEGG: Kyoto Encyclopedia of Genes and Genomes Pathway Database (http://www.genome.jp/kegg/), and Reactome: Reactome Pathway Database (http://reactome.org/). Disease enrichment analysis includes DO, DGN and NCG enrichment analysis. The disease databases used include DO: Disease Ontology database (http://disease-ontology.org/), DGN: DisGeNET database (http://www.disgenet.org/) and NCG: Network of Cancer Genes database (http://ncg.kcl.ac.uk/).

modulegenes_WGCNA <- module_WGCNA(ceRExp[, seq_len(150)], 
                                  mRExp[, seq_len(150)])
# Identify miRNA sponge modules using cannonical correlation (CC)
miRSM_WGCNA_CC <- miRSM(miRExp, ceRExp, mRExp, miRTarget, 
                        modulegenes_WGCNA, nperms = 5, 
                        method = "CC")
miRSM_WGCNA_CC_genes <- miRSM_WGCNA_CC[[2]]
miRSM_WGCNA_CC_FEA <- module_FA(miRSM_WGCNA_CC_genes,
                                Analysis.type ="FEA")
miRSM_WGCNA_CC_DEA <- module_FA(miRSM_WGCNA_CC_genes, 
                                Analysis.type = "DEA")

5 Conclusions

miRSM provides several functions to study miRNA sponge modules, including popular methods for inferring gene modules (candidate miRNA sponge modules), and a function to identify miRNA sponge modules, as well as a function to conduct functional analysis of miRNA sponge modules. It could provide a useful tool for the research of miRNA sponge modules.

6 References

[1] Ambros V. microRNAs: tiny regulators with great potential. Cell, 2001, 107:823–6.

[2] Bartel DP. MicroRNAs: genomics, biogenesis, mechanism, and function. Cell, 2004, 116:281–97.

[3] Du T, Zamore PD. Beginning to understand microRNA function. Cell Research, 2007, 17:661–3.

[4] Esquela-Kerscher A, Slack FJ. Oncomirs—microRNAs with a role in cancer. Nature Reviews Cancer, 2006, 6:259–69.

[5] Lin S, Gregory RI. MicroRNA biogenesis pathways in cancer. Nature Reviews Cancer, 2015, 15:321–33.

[6] Cesana M, Cacchiarelli D, Legnini I, et al. A long noncoding RNA controls muscle differentiation by functioning as a competing endogenous RNA. Cell, 2011, 147:358–69.

[7] Poliseno L, Salmena L, Zhang J, et al. A coding-independent function of gene and pseudogene mRNAs regulates tumour biology. Nature, 2010, 465:1033–8.

[8] Hansen TB, Jensen TI, Clausen BH, et al. Natural RNA circles function as efficient microRNA sponges. Nature, 2013, 495:384–8.

[9] Memczak S, Jens M, Elefsinioti A, et al. Circular RNAs are a large class of animal RNAs with regulatory potency. Nature, 2013, 495:333–8.

[10] Salmena L, Poliseno L, Tay Y, et al. A ceRNA hypothesis: the Rosetta Stone of a hidden RNA language? Cell, 2011, 146(3):353-8.

[11] Langfelder P, Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics, 2008, 9:559.

[12] Bunte K, Lepp"{a}aho E, Saarinen I, Kaski S. Sparse group factor analysis for biclustering of multiple data sources. Bioinformatics, 2016, 32(16):2457-63.

[13] Lepp"{a}aho E, Ammad-ud-din M, Kaski S. GFA: exploratory analysis of multiple data sources with group factor analysis. J Mach Learn Res., 2017, 18(39):1-5.

[14] Csardi G, Nepusz T. The igraph software package for complex network research, InterJournal, Complex Systems, 2006:1695.

[15] Clauset A, Newman ME, Moore C. Finding community structure in very large networks. Phys Rev E Stat Nonlin Soft Matter Phys., 2004, 70(6 Pt 2):066111.

[16] Enright AJ, Van Dongen S, Ouzounis CA. An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res., 2002, 30(7):1575-84.

[17] Kalinka AT, Tomancak P. linkcomm: an R package for the generation, visualization, and analysis of link communities in networks of arbitrary size and type. Bioinformatics, 2011, 27(14):2011-2.

[18] Bader GD, Hogue CW. An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinformatics, 2003, 4:2.

[19] Zhang Y, Phillips CA, Rogers GL, Baker EJ, Chesler EJ, Langston MA. On finding bicliques in bipartite graphs: a novel algorithm and its application to the integration of diverse biological data types. BMC Bioinformatics, 2014, 15:110.

[20] Gaujoux R, Seoighe C. A flexible R package for nonnegative matrix factorization. BMC Bioinformatics, 2010, 11:367.

[21] Kaiser S, Santamaria R, Khamiakova T, Sill M, Theron R, Quintales L, Leisch F, De TE. biclust: BiCluster Algorithms. R package version 1.2.0., 2015, https://CRAN.R-project.org/package=biclust.

[22] Wang Z, Li G, Robinson RW, Huang X. UniBic: Sequential row-based biclustering algorithm for analysis of gene expression data. Sci Rep., 2016, 6:23466.

[23] Gusenleitner D, Howe EA, Bentink S, Quackenbush J, Culhane AC. iBBiG: iterative binary bi-clustering of gene sets. Bioinformatics, 2012, 28(19):2484-92.

[24] Hochreiter S, Bodenhofer U, Heusel M, Mayr A, Mitterecker A, Kasim A, Khamiakova T, Van Sanden S, Lin D, Talloen W, Bijnens L, G"{o}hlmann HW, Shkedy Z, Clevert DA. FABIA: factor analysis for bicluster acquisition. Bioinformatics, 2010, 26(12):1520-7.

[25] Yang J, Wang H, Wang W, Yu, PS. An improved biclustering method for analyzing gene expression. Int J Artif Intell Tools, 2005, 14(5): 771-789.

[26] Bergmann S, Ihmels J, Barkai N. Iterative signature algorithm for the analysis of large-scale gene expression data. Phys Rev E Stat Nonlin Soft Matter Phys., 2003, 67(3 Pt 1):031902.

[27] Sill M, Kaiser S, Benner A, Kopp-Schneider A. Robust biclustering by sparse singular value decomposition incorporating stability selection. Bioinformatics, 2011, 27(15):2089-97.

[28] Rodriguez-Baena DS, Perez-Pulido AJ, Aguilar-Ruiz JS. A biclustering algorithm for extracting bit-patterns from binary datasets. Bioinformatics, 2011, 27(19):2738-45.

[29] Li G, Ma Q, Tang H, Paterson AH, Xu Y. QUBIC: a qualitative biclustering algorithm for analyses of gene expression data. Nucleic Acids Res., 2009, 37(15):e101.

[30] Witten DM, Tibshirani R, Hastie T. A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics, 2009, 10(3):515-34.

7 Session information

sessionInfo()
## R version 3.5.1 Patched (2018-07-12 r74967)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.5 LTS
## 
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.8-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.8-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] parallel  stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] NMF_0.21.0          cluster_2.0.7-1     rngtools_1.3.1     
##  [4] pkgmaker_0.27       registry_0.5        miRSM_1.0.0        
##  [7] bigmemory_4.5.33    Biobase_2.42.0      BiocGenerics_0.28.0
## [10] BiocStyle_2.10.0   
## 
## loaded via a namespace (and not attached):
##   [1] tidyselect_0.2.5            robust_0.4-18              
##   [3] RSQLite_2.1.1               AnnotationDbi_1.44.0       
##   [5] htmlwidgets_1.3             grid_3.5.1                 
##   [7] trimcluster_0.1-2.1         BiocParallel_1.16.0        
##   [9] Rtsne_0.13                  miRsponge_1.8.0            
##  [11] additivityTests_1.1-4       biclust_2.0.1              
##  [13] munsell_0.5.0               rqubic_1.28.0              
##  [15] codetools_0.2-15            preprocessCore_1.44.0      
##  [17] units_0.6-1                 withr_2.1.2                
##  [19] colorspace_1.3-2            GOSemSim_2.8.0             
##  [21] knitr_1.20                  rstudioapi_0.8             
##  [23] stats4_3.5.1                robustbase_0.93-3          
##  [25] DOSE_3.8.0                  urltools_1.7.1             
##  [27] GenomeInfoDbData_1.2.0      bit64_0.9-7                
##  [29] farver_1.0                  PMA_1.0.11                 
##  [31] rprojroot_1.3-2             runibic_1.4.0              
##  [33] xfun_0.4                    fastcluster_1.1.25         
##  [35] diptest_0.75-7              R6_2.3.0                   
##  [37] doParallel_1.0.14           GenomeInfoDb_1.18.0        
##  [39] flexmix_2.3-14              gridGraphics_0.3-0         
##  [41] bitops_1.0-6                fgsea_1.8.0                
##  [43] DelayedArray_0.8.0          assertthat_0.2.0           
##  [45] scales_1.0.0                ggraph_1.0.2               
##  [47] nnet_7.3-12                 enrichplot_1.2.0           
##  [49] gtable_0.2.0                WGCNA_1.66                 
##  [51] rlang_0.3.0.1               BiBitR_0.3.1               
##  [53] iBBiG_1.26.0                splines_3.5.1              
##  [55] lazyeval_0.2.1              acepack_1.4.1              
##  [57] impute_1.56.0               europepmc_0.3              
##  [59] checkmate_1.8.5             BiocManager_1.30.3         
##  [61] yaml_2.2.0                  reshape2_1.4.3             
##  [63] fabia_2.28.0                backports_1.1.2            
##  [65] qvalue_2.14.0               Hmisc_4.1-1                
##  [67] clusterProfiler_3.10.0      tools_3.5.1                
##  [69] bookdown_0.7                ggplotify_0.0.3            
##  [71] gridBase_0.4-7              ggplot2_3.1.0              
##  [73] RColorBrewer_1.1-2          dynamicTreeCut_1.63-1      
##  [75] ggridges_0.5.1              Rcpp_0.12.19               
##  [77] plyr_1.8.4                  base64enc_0.1-3            
##  [79] progress_1.2.0              zlibbioc_1.28.0            
##  [81] purrr_0.2.5                 RCurl_1.95-4.11            
##  [83] prettyunits_1.0.2           rpart_4.1-13               
##  [85] viridis_0.5.1               cowplot_0.9.3              
##  [87] S4Vectors_0.20.0            SummarizedExperiment_1.12.0
##  [89] ggrepel_0.8.0               magrittr_1.5               
##  [91] data.table_1.11.8           DO.db_2.9                  
##  [93] reactome.db_1.64.0          triebeard_0.3.0            
##  [95] BicARE_1.40.0               mvtnorm_1.0-8              
##  [97] whisker_0.3-2               matrixStats_0.54.0         
##  [99] flexclust_1.4-0             randomcoloR_1.1.0          
## [101] hms_0.4.2                   evaluate_0.12              
## [103] xtable_1.8-3                XML_3.98-1.16              
## [105] mclust_5.4.1                IRanges_2.16.0             
## [107] gridExtra_2.3               testthat_2.0.1             
## [109] compiler_3.5.1              tibble_1.4.2               
## [111] V8_1.5                      crayon_1.3.4               
## [113] GFA_1.0.3                   htmltools_0.3.6            
## [115] corpcor_1.6.9               pcaPP_1.9-73               
## [117] Formula_1.2-3               tidyr_0.8.2                
## [119] rrcov_1.4-4                 expm_0.999-3               
## [121] MCL_1.0                     ReactomePA_1.26.0          
## [123] DBI_1.0.0                   tweenr_1.0.0               
## [125] rappdirs_0.3.1              MASS_7.3-51                
## [127] fpc_2.1-11.1                Matrix_1.2-14              
## [129] ade4_1.7-13                 bindr_0.1.1                
## [131] igraph_1.2.2                GenomicRanges_1.34.0       
## [133] pkgconfig_2.0.2             bigmemory.sri_0.1.3        
## [135] fit.models_0.5-14           flashClust_1.01-2          
## [137] rvcheck_0.1.1               s4vd_1.1-1                 
## [139] foreign_0.8-71              xml2_1.2.0                 
## [141] foreach_1.4.4               annotate_1.60.0            
## [143] XVector_0.22.0              bibtex_0.4.2               
## [145] stringr_1.3.1               digest_0.6.18              
## [147] varhandle_2.0.3             graph_1.60.0               
## [149] rmarkdown_1.10              isa2_0.3.5                 
## [151] fastmatch_1.1-0             htmlTable_1.12             
## [153] dendextend_1.9.0            linkcomm_1.0-11            
## [155] GSEABase_1.44.0             curl_3.2                   
## [157] kernlab_0.9-27              graphite_1.28.0            
## [159] modeltools_0.2-22           jsonlite_1.5               
## [161] bindrcpp_0.2.2              viridisLite_0.3.0          
## [163] pillar_1.3.0                lattice_0.20-35            
## [165] httr_1.3.1                  DEoptimR_1.0-8             
## [167] survival_2.43-1             GO.db_3.7.0                
## [169] glue_1.3.0                  UpSetR_1.3.3               
## [171] prabclus_2.2-6              iterators_1.0.10           
## [173] bit_1.1-14                  ggforce_0.1.3              
## [175] class_7.3-14                stringi_1.2.4              
## [177] blob_1.1.1                  latticeExtra_0.6-28        
## [179] memoise_1.1.0               dplyr_0.7.7                
## [181] irlba_2.3.2