Contents

ScanMiRApp offers a shiny interface to the scanMiR package, as well as convenience function to simplify its use with common annotations.

1 ScanMiRAnno objects

Both the shiny app and the convenience functions rely on objects of the class ScanMiRAnno, which contain the different pieces of annotation relating to a species and genome build. Annotations for human (GRCh38), mouse (GRCm38) and rat (Rnor_6) can be obtained as follows:

library(scanMiRApp)
# anno <- ScanMiRAnno("Rnor_6")
# for this vignette, we'll work with a lightweight fake annotation:
anno <- ScanMiRAnno("fake")
anno
## Genome: /tmp/RtmpeGmwEy/filee65833c866540 
## Annotation: Fake falsus (fake1)
## Models: KdModelList of length 1

You can also build your own ScanMiRAnno object by providing the function with the different components (minimally, a BSgenome and an ensembldb object - see ?ScanMiRAnno for more information). For minimal functioning with the shiny app, the models slot additionally needs to be populated with a KdModelList (see the corresponding vignette of the scanMiR package for more information).

In addition, ScanMiRAnno objects can contain pre-compiled scans and aggregations, which are especially meant to speed up the shiny application. These should be saved as IndexedFst files and should be respectively indexed by transcript and by miRNA, and stored in the scan and aggregated slot of the object.



2 Convenience functions

2.1 Obtaining the UTR sequence of a transcript

The transcript (or UTR) sequence for any (set of) transcript(s) in the annotation can be obtained with:

seq <- getTranscriptSequence("ENSTFAKE0000056456", anno)
seq
## DNAStringSet object of length 1:
##     width seq                                               names               
## [1]   688 CGTATTAAATTTAGCAAGGTTCC...ACCTTCAGATTTCAGCAGACTAG ENSTFAKE0000056456

2.2 Plotting sites on the UTR sequence of a transcript

Binding sites of a given miRNA on a transcript can be visualized with:

plotSitesOnUTR(tx="ENSTFAKE0000056456", annotation=anno, miRNA="hsa-miR-155-5p")
## Prepare miRNA model
## Get Transcript Sequence
## Scan
## Warning in XStringSet(pattern_seqtype, pattern): metadata columns on input
## DNAStringSet object were dropped

This will fetch the sequence, perform the scan, and plot the results.

2.3 Running a full-transcriptome scan

The runFullScan function can be used to launch a the scan for all miRNAs on all protein-coding transcripts (or their UTRs) of a genome. These scans can then be used to speed up the shiny app (see below). They can simply be launched as:

m <- runFullScan(anno)
## Loading annotation
## Extracting transcripts
## Scanning with 1 thread(s)
## Warning in XStringSet(pattern_seqtype, pattern): metadata columns on input
## DNAStringSet object were dropped
m
## GRanges object with 2 ranges and 4 metadata columns:
##                 seqnames    ranges strand |     type    log_kd  p3.score  note
##                    <Rle> <IRanges>  <Rle> | <factor> <integer> <integer> <Rle>
##   [1] ENSTFAKE0000056456   281-288      * |  8mer        -4868        12 TDMD?
##   [2] ENSTFAKE0000056456   482-489      * |  7mer-m8     -3702         0     -
##   -------
##   seqinfo: 1 sequence from an unspecified genome; no seqlengths

Multi-threading can be enabled through the ncores argument. See ?runFullScan for more options.

2.4 Detecting enriched miRNA-target pairs

The enrichedMirTxPairs identifies miRNA-target enrichments (which could indicate sponge- or cargo-like behaviors) by means of a binomial model estimating the probability of the given number of binding sites for a given pair given the total number of bindings sites for the miRNA (across all transcripts) and transcript (across all miRNAs) in question. The output is a data.frame indicating, for each pair passing some lenient filtering, the transcript, miRNA, the number of 7mer/8mer sites, and the binomial log(p-value) of the combination. We strongly recommend further filtering this list by expression of the transcript in the system of interest, otherwise some transcripts with very low expression (and hence biologically irrelevant) might come up as strongly enriched.



3 Shiny app

The features of the shiny app are organized into two main components:

3.1 Setting up the application

A ScanMiRAnno object is the minimal input for the shiny app, and multiple such objects can be provided in the form of a named list:

scanMiRApp( list( nameOfAnnotation=anno ) )

Launched with this object, the app will not have access to any pre-compiled scans or to aggregated data. This means that scans will be performed on the fly, which also means that they will be slower. In addition, it means that the top targets based on aggregated repression estimates (in the miRNA-based tab) will not be available. To provide this additional information, you first need to prepare the objects as IndexedFst files. Assuming you’ve saved (or downloaded) the scans as scan.rds and the aggregated data as aggregated.rds, you can re-save them as IndexedFst (here in the folder out_path) and add them to the anno object as follows:

# not run
anno <- ScanMiRAnno("Rnor_6")
saveIndexedFst(readRDS("scan.rds"), "seqnames", file.prefix="out_path/scan")
saveIndexedFst(readRDS("aggregated.rds"), "miRNA", 
               file.prefix="out_path/aggregated")
anno$scan <- loadIndexedFst("out_path/scan")
anno$aggregated <- loadIndexedFst("out_path/aggregated")
# then launch the app
scanMiRApp(list(Rnor_6=anno))

The same could be done for multiple ScanMiRAnno objects. If scanMiRApp is launched without any annotation argument, it will generate anno objects for the three base species (without any pre-compiled data).

3.2 Multi-threading

Multithreading can be enabled in the shiny app by calling scanMiRApp() (or the underlying scanMiRserver()) with the BP argument, e.g.:

scanMiRApp(..., BP=BiocParallel::MulticoreParam(ncores))

where ncores is the number of threads to use. This will enable multi-threading for the scanning functions, which makes a big difference when scanning for many miRNAs at a time. In addition, multi-threading can be used to read the IndexedFst files, which is enabled by the nthreads of the loadIndexedFst function. However, since reading is quite fast already with a single core, improvements there are typically fairly marginal.

3.3 Caching

By default, the app has a caching system which means that if a user wants to launch the same scan with the same parameters twice, the results will be re-used instead of re-computed. The cache has a maximum size (by default 10MB) per user, beyond which older cache items will be removed. The cache size can be manipulated through the maxCacheSize argument.

Session info

## R version 4.2.0 RC (2022-04-19 r82224)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.4 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.15-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.15-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
## [1] GenomicRanges_1.48.0 GenomeInfoDb_1.32.0  IRanges_2.30.0      
## [4] S4Vectors_0.34.0     BiocGenerics_0.42.0  fstcore_0.9.12      
## [7] scanMiRApp_1.2.0     BiocStyle_2.24.0    
## 
## loaded via a namespace (and not attached):
##   [1] colorspace_2.0-3              rjson_0.2.21                 
##   [3] ellipsis_0.3.2                XVector_0.36.0               
##   [5] farver_2.1.0                  DT_0.22                      
##   [7] bit64_4.0.5                   interactiveDisplayBase_1.34.0
##   [9] AnnotationDbi_1.58.0          fansi_1.0.3                  
##  [11] scanMiR_1.2.0                 xml2_1.3.3                   
##  [13] cachem_1.0.6                  knitr_1.38                   
##  [15] jsonlite_1.8.0                Rsamtools_2.12.0             
##  [17] scanMiRData_1.1.0             dbplyr_2.1.1                 
##  [19] png_0.1-7                     shinydashboard_0.7.2         
##  [21] shiny_1.7.1                   BiocManager_1.30.17          
##  [23] compiler_4.2.0                httr_1.4.2                   
##  [25] assertthat_0.2.1              Matrix_1.4-1                 
##  [27] fastmap_1.1.0                 lazyeval_0.2.2               
##  [29] cli_3.3.0                     later_1.3.0                  
##  [31] htmltools_0.5.2               prettyunits_1.1.1            
##  [33] tools_4.2.0                   gtable_0.3.0                 
##  [35] glue_1.6.2                    GenomeInfoDbData_1.2.8       
##  [37] dplyr_1.0.8                   rappdirs_0.3.3               
##  [39] Rcpp_1.0.8.3                  Biobase_2.56.0               
##  [41] jquerylib_0.1.4               vctrs_0.4.1                  
##  [43] Biostrings_2.64.0             shinyjqui_0.4.1              
##  [45] rtracklayer_1.56.0            rintrojs_0.3.0               
##  [47] ggseqlogo_0.1                 xfun_0.30                    
##  [49] stringr_1.4.0                 mime_0.12                    
##  [51] lifecycle_1.0.1               restfulr_0.0.13              
##  [53] ensembldb_2.20.0              shinycssloaders_1.0.0        
##  [55] XML_3.99-0.9                  waiter_0.2.5                 
##  [57] AnnotationHub_3.4.0           zlibbioc_1.42.0              
##  [59] scales_1.2.0                  ProtGenerics_1.28.0          
##  [61] hms_1.1.1                     promises_1.2.0.1             
##  [63] MatrixGenerics_1.8.0          parallel_4.2.0               
##  [65] SummarizedExperiment_1.26.0   AnnotationFilter_1.20.0      
##  [67] yaml_2.3.5                    curl_4.3.2                   
##  [69] memoise_2.0.1                 ggplot2_3.3.5                
##  [71] sass_0.4.1                    biomaRt_2.52.0               
##  [73] stringi_1.7.6                 RSQLite_2.2.12               
##  [75] highr_0.9                     BiocVersion_3.15.2           
##  [77] BiocIO_1.6.0                  GenomicFeatures_1.48.0       
##  [79] filelock_1.0.2                BiocParallel_1.30.0          
##  [81] rlang_1.0.2                   pkgconfig_2.0.3              
##  [83] bitops_1.0-7                  matrixStats_0.62.0           
##  [85] evaluate_0.15                 lattice_0.20-45              
##  [87] purrr_0.3.4                   labeling_0.4.2               
##  [89] GenomicAlignments_1.32.0      htmlwidgets_1.5.4            
##  [91] cowplot_1.1.1                 bit_4.0.4                    
##  [93] tidyselect_1.1.2              magrittr_2.0.3               
##  [95] bookdown_0.26                 R6_2.5.1                     
##  [97] magick_2.7.3                  generics_0.1.2               
##  [99] DelayedArray_0.22.0           DBI_1.1.2                    
## [101] pillar_1.7.0                  KEGGREST_1.36.0              
## [103] RCurl_1.98-1.6                tibble_3.1.6                 
## [105] crayon_1.5.1                  utf8_1.2.2                   
## [107] plotly_4.10.0                 BiocFileCache_2.4.0          
## [109] rmarkdown_2.14                progress_1.2.2               
## [111] grid_4.2.0                    data.table_1.14.2            
## [113] blob_1.2.3                    digest_0.6.29                
## [115] xtable_1.8-4                  tidyr_1.2.0                  
## [117] httpuv_1.6.5                  fst_0.9.8                    
## [119] munsell_0.5.0                 viridisLite_0.4.0            
## [121] bslib_0.3.1