Contents

1 12-mer dissociation rates

McGeary, Lin et al. (2019) used RNA bind-n-seq (RBNS) to empirically determine the affinities (i.e. dissoiation rates) of selected miRNAs towards random 12-nucleotide sequences (termed 12-mers). As expected, bound sequences typically exhibited complementarity to the miRNA seed region (positions 2-8 from the miRNA’s 5’ end), but the study also revealed non-canonical bindings and the importance of flanking di-nucleotides. Based on these data, the authors developed a model which predicted 12-mer dissociation rates (KD) based on the miRNA sequence. ScanMiR encodes a compressed version of these prediction in the form of a KdModel object.

The 12-mer is defined as the 8 nucleotides opposite the miRNA’s extended seed region plus flanking dinucleotides on either side:

## Warning in knitr::include_graphics(system.file("docs", "12mer.png", package =
## "scanMiR")): It is highly recommended to use relative paths for images. You had
## absolute paths: "/tmp/RtmpMATF3z/Rinste54c68cda8d0/scanMiR/docs/12mer.png"

2 KdModels

The KdModel class contains the information concerning the sequence (12-mer) affinity of a given miRNA, and is meant to compress and make easily manipulable the dissociation constants (Kd) predictions from McGeary, Lin et al. (2019).

We can take a look at the example KdModel:

library(scanMiR)
data(SampleKdModel)
SampleKdModel
## A `KdModel` for hsa-miR-155-5p (Conserved across mammals)
##   Sequence: UUAAUGCUAAUCGUGAUAGGGGUU
##   Canonical target seed: AGCATTA(A)

In addition to the information necessary to predict the binding affinity to any given 12-mer sequence, the model contains, minimally, the name and sequence of the miRNA. Since the KdModel class extends the list class, any further information can be stored:

SampleKdModel$myVariable <- "test"

An overview of the binding affinities can be obtained with the following plot:

plotKdModel(SampleKdModel, what="seeds")

The plot gives the -log(Kd) values of the top 7-mers (including both canonical and non-canonical sites), with or without the final “A” vis-à-vis the first miRNA nucleotide.

To predict the dissociation constant (and binding type, if any) of a given 12-mer sequence, you can use the assignKdType function:

assignKdType("ACGTACGTACGT", SampleKdModel)
##            type log_kd
## 1 non-canonical      0
# or using multiple sequences:
assignKdType(c("CTAGCATTAAGT","ACGTACGTACGT"), SampleKdModel)
##            type log_kd
## 1          8mer  -5129
## 2 non-canonical      0

The log_kd column contains log(Kd) values multiplied by 1000 and stored as an integer (which is more economical when dealing with millions of sites). In the example above, -5129 means -5.129, or a dissociation constant of 0.0059225. The smaller the values, the stronger the relative affinity.

2.1 KdModelLists

A KdModelList object is simply a collection of KdModel objects. We can build one in the following way:

# we create a copy of the KdModel, and give it a different name:
mod2 <- SampleKdModel
mod2$name <- "dummy-miRNA"
kml <- KdModelList(SampleKdModel, mod2)
kml
## An object of class "KdModelList"
## [[1]]
## A `KdModel` for hsa-miR-155-5p (Conserved across mammals)
##   Sequence: UUAAUGCUAAUCGUGAUAGGGGUU
##   Canonical target seed: AGCATTA(A)
## [[2]]
## A `KdModel` for dummy-miRNA (Conserved across mammals)
##   Sequence: UUAAUGCUAAUCGUGAUAGGGGUU
##   Canonical target seed: AGCATTA(A)
summary(kml)
## A `KdModelList` object containing binding affinity models from 2 miRNAs.
## 
##               Low-confidence             Poorly conserved 
##                            0                            0 
##     Conserved across mammals Conserved across vertebrates 
##                            2                            0

Beyond operations typically performed on a list (e.g. subsetting), some specific slots of the respective KdModels can be accessed, for example:

conservation(kml)
##           hsa-miR-155-5p              dummy-miRNA 
## Conserved across mammals Conserved across mammals 
## 4 Levels: Low-confidence Poorly conserved ... Conserved across vertebrates

3 Creating a KdModel object

KdModel objects are meant to be created from a table assigning a log_kd values to 12-mer target sequences, as produced by the CNN from McGeary, Lin et al. (2019). For the purpose of example, we create such a dummy table:

kd <- dummyKdData()
head(kd)
##         X12mer log_kd
## 1 AAAGCAAAAAAA -0.428
## 2 CAAGCACAAACA -0.404
## 3 GAAGCAGAAAGA -0.153
## 4 TAAGCATAAATA -1.375
## 5 ACAGCAACAAAC -0.448
## 6 CCAGCACCAACC -0.274

A KdModel object can then be created with:

mod3 <- getKdModel(kd=kd, mirseq="TTAATGCTAATCGTGATAGGGGTT", name = "my-miRNA")

Alternatively, the kd argument can also be the path to the output file of the CNN (and if mirseq and name are in the table, they can be omitted).

4 Common KdModel collections

The scanMiRData package contains KdModel collections corresponding to all human, mouse and rat mirbase miRNAs.

5 Under the hood

When calling getKdModel, the dissociation constants are stored as an lightweight overfitted linear model, with base KDs coefficients (stored as integers in object$mer8) for each 1024 partially-matching 8-mers (i.e. at least 4 consecutive matching nucleotides) to which are added 8-mer-specific coefficients (stored in object$fl) that are multiplied with a flanking score generated by the flanking di-nucleotides. The flanking score is calculated based on the di-nucleotide effects experimentally measured by McGeary, Lin et al. (2019). To save space, the actual 8-mer sequences are not stored but generated when needed in a deterministic fashion. The 8-mers can be obtained, in the right order, with the getSeed8mers function.



Session info

## R version 4.2.0 RC (2022-04-19 r82224)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.4 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.15-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.15-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] scanMiR_1.2.0    BiocStyle_2.24.0
## 
## loaded via a namespace (and not attached):
##  [1] tidyselect_1.1.2       xfun_0.30              bslib_0.3.1           
##  [4] purrr_0.3.4            colorspace_2.0-3       vctrs_0.4.1           
##  [7] generics_0.1.2         htmltools_0.5.2        stats4_4.2.0          
## [10] yaml_2.3.5             utf8_1.2.2             rlang_1.0.2           
## [13] jquerylib_0.1.4        pillar_1.7.0           glue_1.6.2            
## [16] DBI_1.1.2              BiocParallel_1.30.0    BiocGenerics_0.42.0   
## [19] GenomeInfoDbData_1.2.8 lifecycle_1.0.1        ggseqlogo_0.1         
## [22] stringr_1.4.0          zlibbioc_1.42.0        Biostrings_2.64.0     
## [25] munsell_0.5.0          gtable_0.3.0           evaluate_0.15         
## [28] labeling_0.4.2         knitr_1.38             IRanges_2.30.0        
## [31] fastmap_1.1.0          GenomeInfoDb_1.32.0    parallel_4.2.0        
## [34] fansi_1.0.3            highr_0.9              Rcpp_1.0.8.3          
## [37] scales_1.2.0           BiocManager_1.30.17    S4Vectors_0.34.0      
## [40] magick_2.7.3           jsonlite_1.8.0         XVector_0.36.0        
## [43] farver_2.1.0           ggplot2_3.3.5          digest_0.6.29         
## [46] stringi_1.7.6          bookdown_0.26          dplyr_1.0.8           
## [49] cowplot_1.1.1          GenomicRanges_1.48.0   grid_4.2.0            
## [52] cli_3.3.0              tools_4.2.0            bitops_1.0-7          
## [55] magrittr_2.0.3         sass_0.4.1             RCurl_1.98-1.6        
## [58] tibble_3.1.6           crayon_1.5.1           pkgconfig_2.0.3       
## [61] ellipsis_0.3.2         data.table_1.14.2      assertthat_0.2.1      
## [64] rmarkdown_2.14         R6_2.5.1               compiler_4.2.0