1 Introduction

The tRNA package allows tRNA sequences and structures to be accessed and used for subsetting. In addition, it provides visualization tools to compare feature parameters of multiple tRNA sets and correlate them to additional data. The package expects a GRanges object with certain columns as input. The following columns are a requirement: tRNA_length, tRNA_type, tRNA_anticodon, tRNA_seq, tRNA_str, tRNA_CCA.end.

2 Loading tRNA information

To work with the tRNA packages, tRNA information can be retrieved in a number of ways:

  1. A GRanges object can be constructed by hand containing the required colums (see above)
  2. a tRNAscan result file can be loaded using import.tRNAscanAsGRanges() from tRNAscanImport package
  3. selected tRNA information can be retrieved using import.tRNAdb() from tRNAdbImport package

For this vignette a predefined GRanges object is loaded.

library(tRNA)
data("gr", package = "tRNA", envir = environment())

3 tRNA sequences and structures

To retrieve the sequences for individual tRNA structure elements gettRNAstructureGRanges or gettRNAstructureSeqs can be used.

# just get the coordinates of the anticodonloop
gettRNAstructureGRanges(gr,
                        structure = "anticodonLoop")
## $anticodonLoop
## IRanges object with 299 ranges and 0 metadata columns:
##           start       end     width
##       <integer> <integer> <integer>
##   TGG        31        37         7
##   TGC        32        38         7
##   CAA        31        37         7
##   AGA        31        37         7
##   TAA        31        37         7
##   ...       ...       ...       ...
##   CAT        32        38         7
##   GAA        31        37         7
##   TTA        31        37         7
##   TAC        32        38         7
##   CAT        32        38         7
gettRNAstructureSeqs(gr,
                     joinFeatures = TRUE,
                     structure = "anticodonLoop")
## $anticodonLoop
##   A DNAStringSet instance of length 299
##       width seq                                          names               
##   [1]     7 TTTGGGT                                      TGG
##   [2]     7 CTTGCAA                                      TGC
##   [3]     7 TTCAAGC                                      CAA
##   [4]     7 TTAGAAA                                      AGA
##   [5]     7 CTTAAGA                                      TAA
##   ...   ... ...
## [295]     7 CTCATAA                                      CAT
## [296]     7 TTGAAGA                                      GAA
## [297]     7 TTTTAGT                                      TTA
## [298]     7 TTTACAC                                      TAC
## [299]     7 GTCATGA                                      CAT

In addition the sequences can be returned already joined to get a fully blank padded set of sequences. The boundaries of the individual structures is returned as metadata of the RNAStringSet object.

seqs <- gettRNAstructureSeqs(gr[1:10],
                             joinCompletely = TRUE)
seqs
##   A DNAStringSet instance of length 10
##      width seq
##  [1]    85 GGGCGTGTGGTC-TAGT-GGTAT-GATTCTCG...----GCCTGGGTTCAATTCCCAGCTCGCCCC
##  [2]    85 GGGCACATGGCGCAGTT-GGT-AGCGCGCTTC...----GCATCGGTTCGATTCCGGTTGCGTCCA
##  [3]    85 GGTTGTTTGGCC-GAGC-GGTAA-GGCGCCTG...-GATGCAAGAGTTCGAATCTCTTAGCAACCA
##  [4]    85 GGCAACTTGGCC-GAGT-GGTAA-GGCGAAAG...GCCCGCGCAGGTTCGAGTCCTGCAGTTGTCG
##  [5]    85 GGAGGGTTGGCC-GAGT-GGTAA-GGCGGCAG...GTCCGCGCGAGTTCGAACCTCGCATCCTTCA
##  [6]    85 GCGGATTTAGCTCAGTT-GGG-AGAGCGCCAG...----GCCTGTGTTCGATCCACAGAATTCGCA
##  [7]    85 GGTCTCTTGGCC-CAGTTGGTAA-GGCACCGT...----ACAGCGGTTCGATCCCGCTAGAGACCA
##  [8]    85 GCGCAAGTGGTTTAGT--GGT-AAAATCCAAC...-----CCCCGGTTCGATTCCGGGCTTGCGCA
##  [9]    85 GGCAACTTGGCC-GAGT-GGTAA-GGCGAAAG...GCCCGCGCAGGTTCGAGTCCTGCAGTTGTCG
## [10]    85 GCTTCTATGGCC-AAGTTGGTAA-GGCGCCAC...----ACATCGGTTCAAATCCGATTGGAAGCA
# getting the tRNA structure boundaries
metadata(seqs)[["tRNA_structures"]]
## IRanges object with 15 ranges and 0 metadata columns:
##                           start       end     width
##                       <integer> <integer> <integer>
##   acceptorStem.prime5         1         7         7
##               Dprime5         8         9         2
##          DStem.prime5        10        13         4
##                 Dloop        14        23        10
##          DStem.prime3        24        27         4
##                   ...       ...       ...       ...
##          TStem.prime5        61        65         5
##                 Tloop        66        72         7
##          TStem.prime3        73        77         5
##   acceptorStem.prime3        78        84         7
##         discriminator        85        85         1

Be aware, that gettRNAstructureGRanges and gettRNAstructureSeqs might not be working as expected, if the tRNA sequences in questions are armless or deviate a lot from the canonical tRNA model. It was thouroughly tested using human mitochondrial tRNA and other tRNAs missing certain features. However, for fringe cases this might not be the case. Please report these cases, if you find them, with an example.

4 Subsetting tRNA sequences

The structure information can be queried for subsetting using the function hasAccpeptorStem() as an example. Please have a look at?hasAccpeptorStem for all available subsetting functions.

gr[hasAcceptorStem(gr, unpaired = TRUE)]
# mismatches and bulged are subsets of unpaired
gr[hasAcceptorStem(gr, mismatches = TRUE)]
gr[hasAcceptorStem(gr, bulged = TRUE)]
# combination of different structure parameters
gr[hasAcceptorStem(gr, mismatches = TRUE) & 
     hasDloop(gr, length = 8)]

5 Visualization

To get an overview of tRNA feature and compare different datasets, gettRNAFeaturePlots can be used. It accepts a named GRangesList as input. Internally it will calculate a list of features values based on the functions mentioned above and the data contained in the mcols of the GRanges objects. The results can be retrieved without generating plots using gettRNASummary.

# load tRNA data for E. coli and H. sapiens
data("gr_eco", package = "tRNA", envir = environment())
data("gr_human", package = "tRNA", envir = environment())

# get summary plots
grl <- GRangesList(Sce = gr,
                   Hsa = gr_human,
                   Eco = gr_eco)
plots <- gettRNAFeaturePlots(grl)
plots$length
tRNA length.

Figure 1: tRNA length

plots$tRNAscan_score
tRNAscan-SE scores.

Figure 2: tRNAscan-SE scores

plots$gc
tRNA GC content.

Figure 3: tRNA GC content

plots$tRNAscan_intron
tRNAs with introns.

Figure 4: tRNAs with introns

plots$variableLoop_length
Length of the variable loop.

Figure 5: Length of the variable loop

To check whether features correlate with additional scores, optional arguments can be added to gettRNAFeaturePlots or used from the score column of the GRanges objects. For the first option provided a list of scores with the same dimensions as the GRangesList object as argument scores and for the latter set plotScore = TRUE.

# score column will be used
plots <- gettRNAFeaturePlots(grl,
                             plotScores = TRUE)
plots <- gettRNAFeaturePlots(grl,
                             scores = list(runif(length(grl[[1]]),0,100),
                                           runif(length(grl[[2]]),0,100),
                                           runif(length(grl[[3]]),0,100)))
plots$length
tRNA length and score correlation.

Figure 6: tRNA length and score correlation

plots$variableLoop_length
variable loop length and score correlation.

Figure 7: variable loop length and score correlation

The plots can be of course modified manually and changed to suit your needs.

plots$length$layers <- plots$length$layers[c(-1,-2)]
plots$length + ggplot2::geom_boxplot()
Customized plot switching out the point and violin plot into a boxplot.

Figure 8: Customized plot switching out the point and violin plot into a boxplot

6 Options

The colours of the plots can be customized directly on creation with the following options.

options("tRNA_colour_palette")
## $tRNA_colour_palette
## [1] "Set1"
options("tRNA_colour_yes")
## $tRNA_colour_yes
## [1] "green"
options("tRNA_colour_no")
## $tRNA_colour_no
## [1] "red"

7 Dot bracket annotation

To retrieve detailed information on the base pairing gettRNABasePairing() or getBasePairing() can be used depending on the input type. The DotBracket annotation is expected with pairs of <>{}[]() (Please note the orientation. For <> the orientation is variable, since the tRNAscan files use the >< annotation by default).

head(gettRNABasePairing(gr)[[1]])
head(getBasePairing(gr[1]$tRNA_str)[[1]])

The loop ids for the structure elements can be retrieved with gettRNALoopIDs() or getLoopIDs(). Please not that pseudoloops are currently not correctly reflected in the output of these functions.

gettRNALoopIDs(gr)[[1]]
##  [1]  1  2  3  4  5  5  6  6  6  7  8  9  9  9  9  9  9  9  9  9  9  9  8  7
## [25]  6 10 11 12 13 14 14 14 14 14 14 14 14 14 13 12 11 10  6  6  6  6 15 16
## [49] 17 18 19 19 19 19 19 19 19 19 19 18 17 16 15  6  5  5  4  3  2  1  0
getLoopIDs(gr[1]$tRNA_str)
## $`1`
##  [1]  1  2  3  4  5  5  6  6  6  7  8  9  9  9  9  9  9  9  9  9  9  9  8  7
## [25]  6 10 11 12 13 14 14 14 14 14 14 14 14 14 13 12 11 10  6  6  6  6 15 16
## [49] 17 18 19 19 19 19 19 19 19 19 19 18 17 16 15  6  5  5  4  3  2  1  0

8 Session info

sessionInfo()
## R version 3.5.1 Patched (2018-07-12 r74967)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.5 LTS
## 
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.8-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.8-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] parallel  stats4    stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
## [1] tRNA_1.0.0           GenomicRanges_1.34.0 GenomeInfoDb_1.18.0 
## [4] IRanges_2.16.0       S4Vectors_0.20.0     BiocGenerics_0.28.0 
## [7] BiocStyle_2.10.0    
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.19               assertive.properties_0.0-4
##  [3] assertive.types_0.0-3      assertive.data.us_0.0-2   
##  [5] Biostrings_2.50.0          assertthat_0.2.0          
##  [7] rprojroot_1.3-2            digest_0.6.18             
##  [9] R6_2.3.0                   plyr_1.8.4                
## [11] backports_1.1.2            evaluate_0.12             
## [13] assertive.code_0.0-3       ggplot2_3.1.0             
## [15] highr_0.7                  pillar_1.3.0              
## [17] assertive.strings_0.0-3    zlibbioc_1.28.0           
## [19] rlang_0.3.0.1              lazyeval_0.2.1            
## [21] assertive_0.3-5            assertive.data_0.0-1      
## [23] rmarkdown_1.10             labeling_0.3              
## [25] stringr_1.3.1              RCurl_1.95-4.11           
## [27] munsell_0.5.0              compiler_3.5.1            
## [29] xfun_0.4                   pkgconfig_2.0.2           
## [31] htmltools_0.3.6            tidyselect_0.2.5          
## [33] tibble_1.4.2               GenomeInfoDbData_1.2.0    
## [35] bookdown_0.7               assertive.sets_0.0-3      
## [37] codetools_0.2-15           crayon_1.3.4              
## [39] dplyr_0.7.7                withr_2.1.2               
## [41] bitops_1.0-6               grid_3.5.1                
## [43] jsonlite_1.5               assertive.base_0.0-7      
## [45] gtable_0.2.0               magrittr_1.5              
## [47] assertive.models_0.0-2     scales_1.0.0              
## [49] stringi_1.2.4              XVector_0.22.0            
## [51] assertive.matrices_0.0-1   bindrcpp_0.2.2            
## [53] assertive.reflection_0.0-4 assertive.datetimes_0.0-2 
## [55] RColorBrewer_1.1-2         tools_3.5.1               
## [57] glue_1.3.0                 purrr_0.2.5               
## [59] assertive.numbers_0.0-2    yaml_2.2.0                
## [61] colorspace_1.3-2           BiocManager_1.30.3        
## [63] assertive.files_0.0-2      assertive.data.uk_0.0-2   
## [65] knitr_1.20                 bindr_0.1.1