1 Introduction

tRNAscan-SE (Lowe and Eddy 1997) can be used for prediction of tRNA genes in whole genomes based on sequence context and calculated structural features. Many tRNA annotations in genomes, for example the current SGD reference genome sacCer3 of Saccharomyces cerevisiae, contain or are based on information generated by tRNAscan-SE. However, not all available information from tRNAscan-SE end up in the genome annotation. Among these are for example structural information, additional scores and the information, whether the conserved CCA-end is encoded in the genomic DNA. To work with this complete set of information, the tRNAscan-SE output can be parsed into a more accessible GRanges object using tRNAscanImport.

2 Getting started

The default tRNAscan-SE output, either from running tRNAscan-SE (Lowe and Eddy 1997) locally or retrieving the output from the gtRNADb (Chan and Lowe 2016), consist of a formatted text document containing individual text blocks per tRNA delimited by an empty line.

library(tRNAscanImport)
sacCer3_file <- system.file("extdata", 
                            file = "sacCer3-tRNAs.ss.sort", 
                            package = "tRNAscanImport")

# output for sacCer3
# Before
readLines(con = sacCer3_file, n = 7L)
## [1] "chrI.trna1 (139152-139254)\tLength: 103 bp"                                                                  
## [2] "Type: Pro\tAnticodon: TGG at 33-35 (139184-139186)\tScore: 62.1"                                             
## [3] "Possible intron: 37-67 (139188-139218)"                                                                      
## [4] "HMM Sc=37.90\tSec struct Sc=24.20"                                                                           
## [5] "         *    |    *    |    *    |    *    |    *    |    *    |    *    |    *    |    *    |    *    |  " 
## [6] "Seq: GGGCGTGTGGTCTAGTGGTATGATTCTCGCTTTGGGcgacttcctgattaaacaggaagacaaagcaTGCGAGAGGcCCTGGGTTCAATTCCCAGCTCGCCCC"
## [7] "Str: >>>>>.>..>>>.........<<<.>>>>>......................................<<<<<.....>>>>>.......<<<<<<.<<<<<."

3 Importing as GRanges

To access the information in a BioC context the import as a GRanges object comes to mind. import.tRNAscanAsGRanges() performs this task by evaluating each text block using regular expressions.

# output for sacCer3
# After
gr <- import.tRNAscanAsGRanges(sacCer3_file)
head(gr, 2)
## GRanges object with 2 ranges and 18 metadata columns:
##       seqnames        ranges strand |        no tRNA_length   tRNA_type
##          <Rle>     <IRanges>  <Rle> | <integer>   <integer> <character>
##   [1]     chrI 139152-139254      + |         1         103         Pro
##   [2]     chrI 166267-166339      + |         2          73         Ala
##       tRNA_anticodon tRNA_anticodon.start tRNA_anticodon.end tRNAscan_score
##          <character>            <integer>          <integer>      <numeric>
##   [1]            TGG                   33                 35           62.1
##   [2]            TGC                   34                 36             76
##                      tRNA_seq
##                <DNAStringSet>
##   [1] GGGCGTGTGG...AGCTCGCCCC
##   [2] GGGCACATGG...GTTGCGTCCA
##                                                                        tRNA_str
##                                                                     <character>
##   [1]  >>>>>.>..>>>.........<<<.>>>>>.......<<<<<.....>>>>>.......<<<<<<.<<<<<.
##   [2] >>>>.>>..>>>>........<<<<.>>>>>.......<<<<<.....>>>>>.......<<<<<<<.<<<<.
##       tRNA_CCA.end tRNAscan_potential.pseudogene tRNAscan_intron.start
##          <logical>                     <logical>             <integer>
##   [1]        FALSE                          TRUE                139188
##   [2]        FALSE                          TRUE                  <NA>
##       tRNAscan_intron.end tRNAscan_intron.locstart tRNAscan_intron.locend
##                 <integer>                <integer>              <integer>
##   [1]              139218                       37                     67
##   [2]                <NA>                     <NA>                   <NA>
##       tRNAscan_hmm.score tRNAscan_sec.str.score tRNAscan_infernal
##                <numeric>              <numeric>         <numeric>
##   [1]               37.9                   24.2              <NA>
##   [2]               53.4                   22.6              <NA>
##   -------
##   seqinfo: 17 sequences from an unspecified genome; no seqlengths

The result can be used directly in R or saved as gff3/fasta file for further use, including processing the sequences for HTS read mapping or statistical analysis on tRNA content of the analyzed genome.

library(Biostrings)
library(rtracklayer)
# suppressMessages(library(rtracklayer, quietly = TRUE))
# Save tRNA sequences
writeXStringSet(gr$tRNA_seq, filepath = tempfile())
# to be GFF3 compliant use tRNAscan2GFF
gff <- tRNAscan2GFF(gr)
export.gff3(gff, con = tempfile())

4 Visualization

The tRNAscan-SE information can be visualized using the gettRNAscanPlots() function, returning a named list of ggplot2 plots, which can be plotted or further modified. gettRNAscanPlots() requires ggplot2 to be installed. Alternatively, gettRNAscanSummary() returns the aggregated information for further use. plottRNAscan() plots the output of gettRNAscanPlots() directly to the output.

library(GenomicRanges)
# tRNAscan-SE output for hg38
hg38_file <- system.file("extdata", 
                         file = "hg38-tRNAs.ss.sort", 
                         package = "tRNAscanImport")
# tRNAscan-SE output for E. coli MG1655
eco_file <- system.file("extdata", 
                        file = "eschColi_K_12_MG1655-tRNAs.ss.sort", 
                        package = "tRNAscanImport")
# import tRNAscan-SE files
gr_hg <- import.tRNAscanAsGRanges(hg38_file)
gr_eco <- import.tRNAscanAsGRanges(eco_file)

# get summary plots if ggplot2 is installed
grl <- GRangesList(Sce = gr, 
                   Hsa = gr_hg, 
                   Eco = gr_eco)
plots <- gettRNAscanPlots(grl)
## Loading required namespace: ggplot2
plots$length
tRNA length.

Figure 1: tRNA length

plots$tRNAscan_score
tRNAscan-SE scores.

Figure 2: tRNAscan-SE scores

plots$gc
tRNA GC content.

Figure 3: tRNA GC content

plots$introns
tRNAs with introns.

Figure 4: tRNAs with introns

5 Session info

sessionInfo()
## R version 3.5.0 (2018-04-23)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.4 LTS
## 
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.7-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.7-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] parallel  stats4    stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] rtracklayer_1.40.1   Biostrings_2.48.0    XVector_0.20.0      
##  [4] tRNAscanImport_1.0.1 GenomicRanges_1.32.1 GenomeInfoDb_1.16.0 
##  [7] IRanges_2.14.3       S4Vectors_0.18.1     BiocGenerics_0.26.0 
## [10] BiocStyle_2.8.0     
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.16                lattice_0.20-35            
##  [3] assertive.properties_0.0-4  Rsamtools_1.32.0           
##  [5] assertive.types_0.0-3       assertive.data.us_0.0-1    
##  [7] rprojroot_1.3-2             digest_0.6.15              
##  [9] plyr_1.8.4                  backports_1.1.2            
## [11] evaluate_0.10.1             assertive.code_0.0-1       
## [13] highr_0.6                   ggplot2_2.2.1              
## [15] pillar_1.2.2                assertive.strings_0.0-3    
## [17] zlibbioc_1.26.0             rlang_0.2.0                
## [19] lazyeval_0.2.1              Matrix_1.2-14              
## [21] assertive_0.3-5             assertive.data_0.0-1       
## [23] rmarkdown_1.9               labeling_0.3               
## [25] BiocParallel_1.14.0         stringr_1.3.0              
## [27] RCurl_1.95-4.10             munsell_0.4.3              
## [29] DelayedArray_0.6.0          compiler_3.5.0             
## [31] xfun_0.1                    htmltools_0.3.6            
## [33] SummarizedExperiment_1.10.0 tibble_1.4.2               
## [35] GenomeInfoDbData_1.1.0      bookdown_0.7               
## [37] assertive.sets_0.0-3        codetools_0.2-15           
## [39] matrixStats_0.53.1          XML_3.98-1.11              
## [41] GenomicAlignments_1.16.0    bitops_1.0-6               
## [43] grid_3.5.0                  assertive.base_0.0-7       
## [45] gtable_0.2.0                magrittr_1.5               
## [47] assertive.models_0.0-1      scales_0.5.0               
## [49] stringi_1.2.2               reshape2_1.4.3             
## [51] assertive.matrices_0.0-1    assertive.reflection_0.0-4 
## [53] assertive.datetimes_0.0-2   RColorBrewer_1.1-2         
## [55] tools_3.5.0                 Biobase_2.40.0             
## [57] assertive.numbers_0.0-2     yaml_2.1.19                
## [59] colorspace_1.3-2            assertive.files_0.0-2      
## [61] assertive.data.uk_0.0-1     knitr_1.20

References

Chan, Patricia P., and Todd M. Lowe. 2016. “GtRNAdb 2.0: An Expanded Database of Transfer Rna Genes Identified in Complete and Draft Genomes.” Nucleic Acids Research 44 (D1):D184–9. https://doi.org/10.1093/nar/gkv1309.

Lowe, T. M., and S. R. Eddy. 1997. “TRNAscan-Se: A Program for Improved Detection of Transfer Rna Genes in Genomic Sequence.” Nucleic Acids Research 25 (5):955–64.