tRNAscanImport 1.6.0
tRNAscan-SE (Lowe and Eddy 1997) can be used for prediction of tRNA genes in whole
genomes based on sequence context and calculated structural features. Many tRNA
annotations in genomes contain or are based on information generated by
tRNAscan-SE, for example the current SGD reference genome sacCer3 for
Saccharomyces cerevisiae. However, not all available information from
tRNAscan-SE end up in the genome annotation. Among these are for example
structural information, additional scores and the information, whether the
conserved CCA-end is encoded in the genomic DNA. To work with this complete set
of information, the tRNAscan-SE output can be parsed into a more accessible
GRanges object using tRNAscanImport
.
The default tRNAscan-SE output, either from running tRNAscan-SE (Lowe and Eddy 1997) locally or retrieving the output from the gtRNADb (Chan and Lowe 2016), consist of a formatted text document containing individual text blocks per tRNA delimited by an empty line.
library(tRNAscanImport)
yeast_file <- system.file("extdata",
file = "yeast.tRNAscan",
package = "tRNAscanImport")
# output for sacCer3
# Before
readLines(con = yeast_file, n = 7L)
## [1] "chrI.trna1 (139152-139254)\tLength: 103 bp"
## [2] "Type: Pro\tAnticodon: TGG at 33-35 (139184-139186)\tScore: 62.1"
## [3] "Possible intron: 37-67 (139188-139218)"
## [4] "HMM Sc=37.90\tSec struct Sc=24.20"
## [5] " * | * | * | * | * | * | * | * | * | * | "
## [6] "Seq: GGGCGTGTGGTCTAGTGGTATGATTCTCGCTTTGGGcgacttcctgattaaacaggaagacaaagcaTGCGAGAGGcCCTGGGTTCAATTCCCAGCTCGCCCC"
## [7] "Str: >>>>>.>..>>>.........<<<.>>>>>......................................<<<<<.....>>>>>.......<<<<<<.<<<<<."
To access the information in a BioC context the import as a GRanges object
comes to mind. import.tRNAscanAsGRanges()
performs this task by evaluating
each text block using regular expressions.
# output for sacCer3
# After
gr <- import.tRNAscanAsGRanges(yeast_file)
head(gr, 2)
## GRanges object with 2 ranges and 18 metadata columns:
## seqnames ranges strand | no tRNA_length tRNA_type
## <Rle> <IRanges> <Rle> | <integer> <integer> <character>
## [1] chrI 139152-139254 + | 1 72 Pro
## [2] chrI 166267-166339 + | 2 73 Ala
## tRNA_anticodon tRNA_anticodon.start tRNA_anticodon.end tRNAscan_score
## <character> <integer> <integer> <numeric>
## [1] TGG 33 35 62.1
## [2] TGC 34 36 76
## tRNA_seq tRNA_str tRNA_CCA.end
## <DNAStringSet> <DotBracketStringSet> <logical>
## [1] GGGCGTGTGG...AGCTCGCCCC <<<<<.<..<...>>>.>>>>>. FALSE
## [2] GGGCACATGG...GTTGCGTCCA <<<<.<<..<...>>>>.>>>>. FALSE
## tRNAscan_potential.pseudogene tRNAscan_intron.start
## <logical> <integer>
## [1] FALSE 139188
## [2] FALSE <NA>
## tRNAscan_intron.end tRNAscan_intron.locstart tRNAscan_intron.locend
## <integer> <integer> <integer>
## [1] 139218 37 67
## [2] <NA> <NA> <NA>
## tRNAscan_hmm.score tRNAscan_sec.str.score tRNAscan_infernal
## <numeric> <numeric> <numeric>
## [1] 37.9 24.2 <NA>
## [2] 53.4 22.6 <NA>
## -------
## seqinfo: 17 sequences from an unspecified genome; no seqlengths
# Any GRanges passing this, can be used for subsequent function
istRNAscanGRanges(gr)
## [1] TRUE
The result can be used directly in R or saved as gff3/fasta file for further use, including processing the sequences for HTS read mapping or statistical analysis on tRNA content of the analyzed genome.
library(Biostrings)
library(rtracklayer)
# suppressMessages(library(rtracklayer, quietly = TRUE))
# Save tRNA sequences
writeXStringSet(gr$tRNA_seq, filepath = tempfile())
# to be GFF3 compliant use tRNAscan2GFF
gff <- tRNAscan2GFF(gr)
export.gff3(gff, con = tempfile())
The tRNAscan-SE information can be visualized using the gettRNAFeaturePlots()
function of the tRNA
package, returning a named list of ggplot2 plots, which
can be plotted or further modified.
Alternatively, gettRNASummary()
returns the aggregated information for
further use.
# tRNAscan-SE output for hg38
human_file <- system.file("extdata",
file = "human.tRNAscan",
package = "tRNAscanImport")
# tRNAscan-SE output for E. coli MG1655
eco_file <- system.file("extdata",
file = "ecoli.tRNAscan",
package = "tRNAscanImport")
# import tRNAscan-SE files
gr_human <- import.tRNAscanAsGRanges(human_file)
gr_eco <- import.tRNAscanAsGRanges(eco_file)
# get summary plots
grl <- GRangesList(Sce = gr,
Hsa = gr_human,
Eco = gr_eco)
plots <- gettRNAFeaturePlots(grl)
plots$length
plots$tRNAscan_score
plots$gc
plots$tRNAscan_intron
plots$variableLoop_length
Since tRNAscan reports the genomic location for tRNAs found, approximate tRNA
precursor sequences can be retrieved by combining a tRNAscan input object with
matching genomic sequences for the function get.tRNAprecursor
.
library(BSgenome.Scerevisiae.UCSC.sacCer3)
genome <- getSeq(BSgenome.Scerevisiae.UCSC.sacCer3)
# renaming chromosome to match tRNAscan output
names(genome) <- c(names(genome)[-17],"chrmt")
tRNAprecursor <- get.tRNAprecursor(gr, genome)
## Warning in lengths(comp[strandM]) - add.5prime: longer object length is not a
## multiple of shorter object length
## Warning in recycleSingleBracketReplacementValue(value, x, nsbs): number of
## values supplied is not a sub-multiple of the number of values to be replaced
head(tRNAprecursor)
## A DNAStringSet instance of length 6
## width seq names
## [1] 203 CAATTTGTATATATATACATCT...TTAAAGTAGCAGTACTTCAAC pre_chrI.tRNA1
## [2] 173 AGCTTCTAAGCACTTACCATTC...TTCGTGAATAGCTGACTGTCA pre_chrI.tRNA2
## [3] 214 GTCAGTGTCCAAATAGTTAAAA...TAATCTACGTAGGAATGAAAG pre_chrI.tRNA3
## [4] 182 GTCATACTGACATATCTCATTT...TCTTCAAAGCATACTCATCTT pre_chrI.tRNA4
## [5] 184 GGGTAAAATAGGGTATTTAACT...TAACTAGAATAATAGGGAAAT pre_chrII.tRNA1
## [6] 191 TTTGCTAATAATAAATCTATTT...CATTTCTAGGCCTGTTTCTCC pre_chrII.tRNA2
The length of the overhangs can be defined with the arguments add.5prime
and
add.3prime
, respectively. Both support individual lengths for each tRNA and
require values to be integer only. In addition, introns can be removed by
setting trim.introns = TRUE
.
Further examples of working with tRNA information can be found in the
vignette of the tRNA
package.
sessionInfo()
## R version 3.6.1 (2019-07-05)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.3 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.10-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.10-bioc/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] parallel stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] BSgenome.Scerevisiae.UCSC.sacCer3_1.4.0
## [2] BSgenome_1.54.0
## [3] rtracklayer_1.46.0
## [4] tRNAscanImport_1.6.0
## [5] tRNA_1.4.0
## [6] Structstrings_1.2.0
## [7] Biostrings_2.54.0
## [8] XVector_0.26.0
## [9] GenomicRanges_1.38.0
## [10] GenomeInfoDb_1.22.0
## [11] IRanges_2.20.0
## [12] S4Vectors_0.24.0
## [13] BiocGenerics_0.32.0
## [14] BiocStyle_2.14.0
##
## loaded via a namespace (and not attached):
## [1] Biobase_2.46.0 assertive.sets_0.0-3
## [3] assertthat_0.2.1 assertive.data_0.0-3
## [5] BiocManager_1.30.9 highr_0.8
## [7] GenomeInfoDbData_1.2.2 Rsamtools_2.2.0
## [9] yaml_2.2.0 pillar_1.4.2
## [11] lattice_0.20-38 glue_1.3.1
## [13] assertive.data.uk_0.0-2 assertive.matrices_0.0-2
## [15] digest_0.6.22 assertive.types_0.0-3
## [17] RColorBrewer_1.1-2 colorspace_1.4-1
## [19] htmltools_0.4.0 Matrix_1.2-17
## [21] XML_3.98-1.20 pkgconfig_2.0.3
## [23] assertive.data.us_0.0-2 assertive.properties_0.0-4
## [25] assertive.reflection_0.0-4 bookdown_0.14
## [27] zlibbioc_1.32.0 purrr_0.3.3
## [29] scales_1.0.0 assertive_0.3-5
## [31] BiocParallel_1.20.0 tibble_2.1.3
## [33] ggplot2_3.2.1 withr_2.1.2
## [35] SummarizedExperiment_1.16.0 assertive.code_0.0-3
## [37] lazyeval_0.2.2 magrittr_1.5
## [39] crayon_1.3.4 assertive.strings_0.0-3
## [41] evaluate_0.14 Modstrings_1.2.0
## [43] assertive.numbers_0.0-2 tools_3.6.1
## [45] matrixStats_0.55.0 assertive.files_0.0-2
## [47] stringr_1.4.0 munsell_0.5.0
## [49] DelayedArray_0.12.0 compiler_3.6.1
## [51] rlang_0.4.1 grid_3.6.1
## [53] RCurl_1.95-4.12 assertive.models_0.0-2
## [55] assertive.base_0.0-7 bitops_1.0-6
## [57] labeling_0.3 rmarkdown_1.16
## [59] gtable_0.3.0 codetools_0.2-16
## [61] assertive.datetimes_0.0-2 R6_2.4.0
## [63] GenomicAlignments_1.22.0 knitr_1.25
## [65] dplyr_0.8.3 stringi_1.4.3
## [67] Rcpp_1.0.2 tidyselect_0.2.5
## [69] xfun_0.10
Chan, Patricia P., and Todd M. Lowe. 2016. “GtRNAdb 2.0: An Expanded Database of Transfer Rna Genes Identified in Complete and Draft Genomes.” Nucleic Acids Research 44 (D1):D184–9. https://doi.org/10.1093/nar/gkv1309.
Lowe, T. M., and S. R. Eddy. 1997. “TRNAscan-Se: A Program for Improved Detection of Transfer Rna Genes in Genomic Sequence.” Nucleic Acids Research 25 (5):955–64.