GRanges
tRNAdbImport 1.8.0
The tRNAdb and mttRNAdb (Jühling et al. 2009) is a compilation of tRNA sequences and tRNA genes. It is a follow up version of the database of Sprinzl et al. (Sprinzl and Vassilenko 2005).
Using tRNAdbImport
the tRNAdb can be accessed as outlined on the website
http://trna.bioinf.uni-leipzig.de/ and the results
are returned as a GRanges
object.
GRanges
library(tRNAdbImport)
# accessing tRNAdb
# tRNA from yeast for Alanine and Phenylalanine
gr <- import.tRNAdb(organism = "Saccharomyces cerevisiae",
aminoacids = c("Phe","Ala"))
# get a Phenylalanine tRNA from yeast
gr <- import.tRNAdb.id(tdbID = gr[gr$tRNA_type == "Phe",][1L]$tRNAdb_ID)
# find the same tRNA via blast
gr <- import.tRNAdb.blast(blastSeq = gr$tRNA_seq)
# accessing mtRNAdb
# get the mitochrondrial tRNA for Alanine in Bos taurus
gr <- import.mttRNAdb(organism = "Bos taurus",
aminoacids = "Ala")
# get one mitochrondrial tRNA in Bos taurus.
gr <- import.mttRNAdb.id(mtdbID = gr[1L]$tRNAdb_ID)
# check that the result has the appropriate columns
istRNAdbGRanges(gr)
## [1] TRUE
GRanges
from the RNA databaseThe tRNAdb offers two different sets of data, one containing DNA sequences and
one containing RNA sequences. Depending on the database selected, DNA
as
default, the GRanges will contain a DNAStringSet
or a ModRNAStringSet
as
the tRNA_seq
column. Because the RNA sequences can contain modified
nucleotides, the ModRNAStringSet
class is used instead of the RNAStringSet
class to store the sequences correctly with all information intact.
gr <- import.tRNAdb(organism = "Saccharomyces cerevisiae",
aminoacids = c("Phe","Ala"),
database = "RNA")
gr$tRNA_seq
## A ModRNAStringSet instance of length 3
## width seq names
## [1] 76 GGGCGUGUKGCGUAGDCGGDAGC...TPCGAUUCCGGACUCGUCCACCA tdbR00000012
## [2] 76 GCGGAUUUALCUCAGDDGGGAGA...TPCG"UCCACAGAAUUCGCACCA tdbR00000083
## [3] 76 GCGGACUUALCUCAGDDGGGAGA...TPCG"UCCACAGAGUUCGCACCA tdbR00000084
The special characters in the sequence might no exactly match the ones shown on
the website, since they are sanitized internally to a unified dictionary defined
in the Modstrings
package. However, the type of modification encoded will
remain the same (See the Modstrings
package for more details).
The information on the position and type of the modifications can also be
converted into a tabular format using the separate
function from the
Modstrings
package.
separate(gr$tRNA_seq)
## GRanges object with 38 ranges and 1 metadata column:
## seqnames ranges strand | mod
## <Rle> <IRanges> <Rle> | <character>
## [1] tdbR00000012 9 + | m1G
## [2] tdbR00000012 16 + | D
## [3] tdbR00000012 20 + | D
## [4] tdbR00000012 26 + | m2,2G
## [5] tdbR00000012 34 + | I
## ... ... ... ... . ...
## [34] tdbR00000084 46 + | m7G
## [35] tdbR00000084 49 + | f5Cm
## [36] tdbR00000084 54 + | m5U
## [37] tdbR00000084 55 + | Y
## [38] tdbR00000084 58 + | m1A
## -------
## seqinfo: 3 sequences from an unspecified genome; no seqlengths
The output can be saved or directly used for further analysis.
library(Biostrings)
library(rtracklayer)
# saving the tRAN sequences as fasta file
writeXStringSet(gr$tRNA_seq, filepath = tempfile())
# converting tRNAdb information to GFF compatible values
gff <- tRNAdb2GFF(gr)
gff
## GRanges object with 3 ranges and 20 metadata columns:
## seqnames ranges strand | source type score phase
## <Rle> <IRanges> <Rle> | <factor> <factor> <integer> <integer>
## [1] tdbR00000012 1-76 * | tRNAdb tRNA <NA> <NA>
## [2] tdbR00000083 1-76 * | tRNAdb tRNA <NA> <NA>
## [3] tdbR00000084 1-76 * | tRNAdb tRNA <NA> <NA>
## ID no tRNA_length tRNA_type tRNA_anticodon
## <character> <integer> <integer> <character> <character>
## [1] tdbR00000012 1 76 Ala IGC
## [2] tdbR00000083 2 76 Phe #AA
## [3] tdbR00000084 3 76 Phe #AA
## tRNA_seq tRNA_str tRNA_CCA.end tRNAdb
## <character> <character> <logical> <character>
## [1] GGGCGUGUKGCGUAGDCGGD.. <<<<<.<..<<<<......... TRUE RNA
## [2] GCGGAUUUALCUCAGDDGGG.. <<<<<<<..<<<<......... TRUE RNA
## [3] GCGGACUUALCUCAGDDGGG.. <<<<<<<..<<<<......... TRUE RNA
## tRNAdb_ID tRNAdb_organism tRNAdb_strain tRNAdb_taxonomyID
## <character> <character> <character> <character>
## [1] tdbR00000012 Saccharomyces cerevi.. 4932
## [2] tdbR00000083 Saccharomyces cerevi.. 4932
## [3] tdbR00000084 Saccharomyces cerevi.. 4932
## tRNAdb_verified tRNAdb_reference tRNAdb_pmid
## <logical> <CharacterList> <CharacterList>
## [1] TRUE J.R.PENSWICK, R.MART..
## [2] TRUE P.E.NIELSEN, V.LEICK..
## [3] TRUE G.KEITH, G.DIRHEIMER..
## -------
## seqinfo: 3 sequences from an unspecified genome; no seqlengths
# Saving the information as gff3 file
export.gff3(gff, con = tempfile())
Please have a look at the tRNA
package for further analysis of the tRNA
sequences.
sessionInfo()
## R version 4.0.3 (2020-10-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.5 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.12-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.12-bioc/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] parallel stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] rtracklayer_1.50.0 tRNAdbImport_1.8.0 tRNA_1.8.0
## [4] Structstrings_1.6.0 Modstrings_1.6.0 Biostrings_2.58.0
## [7] XVector_0.30.0 GenomicRanges_1.42.0 GenomeInfoDb_1.26.0
## [10] IRanges_2.24.0 S4Vectors_0.28.0 BiocGenerics_0.36.0
## [13] BiocStyle_2.18.0
##
## loaded via a namespace (and not attached):
## [1] SummarizedExperiment_1.20.0 tidyselect_1.1.0
## [3] xfun_0.18 purrr_0.3.4
## [5] lattice_0.20-41 colorspace_1.4-1
## [7] vctrs_0.3.4 generics_0.0.2
## [9] htmltools_0.5.0 yaml_2.2.1
## [11] XML_3.99-0.5 rlang_0.4.8
## [13] pillar_1.4.6 glue_1.4.2
## [15] BiocParallel_1.24.0 matrixStats_0.57.0
## [17] GenomeInfoDbData_1.2.4 lifecycle_0.2.0
## [19] stringr_1.4.0 zlibbioc_1.36.0
## [21] MatrixGenerics_1.2.0 munsell_0.5.0
## [23] gtable_0.3.0 evaluate_0.14
## [25] Biobase_2.50.0 knitr_1.30
## [27] curl_4.3 scales_1.1.1
## [29] BiocManager_1.30.10 DelayedArray_0.16.0
## [31] Rsamtools_2.6.0 ggplot2_3.3.2
## [33] digest_0.6.27 stringi_1.5.3
## [35] bookdown_0.21 dplyr_1.0.2
## [37] grid_4.0.3 tools_4.0.3
## [39] bitops_1.0-6 magrittr_1.5
## [41] RCurl_1.98-1.2 tibble_3.0.4
## [43] crayon_1.3.4 pkgconfig_2.0.3
## [45] Matrix_1.2-18 ellipsis_0.3.1
## [47] xml2_1.3.2 rmarkdown_2.5
## [49] httr_1.4.2 R6_2.4.1
## [51] GenomicAlignments_1.26.0 compiler_4.0.3
Jühling, Frank, Mario Mörl, Roland K. Hartmann, Mathias Sprinzl, Peter F. Stadler, and Joern Pütz. 2009. “TRNAdb 2009: Compilation of tRNA Sequences and tRNA Genes.” Nucleic Acids Research 37:D159–D162. https://doi.org/10.1093/nar/gkn772.
Sprinzl, Mathias, and Konstantin S. Vassilenko. 2005. “Compilation of tRNA Sequences and Sequences of tRNA Genes.” Nucleic Acids Research 33:D139–D140. https://doi.org/10.1093/nar/gki012.