atenapackage provides methods to quantify the expression of transposable elements within R and Bioconductor.
Transposable elements (TEs) are autonomous mobile genetic elements. They are DNA sequences that have, or once had, the ability to mobilize within the genome either directly or through an RNA intermediate (Payer and Burns 2019). TEs can be categorized into two classes based on the intermediate substrate propagating insertions (RNA or DNA). Class I TEs, also called retrotransposons, first transcribe an RNA copy that is then reverse transcribed to cDNA before inserting in the genome. In turn, these can be divided into long terminal repeat (LTR) retrotransposons, which refer to endogenous retroviruses (ERVs), and non-LTR retrotransposons, which include long interspersed element class 1 (LINE-1 or L1) and short interspersed elements (SINEs). Class II TEs, also known as DNA transposons, directly excise themselves from one location before reinsertion. TEs are further split into families and subfamilies depending on various structural features (Goerner-Potvin and Bourque 2018; Guffanti et al. 2018).
Most TEs have lost the capacity for generating new insertions over their evolutionary history and are now fixed in the human population. Their insertions have resulted in a complex distribution of interspersed repeats comprising almost half (50%) of the human genome (Payer and Burns 2019).
TE expression has been observed in association with physiological processes in a wide range of species, including humans where it has been described to be important in early embryonic pluripotency and development. Moreover, aberrant TE expression has been associated with diseases such as cancer, neurodegenerative disorders, and infertility (Payer and Burns 2019).
The study of TE expression faces one main challenge: given their repetitive nature, the majority of TE-derived reads map to multiple regions of the genome and these multi-mapping reads are consequently discarded in standard RNA-seq data processing pipelines. For this reason, specific software packages for the quantification of TE expression have been developed (Goerner-Potvin and Bourque 2018), such as TEtranscripts (Jin et al. 2015), ERVmap (Tokuyama et al. 2018) and Telescope (Bendall et al. 2019). The main differences between these three methods are the following:
TEtranscripts (Jin et al. 2015) reassigns multi-mapping reads to TEs proportionally to their relative abundance, which is estimated using an expectation-maximization (EM) algorithm.
ERVmap (Tokuyama et al. 2018) is based on selective filtering of multi-mapping reads. It applies filters that consist in discarding reads when the ratio of sum of hard and soft clipping to the length of the read (base pair) is greater than or equal to 0.02, the ratio of the edit distance to the sequence read length (base pair) is greater or equal to 0.02 and/or the difference between the alignment score from BWA (field AS) and the suboptimal alignment score from BWA (field XS) is less than 5.
Telescope (Bendall et al. 2019) reassigns multi-mapping reads to TEs using their relative abundance, which like in TEtranscripts, is also estimated using an EM algorithm. The main differences with respect to TEtranscripts are: (1) Telescope works with an additional parameter for each TE that estimates the proportion of multi-mapping reads that need to be reassigned to that TE; (2) that reassignment parameter is optimized during the EM algorithm jointly with the TE relative abundances, using a Bayesian maximum a posteriori (MAP) estimate that allows one to use prior values on these two parameters; and (3) using the final estimates on these two parameters, multi-mapping reads can be flexibly reassigned to TEs using different strategies, where the default one is to assign a multi-mapping read to the TE with largest estimated abundance and discard those multi-mapping reads with ties on those largest abundances.
Because these tools were only available outside R and Bioconductor, the
atena package provides a complete re-implementation in R of these three methods to facilitate the integration of TE expression quantification into Bioconductor workflows for the analysis of RNA-seq data.
Another challenge in TE expression quantification is the lack of complete TE annotations due to the difficulty to correctly place TEs in genome assemblies (Goerner-Potvin and Bourque 2018). The gold standard for TE annotations are RepeatMasker annotations, available through the RepeatMasker track in UCSC Genome Browser.
atena can fetch RepeatMasker annotations with the function
annotaTEs(). Moreover, this function can parse TE annotations by applying
parsefun. Examples of
parsefun included in
For example, let’s retrieve TE annotations for Drosophila melanogaster dm6 genome version and process them using the
rmskbasicparser() function, which assigns a unique identifier for each TE, as well as discards repeats present in the annotations that are not TEs.
library(atena) library(GenomicRanges) rmsk <- annotaTEs(genome = "dm6", parsefun = rmskbasicparser) rmsk GRanges object with 38384 ranges and 4 metadata columns: seqnames ranges strand | swScore <Rle> <IRanges> <Rle> | <integer> IDEFIX_LTR_dup1 chr2L 9726-9859 + | 285 DNAREP1_DM_dup1 chr2L 24236-24484 + | 1198 LINEJ1_DM_dup1 chr2L 47514-52519 + | 43674 DNAREP1_DM_dup2 chr2L 60251-60631 + | 1728 DNAREP1_DM_dup3 chr2L 60656-60786 + | 518 ... ... ... ... . ... ROVER-I_DM_dup184 chrUn_DS486004v1 1-98 - | 897 ROVER-LTR_DM_dup132 chrUn_DS486004v1 99-466 - | 3224 NOMAD_LTR_dup90 chrUn_DS486008v1 1-488 + | 4554 ACCORD_LTR_dup32 chrUn_DS486008v1 489-717 - | 2107 DMRT1A_dup201 chrUn_DS486008v1 717-1001 - | 2651 repName repClass repFamily <character> <character> <character> IDEFIX_LTR_dup1 IDEFIX_LTR LTR Gypsy DNAREP1_DM_dup1 DNAREP1_DM RC Helitron LINEJ1_DM_dup1 LINEJ1_DM LINE Jockey DNAREP1_DM_dup2 DNAREP1_DM RC Helitron DNAREP1_DM_dup3 DNAREP1_DM RC Helitron ... ... ... ... ROVER-I_DM_dup184 ROVER-I_DM LTR Gypsy ROVER-LTR_DM_dup132 ROVER-LTR_DM LTR Gypsy NOMAD_LTR_dup90 NOMAD_LTR LTR Gypsy ACCORD_LTR_dup32 ACCORD_LTR LTR Gypsy DMRT1A_dup201 DMRT1A LINE R1 ------- seqinfo: 1870 sequences (1 circular) from dm6 genome
Quantification of TE expression with
atena consists in the following two
Building of a parameter object for one of the available quantification methods.
Calling the TE expression quantification method
qtex() using the previously
built parameter object.
The dataset that will be used to illustrate how to quantify TE expression with
atena is a published RNA-seq dataset of Drosophila melanogaster available at the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (accession no. GSE47006). The two selected samples are: a piwi knockdown and a piwi control (GSM1142845 and GSM1142844). These files have been subsampled. The piwi-associated silencing complex (piRISC) silences TEs in the Drosophila ovary, thus, the knockdown of piwi causes the de-repression of TEs. To minimize the computational cost and memory requirements when building this vignette, the annotations used in following examples consist of 28 and 50 highly expressed TEs and genes, respectively, from Drosophila melanogaster.
To use the ERVmap method in
atena we should first build an object of the class
ERVmapParam using the function
singleEnd parameter is set to
TRUE since the example BAM files are single-end. The
ignoreStrand parameter works analogously to the same parameter in the function
summarizeOverlaps() from package GenomicAlignments and should be set to
TRUE whenever the RNA library preparation protocol was stranded.
One of the filters applied by the ERVmap method compares the alignment score of a given primary alignment, stored in the
AS tag of a SAM record, to the largest alignment score among every other secondary alignment, known as the suboptimal alignment score. The original ERVmap software assumes that input BAM files are generated using the Burrows-Wheeler Aligner (BWA) software (Li and Durbin 2009), which stores suboptimal alignment scores in the
XS tag. Although
AS is an optional tag, most short-read aligners provide this tag with alignment scores in BAM files. However, the suboptimal alignment score, stored in the
XS tag by BWA, is either stored in a different tag or not stored at all by other short-read aligner software, such as STAR (Dobin et al. 2013).
To enable using ERVmap on BAM files produced by short-read aligner software other than BWA,
atena allows the user to set the argument
suboptimalAlignmentTag to one of the following three possible values:
The name of a tag different to
XS that stores the suboptimal alignment score.
The value “none”, which will trigger the calculation of the suboptimal alignment score by searching for the largest value stored in the
AS tag among all available secondary alignments.
The value “auto” (default), by which
atena will first extract the name of the short-read aligner software from the BAM file and if that software is BWA, then suboptimal alignment scores will be obtained from the
XS tag. Otherwise, it will trigger the calculation previously explained for
Finally, this filter is applied by comparing the difference between alignment and suboptimal alignment scores to a cutoff value, which by default is 5 but can be modified using the parameter
suboptimalAlignmentCutoff. The default value 5 is the one employed in the original ERVmap software that assumes the BAM file was generated with BWA and for which lower values are interpreted as “equivalent to second best match has one or more mismatches than the best match” (Tokuyama et al. 2018, pg. 12571). From a different perspective, in BWA the mismatch penalty has a value of 4 and therefore, a
suboptimalAlignmentCutoff value of 5 only retains those reads where the suboptimal alignment has at least 1 mismatch more than the best match. Therefore, the
suboptimalAlignmentCutoff value is specific to the short-read mapper software and we recommend to set this value according to the mismatch penalty of that software. Another option is to set
suboptimalAlignmentCutoff=NA, which prevents the filtering of reads based on this criteria, as set in the following example.
bamfiles <- list.files(system.file("extdata", package="atena"), pattern="*.bam", full.names=TRUE) TE_annot <- readRDS(file = system.file("extdata", "Top28TEs.rds", package="atena")) empar <- ERVmapParam(bamfiles, teFeatures = TE_annot, singleEnd = TRUE, ignoreStrand = TRUE, suboptimalAlignmentCutoff=NA) empar ERVmapParam object # BAM files (2): control_KD.bam, piwi_KD.bam # features (28): ROO_LTR_174, ..., ROO_LTR_10731 # single-end, unstranded
In the case of paired-end BAM files (
singleEnd=FALSE), two additional arguments can be specified,
strandMode defines the behavior of the strand getter when internally reading the BAM files with the
GAlignmentPairs() function. See the help page of
strandMode in the GenomicAlignments package for further details.
fragments controls how read filtering and counting criteria are applied to the read mates in a paired-end read. To use the original ERVmap algorithm (Tokuyama et al. 2018) one should set
fragments=TRUE (default when
singleEnd=FALSE), which filters and counts each mate of a paired-end read independently (i.e., two read mates overlapping the same feature count twice on that feature, treating paired-end reads as if they were single-end). On the other hand, when
fragments=FALSE, if the two read mates pass the filtering criteria and overlap the same feature, they count once on that feature. If either read mate fails to pass the filtering criteria, then both read mates are discarded.
An additional functionality with respect to the original ERVmap software is the integration of gene and TE expression quantification. The original ERVmap software doesn’t quantify TE and gene expression coordinately and this can potentially lead to counting twice reads that simultaneously overlap a gene and a TE. In
atena, gene expression is quantified based on the approach used in the TEtranscripts software (Jin et al. 2015): unique reads are preferably assigned to genes, whereas multi-mapping reads are preferably assigned to TEs.
In case that a unique read does not overlap a gene or a multi-mapping read does not overlap a TE,
atena searches for overlaps with TEs or genes, respectively. Given the different treatment of unique and multi-mapping reads,
atena requires the information regarding the unique or multi-mapping status of a read. This information is obtained from the presence of secondary alignments in the BAM file or, alternatively, from the
NH tag in the BAM file (number of reported alignments that contain the query in the current SAM record). Therefore, either secondary alignments or the
NH tag need to be present for gene expression quantification.
The original ERVmap approach does not discard any read overlapping gene annotations. However, this can be changed using the parameter
geneCountMode, which by default
geneCountMode="all" and follows the behavior in the original ERVmap method. On the contrary, by setting
atena also applies the filtering criteria employed to quantify TE expression to the reads overlapping gene annotations.
atena also allows one to aggregate TE expression quantifications. By default, the names of the input
GRangesList object given in the
teFeatures parameter are used to aggregate quantifications. However, the
aggregateby parameter can be used to specify other column names in the feature annotations to be used to aggregate TE counts, for example at the sub-family level.
To use the Telescope method for TE expression quantification, the
TelescopeParam() function is used to build a parameter object of the class
As in the case of
aggregateby argument, which should be a character vector of column names in the annotation, determines the columns to be used to aggregate TE expression quantifications. This way,
atena provides not only quantifications at the subfamily level, but also allows to quantify TEs at the desired level (family, class, etc.), including locus based quantifications. For such a use case, the object with the TE annotations should include a column with unique identifiers for each TE locus and the
aggregateby argument should specify the name of that column. When
aggregateby is not specified, the
names() of the object containing TE annotations are used to aggregate quantifications.
Here, the Telescope annotations will be used and TE quantifications will be aggregated according to the
names() of the
bamfiles <- list.files(system.file("extdata", package="atena"), pattern="*.bam", full.names=TRUE) TE_annot <- readRDS(file = system.file("extdata", "Top28TEs.rds", package="atena")) gene_annot <- readRDS(file = system.file("extdata", "Top50genes.rds", package="atena")) tspar <- TelescopeParam(bfl=bamfiles, teFeatures=TE_annot, geneFeatures = gene_annot, singleEnd = TRUE, ignoreStrand=TRUE) tspar TelescopeParam object # BAM files (2): control_KD.bam, piwi_KD.bam # features (GRanges length 78): ROO_LTR_174, ..., FBgn0259229 # aggregated by: GRanges names # single-end; unstranded
In case of paired-end data (
singleEnd=FALSE), the argument usage is similar to that of
ERVmapParam(). In relation to the BAM file, Telescope follows the same approach as the ERVmap method: when
fragments=FALSE, only mated read pairs from opposite strands are considered, while when
fragments=TRUE, same-strand pairs, singletons, reads with unmapped pairs and other fragments are also considered by the algorithm. However, there is one important difference with respect to the counting approach followed by ERVmap: when
fragments=TRUE mated read pairs mapping to the same element are counted once, whereas in the ERVmap method they are counted twice.
As in the ERVmap method from
atena, the gene expression quantification method in Telescope is based on the approach of the TEtranscripts software (Jin et al. 2015). This way,
atena provides the possibility to integrate TE expression quantification by Telescope with gene expression quantification. As in the case of the ERVmap method from
atena, either secondary alignments or the
NH tag are required for gene expression quantification.
Finally, the third method available is TEtranscripts. First, the
TEtranscriptsParam() function is called to build a parameter object of the class
TEtranscriptsParam. The usage of the
aggregateby argument is the same as in
ERVmapParam(). Locus based quantifications in the TEtranscripts method from
atena is possible because the TEtranscripts algorithm actually computes TE quantifications at the locus level and then sums up all instances of each TE subfamily to provide expression at the subfamily level. By avoiding this last step,
atena can provide TE expression quantification at the locus level using the TEtranscripts method. For such a use case, the object with the TE annotations should include a column with unique identifiers for each TE and the
aggregateby argument should specify the name of that column.
Here, the Telescope annotations will be used and TE quantifications will be aggregated at the repeat name level. This way, the
aggregateby argument will be set to
aggregateby = "repName".
bamfiles <- list.files(system.file("extdata", package="atena"), pattern="*.bam", full.names=TRUE) TE_annot <- readRDS(file = system.file("extdata", "Top28TEs.rds", package="atena")) ttpar <- TEtranscriptsParam(bamfiles, teFeatures = TE_annot, singleEnd = TRUE, ignoreStrand=TRUE, aggregateby = c("repName")) ttpar TEtranscriptsParam object # BAM files (2): control_KD.bam, piwi_KD.bam # features (GRanges length 28): ROO_LTR_174, ..., ROO_LTR_10731 # aggregated by: repName # single-end; unstranded
For paired-end data (
singleEnd=FALSE), the usage of the
fragments argument is the same as in
Regarding gene expression quantification,
atena has implemented the approach of the original TEtranscripts software (Jin et al. 2015). As in the case of the ERVmap and Telescope methods from
atena, either secondary alignments or the
NH tag are required.
Following the gene annotation processing present in the TEtranscripts algorithm, in case that
geneFeatures contains a metadata column named “type”, only the elements with “type” = “exon” are considered for the analysis. Then, exon counts are summarized to the gene level in a
GRangesList object. This also applies to the ERVmap and Telescope methods for
atena when gene feature are present. Let’s see an example of this processing:
# Creating an example of gene annotations annot_gen <- GRanges(seqnames = rep("2L",10), ranges = IRanges(start = c(1,20,45,80,110,130,150,170,200,220), width = c(10,20,35,10,5,15,10,25,5,20)), strand = "*", type = rep("exon",10)) # Setting gene ids names(annot_gen) <- paste0("gene",c(rep(1,3),rep(2,4),rep(3,3))) annot_gen GRanges object with 10 ranges and 1 metadata column: seqnames ranges strand | type <Rle> <IRanges> <Rle> | <character> gene1 2L 1-10 * | exon gene1 2L 20-39 * | exon gene1 2L 45-79 * | exon gene2 2L 80-89 * | exon gene2 2L 110-114 * | exon gene2 2L 130-144 * | exon gene2 2L 150-159 * | exon gene3 2L 170-194 * | exon gene3 2L 200-204 * | exon gene3 2L 220-239 * | exon ------- seqinfo: 1 sequence from an unspecified genome; no seqlengths ttpar_gen <- TEtranscriptsParam(bamfiles, teFeatures = TE_annot, geneFeatures = annot_gen, singleEnd = TRUE, ignoreStrand=TRUE) ttpar_gen TEtranscriptsParam object # BAM files (2): control_KD.bam, piwi_KD.bam # features (CompressedGRangesList length 31): ROO_LTR_174, ..., gene3 # aggregated by: CompressedGRangesList names # single-end; unstranded
Let’s see the result of the gene annotation processing:
features(ttpar_gen)[!attributes(features(ttpar_gen))$isTE$isTE] GRangesList object of length 3: $gene1 GRanges object with 3 ranges and 11 metadata columns: seqnames ranges strand | source feature_type score phase <Rle> <IRanges> <Rle> | <factor> <factor> <numeric> <integer> gene1 2L 1-10 * | NA NA NA <NA> gene1 2L 20-39 * | NA NA NA <NA> gene1 2L 45-79 * | NA NA NA <NA> repName repClass repFamily name ID type <character> <character> <character> <character> <character> <character> gene1 <NA> <NA> <NA> <NA> <NA> exon gene1 <NA> <NA> <NA> <NA> <NA> exon gene1 <NA> <NA> <NA> <NA> <NA> exon isTE <Rle> gene1 FALSE gene1 FALSE gene1 FALSE ------- seqinfo: 1539 sequences from an unspecified genome; no seqlengths $gene2 GRanges object with 4 ranges and 11 metadata columns: seqnames ranges strand | source feature_type score phase <Rle> <IRanges> <Rle> | <factor> <factor> <numeric> <integer> gene2 2L 80-89 * | NA NA NA <NA> gene2 2L 110-114 * | NA NA NA <NA> gene2 2L 130-144 * | NA NA NA <NA> gene2 2L 150-159 * | NA NA NA <NA> repName repClass repFamily name ID type <character> <character> <character> <character> <character> <character> gene2 <NA> <NA> <NA> <NA> <NA> exon gene2 <NA> <NA> <NA> <NA> <NA> exon gene2 <NA> <NA> <NA> <NA> <NA> exon gene2 <NA> <NA> <NA> <NA> <NA> exon isTE <Rle> gene2 FALSE gene2 FALSE gene2 FALSE gene2 FALSE ------- seqinfo: 1539 sequences from an unspecified genome; no seqlengths $gene3 GRanges object with 3 ranges and 11 metadata columns: seqnames ranges strand | source feature_type score phase <Rle> <IRanges> <Rle> | <factor> <factor> <numeric> <integer> gene3 2L 170-194 * | NA NA NA <NA> gene3 2L 200-204 * | NA NA NA <NA> gene3 2L 220-239 * | NA NA NA <NA> repName repClass repFamily name ID type <character> <character> <character> <character> <character> <character> gene3 <NA> <NA> <NA> <NA> <NA> exon gene3 <NA> <NA> <NA> <NA> <NA> exon gene3 <NA> <NA> <NA> <NA> <NA> exon isTE <Rle> gene3 FALSE gene3 FALSE gene3 FALSE ------- seqinfo: 1539 sequences from an unspecified genome; no seqlengths
Finally, to quantify TE expression we call the
qtex() method using one of the previously defined parameter objects (
TelescopeParam) according to the quantification method we want to use. The
qtex() method returns a
SummarizedExperiment object containing the resulting quantification of expression in an assay slot. Additionally, when a
DataFrame, object storing phenotypic data is passed to the
qtex() function through the
phenodata parameter, this will be included as column data in the resulting
SummarizedExperiment object and the row names of these phenotypic data will be set as column names in the output
In the current example, the call to quantify TE expression using the ERVmap method would be the following:
emq <- qtex(empar)
emq class: RangedSummarizedExperiment dim: 28 2 metadata(0): assays(1): counts rownames(28): ROO_LTR_174 ROO_LTR_196 ... ROO_LTR_10666 ROO_LTR_10731 rowData names(10): source feature_type ... ID isTE colnames(2): control_KD piwi_KD colData names(0): colSums(assay(emq)) control_KD piwi_KD 29 17
In the case of the Telescope method, the call would be as follows:
tsq <- qtex(tspar)
tsq class: RangedSummarizedExperiment dim: 79 2 metadata(0): assays(1): counts rownames(79): ROO_LTR_174 ROO_LTR_196 ... FBgn0259229 no_feature rowData names(15): source feature_type ... transcript_name isTE colnames(2): control_KD piwi_KD colData names(0): colSums(assay(tsq)) control_KD piwi_KD 150 150
For the TEtranscripts method, TE expression is quantified by using the following call:
ttq <- qtex(ttpar)
ttq class: RangedSummarizedExperiment dim: 1 2 metadata(0): assays(1): counts rownames(1): ROO_LTR rowData names(0): colnames(2): control_KD piwi_KD colData names(0): colSums(assay(ttq)) control_KD piwi_KD 150 123
As mentioned, TE expression quantification is provided at the repeat name level. All 28 TE features share the same repeat name. Thus, the expression levels of the initial 28 TE features have been a aggregated into 1 element with repeat name ROO_LTR.
nrow(ttq)  1
sessionInfo() R version 4.2.1 (2022-06-23) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 20.04.5 LTS Matrix products: default BLAS: /home/biocbuild/bbs-3.16-bioc/R/lib/libRblas.so LAPACK: /home/biocbuild/bbs-3.16-bioc/R/lib/libRlapack.so locale:  LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C  LC_TIME=en_GB LC_COLLATE=C  LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8  LC_PAPER=en_US.UTF-8 LC_NAME=C  LC_ADDRESS=C LC_TELEPHONE=C  LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages:  stats4 stats graphics grDevices utils datasets methods  base other attached packages:  atena_1.4.0 SummarizedExperiment_1.28.0  Biobase_2.58.0 GenomicRanges_1.50.0  GenomeInfoDb_1.34.0 IRanges_2.32.0  S4Vectors_0.36.0 BiocGenerics_0.44.0  MatrixGenerics_1.10.0 matrixStats_0.62.0  knitr_1.40 BiocStyle_2.26.0 loaded via a namespace (and not attached):  httr_1.4.4 sass_0.4.2  bit64_4.0.5 jsonlite_1.8.3  AnnotationHub_3.6.0 bslib_0.4.0  shiny_1.7.3 assertthat_0.2.1  interactiveDisplayBase_1.36.0 BiocManager_1.30.19  BiocFileCache_2.6.0 blob_1.2.3  Rsamtools_2.14.0 GenomeInfoDbData_1.2.9  yaml_2.3.6 BiocVersion_3.16.0  pillar_1.8.1 RSQLite_2.2.18  lattice_0.20-45 glue_1.6.2  digest_0.6.30 promises_188.8.131.52  XVector_0.38.0 colorspace_2.0-3  htmltools_0.5.3 httpuv_1.6.6  Matrix_1.5-1 pkgconfig_2.0.3  bookdown_0.29 zlibbioc_1.44.0  purrr_0.3.5 xtable_1.8-4  scales_1.2.1 later_1.3.0  BiocParallel_1.32.0 tibble_3.1.8  KEGGREST_1.38.0 generics_0.1.3  ellipsis_0.3.2 withr_2.5.0  cachem_1.0.6 cli_3.4.1  crayon_1.5.2 magrittr_2.0.3  mime_0.12 memoise_2.0.1  evaluate_0.17 fansi_1.0.3  tools_4.2.1 lifecycle_1.0.3  stringr_1.4.1 munsell_0.5.0  DelayedArray_0.24.0 AnnotationDbi_1.60.0  Biostrings_2.66.0 compiler_4.2.1  jquerylib_0.1.4 rlang_1.0.6  grid_4.2.1 RCurl_1.98-1.9  rappdirs_0.3.3 bitops_1.0-7  rmarkdown_2.17 codetools_0.2-18  DBI_1.1.3 curl_4.3.3  R6_2.5.1 GenomicAlignments_1.34.0  dplyr_1.0.10 fastmap_1.1.0  bit_4.0.4 utf8_1.2.2  filelock_1.0.2 stringi_1.7.8  SQUAREM_2021.1 parallel_4.2.1  Rcpp_1.0.9 vctrs_0.5.0  png_0.1-7 sparseMatrixStats_1.10.0  dbplyr_2.2.1 tidyselect_1.2.0  xfun_0.34
Bendall, Matthew L, Miguel De Mulder, Luis Pedro Iñiguez, Aarón Lecanda-Sánchez, Marcos Pérez-Losada, Mario A Ostrowski, R Brad Jones, et al. 2019. “Telescope: Characterization of the Retrotranscriptome by Accurate Estimation of Transposable Element Expression.” PLoS Computational Biology 15 (9): e1006453.
Dobin, Alexander, Carrie A Davis, Felix Schlesinger, Jorg Drenkow, Chris Zaleski, Sonali Jha, Philippe Batut, Mark Chaisson, and Thomas R Gingeras. 2013. “STAR: Ultrafast Universal Rna-Seq Aligner.” Bioinformatics 29 (1): 15–21.
Goerner-Potvin, Patricia, and Guillaume Bourque. 2018. “Computational Tools to Unmask Transposable Elements.” Nature Reviews Genetics 19 (11): 688–704.
Guffanti, Guia, Andrew Bartlett, Torsten Klengel, Claudia Klengel, Richard Hunter, Gennadi Glinsky, and Fabio Macciardi. 2018. “Novel Bioinformatics Approach Identifies Transcriptional Profiles of Lineage-Specific Transposable Elements at Distinct Loci in the Human Dorsolateral Prefrontal Cortex.” Molecular Biology and Evolution 35 (10): 2435–53.
Jin, Ying, Oliver H Tam, Eric Paniagua, and Molly Hammell. 2015. “TEtranscripts: A Package for Including Transposable Elements in Differential Expression Analysis of Rna-Seq Datasets.” Bioinformatics 31 (22): 3593–9.
Li, Heng, and Richard Durbin. 2009. “Fast and Accurate Short Read Alignment with Burrows–Wheeler Transform.” Bioinformatics 25 (14): 1754–60.
Payer, Lindsay M, and Kathleen H Burns. 2019. “Transposable Elements in Human Genetic Disease.” Nature Reviews Genetics 20 (12): 760–72.
Tokuyama, Maria, Yong Kong, Eric Song, Teshika Jayewickreme, Insoo Kang, and Akiko Iwasaki. 2018. “ERVmap Analysis Reveals Genome-Wide Transcription of Human Endogenous Retroviruses.” Proceedings of the National Academy of Sciences 115 (50): 12565–72.