qsvaR 1.2.0
qsvaR
R
is an open-source statistical environment which can be easily modified to enhance its functionality via packages. qsvaR is a R
package available via the Bioconductor repository for packages. R
can be installed on any operating system from CRAN after which you can install qsvaR by using the following commands in your R
session:
## To install Bioconductor packages
if (!requireNamespace("BiocManager", quietly = TRUE)) {
install.packages("BiocManager")
}
BiocManager::install("qsvaR")
## Check that you have a valid Bioconductor installation
BiocManager::valid()
## You can install the development version from GitHub with:
BiocManager::install("LieberInstitute/qsvaR")
qsvaR is based on many other packages and in particular in those that have implemented the infrastructure needed for dealing with RNA-seq data. That is, packages like SummarizedExperiment. Here it might be useful for you to check the qSVA framework manuscript (Jaffe et al, PNAS, 2017).
If you are asking yourself the question “Where do I start using Bioconductor?” you might be interested in this blog post.
As package developers, we try to explain clearly how to use our packages and in which order to use the functions. But R
and Bioconductor
have a steep learning curve so it is critical to learn where to ask for help. The blog post quoted above mentions some but we would like to highlight the Bioconductor support site as the main resource for getting help: remember to use the qsvaR
tag and check the older posts. Other alternatives are available such as creating GitHub issues and tweeting. However, please note that if you want to receive help you should adhere to the posting guidelines. It is particularly critical that you provide a small reproducible example and your session information so package developers can track down the source of the error.
qsvaR
We hope that qsvaR will be useful for your research. Please use the following information to cite the package and the overall approach. Thank you!
## Citation info
citation("qsvaR")
#>
#> To cite package 'qsvaR' in publications use:
#>
#> Stolz JM, Collado-Torres L (2022). _qsvaR_.
#> doi:10.18129/B9.bioc.qsvaR <https://doi.org/10.18129/B9.bioc.qsvaR>,
#> https://github.com/LieberInstitute/qsvaR/qsvaR - R package version
#> 1.2.0, <http://www.bioconductor.org/packages/qsvaR>.
#>
#> Stolz JM, Tao R, Jaffe AE, Collado-Torres L (2022). "qsvaR."
#> _bioRxiv_. doi:10.1101/TODO <https://doi.org/10.1101/TODO>,
#> <https://www.biorxiv.org/content/10.1101/TODO>.
#>
#> To see these entries in BibTeX format, use 'print(<citation>,
#> bibtex=TRUE)', 'toBibtex(.)', or set
#> 'options(citation.bibtex.max=999)'.
qsvaR
Overviewlibrary("qsvaR")
library("limma")
library("BiocFileCache")
Differential expressions analysis requires the ability normalize complex datasets. In the case of postmortem brain tissue we are tasked with removing the effects of bench degradation. Our current work expands the scope of qSVA by generating degradation profiles (5 donors across 4 degradation time points: 0, 15, 30, and 60 minutes) from six human brain regions (n = 20 * 6 = 120): dorsolateral prefrontal cortex (DLPFC), hippocampus (HPC), medial prefrontal cortex (mPFC), subgenual anterior cingulate cortex (sACC), caudate, amygdala (AMY). We identified an average of 80,258 transcripts associated (FDR < 5%) with degradation time across the six brain regions. Testing for an interaction between brain region and degradation time identified 45,712 transcripts (FDR < 5%). A comparison of regions showed a unique pattern of expression changes associated with degradation time particularly in the DLPFC, implying that this region may not be representative of the effects of degradation on gene expression in other tissues. Furthermore previous work was done by analyzing expressed regions (Collado-Torres et al, NAR, 2017), and while this is an effective method of analysis, expressed regions are not a common output for many pipelines and are computationally expensive to identify, thus creating a barrier for the use of any qSVA software. In our most recent work expression quantification was performed at the transcript level using Salmon (Patro et al, Nat Methods, 2017). The changes from past work on qSVs to now is illustrated in the below cartoon.
The qsvaR (Stolz and Collado-Torres, 2022) package combines an established method for removing the effects of degradation from RNA-seq data with easy to use functions. The first step in this workflow is to create an RangedSummarizedExperiment
object with the transcripts identified in our qSVA experiment. If you already have a RangedSummarizedExperiment
of transcripts we can do this with the getDegTx()
function as shown below.If not this can be generated with the SPEAQeasy
(a RNA-seq pipeline maintained by our lab) pipeline usinge the --qsva
flag. If you already have a RangedSummarizedExperiment
object with transcripts then you do not need to run SPEAQeasy
. This flag requires a full path to a text file, containing one Ensembl transcript ID per line for each transcript desired in the final transcripts R output object (called rse_tx
). The sig_transcripts
argument in this package should contain the same Ensembl transcript IDs as the text file for the --qsva
flag. The goal of qsvaR
is to provide software that can remove the effects of bench degradation from RNA-seq data.
bfc <- BiocFileCache::BiocFileCache()
## Download brainseq phase 2 ##
rse_file <- BiocFileCache::bfcrpath(
"https://s3.us-east-2.amazonaws.com/libd-brainseq2/rse_tx_unfiltered.Rdata",
x = bfc
)
load(rse_file, verbose = TRUE)
#> Loading objects:
#> rse_tx
In this next step we subset for the transcripts associated with degradation. These were determined by Joshua M. Stolz et al, 2022. We have provided three models to choose from. Here the names "cell_component"
, "top1500"
, and "standard"
refer to models that were determined to be effective in removing degradation effects. The "standard"
model involves taking the union of the top 1000 transcripts associated with degradation from the interaction model and the main effect model. The "top1500"
model is the same as the "standard"
model except the union of the top 1500 genes associated with degradation is selected. The most effective of our models, "cell_component"
, involved deconvolution of the degradation matrix to determine the proportion of cell types within our studied tissue. These proportions were then added to our model.matrix()
and the union of the top 1000 transcripts in the interaction model, the main effect model, and the cell proportions model were used to generate this model of quality surrogate variables (qSVs). In this example we will choose "cell_component"
when using the getDegTx()
and select_transcripts()
functions.
# obtain transcripts with degradation signature
DegTx <- getDegTx(rse_tx, type = "cell_component")
dim(DegTx)
#> [1] 2976 900
The qSVs are derived from taking the principal components (PCs) of the selected transcript expression data. This can be done with the function getPCs
. qSVs are essentially pricipal components from an rna-seq experiment designed to model bench degradation. For more on principal components you can read and introductory article here
. rse_tx
specifies a RangedSummarizedExperiment
object that has the specified degraded transcripts. For us this is DegTx
. Here tpm
is the name of the assay we are using within the RangedSummarizedExperiment
object, where TPM stands for transcripts per million.
pcTx <- getPCs(rse_tx = DegTx, assayname = "tpm")
Next we use the k_qsvs()
function to calculate how many PCs we will need to account for the variation. A model matrix accounting for relevant variables should be used. Common variables such as Age, Sex, Race and Religion are often included in the model. Again we are using our RangedSummarizedExperiment
DegTx
as the rse_tx
option. Next we specify the mod
with our model.matrix()
. model.matrix()
creates a design (or model) matrix, e.g., by expanding factors to a set of dummy variables (depending on the contrasts) and expanding interactions similarly. For more information on creating a design matrix for your experiment see the documentation here. Again we use the assayname
option to specify that we are using the tpm
assay.
# design a basic model matrix to model the number of pcs needed
mod <- model.matrix(~ Dx + Age + Sex + Race + Region,
data = colData(rse_tx)
)
# use k qsvs function to return a integer of pcs needed
k <- k_qsvs(rse_tx = DegTx, mod = mod, assayname = "tpm")
print(k)
#> [1] 34
Finally we subset our data to the calculated number of PCs. The output of this function will be the qsvs for each sample. Here we use the qsvPCs
argument to enter the principal components (pcTx
). Here the argument k is the number of PCs we are going to include as calculated in the previous step.
# get_qsvs use k to subset our pca analysis
qsvs <- get_qsvs(qsvPCs = pcTx, k = k)
dim(qsvs)
#> [1] 900 34
This can be done in one step with our wrapper function qSVA
which just combinds all the previous mentioned functions.
## Example use of the wrapper function qSVA()
qsvs_wrapper <- qSVA(rse_tx = rse_tx, type = "cell_component", mod = mod, assayname = "tpm")
dim(qsvs_wrapper)
#> [1] 900 34
Next we can use a standard limma package approach to do differential expression on the data. The key here is that we add our qsvs to the model.matrix()
. Here we input our RangedSummarizedExperiment
object and our model.matrix()
with qSVs. Note here that the RangedSummarizedExperiment
object is the original object loaded with the full list of transcripts, not the the one we subsetted for qSVs. This is because while PCs can be generated from a subset of genes, differential expression is best done on the full dataset. The expected output is a sigTx
object that shows the results of differential expression.
### should be done by cbinding mod to pcs
## subset to an expression cutoff
rse_tx <- rse_tx[rowData(rse_tx)$passExprsCut, ]
# create a model.matrix with demographic infor and qsvs
mod_qSVA <- cbind(mod, qsvs)
# log tranform transcript expression
txExprs <- log2(assays(rse_tx)$tpm + 1)
# linear model differential expression
fitTx <- lmFit(txExprs, mod_qSVA)
# generate empirical bayes for DE
eBTx <- eBayes(fitTx)
# get DE results table
sigTx <- topTable(eBTx,
coef = 2,
p.value = 1, number = nrow(rse_tx)
)
head(sigTx)
#> logFC AveExpr t P.Value adj.P.Val
#> ENST00000553142.5 -0.06547988 2.0390889 -5.981344 3.242087e-09 0.0003006452
#> ENST00000552074.5 -0.12911383 2.4347985 -5.372043 1.002462e-07 0.0046480161
#> ENST00000530589.1 -0.10297938 2.4271711 -4.918614 1.044003e-06 0.0242568671
#> ENST00000510632.1 0.08994392 0.9073516 4.918168 1.046321e-06 0.0242568671
#> ENST00000450454.6 0.08446871 1.0042440 4.814104 1.746151e-06 0.0256773930
#> ENST00000572236.1 -0.05358333 0.6254025 -4.806373 1.813172e-06 0.0256773930
#> B
#> ENST00000553142.5 10.661953
#> ENST00000552074.5 7.426876
#> ENST00000530589.1 5.230075
#> ENST00000510632.1 5.228001
#> ENST00000450454.6 4.749607
#> ENST00000572236.1 4.714453
If we look at a plot of our most significant transcript we can see that at this level we don’t have that much fold change in our data at any individual transcript. These transcripts are however significant and it might be valuable to repeat the analysis at gene level. At gene level the results can be used to do gene ontology enrichment with packages such as clusterProfiler.
# get expression for most significant gene
yy <- txExprs[rownames(txExprs) == rownames(sigTx)[1], ]
## Visualize the expression of this gene using boxplots
p <- boxplot(yy ~ rse_tx$Dx,
outline = FALSE,
ylim = range(yy), ylab = "log2 Exprs", xlab = "",
main = paste(rownames(sigTx)[1])
)
We can assess the effectiveness of our analysis by first performing DE without qSVs
# log tranform transcript expression
txExprs_noqsv <- log2(assays(rse_tx)$tpm + 1)
# linear model differential expression with generic model
fitTx_noqsv <- lmFit(txExprs_noqsv, mod)
# generate empirical bayes for DE
eBTx_noqsv <- eBayes(fitTx_noqsv)
# get DE results table
sigTx_noqsv <- topTable(eBTx_noqsv,
coef = 2,
p.value = 1, number = nrow(rse_tx)
)
## Explore the top results
head(sigTx_noqsv)
#> logFC AveExpr t P.Value adj.P.Val
#> ENST00000399220.2 -0.4976648 1.8820796 -8.227630 6.687751e-16 6.201686e-11
#> ENST00000302632.3 -0.5046321 2.7424034 -7.767379 2.187319e-14 1.014172e-09
#> ENST00000409584.5 0.2110462 0.6115977 7.686040 3.980683e-14 1.230456e-09
#> ENST00000540372.5 -0.1844651 0.5463078 -7.621955 6.356374e-14 1.473598e-09
#> ENST00000550948.1 -0.3359378 1.3968139 -7.496492 1.573811e-13 2.443954e-09
#> ENST00000412814.1 -0.2589885 0.6288506 -7.495831 1.581302e-13 2.443954e-09
#> B
#> ENST00000399220.2 25.15571
#> ENST00000302632.3 21.85502
#> ENST00000409584.5 21.28881
#> ENST00000540372.5 20.84637
#> ENST00000550948.1 19.98955
#> ENST00000412814.1 19.98506
Next we should subset our differential expression output to just the t-statistic
## Subset the topTable() results to keep just the t-statistic
DE_noqsv <- data.frame(t = sigTx_noqsv$t, row.names = rownames(sigTx_noqsv))
DE <- data.frame(t = sigTx$t, row.names = rownames(sigTx))
## Explore this data.frame()
head(DE)
#> t
#> ENST00000553142.5 -5.981344
#> ENST00000552074.5 -5.372043
#> ENST00000530589.1 -4.918614
#> ENST00000510632.1 4.918168
#> ENST00000450454.6 4.814104
#> ENST00000572236.1 -4.806373
Using our DEqual
function we can make a plot comparing the t-statistics from degradation and our differential expression output. In the first model below there is a 0.5 correlation between degradation t-statistics and our differential expression. This means the data is likely confounded for degradation and will lead to many false positives.
## Generate a DEqual() plot using the model results without qSVs
DEqual(DE_noqsv)
In the plot below when we add qSVs to our model we reduce the association with degradation to -0.014, which is very close to 0.
## Generate a DEqual() plot using the model results with qSVs
DEqual(DE)
We have shown that this method is effective for removing the effects of degradation from RNA-seq data. We found that the qsvaR is simpler to use than the previous version from 2016 that used expressed regions instead of transcripts making this software package preferable for users. I would encourage users to read how each set of degradation transcripts was selected as not all models may be appropriate for every experiment. Thank you for your interest and for using qsvaR (Stolz and Collado-Torres, 2022)!
We would like to thank:
SPEAQeasy
The qsvaR package (Stolz and Collado-Torres, 2022) was made possible thanks to:
This package was developed using biocthis.
Code for creating the vignette
## Create the vignette
library("rmarkdown")
system.time(render("Intro_qsvaR.Rmd", "BiocStyle::html_document"))
## Extract the R code
library("knitr")
knit("Intro_qsvaR.Rmd", tangle = TRUE)
Date the vignette was generated.
#> [1] "2022-11-01 18:48:07 EDT"
Wallclock time spent generating the vignette.
#> Time difference of 10.699 mins
R
session information.
#> ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.2.1 (2022-06-23)
#> os Ubuntu 20.04.5 LTS
#> system x86_64, linux-gnu
#> ui X11
#> language (EN)
#> collate C
#> ctype en_US.UTF-8
#> tz America/New_York
#> date 2022-11-01
#> pandoc 2.5 @ /usr/bin/ (via rmarkdown)
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
#> package * version date (UTC) lib source
#> annotate 1.76.0 2022-11-01 [2] Bioconductor
#> AnnotationDbi 1.60.0 2022-11-01 [2] Bioconductor
#> assertthat 0.2.1 2019-03-21 [2] CRAN (R 4.2.1)
#> backports 1.4.1 2021-12-13 [2] CRAN (R 4.2.1)
#> bibtex 0.5.0 2022-09-25 [2] CRAN (R 4.2.1)
#> Biobase * 2.58.0 2022-11-01 [2] Bioconductor
#> BiocFileCache * 2.6.0 2022-11-01 [2] Bioconductor
#> BiocGenerics * 0.44.0 2022-11-01 [2] Bioconductor
#> BiocManager 1.30.19 2022-10-25 [2] CRAN (R 4.2.1)
#> BiocParallel 1.32.0 2022-11-01 [2] Bioconductor
#> BiocStyle * 2.26.0 2022-11-01 [2] Bioconductor
#> Biostrings 2.66.0 2022-11-01 [2] Bioconductor
#> bit 4.0.4 2020-08-04 [2] CRAN (R 4.2.1)
#> bit64 4.0.5 2020-08-30 [2] CRAN (R 4.2.1)
#> bitops 1.0-7 2021-04-24 [2] CRAN (R 4.2.1)
#> blob 1.2.3 2022-04-10 [2] CRAN (R 4.2.1)
#> bookdown 0.29 2022-09-12 [2] CRAN (R 4.2.1)
#> bslib 0.4.0 2022-07-16 [2] CRAN (R 4.2.1)
#> cachem 1.0.6 2021-08-19 [2] CRAN (R 4.2.1)
#> cli 3.4.1 2022-09-23 [2] CRAN (R 4.2.1)
#> codetools 0.2-18 2020-11-04 [2] CRAN (R 4.2.1)
#> colorspace 2.0-3 2022-02-21 [2] CRAN (R 4.2.1)
#> crayon 1.5.2 2022-09-29 [2] CRAN (R 4.2.1)
#> curl 4.3.3 2022-10-06 [2] CRAN (R 4.2.1)
#> DBI 1.1.3 2022-06-18 [2] CRAN (R 4.2.1)
#> dbplyr * 2.2.1 2022-06-27 [2] CRAN (R 4.2.1)
#> DelayedArray 0.24.0 2022-11-01 [2] Bioconductor
#> digest 0.6.30 2022-10-18 [2] CRAN (R 4.2.1)
#> dplyr 1.0.10 2022-09-01 [2] CRAN (R 4.2.1)
#> edgeR 3.40.0 2022-11-01 [2] Bioconductor
#> evaluate 0.17 2022-10-07 [2] CRAN (R 4.2.1)
#> fansi 1.0.3 2022-03-24 [2] CRAN (R 4.2.1)
#> farver 2.1.1 2022-07-06 [2] CRAN (R 4.2.1)
#> fastmap 1.1.0 2021-01-25 [2] CRAN (R 4.2.1)
#> filelock 1.0.2 2018-10-05 [2] CRAN (R 4.2.1)
#> genefilter 1.80.0 2022-11-01 [2] Bioconductor
#> generics 0.1.3 2022-07-05 [2] CRAN (R 4.2.1)
#> GenomeInfoDb * 1.34.0 2022-11-01 [2] Bioconductor
#> GenomeInfoDbData 1.2.9 2022-09-28 [2] Bioconductor
#> GenomicRanges * 1.50.0 2022-11-01 [2] Bioconductor
#> ggplot2 3.3.6 2022-05-03 [2] CRAN (R 4.2.1)
#> glue 1.6.2 2022-02-24 [2] CRAN (R 4.2.1)
#> gtable 0.3.1 2022-09-01 [2] CRAN (R 4.2.1)
#> highr 0.9 2021-04-16 [2] CRAN (R 4.2.1)
#> htmltools 0.5.3 2022-07-18 [2] CRAN (R 4.2.1)
#> httr 1.4.4 2022-08-17 [2] CRAN (R 4.2.1)
#> IRanges * 2.32.0 2022-11-01 [2] Bioconductor
#> jquerylib 0.1.4 2021-04-26 [2] CRAN (R 4.2.1)
#> jsonlite 1.8.3 2022-10-21 [2] CRAN (R 4.2.1)
#> KEGGREST 1.38.0 2022-11-01 [2] Bioconductor
#> knitr 1.40 2022-08-24 [2] CRAN (R 4.2.1)
#> labeling 0.4.2 2020-10-20 [2] CRAN (R 4.2.1)
#> lattice 0.20-45 2021-09-22 [2] CRAN (R 4.2.1)
#> lifecycle 1.0.3 2022-10-07 [2] CRAN (R 4.2.1)
#> limma * 3.54.0 2022-11-01 [2] Bioconductor
#> locfit 1.5-9.6 2022-07-11 [2] CRAN (R 4.2.1)
#> lubridate 1.8.0 2021-10-07 [2] CRAN (R 4.2.1)
#> magrittr 2.0.3 2022-03-30 [2] CRAN (R 4.2.1)
#> Matrix 1.5-1 2022-09-13 [2] CRAN (R 4.2.1)
#> MatrixGenerics * 1.10.0 2022-11-01 [2] Bioconductor
#> matrixStats * 0.62.0 2022-04-19 [2] CRAN (R 4.2.1)
#> memoise 2.0.1 2021-11-26 [2] CRAN (R 4.2.1)
#> mgcv 1.8-41 2022-10-21 [2] CRAN (R 4.2.1)
#> munsell 0.5.0 2018-06-12 [2] CRAN (R 4.2.1)
#> nlme 3.1-160 2022-10-10 [2] CRAN (R 4.2.1)
#> pillar 1.8.1 2022-08-19 [2] CRAN (R 4.2.1)
#> pkgconfig 2.0.3 2019-09-22 [2] CRAN (R 4.2.1)
#> plyr 1.8.7 2022-03-24 [2] CRAN (R 4.2.1)
#> png 0.1-7 2013-12-03 [2] CRAN (R 4.2.1)
#> purrr 0.3.5 2022-10-06 [2] CRAN (R 4.2.1)
#> qsvaR * 1.2.0 2022-11-01 [1] Bioconductor
#> R6 2.5.1 2021-08-19 [2] CRAN (R 4.2.1)
#> rappdirs 0.3.3 2021-01-31 [2] CRAN (R 4.2.1)
#> Rcpp 1.0.9 2022-07-08 [2] CRAN (R 4.2.1)
#> RCurl 1.98-1.9 2022-10-03 [2] CRAN (R 4.2.1)
#> RefManageR * 1.4.0 2022-09-30 [2] CRAN (R 4.2.1)
#> rlang 1.0.6 2022-09-24 [2] CRAN (R 4.2.1)
#> rmarkdown 2.17 2022-10-07 [2] CRAN (R 4.2.1)
#> RSQLite 2.2.18 2022-10-04 [2] CRAN (R 4.2.1)
#> S4Vectors * 0.36.0 2022-11-01 [2] Bioconductor
#> sass 0.4.2 2022-07-16 [2] CRAN (R 4.2.1)
#> scales 1.2.1 2022-08-20 [2] CRAN (R 4.2.1)
#> sessioninfo * 1.2.2 2021-12-06 [2] CRAN (R 4.2.1)
#> stringi 1.7.8 2022-07-11 [2] CRAN (R 4.2.1)
#> stringr 1.4.1 2022-08-20 [2] CRAN (R 4.2.1)
#> SummarizedExperiment * 1.28.0 2022-11-01 [2] Bioconductor
#> survival 3.4-0 2022-08-09 [2] CRAN (R 4.2.1)
#> sva 3.46.0 2022-11-01 [2] Bioconductor
#> tibble 3.1.8 2022-07-22 [2] CRAN (R 4.2.1)
#> tidyselect 1.2.0 2022-10-10 [2] CRAN (R 4.2.1)
#> utf8 1.2.2 2021-07-24 [2] CRAN (R 4.2.1)
#> vctrs 0.5.0 2022-10-22 [2] CRAN (R 4.2.1)
#> viridisLite 0.4.1 2022-08-22 [2] CRAN (R 4.2.1)
#> withr 2.5.0 2022-03-03 [2] CRAN (R 4.2.1)
#> xfun 0.34 2022-10-18 [2] CRAN (R 4.2.1)
#> XML 3.99-0.12 2022-10-28 [2] CRAN (R 4.2.1)
#> xml2 1.3.3 2021-11-30 [2] CRAN (R 4.2.1)
#> xtable 1.8-4 2019-04-21 [2] CRAN (R 4.2.1)
#> XVector 0.38.0 2022-11-01 [2] Bioconductor
#> yaml 2.3.6 2022-10-18 [2] CRAN (R 4.2.1)
#> zlibbioc 1.44.0 2022-11-01 [2] Bioconductor
#>
#> [1] /tmp/RtmpxPCrcr/Rinst240aa964885a05
#> [2] /home/biocbuild/bbs-3.16-bioc/R/library
#>
#> ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
This vignette was generated using BiocStyle (Oleś, 2022) with knitr (Xie, 2022) and rmarkdown (Allaire, Xie, McPherson et al., 2022) running behind the scenes.
Citations made with RefManageR (McLean, 2017).
[1] J. Allaire, Y. Xie, J. McPherson, et al. rmarkdown: Dynamic Documents for R. R package version 2.17. 2022. URL: https://github.com/rstudio/rmarkdown.
[2] J. Hester. covr: Test Coverage for Packages. R package version 3.6.1. 2022. URL: https://CRAN.R-project.org/package=covr.
[3] J. T. Leek, W. E. Johnson, H. S. Parker, et al. sva: Surrogate Variable Analysis. R package version 3.46.0. 2022.
[4] M. W. McLean. “RefManageR: Import and Manage BibTeX and BibLaTeX References in R”. In: The Journal of Open Source Software (2017). DOI: 10.21105/joss.00338.
[5] M. Morgan, V. Obenchain, J. Hester, et al. SummarizedExperiment: SummarizedExperiment container. R package version 1.28.0. 2022. URL: https://bioconductor.org/packages/SummarizedExperiment.
[6] A. Oleś. BiocStyle: Standard styles for vignettes and other Bioconductor documents. R package version 2.26.0. 2022. URL: https://github.com/Bioconductor/BiocStyle.
[7] R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria, 2022. URL: https://www.R-project.org/.
[8] M. E. Ritchie, B. Phipson, D. Wu, et al. “limma powers differential expression analyses for RNA-sequencing and microarray studies”. In: Nucleic Acids Research 43.7 (2015), p. e47. DOI: 10.1093/nar/gkv007.
[9] L. Shepherd and M. Morgan. BiocFileCache: Manage Files Across Sessions. R package version 2.6.0. 2022.
[10] J. M. Stolz and L. Collado-Torres. qsvaR. https://github.com/LieberInstitute/qsvaR/qsvaR - R package version 1.2.0. 2022. DOI: 10.18129/B9.bioc.qsvaR. URL: http://www.bioconductor.org/packages/qsvaR.
[11] H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016. ISBN: 978-3-319-24277-4. URL: https://ggplot2.tidyverse.org.
[12] H. Wickham. “testthat: Get Started with Testing”. In: The R Journal 3 (2011), pp. 5–10. URL: https://journal.r-project.org/archive/2011-1/RJournal_2011-1_Wickham.pdf.
[13] H. Wickham, W. Chang, R. Flight, et al. sessioninfo: R Session Information. R package version 1.2.2. 2021. URL: https://CRAN.R-project.org/package=sessioninfo.
[14] Y. Xie. knitr: A General-Purpose Package for Dynamic Report Generation in R. R package version 1.40. 2022. URL: https://yihui.org/knitr/.