Project: SRP009615.
This report is meant to help explore DESeq2 (Love, Huber, and Anders, 2014) results and was generated using the regionReport
(Collado-Torres, Jaffe, and Leek, 2016) package. While the report is rich, it is meant to just start the exploration of the results and exemplify some of the code used to do so. If you need a more in-depth analysis for your specific data set you might want to use the customCode
argument. This report is based on the vignette of the DESeq2
(Love, Huber, and Anders, 2014) package which you can find here.
This section contains the code for setting up the rest of the report.
## knitrBoostrap and device chunk options
load_install('knitr')
opts_chunk$set(bootstrap.show.code = FALSE, dev = device)
if(!outputIsHTML) opts_chunk$set(bootstrap.show.code = FALSE, dev = device, echo = FALSE)
#### Libraries needed
## Bioconductor
load_install('DESeq2')
if(isEdgeR) load_install('edgeR')
## CRAN
load_install('ggplot2')
if(!is.null(theme)) theme_set(theme)
load_install('knitr')
if(is.null(colors)) {
load_install('RColorBrewer')
}
load_install('pheatmap')
load_install('DT')
load_install('devtools')
## Working behind the scenes
# load_install('knitcitations')
# load_install('rmarkdown')
## Optionally
# load_install('knitrBootstrap')
#### Code setup
## For ggplot
res.df <- as.data.frame(res)
## Sort results by adjusted p-values
ord <- order(res.df$padj, decreasing = FALSE)
res.df <- res.df[ord, ]
features <- rownames(res.df)
res.df <- cbind(data.frame(Feature = features), res.df)
rownames(res.df) <- NULL
## Transform count data
rld <- tryCatch(rlog(dds), error = function(e) { rlog(dds, fitType = 'mean') })
## Perform PCA analysis and make plot
plotPCA(rld, intgroup = intgroup)
## Get percent of variance explained
data_pca <- plotPCA(rld, intgroup = intgroup, returnData = TRUE)
percentVar <- round(100 * attr(data_pca, "percentVar"))
The above plot shows the first two principal components that explain the variability in the data using the regularized log count data. If you are unfamiliar with principal component analysis, you might want to check the Wikipedia entry or this interactive explanation. In this case, the first and second principal component explain 65 and 15 percent of the variance respectively.
## Obtain the sample euclidean distances
sampleDists <- dist(t(assay(rld)))
sampleDistMatrix <- as.matrix(sampleDists)
## Add names based on intgroup
rownames(sampleDistMatrix) <- apply(as.data.frame(colData(rld)[, intgroup]), 1,
paste, collapse = ' : ')
colnames(sampleDistMatrix) <- NULL
## Define colors to use for the heatmap if none were supplied
if(is.null(colors)) {
colors <- colorRampPalette( rev(brewer.pal(9, "Blues")) )(255)
}
## Make the heatmap
pheatmap(sampleDistMatrix, clustering_distance_rows = sampleDists,
clustering_distance_cols = sampleDists, color = colors)
This plot shows how samples are clustered based on their euclidean distance using the regularized log transformed count data. This figure gives an overview of how the samples are hierarchically clustered. It is a complementary figure to the PCA plot.
This section contains three MA plots (see Wikipedia) that compare the mean of the normalized counts against the log fold change. They show one point per feature. The points are shown in red if the feature has an adjusted p-value less than alpha
, that is, the statistically significant features are shown in red.
## MA plot with alpha used in DESeq2::results()
plotMA(res, alpha = metadata(res)$alpha, main = paste('MA plot with alpha =',
metadata(res)$alpha))
This first plot shows uses alpha
= 0.1, which is the alpha
value used to determine which resulting features were significant when running the function DESeq2::results()
.
## MA plot with alpha = 1/2 of the alpha used in DESeq2::results()
plotMA(res, alpha = metadata(res)$alpha / 2,
main = paste('MA plot with alpha =', metadata(res)$alpha / 2))
This second MA plot uses alpha
= 0.05 and can be used agains the first MA plot to identify which features have adjusted p-values between 0.05 and 0.1.
## MA plot with alpha corresponding to the one that gives the nBest features
nBest.actual <- min(nBest, nrow(head(res.df, n = nBest)))
nBest.alpha <- head(res.df, n = nBest)$padj[nBest.actual]
plotMA(res, alpha = nBest.alpha * 1.00000000000001,
main = paste('MA plot for top', nBest.actual, 'features'))
The third and final MA plot uses an alpha such that the top 10 features are shown in the plot. These are the features that whose details are included in the top features interactive table.
## P-value histogram plot
ggplot(res.df[!is.na(res.df$pvalue), ], aes(x = pvalue)) +
geom_histogram(alpha=.5, position='identity', bins = 50) +
labs(title='Histogram of unadjusted p-values') +
xlab('Unadjusted p-values') +
xlim(c(0, 1.0005))
This plot shows a histogram of the unadjusted p-values. It might be skewed right or left, or flat as shown in the Wikipedia examples. The shape depends on the percent of features that are differentially expressed. For further information on how to interpret a histogram of p-values check David Robinson’s post on this topic.
## P-value distribution summary
summary(res.df$pvalue)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.000 0.207 0.540 0.504 0.798 1.000 7979
This is the numerical summary of the distribution of the p-values.
## Split features by different p-value cutoffs
pval_table <- lapply(c(1e-04, 0.001, 0.01, 0.025, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5,
0.6, 0.7, 0.8, 0.9, 1), function(x) {
data.frame('Cut' = x, 'Count' = sum(res.df$pvalue <= x, na.rm = TRUE))
})
pval_table <- do.call(rbind, pval_table)
if(outputIsHTML) {
kable(pval_table, format = 'markdown', align = c('c', 'c'))
} else {
kable(pval_table)
}
Cut | Count |
---|---|
0.0001 | 243 |
0.0010 | 776 |
0.0100 | 2371 |
0.0250 | 3816 |
0.0500 | 5468 |
0.1000 | 8059 |
0.2000 | 12278 |
0.3000 | 16026 |
0.4000 | 19662 |
0.5000 | 23425 |
0.6000 | 27385 |
0.7000 | 32525 |
0.8000 | 38160 |
0.9000 | 44652 |
1.0000 | 50058 |
This table shows the number of features with p-values less or equal than some commonly used cutoff values.
## Adjusted p-values histogram plot
ggplot(res.df[!is.na(res.df$padj), ], aes(x = padj)) +
geom_histogram(alpha=.5, position='identity', bins = 50) +
labs(title=paste('Histogram of', elementMetadata(res)$description[grep('adjusted', elementMetadata(res)$description)])) +
xlab('Adjusted p-values') +
xlim(c(0, 1.0005))
This plot shows a histogram of the BH adjusted p-values. It might be skewed right or left, or flat as shown in the Wikipedia examples.
## Adjusted p-values distribution summary
summary(res.df$padj)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.00 0.31 0.60 0.57 0.84 1.00 33963
This is the numerical summary of the distribution of the BH adjusted p-values.
## Split features by different adjusted p-value cutoffs
padj_table <- lapply(c(1e-04, 0.001, 0.01, 0.025, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5,
0.6, 0.7, 0.8, 0.9, 1), function(x) {
data.frame('Cut' = x, 'Count' = sum(res.df$padj <= x, na.rm = TRUE))
})
padj_table <- do.call(rbind, padj_table)
if(outputIsHTML) {
kable(padj_table, format = 'markdown', align = c('c', 'c'))
} else {
kable(padj_table)
}
Cut | Count |
---|---|
0.0001 | 9 |
0.0010 | 36 |
0.0100 | 234 |
0.0250 | 608 |
0.0500 | 1114 |
0.1000 | 2140 |
0.2000 | 3960 |
0.3000 | 5893 |
0.4000 | 7937 |
0.5000 | 9897 |
0.6000 | 11992 |
0.7000 | 14323 |
0.8000 | 16814 |
0.9000 | 19853 |
1.0000 | 24074 |
This table shows the number of features with BH adjusted p-values less or equal than some commonly used cutoff values.
This interactive table shows the top 10 features ordered by their BH adjusted p-values. Use the search function to find your feature of interest or sort by one of the columns.
## Add search url if appropriate
if(!is.null(searchURL) & outputIsHTML) {
res.df$Feature <- paste0('<a href="', searchURL, res.df$Feature, '">',
res.df$Feature, '</a>')
}
for(i in which(colnames(res.df) %in% c('pvalue', 'padj'))) res.df[, i] <- format(res.df[, i], scientific = TRUE)
if(outputIsHTML) {
datatable(head(res.df, n = nBest), options = list(pagingType='full_numbers', pageLength=10, scrollX='100%'), escape = FALSE, rownames = FALSE) %>% formatRound(which(!colnames(res.df) %in% c('pvalue', 'padj', 'Feature')), digits)
} else {
res.df_top <- head(res.df, n = 20)
for(i in which(!colnames(res.df) %in% c('pvalue', 'padj', 'Feature'))) res.df_top[, i] <- round(res.df_top[, i], digits)
kable(res.df_top)
}
This section contains plots showing the normalized counts per sample for each group of interest. Only the best 2 features are shown, ranked by their BH adjusted p-values. The Y axis is on the log10 scale and the feature name is shown in the title of each plot.
plotCounts_gg <- function(i, dds, intgroup) {
group <- if (length(intgroup) == 1) {
colData(dds)[[intgroup]]
} else if (length(intgroup) == 2) {
lvls <- as.vector(t(outer(levels(colData(dds)[[intgroup[1]]]),
levels(colData(dds)[[intgroup[2]]]), function(x,
y) paste(x, y, sep = " : "))))
droplevels(factor(apply(as.data.frame(colData(dds)[,
intgroup, drop = FALSE]), 1, paste, collapse = " : "),
levels = lvls))
} else {
factor(apply(as.data.frame(colData(dds)[, intgroup, drop = FALSE]),
1, paste, collapse = " : "))
}
data <- plotCounts(dds, gene=i, intgroup=intgroup, returnData = TRUE)
## Change in version 1.15.3
## It might not be necessary to have any of this if else, but I'm not
## sure that plotCounts(returnData) will always return the 'group' variable.
if('group' %in% colnames(data)) {
data$group <- group
} else {
data <- cbind(data, data.frame('group' = group))
}
ggplot(data, aes(x = group, y = count)) + geom_point() + ylab('Normalized count') + ggtitle(i) + coord_trans(y = "log10") + theme(axis.text.x = element_text(angle = 90, hjust = 1))
}
for(i in head(features, nBestFeatures)) {
print(plotCounts_gg(i, dds = dds, intgroup = intgroup))
}
The input for this report was generated with DESeq2 (Love, Huber, and Anders, 2014) using version 1.20.0 and the resulting features were called significantly differentially expressed if their BH adjusted p-values were less than alpha
= 0.1. This report was generated in path /tmp/RtmpJwmqSJ/Rbuild2402561d060c/recount/vignettes using the following call to DESeq2Report():
## DESeq2Report(dds = dds, project = "SRP009615", intgroup = c("group",
## "gene_target"), res = res, nBest = 10, nBestFeatures = 2,
## outdir = ".", output = "SRP009615-results", device = "png",
## template = "SRP009615-results-template.Rmd")
Date the report was generated.
## [1] "2018-07-29 21:08:50 EDT"
Wallclock time spent generating the report.
## Time difference of 49.966 secs
R
session information.
## Session info ----------------------------------------------------------------------------------------------------------
## setting value
## version R version 3.5.1 Patched (2018-07-12 r74967)
## system x86_64, linux-gnu
## ui X11
## language (EN)
## collate C
## tz America/New_York
## date 2018-07-29
## Packages --------------------------------------------------------------------------------------------------------------
## package * version date source
## acepack 1.4.1 2016-10-29 CRAN (R 3.5.1)
## annotate 1.58.0 2018-07-29 Bioconductor
## AnnotationDbi 1.42.1 2018-07-29 Bioconductor
## assertthat 0.2.0 2017-04-11 CRAN (R 3.5.1)
## backports 1.1.2 2017-12-13 CRAN (R 3.5.1)
## base * 3.5.1 2018-07-25 local
## base64enc 0.1-3 2015-07-28 CRAN (R 3.5.1)
## bibtex 0.4.2 2017-06-30 CRAN (R 3.5.1)
## bindr 0.1.1 2018-03-13 CRAN (R 3.5.1)
## bindrcpp 0.2.2 2018-03-29 CRAN (R 3.5.1)
## Biobase * 2.40.0 2018-07-29 Bioconductor
## BiocGenerics * 0.26.0 2018-07-29 Bioconductor
## BiocParallel * 1.14.2 2018-07-29 Bioconductor
## BiocStyle * 2.8.2 2018-07-29 Bioconductor
## biomaRt 2.36.1 2018-07-29 Bioconductor
## Biostrings 2.48.0 2018-07-29 Bioconductor
## bit 1.1-14 2018-05-29 CRAN (R 3.5.1)
## bit64 0.9-7 2017-05-08 CRAN (R 3.5.1)
## bitops 1.0-6 2013-08-17 CRAN (R 3.5.1)
## blob 1.1.1 2018-03-25 CRAN (R 3.5.1)
## bookdown 0.7 2018-02-18 CRAN (R 3.5.1)
## BSgenome 1.48.0 2018-07-29 Bioconductor
## bumphunter 1.22.0 2018-07-29 Bioconductor
## checkmate 1.8.5 2017-10-24 CRAN (R 3.5.1)
## cluster 2.0.7-1 2018-04-13 CRAN (R 3.5.1)
## codetools 0.2-15 2016-10-05 CRAN (R 3.5.1)
## colorspace 1.3-2 2016-12-14 CRAN (R 3.5.1)
## compiler 3.5.1 2018-07-25 local
## crayon 1.3.4 2017-09-16 CRAN (R 3.5.1)
## crosstalk 1.0.0 2016-12-21 CRAN (R 3.5.1)
## data.table 1.11.4 2018-05-27 CRAN (R 3.5.1)
## datasets * 3.5.1 2018-07-25 local
## DBI 1.0.0 2018-05-02 CRAN (R 3.5.1)
## DEFormats 1.8.0 2018-07-29 Bioconductor
## DelayedArray * 0.6.2 2018-07-29 Bioconductor
## derfinder 1.14.0 2018-07-29 Bioconductor
## derfinderHelper 1.14.0 2018-07-29 Bioconductor
## DESeq2 * 1.20.0 2018-07-29 Bioconductor
## devtools * 1.13.6 2018-06-27 CRAN (R 3.5.1)
## digest 0.6.15 2018-01-28 CRAN (R 3.5.1)
## doRNG 1.7.1 2018-06-22 CRAN (R 3.5.1)
## downloader 0.4 2015-07-09 CRAN (R 3.5.1)
## dplyr 0.7.6 2018-06-29 CRAN (R 3.5.1)
## DT * 0.4 2018-01-30 CRAN (R 3.5.1)
## edgeR 3.22.3 2018-07-29 Bioconductor
## evaluate 0.11 2018-07-17 CRAN (R 3.5.1)
## foreach 1.4.4 2017-12-12 CRAN (R 3.5.1)
## foreign 0.8-71 2018-07-20 CRAN (R 3.5.1)
## Formula 1.2-3 2018-05-03 CRAN (R 3.5.1)
## genefilter 1.62.0 2018-07-29 Bioconductor
## geneplotter 1.58.0 2018-07-29 Bioconductor
## GenomeInfoDb * 1.16.0 2018-07-29 Bioconductor
## GenomeInfoDbData 1.1.0 2018-07-25 Bioconductor
## GenomicAlignments 1.16.0 2018-07-29 Bioconductor
## GenomicFeatures 1.32.0 2018-07-29 Bioconductor
## GenomicFiles 1.16.0 2018-07-29 Bioconductor
## GenomicRanges * 1.32.6 2018-07-29 Bioconductor
## GEOquery 2.48.0 2018-07-29 Bioconductor
## ggplot2 * 3.0.0 2018-07-03 CRAN (R 3.5.1)
## glue 1.3.0 2018-07-17 CRAN (R 3.5.1)
## graphics * 3.5.1 2018-07-25 local
## grDevices * 3.5.1 2018-07-25 local
## grid 3.5.1 2018-07-25 local
## gridExtra 2.3 2017-09-09 CRAN (R 3.5.1)
## gtable 0.2.0 2016-02-26 CRAN (R 3.5.1)
## highr 0.7 2018-06-09 CRAN (R 3.5.1)
## Hmisc 4.1-1 2018-01-03 CRAN (R 3.5.1)
## hms 0.4.2 2018-03-10 CRAN (R 3.5.1)
## htmlTable 1.12 2018-05-26 CRAN (R 3.5.1)
## htmltools 0.3.6 2017-04-28 CRAN (R 3.5.1)
## htmlwidgets 1.2 2018-04-19 CRAN (R 3.5.1)
## httpuv 1.4.5 2018-07-19 CRAN (R 3.5.1)
## httr 1.3.1 2017-08-20 CRAN (R 3.5.1)
## IRanges * 2.14.10 2018-07-29 Bioconductor
## iterators 1.0.10 2018-07-13 CRAN (R 3.5.1)
## jsonlite 1.5 2017-06-01 CRAN (R 3.5.1)
## knitcitations * 1.0.8 2017-07-04 CRAN (R 3.5.1)
## knitr * 1.20 2018-02-20 CRAN (R 3.5.1)
## knitrBootstrap 1.0.2 2018-05-24 CRAN (R 3.5.1)
## labeling 0.3 2014-08-23 CRAN (R 3.5.1)
## later 0.7.3 2018-06-08 CRAN (R 3.5.1)
## lattice 0.20-35 2017-03-25 CRAN (R 3.5.1)
## latticeExtra 0.6-28 2016-02-09 CRAN (R 3.5.1)
## lazyeval 0.2.1 2017-10-29 CRAN (R 3.5.1)
## limma 3.36.2 2018-07-29 Bioconductor
## locfit 1.5-9.1 2013-04-20 CRAN (R 3.5.1)
## lubridate 1.7.4 2018-04-11 CRAN (R 3.5.1)
## magrittr 1.5 2014-11-22 CRAN (R 3.5.1)
## markdown 0.8 2017-04-20 CRAN (R 3.5.1)
## Matrix 1.2-14 2018-04-13 CRAN (R 3.5.1)
## matrixStats * 0.54.0 2018-07-23 CRAN (R 3.5.1)
## memoise 1.1.0 2017-04-21 CRAN (R 3.5.1)
## methods * 3.5.1 2018-07-25 local
## mime 0.5 2016-07-07 CRAN (R 3.5.1)
## munsell 0.5.0 2018-06-12 CRAN (R 3.5.1)
## nnet 7.3-12 2016-02-02 CRAN (R 3.5.1)
## parallel * 3.5.1 2018-07-25 local
## pheatmap * 1.0.10 2018-05-19 CRAN (R 3.5.1)
## pillar 1.3.0 2018-07-14 CRAN (R 3.5.1)
## pkgconfig 2.0.1 2017-03-21 CRAN (R 3.5.1)
## pkgmaker 0.27 2018-05-25 CRAN (R 3.5.1)
## plyr 1.8.4 2016-06-08 CRAN (R 3.5.1)
## prettyunits 1.0.2 2015-07-13 CRAN (R 3.5.1)
## progress 1.2.0 2018-06-14 CRAN (R 3.5.1)
## promises 1.0.1 2018-04-13 CRAN (R 3.5.1)
## purrr 0.2.5 2018-05-29 CRAN (R 3.5.1)
## qvalue 2.12.0 2018-07-29 Bioconductor
## R6 2.2.2 2017-06-17 CRAN (R 3.5.1)
## RColorBrewer * 1.1-2 2014-12-07 CRAN (R 3.5.1)
## Rcpp 0.12.18 2018-07-23 CRAN (R 3.5.1)
## RCurl 1.95-4.11 2018-07-15 CRAN (R 3.5.1)
## readr 1.1.1 2017-05-16 CRAN (R 3.5.1)
## recount * 1.6.3 2018-07-29 Bioconductor
## RefManageR 1.2.0 2018-04-25 CRAN (R 3.5.1)
## regionReport * 1.14.3 2018-07-29 Bioconductor
## registry 0.5 2017-12-03 CRAN (R 3.5.1)
## rentrez 1.2.1 2018-03-05 CRAN (R 3.5.1)
## reshape2 1.4.3 2017-12-11 CRAN (R 3.5.1)
## rlang 0.2.1 2018-05-30 CRAN (R 3.5.1)
## rmarkdown 1.10 2018-06-11 CRAN (R 3.5.1)
## rngtools 1.3.1 2018-05-15 CRAN (R 3.5.1)
## rpart 4.1-13 2018-02-23 CRAN (R 3.5.1)
## rprojroot 1.3-2 2018-01-03 CRAN (R 3.5.1)
## Rsamtools 1.32.2 2018-07-29 Bioconductor
## RSQLite 2.1.1 2018-05-06 CRAN (R 3.5.1)
## rstudioapi 0.7 2017-09-07 CRAN (R 3.5.1)
## rtracklayer 1.40.3 2018-07-29 Bioconductor
## S4Vectors * 0.18.3 2018-07-29 Bioconductor
## scales 0.5.0 2017-08-24 CRAN (R 3.5.1)
## shiny 1.1.0 2018-05-17 CRAN (R 3.5.1)
## splines 3.5.1 2018-07-25 local
## stats * 3.5.1 2018-07-25 local
## stats4 * 3.5.1 2018-07-25 local
## stringi 1.2.4 2018-07-20 CRAN (R 3.5.1)
## stringr 1.3.1 2018-05-10 CRAN (R 3.5.1)
## SummarizedExperiment * 1.10.1 2018-07-29 Bioconductor
## survival 2.42-6 2018-07-13 CRAN (R 3.5.1)
## tibble 1.4.2 2018-01-22 CRAN (R 3.5.1)
## tidyr 0.8.1 2018-05-18 CRAN (R 3.5.1)
## tidyselect 0.2.4 2018-02-26 CRAN (R 3.5.1)
## tools 3.5.1 2018-07-25 local
## utils * 3.5.1 2018-07-25 local
## VariantAnnotation 1.26.1 2018-07-29 Bioconductor
## withr 2.1.2 2018-03-15 CRAN (R 3.5.1)
## xfun 0.3 2018-07-06 CRAN (R 3.5.1)
## XML 3.98-1.12 2018-07-15 CRAN (R 3.5.1)
## xml2 1.2.0 2018-01-24 CRAN (R 3.5.1)
## xtable 1.8-2 2016-02-05 CRAN (R 3.5.1)
## XVector 0.20.0 2018-07-29 Bioconductor
## yaml 2.2.0 2018-07-25 CRAN (R 3.5.1)
## zlibbioc 1.26.0 2018-07-29 Bioconductor
Pandoc version used: 2.1.
This report was created with regionReport
(Collado-Torres, Jaffe, and Leek, 2016) using rmarkdown
(Allaire, Xie, McPherson, Luraschi, et al., 2018) while knitr
(Xie, 2014) and DT
(Xie, 2018) were running behind the scenes. pheatmap
(Kolde, 2018) was used to create the sample distances heatmap. Several plots were made with ggplot2
(Wickham, 2016).
Citations made with knitcitations
(Boettiger, 2017). The BibTeX file can be found here.
[1] J. Allaire, Y. Xie, J. McPherson, J. Luraschi, et al. rmarkdown: Dynamic Documents for R. R package version 1.10. 2018. URL: https://CRAN.R-project.org/package=rmarkdown.
[1] C. Boettiger. knitcitations: Citations for ‘Knitr’ Markdown Files. R package version 1.0.8. 2017. URL: https://CRAN.R-project.org/package=knitcitations.
## No encoding supplied: defaulting to UTF-8.
[1] R. Kolde. pheatmap: Pretty Heatmaps. R package version 1.0.10. 2018. URL: https://CRAN.R-project.org/package=pheatmap.
## No encoding supplied: defaulting to UTF-8.
[1] H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016. ISBN: 978-3-319-24277-4. URL: http://ggplot2.org.
[1] Y. Xie. DT: A Wrapper of the JavaScript Library ‘DataTables’. R package version 0.4. 2018. URL: https://CRAN.R-project.org/package=DT.
[1] Y. Xie. “knitr: A Comprehensive Tool for Reproducible Research in R”. In: Implementing Reproducible Computational Research. Ed. by V. Stodden, F. Leisch and R. D. Peng. ISBN 978-1466561595. Chapman and Hall/CRC, 2014. URL: http://www.crcpress.com/product/isbn/9781466561595.