This document shows typical Topconfects usage with limma, edgeR, or DESeq2.
The first step is to load a dataset. Here, we’re looking at RNA-seq data that
investigates the response of Arabodopsis thaliana to a bacterial pathogen.
Besides the experimental and control conditions, there is also a batch effect.
This dataset is also examined in section 4.2 of the edgeR
user manual, and
I’ve followed the initial filtering steps in the edgeR
manual.
library(topconfects)
library(NBPSeq)
library(edgeR)
library(limma)
library(dplyr)
library(ggplot2)
data(arab)
# Retrieve symbol for each gene
info <-
AnnotationDbi::select(
org.At.tair.db::org.At.tair.db,
rownames(arab), "SYMBOL") %>%
group_by(TAIR) %>%
summarize(
SYMBOL=paste(unique(na.omit(SYMBOL)),collapse="/"))
arab_info <-
info[match(rownames(arab),info$TAIR),] %>%
select(-TAIR) %>%
as.data.frame
rownames(arab_info) <- rownames(arab)
# Extract experimental design from sample names
Treat <- factor(substring(colnames(arab),1,4)) %>% relevel(ref="mock")
Time <- factor(substring(colnames(arab),5,5))
y <- DGEList(arab, genes=as.data.frame(arab_info))
# Keep genes with at least 3 samples having an RPM of more than 2
keep <- rowSums(cpm(y)>2) >= 3
y <- y[keep,,keep.lib.sizes=FALSE]
y <- calcNormFactors(y)
design <- model.matrix(~Time+Treat)
design[,]
## (Intercept) Time2 Time3 Treathrcc
## 1 1 0 0 0
## 2 1 1 0 0
## 3 1 0 1 0
## 4 1 0 0 1
## 5 1 1 0 1
## 6 1 0 1 1
fit <-
voom(y, design) %>%
lmFit(design)
Find largest fold changes that we can be confident of at FDR 0.05.
confects <- limma_confects(fit, coef="Treathrcc", fdr=0.05)
confects
## $table
## confect effect AveExpr name SYMBOL
## 1 3.090 4.849 6.567 AT3G46280
## 2 2.895 4.331 9.155 AT4G12500
## 3 2.895 4.493 6.053 AT2G19190 FRK1/SIRK
## 4 2.608 3.839 9.166 AT4G12490 AZI3
## 5 2.608 3.952 6.615 AT1G51800 IOS1
## 6 2.608 4.299 5.440 AT2G39530 CASPL4D1
## 7 2.606 3.908 7.899 AT2G39200 ATMLO12/MLO12
## 8 2.606 3.710 8.729 AT5G64120 AtPRX71/PRX71
## 9 2.595 5.346 1.807 AT5G31702
## 10 2.563 4.888 2.383 AT2G08986
## ...
## 2184 of 16526 non-zero log2 fold change at FDR 0.05
## Prior df 18.2
Here the usual logFC
values estimated by limma
are shown as dots, with lines
to the confect
value.
confects_plot(confects)
confects_plot_me
overlays the confects (black) on a Mean-Difference Plot
(grey) (as might be produced by limma::plotMD
). As we should expect, the very
noisy differences with low mean expression are removed if we look at the
confects.
confects_plot_me(confects)
Let’s compare this to the ranking we obtain from topTable
.
fit_eb <- eBayes(fit)
top <- topTable(fit_eb, coef="Treathrcc", n=Inf)
rank_rank_plot(confects$table$name, rownames(top), "limma_confects", "topTable")