Here, we describe a few additional analyses that can be performed with single-cell RNA sequencing data. This includes detection of significant correlations between genes and regressing out the effect of cell cycle from the gene expression matrix.
Cell cycle phase is usually uninteresting in studies focusing on other aspects of biology. However, the effects of cell cycle on the expression profile can mask other effects and interfere with the interpretation of the results. This cannot be avoided by simply removing cell cycle marker genes, as the cell cycle can affect a substantial number of other transcripts (Buettner et al. 2015). Rather, more sophisticated strategies are required, one of which is demonstrated below using data from a study of T Helper 2 (TH2) cells (Mahata et al. 2014).
library(BiocFileCache) bfc <- BiocFileCache("raw_data", ask = FALSE) mahata.fname <- bfcrpath(bfc, "http://www.nature.com/nbt/journal/v33/n2/extref/nbt.3102-S7.xlsx")
Buettner et al. (2015) have already applied quality control and normalized the data, so we can use them directly as log-expression values (accessible as Supplementary Data 1 of https://dx.doi.org/10.1038/nbt.3102).
library(readxl) incoming <- as.data.frame(read_excel(mahata.fname, sheet=1)) rownames(incoming) <- incoming[,1] incoming <- incoming[,-1] incoming <- incoming[,!duplicated(colnames(incoming))] # Remove duplicated genes. sce.th2 <- SingleCellExperiment(list(logcounts=t(incoming)))
We empirically identify the cell cycle phase using the pair-based classifier in
The majority of cells in Figure 2 seem to lie in G1 phase, with small numbers of cells in the other phases.
library(org.Mm.eg.db) ensembl <- mapIds(org.Mm.eg.db, keys=rownames(sce.th2), keytype="SYMBOL", column="ENSEMBL") set.seed(100) mm.pairs <- readRDS(system.file("exdata", "mouse_cycle_markers.rds", package="scran")) assignments <- cyclone(sce.th2, mm.pairs, gene.names=ensembl, assay.type="logcounts") plot(assignments$score$G1, assignments$score$G2M, xlab="G1 score", ylab="G2/M score", pch=16)
We can block directly on the phase scores in downstream analyses. This is more graduated than using a strict assignment of each cell to a specific phase, as the magnitude of the score considers the uncertainty of the assignment. The phase covariates in the design matrix will absorb any phase-related effects on expression such that they will not affect estimation of the effects of other experimental factors. Users should also ensure that the phase score is not confounded with other factors of interest. For example, model fitting is not possible if all cells in one experimental condition are in one phase, and all cells in another condition are in a different phase.
design <- model.matrix(~ G1 + G2M, assignments$score) dec.block <- modelGeneVar(sce.th2, design=design) library(limma) sce.th2.block <- sce.th2 assay(sce.th2.block, "corrected") <- removeBatchEffect( logcounts(sce.th2), covariates=design[,-1]) sce.th2.block <- denoisePCA(sce.th2.block, technical=dec.block, assay.type="corrected") dim(reducedDim(sce.th2.block, "PCA"))
##  81 5
The result of blocking on
design is visualized with some PCA plots in Figure 3.
Before removal, the distribution of cells along the first two principal components is strongly associated with their G1 and G2/M scores.
This is no longer the case after removal, which suggests that the cell cycle effect has been mitigated.
sce.th2$G1score <- sce.th2.block$G1score <- assignments$score$G1 sce.th2$G2Mscore <- sce.th2.block$G2Mscore <- assignments$score$G2M # Without blocking on phase score. dec.th2 <- modelGeneVar(sce.th2) sce.th2 <- denoisePCA(sce.th2, dec.th2) fontsize <- theme(axis.text=element_text(size=12), axis.title=element_text(size=16)) out <- plotReducedDim(sce.th2, dimred="PCA", ncomponents=2, colour_by="G1score", size_by="G2Mscore") + fontsize + ggtitle("Before removal") # After blocking on the phase score. out2 <- plotReducedDim(sce.th2.block, dimred="PCA", ncomponents=2, colour_by="G1score", size_by="G2Mscore") + fontsize + ggtitle("After removal") multiplot(out, out2, cols=2)