BiocStyle 2.24.0
Single-cell ’omics analysis enables high-resolution characterization of heterogeneous populations of cells by quantifying measurements in individual cells and thus provides a fuller, more nuanced picture into the complexity and heterogeneity between cells. However, the data also present new and significant challenges as compared to previous approaches, especially as single-cell data are much larger and sparser than data generated from bulk sequencing methods. Dimension reduction is a key step in the single-cell analysis to address the high dimension and sparsity of these data, and to enable the application of more complex, computationally expensive downstream pipelines.
Correspondence analysis (CA) is a matrix factorization method, and is similar to
principal components analysis (PCA). Whereas PCA is designed for application to
continuous, approximately normally distributed data, CA is appropriate for
non-negative, count-based data that are in the same additive scale. corral
implements CA for dimensionality reduction of a single matrix of single-cell data.
See the vignette for corralm
for the multi-table adaptation of CA for single-cell batch alignment/integration.
corral can be used with various types of input. When called on a matrix (or other matrix-like object), it returns a list with the SVD output, principal coordinates, and standard coordinates. When called on a SingleCellExperiment, it returns the SingleCellExperiment with the corral embeddings in the reducedDim
slot named corral
. To retrieve the full list output from a SingleCellExperiment
input, the fullout
argument can be set to TRUE
.
We will use the Zhengmix4eq
dataset from the DuoClustering2018 package.
library(corral)
library(SingleCellExperiment)
library(ggplot2)
library(DuoClustering2018)
zm4eq.sce <- sce_full_Zhengmix4eq()
zm8eq <- sce_full_Zhengmix8eq()
This dataset includes approximately 4,000 pre-sorted and annotated cells of 4 types mixed by Duo et al. in approximately equal proportions (Duò, Robinson, and Soneson, n.d.). The cells were sampled from a “Massively parallel digital transcriptional profiling of single cells” (Zheng et al. 2017).
zm4eq.sce
## class: SingleCellExperiment
## dim: 15568 3994
## metadata(1): log.exprs.offset
## assays(3): counts logcounts normcounts
## rownames(15568): ENSG00000237683 ENSG00000228327 ... ENSG00000215700
## ENSG00000215699
## rowData names(10): id symbol ... total_counts log10_total_counts
## colnames(3994): b.cells1147 b.cells6276 ... regulatory.t1084
## regulatory.t9696
## colData names(14): dataset barcode ... libsize.drop feature.drop
## reducedDimNames(2): PCA TSNE
## mainExpName: NULL
## altExpNames(0):
table(colData(zm4eq.sce)$phenoid)
##
## b.cells cd14.monocytes naive.cytotoxic regulatory.t
## 999 1000 998 997
corral
on SingleCellExperimentWe will run corral
directly on the raw count data:
zm4eq.sce <- corral(inp = zm4eq.sce,
whichmat = 'counts')
zm4eq.sce
## class: SingleCellExperiment
## dim: 15568 3994
## metadata(1): log.exprs.offset
## assays(3): counts logcounts normcounts
## rownames(15568): ENSG00000237683 ENSG00000228327 ... ENSG00000215700
## ENSG00000215699
## rowData names(10): id symbol ... total_counts log10_total_counts
## colnames(3994): b.cells1147 b.cells6276 ... regulatory.t1084
## regulatory.t9696
## colData names(14): dataset barcode ... libsize.drop feature.drop
## reducedDimNames(3): PCA TSNE corral
## mainExpName: NULL
## altExpNames(0):
We can use plot_embedding
to visualize the output:
plot_embedding_sce(sce = zm4eq.sce,
which_embedding = 'corral',
plot_title = 'corral on Zhengmix4eq',
color_attr = 'phenoid',
color_title = 'cell type',
saveplot = FALSE)