1 Introduction

This document gives an introduction to and overview of the quality control functionality of the scater package. scater contains tools to help with the analysis of single-cell transcriptomic data, focusing on low-level steps such as quality control, normalization and visualization. It is based on the SingleCellExperiment class (from the SingleCellExperiment package), and thus is interoperable with many other Bioconductor packages such as scran, batchelor and iSEE.

Note: A more comprehensive description of the use of scater (along with other packages) in a scRNA-seq analysis workflow is available at https://osca.bioconductor.org.

2 Setting up the data

2.1 Generating a `SingleCellExperiment` object

We assume that you have a matrix containing expression count data summarised at the level of some features (gene, exon, region, etc.). First, we create a SingleCellExperiment object containing the data, as demonstrated below with a famous brain dataset. Rows of the object correspond to features, while columns correspond to samples, i.e., cells in the context of single-cell ’omics data.

library(scRNAseq)
example_sce <- ZeiselBrainData()
example_sce

## class: SingleCellExperiment 
## dim: 20006 3005 
## metadata(0):
## assays(1): counts
## rownames(20006): Tspan12 Tshz1 ... mt-Rnr1 mt-Nd4l
## rowData names(1): featureType
## colnames(3005): 1772071015_C02 1772071017_G12 ... 1772066098_A12
##   1772058148_F03
## colData names(10): tissue group # ... level1class level2class
## reducedDimNames(0):
## spikeNames(0):
## altExpNames(2): ERCC repeat

We usually expect (raw) count data to be labelled as "counts" in the assays, which can be easily retrieved with the counts accessor. Getters and setters are also provided for exprs, tpm, cpm, fpkm and versions of these with the prefix norm_.

str(counts(example_sce))

Row and column-level metadata are easily accessed (or modified) as shown below. There are also dedicated getters and setters for size factor values (sizeFactors()); reduced dimensionality results (reducedDim()); and alternative experimental features (altExp()).

example_sce$whee <- sample(LETTERS, ncol(example_sce), replace=TRUE)
colData(example_sce)

## DataFrame with 3005 rows and 11 columns
##                        tissue   group # total mRNA mol      well       sex
##                   <character> <numeric>      <numeric> <numeric> <numeric>
## 1772071015_C02       sscortex         1           1221         3         3
## 1772071017_G12       sscortex         1           1231        95         1
## 1772071017_A05       sscortex         1           1652        27         1
## 1772071014_B06       sscortex         1           1696        37         3
## 1772067065_H06       sscortex         1           1219        43         3
## ...                       ...       ...            ...       ...       ...
## 1772067059_B04 ca1hippocampus         9           1997        19         1
## 1772066097_D04 ca1hippocampus         9           1415        21         1
## 1772063068_D01       sscortex         9           1876        34         3
## 1772066098_A12 ca1hippocampus         9           1546        88         1
## 1772058148_F03       sscortex         9           1970        15         3
##                      age  diameter        cell_id       level1class level2class
##                <numeric> <numeric>    <character>       <character> <character>
## 1772071015_C02         2         1 1772071015_C02      interneurons       Int10
## 1772071017_G12         1       353 1772071017_G12      interneurons       Int10
## 1772071017_A05         1        13 1772071017_A05      interneurons        Int6
## 1772071014_B06         2        19 1772071014_B06      interneurons       Int10
## 1772067065_H06         6        12 1772067065_H06      interneurons        Int9
## ...                  ...       ...            ...               ...         ...
## 1772067059_B04         4       382 1772067059_B04 endothelial-mural       Peric
## 1772066097_D04         7        12 1772066097_D04 endothelial-mural        Vsmc
## 1772063068_D01         7       268 1772063068_D01 endothelial-mural        Vsmc
## 1772066098_A12         7       324 1772066098_A12 endothelial-mural        Vsmc
## 1772058148_F03         7         6 1772058148_F03 endothelial-mural        Vsmc
##                       whee
##                <character>
## 1772071015_C02           F
## 1772071017_G12           A
## 1772071017_A05           H
## 1772071014_B06           X
## 1772067065_H06           X
## ...                    ...
## 1772067059_B04           T
## 1772066097_D04           H
## 1772063068_D01           K
## 1772066098_A12           E
## 1772058148_F03           Y

rowData(example_sce)$stuff <- runif(nrow(example_sce))
rowData(example_sce)

## DataFrame with 20006 rows and 2 columns
##          featureType             stuff
##          <character>         <numeric>
## Tspan12   endogenous 0.531340830726549
## Tshz1     endogenous 0.245747287524864
## Fnbp1l    endogenous 0.841682275990024
## Adamts15  endogenous  0.47632492124103
## Cldn12    endogenous 0.631566006690264
## ...              ...               ...
## mt-Co2          mito 0.542126515181735
## mt-Co1          mito 0.915390015114099
## mt-Rnr2         mito 0.665483738295734
## mt-Rnr1         mito 0.612728938227519
## mt-Nd4l         mito 0.610844046343118

Subsetting is very convenient with this class, as both data and metadata are processed in a synchronized manner. More details about the SingleCellExperiment class can be found in the documentation for SingleCellExperiment package.

2.2 Other methods of data import

Count matrices stored as CSV files or equivalent can be easily read into R session using read.table() from utils or fread() from the data.table package. It is advisable to coerce the resulting object into a matrix before storing it in a SingleCellExperiment object.

For large data sets, the matrix can be read in chunk-by-chunk with progressive coercion into a sparse matrix from the Matrix package. This is performed using the readSparseCounts() function and reduces memory usage by not explicitly storing zeroes in memory.

Data from 10X Genomics experiments can be read in using the read10xCounts function from the DropletUtils package. This will automatically generate a SingleCellExperiment with a sparse matrix, see the documentation for more details.

Transcript abundances from the kallisto and Salmon pseudo-aligners can be imported using methods from the tximeta package. This produces a SummarizedExperiment object that can be coerced into a SingleCellExperiment simply with as(se, "SingleCellExperiment").

3 Quality control

3.1 Background

scater provides functionality for three levels of quality control (QC):

QC and filtering of cells
QC and filtering of features (genes)
QC of experimental variables

3.2 Cell-level QC

3.2.1 Definition of metrics

Cell-level metrics are computed by the perCellQCMetrics() function and include:

sum: total number of counts for the cell (i.e., the library size).
detected: the number of features for the cell that have counts above the detection limit (default of zero).
subsets_X_percent: percentage of all counts that come from the feature control set named X.

library(scater)
per.cell <- perCellQCMetrics(example_sce, 
    subsets=list(Mito=grep("mt-", rownames(example_sce))))
summary(per.cell$sum)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    2574    8130   12913   14954   19284   63505

summary(per.cell$detected)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     785    2484    3656    3777    4929    8167

summary(per.cell$subsets_Mito_percent)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   3.992   6.653   7.956  10.290  56.955

It is often convenient to store this in the colData() of our SingleCellExperiment object for future reference. (In fact, the addPerCellQC() function will do this automatically.)

colData(example_sce) <- cbind(colData(example_sce), per.cell)

3.2.2 Diagnostic plots

Metadata variables can be plotted against each other using the plotColData() function, as shown below. We expect to see an increasing number of detected genes with increasing total count. Each point represents a cell that is coloured according to its tissue of origin.

plotColData(example_sce, x = "sum", y="detected", colour_by="tissue")

Here, we have plotted the total count for each cell against the mitochondrial content. Well-behaved cells should have a large number of expressed features and and a low percentage of expression from feature controls. High percentage expression from feature controls and few expressed features are indicative of blank and failed cells. For some variety, we have faceted by the tissue of origin.

plotColData(example_sce, x = "sum", y="subsets_Mito_percent", 
    other_fields="tissue") + facet_wrap(~tissue)

3.2.3 Identifying low-quality cells

Column subsetting of the SingeCellExperiment object will only retain the selected cells, thus removing low-quality or otherwise unwanted cells. We can identify high-quality cells to retain by setting a fixed threshold on particular metrics. For example, we could retain only cells that have at least 100,000 total counts and at least 500 expressed features:

keep.total <- example_sce$sum > 1e5
keep.n <- example_sce$detected > 500
filtered <- example_sce[,keep.total & keep.n]
dim(filtered)

## [1] 20006     0

The isOutlier function provides a more data-adaptive way of choosing these thresholds. This defines the threshold at a certain number of median absolute deviations (MADs) away from the median. Values beyond this threshold are considered outliers and can be filtered out, assuming that they correspond to low-quality cells. Here, we define small outliers (using type="lower") for the log-total counts at 3 MADs from the median.

keep.total <- isOutlier(per.cell$sum, type="lower", log=TRUE)
filtered <- example_sce[,keep.total]

Detection of outliers can be achieved more conveniently for several common metrics using the quickPerCellQC() function. This uses the total count, number of detected features and the percentage of counts in gene sets of diagnostic value (e.g., mitochondrial genes, spike-in transcripts) to identify which cells to discard and for what reason.

qc.stats <- quickPerCellQC(per.cell, percent_subsets="subsets_Mito_percent")
colSums(as.matrix(qc.stats))

##              low_lib_size            low_n_features high_subsets_Mito_percent 
##                         0                         3                       128 
##                   discard 
##                       131

filtered <- example_sce[,!qc.stats$discard]

The isOutlier approach adjusts to experiment-specific aspects of the data, e.g., sequencing depth, amount of spike-in RNA added, cell type. In contrast, a fixed threshold would require manual adjustment to account for changes to the experimental protocol or system. We refer readers to the simpleSingleCell workflow for more details.

3.3 Feature-level QC

3.3.1 Definition of metrics

Feature-level metrics are computed by the perFeatureQCMetrics() function and include:

mean: the mean count of the gene/feature across all cells.
detected: the percentage of cells with non-zero counts for each gene.
subsets_Y_ratio: ratio of mean counts between the cell control set named Y and all cells.

# Pretending that the first 10 cells are empty wells, for demonstration.
per.feat <- perFeatureQCMetrics(example_sce, subsets=list(Empty=1:10))
summary(per.feat$mean)

##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##   0.0007   0.0097   0.1338   0.7475   0.5763 732.1524

summary(per.feat$detected)

##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##  0.03328  0.76539  9.01830 18.87800 31.24792 99.96672

summary(per.feat$subsets_Empty_ratio)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   0.000   0.601   1.872   2.016 300.500

A more refined calculation of the average is provided by the calculateAverage() function, which adjusts the counts by the relative library size (or size factor) prior to taking the mean.

ave <- calculateAverage(example_sce)
summary(ave)

##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##   0.0002   0.0109   0.1443   0.7475   0.5674 850.6880

We can also compute the number of cells expressing a gene directly.

summary(nexprs(example_sce, byrow=TRUE))

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     1.0    23.0   271.0   567.3   939.0  3004.0

3.3.2 Diagnostic plots

We look at a plot that shows the top 50 (by default) most-expressed features. Each row in the plot below corresponds to a gene, and each bar corresponds to the expression of a gene in a single cell. The circle indicates the median expression of each gene, with which genes are sorted. By default, “expression” is defined using the feature counts (if available), but other expression values can be used instead by changing exprs_values.

plotHighestExprs(example_sce, exprs_values = "counts")

We expect to see the “usual suspects”, i.e., mitochondrial genes, actin, ribosomal protein, MALAT1. A few spike-in transcripts may also be present here, though if all of the spike-ins are in the top 50, it suggests that too much spike-in RNA was added. A large number of pseudo-genes or predicted genes may indicate problems with alignment.

3.3.3 Subsetting by row

Genes can be removed by row subsetting of the SingleCellExperiment object. For example, we can filter out features (genes) that are not expressed in any cells:

keep_feature <- nexprs(example_sce, byrow=TRUE) > 0
example_sce <- example_sce[keep_feature,]
dim(example_sce)

## [1] 20006  3005

Other filtering can be done using existing annotation. For example, ribosomal protein genes and predicted genes can be identified (and removed) using regular expressions or biotype information. Such genes are often uninteresting when the aim is to characterize population heterogeneity.

3.4 Variable-level QC

Variable-level metrics are computed by the getVarianceExplained() function (after normalization, see below). This calculates the percentage of variance of each gene’s expression that is explained by each variable in the colData of the SingleCellExperiment object.

example_sce <- logNormCounts(example_sce) # see below.
vars <- getVarianceExplained(example_sce, 
    variables=c("tissue", "total mRNA mol", "sex", "age"))
head(vars)

##              tissue total mRNA mol         sex        age
## Tspan12  0.02207262    0.074086504 0.146344996 0.09472155
## Tshz1    3.36083014    0.003846487 0.001079356 0.31262288
## Fnbp1l   0.43597185    0.421086301 0.003071630 0.64964174
## Adamts15 0.54233888    0.005348505 0.030821621 0.01393787
## Cldn12   0.03506751    0.309128294 0.008341408 0.02363737
## Rxfp1    0.18559637    0.016290703 0.055646799 0.02128006

We can then use this to determine which experimental factors are contributing most to the variance in expression. This is useful for diagnosing batch effects or to quickly verify that a treatment has an effect.

plotExplanatoryVariables(vars)

4 Computing expression values

4.1 Normalization for library size differences

The most commonly used function is logNormCounts(), which calculates log₂-transformed normalized expression values. This is done by dividing each count by its size factor, adding a pseudo-count and log-transforming. The resulting values can be interpreted on the same scale as log-transformed counts, and are stored in "logcounts".

example_sce <- logNormCounts(example_sce)
assayNames(example_sce)

## [1] "counts"    "logcounts"

By default, the size factor is automatically computed from the library size of each cell using the librarySizeFactors() function. This calculation simply involves scaling the library sizes so that they have a mean of 1 across all cells. However, if size factors are explicitly provided in the SingleCellExperiment, they will be used by the normalization functions.

summary(librarySizeFactors(example_sce))

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.1721  0.5437  0.8635  1.0000  1.2895  4.2466

Alternatively, we can calculate counts-per-million using the aptly-named calculateCPM() function. The output is most appropriately stored as an assay named "cpm" in the assays of the SingleCellExperiment object. Related functions include calculateTPM() and calculateFPKM(), which do pretty much as advertised.

cpm(example_sce) <- calculateCPM(example_sce)

Of course, users can construct any arbitrary matrix of the same dimensions as the count matrix and store it as an assay.

assay(example_sce, "normed") <- normalizeCounts(example_sce, 
    size_factors=runif(ncol(example_sce)), pseudo_count=1.5)

4.2 Visualizing expression values

The plotExpression() function makes it easy to plot expression values for a subset of genes or features. This can be particularly useful for further examination of features identified from differential expression testing, pseudotime analysis or other analyses. By default, it uses expression values in the "logcounts" assay, but this can be changed through the exprs_values argument.

plotExpression(example_sce, rownames(example_sce)[1:6], x = "level1class")

Setting x will determine the covariate to be shown on the x-axis. This can be a field in the column metadata or the name of a feature (to obtain the expression profile across cells). Categorical covariates will yield grouped violins as shown above, with one panel per feature. By comparison, continuous covariates will generate a scatter plot in each panel, as shown below.

plotExpression(example_sce, rownames(example_sce)[1:6],
    x = rownames(example_sce)[10])

The points can also be coloured, shaped or resized by the column metadata or expression values.

plotExpression(example_sce, rownames(example_sce)[1:6],
    x = "level1class", colour_by="tissue")

Directly plotting the gene expression without any x or other visual parameters will generate a set of grouped violin plots, coloured in an aesthetically pleasing manner.

plotExpression(example_sce, rownames(example_sce)[1:6])

5 Dimensionality reduction

5.1 Principal components analysis

Principal components analysis (PCA) is often performed to denoise and compact the data prior to downstream analyses. The runPCA() function provides a simple wrapper around the base machinery in BiocSingular for computing PCs from log-transformed expression values. This stores the output in the reducedDims slot of the SingleCellExperiment, which can be easily retrieved (along with the percfentage of variance explained by each PC) as shown below:

example_sce <- runPCA(example_sce)
str(reducedDim(example_sce, "PCA"))

##  num [1:3005, 1:50] 15.4 15 17.2 16.9 18.4 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : chr [1:3005] "1772071015_C02" "1772071017_G12" "1772071017_A05" "1772071014_B06" ...
##   ..$ : chr [1:50] "PC1" "PC2" "PC3" "PC4" ...
##  - attr(*, "percentVar")= num [1:50] 39.72 9.38 4.25 3.9 2.76 ...

By default, runPCA() uses the top 500 genes with the highest variances to compute the first PCs. This can be tuned by specifying subset_row to pass in an explicit set of genes of interest, and by using ncomponents to determine the number of components to compute. The name argument can also be used to change the name of the result in the reducedDims slot.

example_sce <- runPCA(example_sce, name="PCA2",
    subset_row=rownames(example_sce)[1:1000],
    ncomponents=25)
str(reducedDim(example_sce, "PCA2"))

##  num [1:3005, 1:25] 20 21 23 23.7 21.5 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : chr [1:3005] "1772071015_C02" "1772071017_G12" "1772071017_A05" "1772071014_B06" ...
##   ..$ : chr [1:25] "PC1" "PC2" "PC3" "PC4" ...
##  - attr(*, "percentVar")= num [1:25] 22.3 5.11 3.42 1.69 1.58 ...

5.2 Other dimensionality reduction methods

$t$-distributed stochastic neighbour embedding ($t$-SNE) is widely used for visualizing complex single-cell data sets. The same procedure described for PCA plots can be applied to generate $t$-SNE plots using plotTSNE, with coordinates obtained using runTSNE via the Rtsne package. We strongly recommend generating plots with different random seeds and perplexity values, to ensure that any conclusions are robus t to different visualizations.

# Perplexity of 10 just chosen here arbitrarily.
set.seed(1000)
example_sce <- runTSNE(example_sce, perplexity=10)
head(reducedDim(example_sce, "TSNE"))

##                     [,1]      [,2]
## 1772071015_C02 -49.70928 -12.34376
## 1772071017_G12 -52.60643 -10.83418
## 1772071017_A05 -49.34541 -12.32723
## 1772071014_B06 -52.17084 -10.59969
## 1772067065_H06 -51.75725 -10.20202
## 1772071017_E02 -52.71614 -10.29159

A more common pattern involves using the pre-existing PCA results as input into the $t$-SNE algorithm. This is useful as it improves speed by using a low-rank approximation of the expression matrix; and reduces random noise, by focusing on the major factors of variation. The code below uses the first 10 dimensions of the previously computed PCA result to perform the $t$-SNE.

set.seed(1000)
example_sce <- runTSNE(example_sce, perplexity=50, 
    dimred="PCA", n_dimred=10)
head(reducedDim(example_sce, "TSNE"))

##                     [,1]        [,2]
## 1772071015_C02 -37.91782  0.77874503
## 1772071017_G12 -35.71181  0.02937304
## 1772071017_A05 -38.31355  1.25472517
## 1772071014_B06 -38.34157 -0.43358953
## 1772067065_H06 -40.73331 -1.10637835
## 1772071017_E02 -38.59131  2.41607361

The same can be done for uniform manifold with approximate projection (UMAP) via the runUMAP() function, itself based on the uwot package.

example_sce <- runUMAP(example_sce)
head(reducedDim(example_sce, "UMAP"))

##                     [,1]      [,2]
## 1772071015_C02 -13.13473 -2.747997
## 1772071017_G12 -13.20766 -2.790565
## 1772071017_A05 -13.01717 -2.715320
## 1772071014_B06 -13.17852 -2.794246
## 1772067065_H06 -13.22176 -2.828697
## 1772071017_E02 -13.21017 -2.810026

5.3 Visualizing reduced dimensions

Any dimensionality reduction result can be plotted using the plotReducedDim function. Here, each point represents a cell and is coloured according to its cell type label.

plotReducedDim(example_sce, dimred = "PCA", colour_by = "level1class")

Some result types have dedicated wrappers for convenience, e.g., plotTSNE() for $t$-SNE results:

plotTSNE(example_sce, colour_by = "Snap25")

The dedicated plotPCA() function also adds the percentage of variance explained to the axes:

plotPCA(example_sce, colour_by="Mog")

Multiple components can be plotted in a series of pairwise plots. When more than two components are plotted, the diagonal boxes in the scatter plot matrix show the density for each component.

example_sce <- runPCA(example_sce, ncomponents=20)
plotPCA(example_sce, ncomponents = 4, colour_by = "level1class")

We separate the execution of these functions from the plotting to enable the same coordinates to be re-used across multiple plots. This avoids repeatedly recomputing those coordinates just to change an aesthetic across plots.

6 Transitioning from the `SCESet` class

As of July 2017, scater has switched from the SCESet class previously defined within the package to the more widely applicable SingleCellExperiment class. From Bioconductor 3.6 (October 2017), the release version of scater will use SingleCellExperiment. SingleCellExperiment is a more modern and robust class that provides a common data structure used by many single-cell Bioconductor packages. Advantages include support for sparse data matrices and the capability for on-disk storage of data to minimise memory usage for large single-cell datasets.

It should be straight-forward to convert existing scripts based on SCESet objects to SingleCellExperiment objects, with key changes outlined immediately below.

The functions toSingleCellExperiment and updateSCESet (for backwards compatibility) can be used to convert an old SCESet object to a SingleCellExperiment object;
Create a new SingleCellExperiment object with the function SingleCellExperiment (actually less fiddly than creating a new SCESet);
scater functions have been refactored to take SingleCellExperiment objects, so once data is in a SingleCellExperiment object, the user experience is almost identical to that with the SCESet class.

Users may need to be aware of the following when updating their own scripts:

Cell names can now be accessed/assigned with the colnames function (instead of sampleNames or cellNames for an SCESet object);
Feature (gene/transcript) names should now be accessed/assigned with the rownames function (instead of featureNames);
Cell metadata, stored as phenoData in an SCESet, corresponds to colData in a SingleCellExperiment object and is accessed/assigned with the colData function (this replaces the pData function);
Individual cell-level variables can still be accessed with the $ operator (e.g. sce$sum);
Feature metadata, stored as featureData in an SCESet, corresponds to rowData in a SingleCellExperiment object and is accessed/assigned with the rowData function (this replaces the fData function);
plotScater, which produces a cumulative expression, overview plot, replaces the generic plot function for SCESet objects.

Session information

sessionInfo()

## R version 3.6.1 (2019-07-05)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.3 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.10-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.10-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] parallel  stats4    stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] scater_1.14.6               ggplot2_3.2.1              
##  [3] scRNAseq_2.0.2              SingleCellExperiment_1.8.0 
##  [5] SummarizedExperiment_1.16.0 DelayedArray_0.12.0        
##  [7] BiocParallel_1.20.0         matrixStats_0.55.0         
##  [9] Biobase_2.46.0              GenomicRanges_1.38.0       
## [11] GenomeInfoDb_1.22.0         IRanges_2.20.1             
## [13] S4Vectors_0.24.1            BiocGenerics_0.32.0        
## [15] BiocStyle_2.14.2           
## 
## loaded via a namespace (and not attached):
##  [1] bitops_1.0-6                  bit64_0.9-7                  
##  [3] httr_1.4.1                    tools_3.6.1                  
##  [5] backports_1.1.5               R6_2.4.1                     
##  [7] irlba_2.3.3                   vipor_0.4.5                  
##  [9] uwot_0.1.5                    DBI_1.1.0                    
## [11] lazyeval_0.2.2                colorspace_1.4-1             
## [13] withr_2.1.2                   tidyselect_0.2.5             
## [15] gridExtra_2.3                 bit_1.1-14                   
## [17] curl_4.3                      compiler_3.6.1               
## [19] BiocNeighbors_1.4.1           labeling_0.3                 
## [21] bookdown_0.16                 scales_1.1.0                 
## [23] rappdirs_0.3.1                stringr_1.4.0                
## [25] digest_0.6.23                 rmarkdown_2.0                
## [27] XVector_0.26.0                pkgconfig_2.0.3              
## [29] htmltools_0.4.0               dbplyr_1.4.2                 
## [31] fastmap_1.0.1                 rlang_0.4.2                  
## [33] RSQLite_2.1.4                 FNN_1.1.3                    
## [35] shiny_1.4.0                   DelayedMatrixStats_1.8.0     
## [37] farver_2.0.1                  dplyr_0.8.3                  
## [39] RCurl_1.95-4.12               magrittr_1.5                 
## [41] BiocSingular_1.2.0            GenomeInfoDbData_1.2.2       
## [43] Matrix_1.2-18                 Rcpp_1.0.3                   
## [45] ggbeeswarm_0.6.0              munsell_0.5.0                
## [47] viridis_0.5.1                 lifecycle_0.1.0              
## [49] stringi_1.4.3                 yaml_2.2.0                   
## [51] zlibbioc_1.32.0               plyr_1.8.5                   
## [53] Rtsne_0.15                    BiocFileCache_1.10.2         
## [55] AnnotationHub_2.18.0          grid_3.6.1                   
## [57] blob_1.2.0                    promises_1.1.0               
## [59] ExperimentHub_1.12.0          crayon_1.3.4                 
## [61] lattice_0.20-38               cowplot_1.0.0                
## [63] zeallot_0.1.0                 knitr_1.26                   
## [65] pillar_1.4.2                  reshape2_1.4.3               
## [67] glue_1.3.1                    BiocVersion_3.10.1           
## [69] evaluate_0.14                 RcppParallel_4.4.4           
## [71] BiocManager_1.30.10           vctrs_0.2.0                  
## [73] httpuv_1.5.2                  gtable_0.3.0                 
## [75] purrr_0.3.3                   assertthat_0.2.1             
## [77] xfun_0.11                     rsvd_1.0.2                   
## [79] mime_0.7                      xtable_1.8-4                 
## [81] RSpectra_0.16-0               later_1.0.0                  
## [83] viridisLite_0.3.0             tibble_2.1.3                 
## [85] AnnotationDbi_1.48.0          beeswarm_0.2.3               
## [87] memoise_1.1.0                 interactiveDisplayBase_1.24.0

Single-cell analysis toolkit for expression in R

Revised: November 2, 2019

Package

1 Introduction

2 Setting up the data

2.1 Generating a `SingleCellExperiment` object

2.2 Other methods of data import

3 Quality control

3.1 Background

3.2 Cell-level QC

3.2.1 Definition of metrics

3.2.2 Diagnostic plots

3.2.3 Identifying low-quality cells

3.3 Feature-level QC

3.3.1 Definition of metrics

3.3.2 Diagnostic plots

3.3.3 Subsetting by row

3.4 Variable-level QC

4 Computing expression values

4.1 Normalization for library size differences

4.2 Visualizing expression values

5 Dimensionality reduction

5.1 Principal components analysis

5.2 Other dimensionality reduction methods

5.3 Visualizing reduced dimensions

6 Transitioning from the `SCESet` class

Session information

Single-cell analysis toolkit for expression in R

Revised: November 2, 2019

Package

1 Introduction

2 Setting up the data

2.1 Generating a SingleCellExperiment object

2.2 Other methods of data import

3 Quality control

3.1 Background

3.2 Cell-level QC

3.2.1 Definition of metrics

3.2.2 Diagnostic plots

3.2.3 Identifying low-quality cells

3.3 Feature-level QC

3.3.1 Definition of metrics

3.3.2 Diagnostic plots

3.3.3 Subsetting by row

3.4 Variable-level QC

4 Computing expression values

4.1 Normalization for library size differences

4.2 Visualizing expression values

5 Dimensionality reduction

5.1 Principal components analysis

5.2 Other dimensionality reduction methods

5.3 Visualizing reduced dimensions

6 Transitioning from the SCESet class

Session information

2.1 Generating a `SingleCellExperiment` object

6 Transitioning from the `SCESet` class