scater
: Single-cell analysis toolkit for expression in Rscater 1.6.3
This document gives an introduction to and overview of the functionality of the
scater
package.
The scater
package is contains tools to help with the analysis of single-cell
transcriptomic data, with the focus on RNA-seq data. The package features:
SingleCellExperiment
class as a data container for
interoperability with a wide range of other Bioconductor packages;kallisto
and
‘Salmon’ for rapid
quantification of transcript abundance and tight integration with scater
;To get up and running as quickly as possible, see the Quick Start section below. For see the various in-depth sections on various aspects of the functionality that follow.
NB: as of July 2017, scater
has switched from the SCESet
class previously
defined within the package to the more widely applicable SingleCellExperiment
class. The functions toSingleCellExperiment
and updateSCESet
(for backwards
compatibility) can be used to convert an old SCESet
object to a
SingleCellExperiment
object.
Assuming you have a matrix containing expression count data summarised at the
level of some features (gene, exon, region, etc.), then we first need to form a
SingleCellExperiment
object containing the data. A SingleCellExperiment
object is the basic data container used in scater
and many other Bioconductor
packages for single-cell data analysis.
Here we use the example data provided with the package, which gives us two objects, a matrix of counts and a dataframe with information about the cells we are studying:
suppressPackageStartupMessages(library(scater))
data("sc_example_counts")
data("sc_example_cell_info")
We use these objects to form a SingleCellExperiment
object containing all of
the necessary information for our analysis:
example_sce <- SingleCellExperiment(
assays = list(counts = sc_example_counts), colData = sc_example_cell_info)
We always expect to have (raw) count data in a SingleCellExperiment
object. In
almost all cases we will also want to have a log2-scale representation of the
data. We expect this to be stored as the exprs
assay.
Here we use log2-counts-per-million with an offset of 1 as the exprs
values.
exprs(example_sce) <- log2(
calculateCPM(example_sce, use.size.factors = FALSE) + 1)
Subsetting is very convenient with this class. For example, we can filter out features (genes) that are not expressed in any cells:
keep_feature <- rowSums(exprs(example_sce) > 0) > 0
example_sce <- example_sce[keep_feature,]
Now we have the expression data neatly stored in a structure that can be used for lots of exciting analyses.
It is straight-forward to compute many quality control metrics. We typically provide one or more sets of “feature controls”, that is sets of genes or features that represent technical features of the expression data or are not of primary biological interest. QC metrics are computed especially for these feature sets are used to assess the quality of cells. Spike-in genes (such as the commonly-used ERCC set) and mitochondrial genes are typically useful as “feature controls”. Here, for demonstration, we just use the first 40 features.
example_sce <- calculateQCMetrics(example_sce,
feature_controls = list(eg = 1:40))
Now you can play around with your data using the graphical user interface (GUI), which opens an interactive dashboard in your browser!
scater_gui(example_sce)
Many plotting functions are available for visualising the data:
plotScater
: a plot method exists for SingleCellExperiment
objects, which
gives an overview of expression across cells.plotQC
: various methods are available for producing QC diagnostic plots.plotPCA
: produce a principal components plot for the cells.plotTSNE
: produce a t-distributed stochastic neighbour embedding (reduced
dimension) plot for the cells.plotDiffusionMap
: produce a diffusion map (reduced dimension) plot for the
cells.plotMDS
: produce a multi-dimensional scaling plot for the cells.plotReducedDim
: plot a reduced-dimension representation of the cells.plotExpression
: plot expression levels for a defined set of features.plotPlatePosition
: plot cells in their position on a plate, coloured by
cell metadata and QC metrics or feature expression level.plotColData
: plot cell metadata and QC metrics.plotRowData
: plot feature metadata and QC metrics.More detail on all plotting methods is provided in the data visualisation vignette.
Visualisations can highlight features and cells to be filtered out, which can be done easily with the subsetting capabilities of scater.
The QC plotting functions also enable the identification of important experimental variables, which can be conditioned out in the data normalisation step.
After QC and data normalisation (methods are available in scater
), the dataset
is ready for downstream statistical modeling.
SingleCellExperiment
class, including
transitioning from SCESet
objects in previous versions of scater
, see the
"Transitioning from SCESet to SingleCellExperiment"
vignette;scater
for quality control, see the
Quality control with scater
vignette;scater
, see
the Data visualisation methods in scater
vignette;scater
, see the
Expression quantification and import
vignette.