This document gives an introduction to and overview of the functionality of the scater package.

The scater package is contains tools to help with the analysis of single-cell transcriptomic data, with the focus on RNA-seq data. The package features:

Use of the SingleCellExperiment class as a data container for interoperability with a wide range of other Bioconductor packages;
Wrappers to kallisto and ‘Salmon’ for rapid quantification of transcript abundance and tight integration with scater;
Simple calculation of many quality control metrics from the expression data;
Many tools for visualising scRNA-seq data, especially diagnostic plots for quality control;
Subsetting and many other methods for filtering out problematic cells and features;
Methods for identifying important experimental variables and normalising data ahead of downstream statistical analysis and modeling.

To get up and running as quickly as possible, see the Quick Start section below. For see the various in-depth sections on various aspects of the functionality that follow.

NB: as of July 2017, scater has switched from the SCESet class previously defined within the package to the more widely applicable SingleCellExperiment class. The functions toSingleCellExperiment and updateSCESet (for backwards compatibility) can be used to convert an old SCESet object to a SingleCellExperiment object.

1 Quick Start

Assuming you have a matrix containing expression count data summarised at the level of some features (gene, exon, region, etc.), then we first need to form a SingleCellExperiment object containing the data. A SingleCellExperiment object is the basic data container used in scater and many other Bioconductor packages for single-cell data analysis.

Here we use the example data provided with the package, which gives us two objects, a matrix of counts and a dataframe with information about the cells we are studying:

suppressPackageStartupMessages(library(scater))
data("sc_example_counts")
data("sc_example_cell_info")

We use these objects to form a SingleCellExperiment object containing all of the necessary information for our analysis:

example_sce <- SingleCellExperiment(
    assays = list(counts = sc_example_counts), colData = sc_example_cell_info)

We always expect to have (raw) count data in a SingleCellExperiment object. In almost all cases we will also want to have a log2-scale representation of the data. We expect this to be stored as the exprs assay.

Here we use log2-counts-per-million with an offset of 1 as the exprs values.

exprs(example_sce) <- log2(
    calculateCPM(example_sce, use.size.factors = FALSE) + 1)

Subsetting is very convenient with this class. For example, we can filter out features (genes) that are not expressed in any cells:

keep_feature <- rowSums(exprs(example_sce) > 0) > 0
example_sce <- example_sce[keep_feature,]

Now we have the expression data neatly stored in a structure that can be used for lots of exciting analyses.

It is straight-forward to compute many quality control metrics. We typically provide one or more sets of “feature controls”, that is sets of genes or features that represent technical features of the expression data or are not of primary biological interest. QC metrics are computed especially for these feature sets are used to assess the quality of cells. Spike-in genes (such as the commonly-used ERCC set) and mitochondrial genes are typically useful as “feature controls”. Here, for demonstration, we just use the first 40 features.

example_sce <- calculateQCMetrics(example_sce, 
                                  feature_controls = list(eg = 1:40))

Now you can play around with your data using the graphical user interface (GUI), which opens an interactive dashboard in your browser!

scater_gui(example_sce)

Many plotting functions are available for visualising the data:

plotScater: a plot method exists for SingleCellExperiment objects, which gives an overview of expression across cells.
plotQC: various methods are available for producing QC diagnostic plots.
plotPCA: produce a principal components plot for the cells.
plotTSNE: produce a t-distributed stochastic neighbour embedding (reduced dimension) plot for the cells.
plotDiffusionMap: produce a diffusion map (reduced dimension) plot for the cells.
plotMDS: produce a multi-dimensional scaling plot for the cells.
plotReducedDim: plot a reduced-dimension representation of the cells.
plotExpression: plot expression levels for a defined set of features.
plotPlatePosition: plot cells in their position on a plate, coloured by cell metadata and QC metrics or feature expression level.
plotColData: plot cell metadata and QC metrics.
plotRowData: plot feature metadata and QC metrics.

More detail on all plotting methods is provided in the data visualisation vignette.

Visualisations can highlight features and cells to be filtered out, which can be done easily with the subsetting capabilities of scater.

The QC plotting functions also enable the identification of important experimental variables, which can be conditioned out in the data normalisation step.

After QC and data normalisation (methods are available in scater), the dataset is ready for downstream statistical modeling.

2 Where to find out more

For more information about using the SingleCellExperiment class, including transitioning from SCESet objects in previous versions of scater, see the "Transitioning from SCESet to SingleCellExperiment" vignette;
For guidance on using scater for quality control, see the Quality control with scater vignette;
For demonstrations of the data visualisation capabilities of scater, see the Data visualisation methods in scater vignette;
For more details about importing expression data into scater, see the Expression quantification and import vignette.

Introduction to `scater`: Single-cell analysis toolkit for expression in R

13 February 2018

Package

Contents

1 Quick Start

2 Where to find out more

Introduction to scater: Single-cell analysis toolkit for expression in R

13 February 2018

Package

Contents

1 Quick Start

2 Where to find out more

Introduction to `scater`: Single-cell analysis toolkit for expression in R