scater 1.6.3
This document provides some examples of the many data visualisation functions
available in scater
.
Key scater
data visualisation functions:
* plotScater
: cumulative expression, overview plot (this replaces the generic
plot
function for SCESet
objects)
* plotExpression
: plot cell expression levels for one or more genes
* plotExprsVsTxLength
: plot transcript expression against transcript length
* plotPlatePosition()
: plot cell gene expression and metadata with cells in
their position on a plate
* plotColData
/plotCellData
/plotPhenoData
: plot cell-level metadata
variables
* plotRowData
/plotFeatureData
: plot row/gene/feature-level metadata
variables
* plotReducedDim
It is possible to get an overall view of the dataset by using the plotScater
function. This method plots the cumulative proportion of
each cell’s library (total counts) that is accounted for by the top
highest-expressed features (by default showing the cumulative proportion across
the top 500 features).
This type of plot gives an overall idea of differences in expression distributions for different cells. It is used in the same way as per-sample boxplots are for microarray or bulk RNA-seq data. Due to the large numbers of zeroes in expression values for single-cell RNA-seq data, boxplots are not as useful, so instead we focus on the contributions from the most expressed features for each cell.
With this function, we can split up the cells based on colData
variables to
get a finer-grained look at differences between cells. By default, the plot
method will try to use count values for the plot, but other data available in
the assays
slot of the object can be used by specifying the exprs_vales
argument.
suppressPackageStartupMessages(library(scater))
data("sc_example_counts")
data("sc_example_cell_info")
example_sce <- SingleCellExperiment(
assays = list(counts = sc_example_counts), colData = sc_example_cell_info)
exprs(example_sce) <- log2(calculateCPM(example_sce,
use.size.factors = FALSE) + 1)
plotScater(example_sce, block1 = "Mutation_Status", block2 = "Treatment",
colour_by = "Cell_Cycle", nfeatures = 300, exprs_values = "counts")
This sort of approach can help to pick up large differences in expression distributions across different experimental blocks (e.g. processing batches or similar).
In scater
, the plotExpression
function makes it easy to plot expression
values for a subset of genes or features. This can be particularly useful when
investigating the some features identified as being of interest from differential
expression testing, pseudotime analysis or other means.
plotExpression(example_sce, rownames(example_sce)[1:6],
x = "Mutation_Status", exprs_values = "exprs",
colour = "Treatment")
This function uses ggplot2
, making it easy to change the theme to whatever you
prefer. We can also show the median expression level per group on the plot and
show a violin plot to summarise the distribution of expression values:
plotExpression(example_sce, rownames(example_sce)[7:12],
x = "Mutation_Status", exprs_values = "counts",
colour = "Cell_Cycle",
show_median = TRUE, show_violin = FALSE, xlab = "Mutation Status",
log = TRUE)
The package also contains the function plotExprsVsTxLength()
for plotting
expression values against transcript length, and the function
plotPlatePosition()
to plot expression values and cell metadata information
for cells in their position on a plate.
The functions plotColData
, plotCellData
and plotPhenoData
are synonymous
and provide convenient means for plotting cell-level metadata values against
each other (cells can also be coloured, sized or shaped by gene expression
values). The functions will use the data type for each variable
(continous/categorical) to automatically determine an appropriate type of plot
without requiring input from the user.
example_sce <- calculateQCMetrics(example_sce,
feature_controls = list(dummy = 1:40))
plotColData(example_sce, aes(x = total_counts, y = total_features,
colour = Mutation_Status))
Note that ggplot aesthetics will work correctly (in general) for everything
except colour
(color
) and fill
, which must be either columns of colData
or feature names (i.e. gene/transcript names).
These sorts of plots can be very useful for finding potentially problematic cells.
plotColData(example_sce, aes(x = pct_counts_feature_control,
y = total_features, colour = Gene_0500))
The output of these functions is a ggplot
object, which can be added to,
amended and altered. For example, if we don’t like the legend position we can change it, and we could also add a trend line for each group (see below).
Tapping into the powerful capabilities of ggplot2
, the possibilities are many.
The functions plotRowData
and plotFeatureData
are synonymous
and provide convenient means for plotting feature-level (row-level) metadata
values against each other. The functions will use the data type for each variable
(continous/categorical) to automatically determine an appropriate type of plot
without requiring input from the user.
plotRowData(example_sce, aes(x = log10_total_counts, y = n_cells_counts,
colour = log10_mean_counts))
reducedDimension
slotThe SingleCellExperiment
object has a reducedDimension
slot, where
coordinates for reduced dimension representations of the cells can be stored.
If we so wish, the top principal components can be added to the PCA
element of
the reducedDimension
slot as follows:
example_sce <- plotPCA(example_sce, ncomponents = 4,
colour_by = "Treatment", shape_by = "Mutation_Status",
return_SCE = TRUE, theme_size = 12)
reducedDims(example_sce)
## List of length 1
## names(1): PCA
head(reducedDim(example_sce))
## PC1 PC2 PC3 PC4
## Cell_001 -11.3549969 1.987128 -4.3684884 3.3630354
## Cell_002 -1.4129121 2.494169 -0.2695877 -0.7532469
## Cell_003 2.0117100 1.128543 -2.0464581 3.6275892
## Cell_004 -9.8008989 2.562876 5.6841074 -4.6117327
## Cell_005 6.3489281 2.393014 5.8535812 0.4598489
## Cell_006 -0.9004187 -3.396119 -6.6698769 6.7846540
As shown above, the functions reducedDim
and reducedDims
can be used to
access the reduced dimension coordinates.
This means that any other dimension reduction method can be applied and the
coordinates stored. For example, we might wish to use t-distributed stochastic
nearest neighbour embedding (t-SNE) or Gaussian process latent variable models
or any other dimensionality reduction method. We can store these in the
SingleCellExperiment
object with the reducedDim
function and produce plots
just as we did with PCA (plot not shown):
plotReducedDim(example_sce, use_dimred = "PCA", ncomponents = 4,
colour_by = "Treatment", shape_by = "Mutation_Status")
As for the PCA plots we can also colour and size points by feature expression.
plotReducedDim(example_sce, use_dimred = "PCA", ncomponents = 4,
colour_by = "Gene_1000", size_by = "Gene_0500")
(Here, the dimensionality reduction is just PCA, so we have the same plot as the PCA plot above.)
The plotPCA
function makes it easy to produce a PCA plot directly from an
SCESet
object, which is useful for visualising cells.
The default plot shows the first two principal components and if any cell controls have been defined, plots these cells in a different colour.
plotPCA(example_sce)
By default, the PCA plot
is produced using the 500 features with the most variable expression across all
cells. The number of most-variable features used can be changed with the ntop
argument. Alternatively, a specific set of features to use for the PCA can be
defined with the feature_set
argument. This allows, for example, using only
housekeeping features or control features to produce a PCA plot.
By default the PCA plot uses the expression values in the logcounts
slot of the
SingleCellExperiment
object, but other expression values can be used by
specifying the exprs_values
argument (plot not shown):
plotPCA(example_sce, exprs_values = "cpm")
A subset of features can be used to produce a PCA plot. In the code below only the features defined as “feature controls” are used for the PCA (plot not shown).
plotPCA(example_sce, feature_set = fData(example_sce)$is_feature_control)
The function allows more than just the first two components to be plotted, and also allows phenotype variables to be used to define the colour, shape and size of points in the scatter plot.
plotPCA(example_sce, ncomponents = 4, colour_by = "Treatment",
shape_by = "Mutation_Status")
When more than two components are plotted, the diagonal boxes in the scatter plot matrix show the density for each component.
We can also use the colour and size of point in the plot to reflect feature expression values.
plotPCA(example_sce, colour_by = "Gene_0001", size_by = "Gene_1000")
Thus, expression levels of two marker genes or transcripts can be shown overlaid onto reduced-dimension representations of cells.
This also works for plotTSNE
plots.
plotTSNE(example_sce, colour_by = "Gene_0001", size_by = "Gene_1000")
And for diffusion map plots.
plotDiffusionMap(example_sce, colour_by = "Gene_0001", size_by = "Gene_1000")
See the QC vignette for examples of using the plotQC
function to create visualisations relevant for cell and gene filtering for quality control.