1 Introduction

This document gives an introduction to and overview of the quality control functionality of the scater package. scater contains tools to help with the analysis of single-cell transcriptomic data, focusing on low-level steps such as quality control, normalization and visualization. It is based on the SingleCellExperiment class (from the SingleCellExperiment package), and thus is interoperable with many other Bioconductor packages such as scran, batchelor and iSEE.

2 Setting up the data

2.1 Creating a SingleCellExperiment object

We assume that you have a matrix containing expression count data summarised at the level of some features (gene, exon, region, etc.). First, we create a SingleCellExperiment object containing the data, as demonstrated below with some mocked-up example data. Rows of the object correspond to features, while columns correspond to samples, i.e., cells in the context of single-cell ’omics data.

library(scater)
example_sce <- mockSCE()
example_sce
## class: SingleCellExperiment 
## dim: 2000 200 
## metadata(0):
## assays(1): counts
## rownames(2000): Gene_0001 Gene_0002 ... Gene_1999 Gene_2000
## rowData names(0):
## colnames(200): Cell_001 Cell_002 ... Cell_199 Cell_200
## colData names(3): Mutation_Status Cell_Cycle Treatment
## reducedDimNames(0):
## spikeNames(0):
## altExpNames(1): Spikes

We usually expect (raw) count data to be labelled as "counts" in the assays, which can be easily retrieved with the counts accessor. Getters and setters are also provided for exprs, tpm, cpm, fpkm and versions of these with the prefix norm_.

str(counts(example_sce))

Row and column-level metadata are easily accessed (or modified) as shown below. There are also dedicated getters and setters for spike-in specifiers (isSpike); size factor values (sizeFactors); and reduced dimensionality results (reducedDim).

example_sce$whee <- sample(LETTERS, ncol(example_sce), replace=TRUE)
colData(example_sce)
## DataFrame with 200 rows and 4 columns
##          Mutation_Status  Cell_Cycle   Treatment        whee
##              <character> <character> <character> <character>
## Cell_001        negative          G1      treat1           Y
## Cell_002        negative          G1      treat2           R
## Cell_003        positive         G2M      treat2           A
## Cell_004        negative          G1      treat1           N
## Cell_005        positive          G0      treat2           Q
## ...                  ...         ...         ...         ...
## Cell_196        negative         G2M      treat1           S
## Cell_197        negative           S      treat1           E
## Cell_198        negative          G1      treat2           P
## Cell_199        positive          G0      treat2           N
## Cell_200        negative          G0      treat2           G
rowData(example_sce)$stuff <- runif(nrow(example_sce))
rowData(example_sce)
## DataFrame with 2000 rows and 1 column
##                        stuff
##                    <numeric>
## Gene_0001  0.250185570446774
## Gene_0002  0.507190225413069
## Gene_0003 0.0743179991841316
## Gene_0004  0.288846218725666
## Gene_0005  0.323703394504264
## ...                      ...
## Gene_1996  0.888529150281101
## Gene_1997  0.722290675388649
## Gene_1998  0.544348508119583
## Gene_1999  0.969874983653426
## Gene_2000  0.975468719843775

Subsetting is very convenient with this class, as both data and metadata are processed in a synchronized manner. More details about the SingleCellExperiment class can be found in the documentation for SingleCellExperiment package.

2.2 Other methods of data import

Count matrices stored as CSV files or equivalent can be easily read into R session using read.table from utils or fread from the data.table package. It is advisable to coerce the resulting object into a matrix before storing it in a SingleCellExperiment object.

For large data sets, the matrix can be read in chunk-by-chunk with progressive coercion into a sparse matrix from the Matrix package. This is performed using readSparseCounts and reduces memory usage by not explicitly storing zeroes in memory.

Data from 10X Genomics experiments can be read in using the read10xCounts function from the DropletUtils package. This will automatically generate a SingleCellExperiment with a sparse matrix, see the documentation for more details.

Transcript abundances from the kallisto and Salmon pseudo-aligners can be imported using methods from the tximeta package. This produces a SummarizedExperiment object that can be coerced into a SingleCellExperiment simply with as(se, "SingleCellExperiment").

3 Quality control

3.1 Background

scater provides functionality for three levels of quality control (QC):

  1. QC and filtering of cells
  2. QC and filtering of features (genes)
  3. QC of experimental variables

3.2 Cell-level QC

3.2.1 Definition of metrics

Cell-level metrics are computed by the perCellQCMetrics() function and include:

  • sum: total number of counts for the cell (i.e., the library size).
  • detected: the number of features for the cell that have counts above the detection limit (default of zero).
  • subsets_X_percent: percentage of all counts that come from the feature control set named X.
per.cell <- perCellQCMetrics(example_sce, subsets=list(Mito=1:10))
summary(per.cell$sum)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  328260  358606  365911  367248  376288  402262
summary(per.cell$detected)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1470    1497    1507    1507    1517    1546
summary(per.cell$subsets_Mito_percent)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.1456  0.4655  0.6533  0.6673  0.8321  1.7307

It is often convenient to store this in the colData() of our SingleCellExperiment object for future reference. (This would automatically be done if we had used the addPerCellQC() function instead.)

colData(example_sce) <- cbind(colData(example_sce), per.cell)

3.2.2 Diagnostic plots

A particularly useful plot for cell-level QC involves percentage of expression in feature controls against the total number of expressed features. These two metadata variables can be plotted against each other as shown below. We take advantage of ggplot2 to fine-tune the plot aesthetics and to add a smoothing curve: Well-behaved cells should have a large number of expressed features and and a low percentage of expression from feature controls. High percentage expression from feature controls and few expressed features are indicative of blank and failed cells.

plotColData(example_sce, x = "sum", y="subsets_Mito_percent",
    colour_by = "Mutation_Status") + theme(legend.position = "top") +
    stat_smooth(method = "lm", se = FALSE, size = 2, fullrange = TRUE)

The plotScater() method plots the cumulative proportion of each cell’s library assigned to the top highest-expressed features (default 500). This type of plot visualizes differences in expression distributions for different cells, in the same manner as per-sample boxplots for microarray or bulk RNA-seq data. It allows users to identify large differences in expression distributions across different experimental blocks (e.g., processing batches).

plotScater(example_sce, block1 = "Mutation_Status", block2 = "Treatment",
     colour_by = "Cell_Cycle", nfeatures = 300, exprs_values = "counts")

For plate-based experiments, it is useful to see how expression or factors vary with the position of cell on the plate. This can be visualized using the plotPlatePosition() function. Systematic trends in expression with the plate position may indicate that there were issues with processing. The same approach can be used with experimental factors to determine whether cells are appropriately randomized across the plate.

example_sce2 <- example_sce
example_sce2$plate_position <- paste0(
     rep(LETTERS[1:5], each = 8), 
     rep(formatC(1:8, width = 2, flag = "0"), 5)
)
plotPlatePosition(example_sce2, colour_by = "Gene_0001",
    by_exprs_values = "counts") 

3.2.3 Identifying low-quality cells

Column subsetting of the SingeCellExperiment object will only retain the selected cells, thus removing low-quality or otherwise unwanted cells. We can identify high-quality cells to retain by setting a fixed threshold on particular metrics. For example, we could retain only cells that have at least 100,000 total counts and at least 500 expressed features:

keep.total <- example_sce$sum > 1e5
keep.n <- example_sce$detected > 500
filtered <- example_sce[,keep.total & keep.n]
dim(filtered)
## [1] 2000  200

The isOutlier function provides a more data-adaptive way of choosing these thresholds. This defines the threshold at a certain number of median absolute deviations (MADs) away from the median. Values beyond this threshold are considered outliers and can be filtered out, assuming that they correspond to low-quality cells. Here, we define small outliers (using type="lower") for the log-total counts at 3 MADs from the median.

keep.total <- isOutlier(per.cell$sum, type="lower", log=TRUE)
filtered <- example_sce[,keep.total]

Detection of outliers can be done more conveniently for several common metrics using the quickPerCellQC() function. This uses the total count, number of detected features and the percentage of counts in gene sets of diagnostic value (e.g., mitochondrial genes, spike-in transcripts) to identify which cells to discard and for what reason.

qc.stats <- quickPerCellQC(per.cell, percent_subsets="subsets_Mito_percent")
colSums(as.matrix(qc.stats))
##              low_lib_size            low_n_features 
##                         1                         0 
## high_subsets_Mito_percent                   discard 
##                         3                         4

The isOutlier approach adjusts to experiment-specific aspects of the data, e.g., sequencing depth, amount of spike-in RNA added, cell type. In contrast, a fixed threshold would require manual adjustment to account for changes to the experimental protocol or system. We refer readers to the simpleSingleCell workflow for more details.

3.3 Feature-level QC

3.3.1 Definition of metrics

Feature-level metrics are computed by the perFeatureQCMetrics() function and include:

  • mean: the mean count of the gene/feature across all cells.
  • detected: the percentage of cells with non-zero counts for each gene.
  • subsets_Y_ratio: ratio of mean counts between the cell control set named Y and all cells.
per.feat <- perFeatureQCMetrics(example_sce, subsets=list(Empty=1:10))
summary(per.feat$mean)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.80   16.19   67.30  183.62  249.72 1082.20
summary(per.feat$detected)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   10.50   49.50   91.50   75.33  100.00  100.00
summary(per.feat$subsets_Empty_ratio)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.6372  0.9279  0.9979  1.2307  5.6264

A more refined calculation of the average is provided by the calculateAverage() function, which adjusts the counts by the relative library size (or size factor) prior to taking the mean.

ave <- calculateAverage(example_sce)
summary(ave)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##    1.794   16.203   67.257  183.624  249.938 1081.812

We can also compute the number of cells expressing a gene directly.

summary(nexprs(example_sce, byrow=TRUE))
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    21.0    99.0   183.0   150.7   200.0   200.0

3.3.2 Diagnostic plots

We look at a plot that shows the top 50 (by default) most-expressed features. Each row in the plot below corresponds to a gene, and each bar corresponds to the expression of a gene in a single cell. The circle indicates the median expression of each gene, with which genes are sorted. By default, “expression” is defined using the feature counts (if available), but other expression values can be used instead by changing exprs_values.

plotHighestExprs(example_sce, exprs_values = "counts")

We expect to see the “usual suspects”, i.e., mitochondrial genes, actin, ribosomal protein, MALAT1. A few spike-in transcripts may also be present here, though if all of the spike-ins are in the top 50, it suggests that too much spike-in RNA was added. A large number of pseudo-genes or predicted genes may indicate problems with alignment.

3.3.3 Subsetting by row

Genes can be removed by row subsetting of the SingleCellExperiment object. For example, we can filter out features (genes) that are not expressed in any cells:

keep_feature <- rowSums(counts(example_sce) > 0) > 0
example_sce <- example_sce[keep_feature,]
dim(example_sce)
## [1] 2000  200

Other filtering can be done using existing annotation. For example, ribosomal protein genes and predicted genes can be identified (and removed) using regular expressions or biotype information. Such genes are often uninteresting when the aim is to characterize population heterogeneity.

3.4 Variable-level QC

Variable-level metrics are computed by the getVarianceExplained() function (after normalization, see below). This calculates the percentage of variance of each gene’s expression that is explained by each variable in the colData of the SingleCellExperiment object.

example_sce <- logNormCounts(example_sce)
vars <- getVarianceExplained(example_sce)
head(vars)
##           Mutation_Status Cell_Cycle   Treatment      whee         sum
## Gene_0001      0.42472406   0.971581 0.002159253 12.653753 0.983487651
## Gene_0002      0.17398779   1.233842 0.677628921 14.788242 0.824629066
## Gene_0003      0.01742089   1.420996 0.397547057 16.625215 0.440747274
## Gene_0004      0.17623728   1.469209 0.006177647  8.674082 1.448741209
## Gene_0005      0.31823266   0.338658 0.076921562 14.125242 0.003018246
## Gene_0006      0.43934114   1.893785 0.214474799  9.924663 0.167668356
##               detected percent_top_50 percent_top_100 percent_top_200
## Gene_0001 0.0485051703     0.89351203      0.70564662       0.9132624
## Gene_0002 1.8022869388     0.23482067      0.70552962       0.2742710
## Gene_0003 0.5295303129     2.25357984      1.08538795       1.3628744
## Gene_0004 0.0008980969     0.62889371      1.24176421       0.5747002
## Gene_0005 0.0602920269     0.05432575      0.09049331       0.1582609
## Gene_0006 0.2118049510     0.42908841      0.78070458       0.2575307
##           percent_top_500 subsets_Mito_sum subsets_Mito_detected
## Gene_0001     0.918210494      0.103441153           9.081402631
## Gene_0002     0.058802928     19.866448755           0.059697181
## Gene_0003     0.837592656      0.003223234           8.740100551
## Gene_0004     0.094291247      2.319397918           0.009166008
## Gene_0005     0.004353474      3.256201017           2.541073054
## Gene_0006     0.012652106      0.108529353           3.546799459
##           subsets_Mito_percent altexps_Spikes_sum altexps_Spikes_detected
## Gene_0001          0.062036374        0.652579197              0.67664669
## Gene_0002         20.895982359        0.002693347              0.01263754
## Gene_0003          0.007062205        0.127107155              1.01051834
## Gene_0004          2.629365326        0.045468625              1.30876404
## Gene_0005          3.266701164        0.012337415              0.13535978
## Gene_0006          0.151848259        0.641488487              1.30671411
##           altexps_Spikes_percent       total
## Gene_0001           1.0197919848 0.843193793
## Gene_0002           0.0344934889 0.814114003
## Gene_0003           0.0481768776 0.482864551
## Gene_0004           0.0002812829 1.491170818
## Gene_0005           0.0063481804 0.004197936
## Gene_0006           0.5239912389 0.230883447

We can then use this to determine which experimental factors are contributing most to the variance in expression. This is useful for diagnosing batch effects or to quickly verify that a treatment has an effect.

plotExplanatoryVariables(vars)

4 Calculating expression values

We calculate counts-per-million using the aptly-named calculateCPM function. The output is most appropriately stored as an assay named "cpm" in the assays of the SingleCellExperiment object.

cpm(example_sce) <- calculateCPM(example_sce)

Another option is to use the logNormCounts function, which calculates log2-transformed normalized expression values. This is done by dividing each count by its size factor, adding a pseudo-count and log-transforming. The resulting values can be interpreted on the same scale as log-transformed counts, and are stored in "logcounts".

example_sce <- logNormCounts(example_sce)
assayNames(example_sce)
## [1] "counts"    "logcounts" "cpm"

The size factor is automatically computed from the library size of each cell using the librarySizeFactors() function. This calculation simply involves scaling the library sizes so that they have a mean of 1 across all cells.

summary(librarySizeFactors(example_sce))
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.8938  0.9765  0.9964  1.0000  1.0246  1.0953

Of course, users can construct any arbitrary matrix of the same dimensions as the count matrix and store it as an assay.

assay(example_sce, "is_expr") <- counts(example_sce)>0

5 Data visualization

5.1 Plots of expression values

The plotExpression() function makes it easy to plot expression values for a subset of genes or features. This can be particularly useful for further examination of features identified from differential expression testing, pseudotime analysis or other analyses. By default, it uses expression values in the "logcounts" assay, but this can be changed through the exprs_values argument.

plotExpression(example_sce, rownames(example_sce)[1:6],
    x = "Mutation_Status", exprs_values = "logcounts")

Setting x will determine the covariate to be shown on the x-axis. This can be a field in the column metadata or the name of a feature (to obtain the expression profile across cells). Categorical covariates will yield grouped violins as shown above, with one panel per feature. By comparison, continuous covariates will generate a scatter plot in each panel, as shown below.

plotExpression(example_sce, rownames(example_sce)[1:6],
    x = "Gene_0001")

The points can also be coloured, shaped or resized by the column metadata or expression values.

plotExpression(example_sce, rownames(example_sce)[1:6],
    colour_by = "Cell_Cycle", shape_by = "Mutation_Status",
    size_by = "Gene_0002")

For categorical x, we can also show the median expression level per group on the plot to summarise the distribution of expression values:

plotExpression(example_sce, rownames(example_sce)[7:12],
    x = "Mutation_Status", exprs_values = "counts",
    colour = "Cell_Cycle", show_median = TRUE,
    xlab = "Mutation Status", log = TRUE)

Directly plotting the gene expression without any x or other visual parameters will generate a set of grouped violin plots, coloured in an aesthetically pleasing manner.

plotExpression(example_sce, rownames(example_sce)[1:6])

5.2 Dimensionality reduction plots

5.2.1 Using the reducedDims slot

The SingleCellExperiment object has a reducedDims slot, where coordinates for reduced dimension representations of the cells can be stored. These can be accessed using the reducedDim() and reducedDims() functions, which are described in more detail in the SingleCellExperiment documentation. In the code below, we perform a principal components analysis (PCA) and store the results in the "PCA" slot.

example_sce <- runPCA(example_sce)
reducedDimNames(example_sce)
## [1] "PCA"

Any reduced dimension results can be plotted using the plotReducedDim function:

plotReducedDim(example_sce, dimred = "PCA",
    colour_by = "Treatment", shape_by = "Mutation_Status")

We can also colour and size points by the expression of particular features:

plotReducedDim(example_sce, dimred = "PCA",
    colour_by = "Gene_1000", size_by = "Gene_0500")

5.2.2 Generating PCA plots

The plotPCA function makes it easy to produce a PCA plot directly from a SingleCellExperiment object, which is useful for visualising the relationships between cells. The default plot shows the first two principal components, if "PCA" is already in the reducedDims slot.

plotPCA(example_sce)

By default, runPCA performs PCA on the log-counts using the 500 features with the most variable expression across all cells. The number of most-variable features used can be changed with the ntop argument. Alternatively, a specific set of features to use for PCA can be defined with the feature_set argument. This is demonstrated with the feature controls below, to identify technical factors of variation:.

chosen.genes <- 1:100
example_sce2 <- runPCA(example_sce, subset_row=chosen.genes)
plotPCA(example_sce2)

Multiple components can be plotted in a series of pairwise plots. When more than two components are plotted, the diagonal boxes in the scatter plot matrix show the density for each component.

example_sce <- runPCA(example_sce, ncomponents=20)
plotPCA(example_sce, ncomponents = 4, colour_by = "Treatment",
        shape_by = "Mutation_Status")

As shown above, various metadata variables can be used to define the colour, shape and size of points in the scatter plot. We can also use the colour and size of point in the plot to reflect feature expression values.

plotPCA(example_sce, colour_by = "Gene_0001", size_by = "Gene_1000")

5.2.3 Generating \(t\)-SNE plots

\(t\)-distributed stochastic neighbour embedding (\(t\)-SNE) is widely used for visualizing complex single-cell data sets. The same procedure described for PCA plots can be applied to generate \(t\)-SNE plots using plotTSNE, with coordinates obtained using runTSNE via the Rtsne package. We strongly recommend generating plots with different random seeds and perplexity values, to ensure that any conclusions are robust to different visualizations.

# Perplexity of 10 just chosen here arbitrarily.
set.seed(1000)
example_sce <- runTSNE(example_sce, perplexity=10)
plotTSNE(example_sce, colour_by = "Gene_0001", size_by = "Gene_1000")

It is also possible to use the pre-existing PCA results as input into the \(t\)-SNE algorithm. This is useful as it improves speed by using a low-rank approximation of the expression matrix; and reduces random noise, by focusing on the major factors of variation. The code below uses the first 10 dimensions of the previously computed PCA result to perform the \(t\)-SNE.

set.seed(1000)
example_sce <- runTSNE(example_sce, perplexity=10, use.dimred="PCA", n_dimred=10)
plotTSNE(example_sce, colour_by="Treatment")

5.2.4 Other dimensionality reduction methods

The same can be done for diffusion maps using plotDiffusionMap, with coordinates obtained using runDiffusionMap via the destiny package.

example_sce <- runDiffusionMap(example_sce)
plotDiffusionMap(example_sce, colour_by = "Gene_0001", size_by = "Gene_1000")

And again, for uniform manifold with approximate projection (UMAP) via the runUMAP() function, itself based on the uwot package.

example_sce <- runUMAP(example_sce)
plotUMAP(example_sce, colour_by = "Gene_0001", size_by = "Gene_1000")

6 Transitioning from the SCESet class

As of July 2017, scater has switched from the SCESet class previously defined within the package to the more widely applicable SingleCellExperiment class. From Bioconductor 3.6 (October 2017), the release version of scater will use SingleCellExperiment. SingleCellExperiment is a more modern and robust class that provides a common data structure used by many single-cell Bioconductor packages. Advantages include support for sparse data matrices and the capability for on-disk storage of data to minimise memory usage for large single-cell datasets.

It should be straight-forward to convert existing scripts based on SCESet objects to SingleCellExperiment objects, with key changes outlined immediately below.

  • The functions toSingleCellExperiment and updateSCESet (for backwards compatibility) can be used to convert an old SCESet object to a SingleCellExperiment object;
  • Create a new SingleCellExperiment object with the function SingleCellExperiment (actually less fiddly than creating a new SCESet);
  • scater functions have been refactored to take SingleCellExperiment objects, so once data is in a SingleCellExperiment object, the user experience is almost identical to that with the SCESet class.

Users may need to be aware of the following when updating their own scripts:

  • Cell names can now be accessed/assigned with the colnames function (instead of sampleNames or cellNames for an SCESet object);
  • Feature (gene/transcript) names should now be accessed/assigned with the rownames function (instead of featureNames);
  • Cell metadata, stored as phenoData in an SCESet, corresponds to colData in a SingleCellExperiment object and is accessed/assigned with the colData function (this replaces the pData function);
  • Individual cell-level variables can still be accessed with the $ operator (e.g. sce$sum);
  • Feature metadata, stored as featureData in an SCESet, corresponds to rowData in a SingleCellExperiment object and is accessed/assigned with the rowData function (this replaces the fData function);
  • plotScater, which produces a cumulative expression, overview plot, replaces the generic plot function for SCESet objects.

Session information

sessionInfo()
## R version 3.6.1 (2019-07-05)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.3 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.10-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.10-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] parallel  stats4    stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] scater_1.14.0               ggplot2_3.2.1              
##  [3] SingleCellExperiment_1.8.0  SummarizedExperiment_1.16.0
##  [5] DelayedArray_0.12.0         BiocParallel_1.20.0        
##  [7] matrixStats_0.55.0          Biobase_2.46.0             
##  [9] GenomicRanges_1.38.0        GenomeInfoDb_1.22.0        
## [11] IRanges_2.20.0              S4Vectors_0.24.0           
## [13] BiocGenerics_0.32.0         BiocStyle_2.14.0           
## 
## loaded via a namespace (and not attached):
##   [1] Rtsne_0.15               ggbeeswarm_0.6.0        
##   [3] colorspace_1.4-1         RcppEigen_0.3.3.5.0     
##   [5] class_7.3-15             rio_0.5.16              
##   [7] XVector_0.26.0           RcppHNSW_0.2.0          
##   [9] BiocNeighbors_1.4.0      proxy_0.4-23            
##  [11] hexbin_1.27.3            RSpectra_0.15-0         
##  [13] ranger_0.11.2            codetools_0.2-16        
##  [15] robustbase_0.93-5        knitr_1.25              
##  [17] zeallot_0.1.0            uwot_0.1.4              
##  [19] BiocManager_1.30.9       compiler_3.6.1          
##  [21] ggplot.multistats_1.0.0  backports_1.1.5         
##  [23] assertthat_0.2.1         Matrix_1.2-17           
##  [25] lazyeval_0.2.2           BiocSingular_1.2.0      
##  [27] htmltools_0.4.0          tools_3.6.1             
##  [29] rsvd_1.0.2               gtable_0.3.0            
##  [31] glue_1.3.1               GenomeInfoDbData_1.2.2  
##  [33] reshape2_1.4.3           dplyr_0.8.3             
##  [35] ggthemes_4.2.0           Rcpp_1.0.2              
##  [37] carData_3.0-2            cellranger_1.1.0        
##  [39] vctrs_0.2.0              DelayedMatrixStats_1.8.0
##  [41] lmtest_0.9-37            xfun_0.10               
##  [43] laeken_0.5.0             stringr_1.4.0           
##  [45] openxlsx_4.1.2           lifecycle_0.1.0         
##  [47] irlba_2.3.3              DEoptimR_1.0-8          
##  [49] zlibbioc_1.32.0          MASS_7.3-51.4           
##  [51] zoo_1.8-6                scales_1.0.0            
##  [53] VIM_4.8.0                pcaMethods_1.78.0       
##  [55] hms_0.5.1                yaml_2.2.0              
##  [57] curl_4.2                 gridExtra_2.3           
##  [59] stringi_1.4.3            knn.covertree_1.0       
##  [61] e1071_1.7-2              destiny_3.0.0           
##  [63] TTR_0.23-5               boot_1.3-23             
##  [65] zip_2.0.4                rlang_0.4.1             
##  [67] pkgconfig_2.0.3          bitops_1.0-6            
##  [69] evaluate_0.14            lattice_0.20-38         
##  [71] purrr_0.3.3              labeling_0.3            
##  [73] cowplot_1.0.0            tidyselect_0.2.5        
##  [75] plyr_1.8.4               magrittr_1.5            
##  [77] bookdown_0.14            R6_2.4.0                
##  [79] pillar_1.4.2             haven_2.1.1             
##  [81] foreign_0.8-72           withr_2.1.2             
##  [83] xts_0.11-2               scatterplot3d_0.3-41    
##  [85] abind_1.4-5              RCurl_1.95-4.12         
##  [87] sp_1.3-1                 nnet_7.3-12             
##  [89] tibble_2.1.3             crayon_1.3.4            
##  [91] car_3.0-4                rmarkdown_1.16          
##  [93] viridis_0.5.1            grid_3.6.1              
##  [95] readxl_1.3.1             data.table_1.12.6       
##  [97] FNN_1.1.3                forcats_0.4.0           
##  [99] vcd_1.4-4                digest_0.6.22           
## [101] tidyr_1.0.0              RcppParallel_4.4.4      
## [103] munsell_0.5.0            beeswarm_0.2.3          
## [105] viridisLite_0.3.0        smoother_1.1            
## [107] vipor_0.4.5