Contents

1 Introduction

Simulated data sets with known ground truths are often used for developing and comparing computational tools for genomic studies. However, the methods and approaches for simulating complex genomic data are rarely unified across studies. Recognizing this problem in the area of single-cell RNA-sequencing (scRNA-seq), the splatter package provides a uniform API for several “simulators” of scRNA-seq data, including the authors’ own “Splat” simulator (Zappia, Phipson, and Oshlack 2017). In the splatter package, given a set of simulation parameters, each method returns a SingleCellExperiment object of simulated scRNA-seq counts.

Using comparisons presented in (Zappia, Phipson, and Oshlack 2017), we illustrate how the SummarizedBenchmark framework can be used to perform comparisons when the output of each method is more complex than a vector of numbers (e.g. a SingleCellExperiment).

2 Building the BenchDesign

Parameters for the simulators implemented in splatter can either be manually specified or estimated using existing data. Here, we use RSEM counts for a subset of high coverage samples in the fluidigm data set included in the scRNAseq package. The data is made available as a SummarizedExperiment object.

For the purposes of this vignette, we only use a subset of the samples and genes.

To make comparisons with the simulated data sets easier, we convert the SummarizedExperiment object to the SingleCellExperiment class.

Each of the simulators in the splatter package follow the [prefix]Simulate naming convention, with the corresponding parameter estimation function, [prefix]Estimate. Here, we use three methods included in the comparisons of (Zappia, Phipson, and Oshlack 2017).

Each simulator returns a single SingleCellExperiment object containing the simulated scRNA-seq counts. However, to fit the SummarizedBenchmark structure, each method in the BenchDesign must return a vector or a list. To handle the non-standard output of the methods, we add post = list in each addMethod call to wrap each SingleCellExperiment object in a list.

3 Running the Benchmark Experiment

Using the "counts" assay of the fluidigm data set as input, we generate simulated data with the three methods.

## Warning in cov2cor(varcovar): diag(.) had 0 or NA entries; non-finite result
## is doubtful
## class: SummarizedBenchmark 
## dim: 1 3 
## metadata(1): sessionInfo
## assays(1): bench
## rownames: NULL
## rowData names(1): bench
## colnames(3): splat simple lun
## colData names(10): func post ... param.seed label

The simulated data sets are returned as a single row in the assay of the SummarizedBenchmark object, with each column containing a list with a single SingleCellExperiment object.

##      splat simple lun
## [1,] ?     ?      ?
## [1] "SingleCellExperiment" "SingleCellExperiment" "SingleCellExperiment"

4 Comparing the Results

Now that we have our set of simulated data sets, we can compare the behavior of each simulator. Fortunately, the splatter package includes two useful functions for comparing SingleCellExperiment objects (compareSCEs and diffSCEs). The assay of the SummarizedBenchmark can be passed directly to these functions. We also concatenate the original fluidigm data set, sce, with the simulated data sets for comparison.

While these functions produce several metrics and plots, we only include two for illustration. More details on the output of these functions can be found in the documentation of the splatter package.