SummarizedBenchmark 1.0.4
Simulated data sets with known ground truths are often used for developing and comparing computational tools for genomic studies. However, the methods and approaches for simulating complex genomic data are rarely unified across studies. Recognizing this problem in the area of single-cell RNA-sequencing (scRNA-seq), the splatter package provides a uniform API for several “simulators” of scRNA-seq data, including the authors’ own “Splat” simulator (Zappia, Phipson, and Oshlack 2017). In the splatter package, given a set of simulation parameters, each method returns a SingleCellExperiment object of simulated scRNA-seq counts.
Using comparisons presented in (Zappia, Phipson, and Oshlack 2017), we illustrate how the SummarizedBenchmark framework can be used to perform comparisons when the output of each method is more complex than a vector of numbers (e.g. a SingleCellExperiment).
Parameters for the simulators implemented in splatter can either be manually specified or estimated using existing data. Here, we use RSEM counts for a subset of high coverage samples in the fluidigm
data set included in the scRNAseq package. The data is made available as a SummarizedExperiment object.
library("splatter")
library("scRNAseq")
data("fluidigm")
se <- fluidigm[, colData(fluidigm)[, "Coverage_Type"] == "High"]
assays(se) <- assays(se)["rsem_counts"]
assayNames(se) <- "counts"
For the purposes of this vignette, we only use a subset of the samples and genes.
To make comparisons with the simulated data sets easier, we convert the SummarizedExperiment object to the SingleCellExperiment class.
Each of the simulators in the splatter package follow the [prefix]Simulate
naming convention, with the corresponding parameter estimation function, [prefix]Estimate
. Here, we use three methods included in the comparisons of (Zappia, Phipson, and Oshlack 2017).
bd <- BenchDesign() %>%
addMethod(label = "splat",
func = splatSimulate,
params = rlang::quos(params = splatEstimate(in_data),
verbose = in_verbose,
seed = in_seed),
post = list) %>%
addMethod(label = "simple",
func = simpleSimulate,
params = rlang::quos(params = simpleEstimate(in_data),
verbose = in_verbose,
seed = in_seed),
post = list) %>%
addMethod(label = "lun",
func = lunSimulate,
params = rlang::quos(params = lunEstimate(in_data),
verbose = in_verbose,
seed = in_seed),
post = list)
Each simulator returns a single SingleCellExperiment object containing the simulated scRNA-seq counts. However, to fit the SummarizedBenchmark structure, each method in the BenchDesign must return a vector or a list. To handle the non-standard output of the methods, we add post = list
in each addMethod
call to wrap each SingleCellExperiment object in a list.
Using the "counts"
assay of the fluidigm
data set as input, we generate simulated data with the three methods.
fluidigm_dat <- list(in_data = assay(sce, "counts"),
in_verbose = FALSE,
in_seed = 19120128)
sb <- buildBench(bd, data = fluidigm_dat)
## Warning in cov2cor(varcovar): diag(.) had 0 or NA entries; non-finite result
## is doubtful
## class: SummarizedBenchmark
## dim: 1 3
## metadata(1): sessionInfo
## assays(1): bench
## rownames: NULL
## rowData names(1): bench
## colnames(3): splat simple lun
## colData names(10): func post ... param.seed label
The simulated data sets are returned as a single row in the assay of the SummarizedBenchmark object, with each column containing a list with a single SingleCellExperiment object.
## splat simple lun
## [1,] ? ? ?
## [1] "SingleCellExperiment" "SingleCellExperiment" "SingleCellExperiment"
Now that we have our set of simulated data sets, we can compare the behavior of each simulator. Fortunately, the splatter package includes two useful functions for comparing SingleCellExperiment objects (compareSCEs
and diffSCEs
). The assay of the SummarizedBenchmark can be passed directly to these functions. We also concatenate the original fluidigm
data set, sce
, with the simulated data sets for comparison.
res_compare <- compareSCEs(c(ref = sce, assay(sb)[1, ]))
res_diff <- diffSCEs(c(ref = sce, assay(sb)[1, ]), ref = "ref")
While these functions produce several metrics and plots, we only include two for illustration. More details on the output of these functions can be found in the documentation of the splatter package.