1 Introduction

The CellMixS package is a toolbox to explore and compare group effects in single-cell RNA-seq data. It has two major applications:

For this purpose it introduces two new metrics:

Besides this, several exploratory plotting functions enable evaluation of key integration and mixing features.

2 Installation

CellMixS can be installed from Bioconductor as following.

if (!requireNamespace("BiocManager"))

After installation the package can be loaded into R.


3 Getting started

3.1 Load example data

CellMixS uses the SingleCellExperiment class from the SingleCellExperiment Bioconductor package as the format for input data.

The package contains example data named sim_50, a list of simulated single-cell RNA-seq data with varying batch effect strength and unbalanced batch sizes.

Batch effects were introduced by sampling 0%, 20% or 50% of gene expression values from a distribution with variant mean (e.g. 0% - 50% of genes were affected by a batch effect).

All datasets consist of 3 batches, one with 300 cells and the others with half of its size (so 150 cells). The simulation is modified after (Büttner et al. 2019) and described in sim50.

# load required packages
# load sim_list example data
sim_list <- readRDS(system.file(file.path("extdata", "sim50.rds"), 
                                package = "CellMixS"))
#> [1] "batch0"  "batch20" "batch50"

sce50 <- sim_list[["batch50"]]
#> [1] "SingleCellExperiment"
#> attr(,"package")
#> [1] "SingleCellExperiment"

#>   1   2   3 
#> 300 150 150

3.2 Visualize batch effect

Often batch effects can already be detected by visual inspection and simple visualization (e.g. in a normal tSNE or UMAP plot) depending on the strength. CellMixS has different plotting functions to visualize group label and mixing scores aside without the need for using different packages. Results are ggplot objects and can be further customized using ggplot2. Other packages, such as scater, provide similar plotting functions and could be used as well.

#visualize batch distribution in sce50
visGroup(sce50, group = "batch")

#visualize batch distribution in other elements of sim_list 
batch_names <- c("batch0", "batch20")
vis_batch <- lapply(batch_names, function(name){
    sce <- sim_list[[name]]
    visGroup(sce, "batch") + ggtitle(paste0("sim_", name))

plot_grid(plotlist = vis_batch, ncol = 2)