1 Introduction

dittoSeq is a tool built to enable analysis and visualization of single-cell and bulk RNA-sequencing data by novice, experienced, and color blind coders. Thus, it provides many useful visualizations, which all utilize red-green color blindness-optimized colors by default, and which allow sufficient customizations, via discrete inputs, for out-of-the-box creation of publication-ready figures.

For single-cell data, dittoSeq works directly with data pre-processed in other popular packages (Seurat, scater, scran, …). For bulk RNAseq data, dittoSeq’s import functions will convert bulk RNAseq data of various different structures into a set structure that dittoSeq helper and visualization functions can work with. So ultimately, dittoSeq includes universal plotting and helper functions for working with (sc)RNAseq data processed and stored in these formats:

Single-Cell:

  • Seurat (versions 2 & 3)
  • SingleCellExperiment

Bulk:

  • SummarizedExperiment (the general Bioconductor Seq-data storage system)
  • DESeqDataSet (DESeq2 package output)
  • DGEList (edgeR package output)

For bulk data, or if your data is currently not analyzed, or simply not in one of these structures, you can still pull it in to the SingleCellExperiment structure that dittoSeq works with using the importDittoBulk function.

1.1 Color blindness friendliness:

The default colors of this package are red-green color blindness friendly. To make it so, I used the suggested colors from (Wong 2011) and adapted them slightly by appending darker and lighter versions to create a 24 color vector. All plotting functions use these colors, stored in dittoColors(), by default.

Additionally:

  • Shapes displayed in the legends are generally enlarged as this can be almost as helpful as the actual color choice for colorblind individuals.
  • When sensible, dittoSeq funcitons have a shape.by input for having groups displayed through shapes rather than color. (But note: even as a red-green color impaired individual myself writing this vignette, I recommend using color and I generally only use shapes for showing additional groupings.)
  • dittoDimPlots can be generated with letters overlaid (set do.letter = TRUE)
  • The Simulate function allows a cone-typical individual to see what their dittoSeq plots might look like to a colorblind individual.

1.2 Disclaimer

Code used here for dataset processing and normalization should not be seen as a suggestion of the proper methods for performing such steps. dittoSeq is a visualization tool, and my focus while developing this vignette has been simply creating values required for providing visualization examples.

2 Installation

dittoSeq is available through Bioconductor.

# Install BiocManager if needed
if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

# Install dittoSeq
BiocManager::install("dittoSeq")

2.1 Some setup for this vignette that can be ignored

Here, we will need to do some prep as the dataset we will use from Baron et al. (2016) is not normalized nor dimensionality reduced.

library(dittoSeq)
library(scRNAseq)
library(SingleCellExperiment)
library(Seurat)
# Download data
sce <- BaronPancreasData()
# Trim to only 5 of the celltypes for simplicity of vignette
sce <- sce[,meta("label",sce) %in% c(
    "acinar", "endothelial", "gamma", "delta", "ductal")]

Now that we have a single-cell dataset loaded, we are ready to go. All functions work for either Seurat or SCE encapsulated single-cell data.

But to make full use of dittoSeq, we should reaally have this data log-normalized, and dimensionality reductions and clustering run.

# Make Seurat and grab metadata
seurat <- CreateSeuratObject(counts(sce))
seurat <- AddMetaData(seurat, sce$label, col.name = "celltype")
seurat <- AddMetaData(seurat, sce$donor, col.name = "Sample")
seurat <- AddMetaData(seurat,
                      PercentageFeatureSet(seurat, pattern = "^MT"),
                      col.name = "percent.mt")
# Basic Seurat workflow (possibly outdated, but fine for this vignette)
seurat <- NormalizeData(seurat, verbose = FALSE)
seurat <- FindVariableFeatures(object = seurat, verbose = FALSE)
seurat <- ScaleData(object = seurat, verbose = FALSE)
seurat <- RunPCA(object = seurat, verbose = FALSE)
seurat <- RunTSNE(object = seurat)
seurat <- FindNeighbors(object = seurat, verbose = FALSE)
seurat <- FindClusters(object = seurat, verbose = FALSE)
# Grab PCA, TSNE, clustering, log-norm data, and metadata to sce

# sce <- as.SingleCellExperiment(seurat)
# At the time this vignette was made, the above function gave warnings

# So... manual method
sce <- addDimReduction(
  sce, embeddings = Embeddings(seurat, reduction = "pca"), name = "PCA")
sce <- addDimReduction(
  sce, embeddings = Embeddings(seurat, reduction = "tsne"), name = "TSNE")
sce$idents <- seurat$seurat_clusters
assay(sce, "logcounts") <- GetAssayData(seurat)
sce$percent.mt <- seurat$percent.mt
sce$celltype <- seurat$celltype
sce$Sample <- seurat$Sample

Now that we have a single-cell dataset loaded and analyzed in Seurat, let’s convert it to an SCE for examples purposes.

All functions will work the same for either the Seurat or SCE version.

3 Getting started

3.1 Single-cell RNAseq data

dittoSeq works directly with Seurat and SingleCellExperiment objects. Nothing special is needed. Just load in your data if it isn’t already loaded, then go!

dittoDimPlot(seurat, "Sample")

dittoPlot(seurat, "ENO1", group.by = "celltype")

dittoBarPlot(sce, "celltype", group.by = "Sample")

3.2 Bulk RNAseq data

Bulk RNAseq data is handled by dittoSeq using the SingleCellExperiment structure (as of version 0.99). This structure is essentially very similar to the Bioconductor standard SummarizedExperiment, just with room added for storing calculated dimensionality reductions.

# First, lets make some mock expression and conditions data
exp <- matrix(rpois(20000, 5), ncol=20)
colnames(exp) <- paste0("sample", seq_len(ncol(exp)))
rownames(exp) <- paste0("gene", seq_len(nrow(exp)))
logexp <- logexp <- log2(exp + 1)

conditions <- factor(rep(1:4, 5))
sex <- c(rep("M", 9), rep("F", 11))

3.2.1 Standard bulk data import workflow

Importing bulk data can be accomplished with just the importDittoBulk() function, but metadata and dimensionality reductions can also be added after.

# Import
myRNA <- importDittoBulk(
    # x can be a DGEList, a DESeqDataSet, a SummarizedExperiment,
    #   or a list of data matrices
    x = list(counts = exp,
             logcounts = logexp),
    # Optional inputs:
    #   For adding metadata
    metadata = data.frame(conditions = conditions,
                          sex = sex),
    #   For adding dimensionality reductions
    reductions = list(pca = matrix(rnorm(20000), nrow=20)))

# Add metadata (metadata can alternatively be added in this way)
myRNA$conditions <- conditions
myRNA$sex <- sex

# Add dimensionality reductions (can alternatively be added this way)
# (We aren't actually calculating PCA here.)
myRNA <- addDimReduction(
    object = myRNA,
    embeddings = matrix(rnorm(20000), nrow=20),
    name = "pca",
    key = "PC")

Additional details:

By default, provided metadata is added to (or replaces) any metadata already inside of x, but the combine_metadata input can additionally be used to turn retention of previous metadata slots off.

When providing expression data as a list of a single or multiple matrices, it is recommended that matrices containing raw feature counts data be named counts, and log-normalized counts data be named logcounts.

DGEList note: The import function attempts to pull in all information stored in common DGEList slots ($counts, $samples, $genes, $AveLogCPM, $common.dispersion, $trended.dispersion, $tagwise.dispersion, and $offset), but any other slots are ignored.

# Now making plots operates the exact same way as for single-cell data
dittoDimPlot(myRNA, "sex", size = 3, do.ellipse = TRUE)

dittoBarPlot(myRNA, "sex", group.by = "conditions")