1 Loading Processed Single-Cell Data

For the demonstration of escape, we will use the example “pbmc_small” data from Seurat and also generate a SingleCellExperiment object from it.

pbmc_small <- get("pbmc_small")
pbmc_small <- suppressMessages(UpdateSeuratObject(pbmc_small))
sce <- as.SingleCellExperiment(pbmc_small)

2 Getting Gene Sets

The first step in the process of performing gene set enrichment analysis is identifying the gene sets we would like to use. The function getGeneSets() allows users to isolate a whole or multiple libraries from a list of GSEABase GeneSet objects. We can do this for gene set collections from the built-in Molecular Signature Database by setting the parameter library equal to the collection(s) of interest. For multiple libraries, just set library = c("C2", "C5"). To check which collections are available, use msigdbr::msigdbr_collections(). The gs_cat entry indicates the individal gene set collection names (e.g. “C2”). Individual pathways/gene sets can be isolated from the selected libraries selected, by setting gene.sets = the name(s) of the gene sets of interest. For an overview of the gene sets and collections, see https://www.gsea-msigdb.org/gsea/msigdb/genesets.jsp. If you are using a species other than Homo sapiens, set the species parameter to the appropriate scientific name to ensure the gene nomenclature matches your expression data. To check which species are available, use msigdbr::msigdbr_species().

## We'll use the HALLMARK gene set collection ("H")
GS.hallmark <- getGeneSets(library = "H")

If a user would like to expand beyond the GSEA Molecular Signature Database for pathway analysis, the enrichment function enrichIt() can use any list of GeneSet objects from the GSEABase R package. An excellent vignette exists on how to create GeneSet objects to be used in the enrichment analysis. Otherwise, you may also provide a simple named list of gene sets of interest. See ?enrichIt() for details.

3 Enrichment

The next step is performing the enrichment on the RNA count data. The function enrichIt() can handle either a matrix of raw count data or will pull that data directly from a SingleCellExperiment or Seurat object. The gene.sets parameter in the function is the GeneSets, either generated from getGeneSets() or from the user. The enrichment scores will be calculated across all individual cells and groups is the n size to break the enrichment by while the cores is the number of cores to perform in parallel during the enrichment calculation too.

Important Note: This is computationally intensive and is highly dependent on the number of cells and the number of gene sets included.

ES.seurat <- enrichIt(obj = pbmc_small, gene.sets = GS.hallmark, groups = 1000, cores = 2)
## [1] "Using sets of 1000 cells. Running 1 times."
## Setting parallel calculations through a SnowParam back-end
## with workers=2 and tasks=100.
## Estimating ssGSEA scores for 43 gene sets.
ES.sce <- enrichIt(obj = sce, gene.sets = GS.hallmark, groups = 1000, cores = 2)
## [1] "Using sets of 1000 cells. Running 1 times."
## Setting parallel calculations through a SnowParam back-end
## with workers=2 and tasks=100.
## Estimating ssGSEA scores for 43 gene sets.

We can then easily add these results back to our Seurat or SCE object.

## if working with a Seurat object
pbmc_small <- Seurat::AddMetaData(pbmc_small, ES.seurat)

## if working with an SCE object
met.data <- merge(colData(sce), ES.sce, by = "row.names", all=TRUE)
row.names(met.data) <- met.data$Row.names
met.data$Row.names <- NULL
colData(sce) <- met.data

4 Visualizations

The easiest way to generate almost any visualization for single cell data is via dittoSeq, which is an extremely flexible visualization package for both bulk and single-cell RNA-seq data that works very well for both expression data and metadata. Better yet, it can handle both SingleCellExperiment and Seurat objects.

To keep things consistent, we’ll define a pleasing color scheme.

colors <- colorRampPalette(c("#0348A6", "#7AC5FF", "#C6FDEC", "#FFB433", "#FF4B20"))

4.1 Heatmaps

A simple way to approach visualizations for enrichment results is the heatmap, especially if you are using a number of gene sets or libraries.

dittoHeatmap(pbmc_small, genes = NULL, metas = names(ES.seurat), 
             annot.by = "groups", 
             fontsize = 7, 
             cluster_cols = TRUE,
             heatmap.colors = colors(50))