1 Introduction

1.1 Overview

The primary utility of the spatialHeatmap package is the generation of spatial heatmaps (SHM) for visualizing cell-, tissue- and organ-specific abundance patterns of biological molecules (e.g. RNAs) in spatial anatomical images (Zhang et al. 2022). This is useful for identifying biomolecules with spatially enriched/depleted abundance patterns as well as clusters and/or network modules composed of biomolecules sharing similar abundance patterns such as similar gene expression patterns. These functionalities are introduced in the main vignette of this package. The following describes extended functionalities for integrating tissue with single cell data by co-visualizing them in composite plots that combine spatial heatmaps with embedding plots of high-dimensional data. The resulting spatial context information is important for gaining insights into the tissue-level organization of single cell data or vice versa.

The required quantitative bulk and single-cell assay data, such as gene expression values, can be provided in the widely used tabular data structures SummarizedExperiment (SE) and SingleCellExperiment (SCE) respectively (Figure 1A, C), while the corresponding anatomical images need to be supplied as annotated SVG (aSVG) images and can be stored in a specific S4 class SVG (Figure 1B). More details of aSVGs is described in the main vignette of this package. In addition, multiple methods are supported for associating single cells with source tissues and coloring the associated cells and tissues (Figure 2).

1.2 Data Structures

For the implementation of the co-visualization functionality, spatialHeatmap takes advantage of efficient and reusable S4 classes for both assay data and aSVGs respectively. The former includes the Bioconductor core data structures SummarizedExperiment (SE, Morgan et al. (2018)) and SingleCellExperiment (SCE, Amezquita et al. (2020)) for bulk and single-cell data respectively (Figure 1A, C). The slots assays, colData, and rowData contain expression values, tissue/cell metadata, and biomolecule metadata respectively. For the embedding plots of single cell data, several dimension reduction algorithms (e.g. PCA, UMAP or tSNE) are supported, and the reduced dimensionality embedding results are stored in the reducedDims slot of SCE.

The S4 class SVG (Figure 1B) is developed specifically in spatialHeatmap for storing aSVG instances. The two most important slots coordinate and attribute stores the aSVG feature coordinates and respective attributes (colors, line withs, etc) respectively, while other slots dimension, svg, and raster stores image dimension, aSVG file paths, and raster image paths respectively. Moreover, the meta class SPHM (Figure 1D) is developed to harmonize these data objects.

When creating co-visualization plots (Figure 1a-b), SHMs are created by mapping expression values from SE to corresponding spatial features in SVG through the same identifiers (here TissuesA and TissueB) between the two, and single cells in SCE are associated with spatial features through their group labels (here TissuesA and TissueB) stored in the colData slot.

Schematic view of data structures and creation of co-visualization plots. File imports, classes, and plotting functionalities are illustrated in boxes with color-coded title bars in grey, blue and green, respectively. Quantitative and experimental design data (I) are imported into matching slots of an `SE` container (A). aSVG image files are stored in `SVG` containers (B). Expression profiles of a chosen gene (GeneX) in (A) are mapped to the corresponding spatial features in (B) via common identifiers (here TissuesA and TissueB). The quantitative data is represented in the matching features by colors according to a number to color key and the output is an SHM (a). For co-visualization plots, single-cell data are stored in the `SCE` object class (C). Reduced dimension data for embedding plots can be generated in R or imported from files. The single-cell embedding results are co-visualized with SHMs where the cell-to-tissue mappings are indicated by common colors in the co-visualization plot (b). The `SPHM` meta class organizes the individual objects (A)-(C) along with internally generated data.

Figure 1: Schematic view of data structures and creation of co-visualization plots
File imports, classes, and plotting functionalities are illustrated in boxes with color-coded title bars in grey, blue and green, respectively. Quantitative and experimental design data (I) are imported into matching slots of an SE container (A). aSVG image files are stored in SVG containers (B). Expression profiles of a chosen gene (GeneX) in (A) are mapped to the corresponding spatial features in (B) via common identifiers (here TissuesA and TissueB). The quantitative data is represented in the matching features by colors according to a number to color key and the output is an SHM (a). For co-visualization plots, single-cell data are stored in the SCE object class (C). Reduced dimension data for embedding plots can be generated in R or imported from files. The single-cell embedding results are co-visualized with SHMs where the cell-to-tissue mappings are indicated by common colors in the co-visualization plot (b). The SPHM meta class organizes the individual objects (A)-(C) along with internally generated data.

1.3 Cell-Tissue Mapping and Coloring

To co-visualize bulk and single-cell data (Figure 1b), the individual cells of the single-cell data are mapped via their group labels to the corresponding tissue features in an aSVG image. If the feature labels in an aSVG are different than the corresponding cell group labels, e.g. due to variable terminologies, a translation map can be used to avoid manual relabelling. Throughout this vignette the term feature is a generalization referring in most cases to tissues or organs. For handling cell grouping information, five major methods are supported including (a) annotation labels, (b) manual assignments, (c) marker genes, (d) clustering labels, and (e) automated co-clusterirng (Figure 2a). The first three are similar by using known cell group labels. The main difference is how the cell labels are provided. In the annotation-based method, existing group labels are available and can be uploaded and/or stored in the SCE object, as is the case in some of the SCE instances provided by the scRNAseq package (Risso and Cole 2022). The manual method allows users to create the cell to tissue associations one-by-one or import them from a tabular file. The marker-gene method utilizes known marker genes to group cells. In the clustering method, cells are clustered and grouped by clustering labels. In contrast, the automated co-clustering aims to assign source tissues to corresponding single cells computationally by a co-clustering method (Figure 8). This method is experimental and requires bulk expression data that are obtained from the tissues represented in the single-cell data.

The matching between cell groups in the embedding plots and tissue features in SHMs are indicated with four coloring schemes (Figure 2b). The first three ‘fixed-group’, ‘cell-by-group’, and ‘feature-by-group’ assign the same color for a cell group and matching tissue. The main difference is that ‘fixed-group’ uses constance colors while the latte two uses heat colors that is proportional to the numeric expression information obtained from the single cell or bulk expression data of a chosen gene. When expression values among groups are very similar, toggling between the constant and heat colors is important to track the tissue origin in the single cell data. In ‘cell-by-group’ coloring, one often wants to first summarize the expression of a given gene across the cells within each group via a meaningful summary statistics, such as mean or median, then heat colors are created from the summary values and assigned to the corresponding cells and tissues (Figure 21-2), so the mapping direction is cell-to-tissue. The ‘feature-by-group’ coloring is very similar except that heat colors are based on summary values of each tissue. The mapping direction in this option is tissue-to-cell. The most meanful coloring is ‘cell-by-value’ (Figure 22-3). In this option, each cell and tissue is colored according to respective expression values of a chosen gene, so the cellular heterogeneity is reflected.

Similar to other functionalities in spatialHeatmap, these functionalities are available within R as well as the corresponding Shiny App (Chang et al. 2021).

Cell grouping and coloring. (a) For co-visualizing with SHMs, single cells need to have group labels. Five methods are supported to obtain group labels. (b) In the co-visualization plot, matching between cells and aSVG features is indicated by colors between the two. Four coloring options are summarized in a table. The cell grouping and coloring are schematically illustrated in 1-3.

Figure 2: Cell grouping and coloring
(a) For co-visualizing with SHMs, single cells need to have group labels. Five methods are supported to obtain group labels. (b) In the co-visualization plot, matching between cells and aSVG features is indicated by colors between the two. Four coloring options are summarized in a table. The cell grouping and coloring are schematically illustrated in 1-3.

2 Getting Started

2.1 Installation

The spatialHeatmap package can be installed with the BiocManager::install command.

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("spatialHeatmap")

2.2 Packages and Documentation

Next, the packages required for running the sample code in this vignette need to be loaded.

library(spatialHeatmap); library(SummarizedExperiment); library(ggplot2); library(SingleCellExperiment);
library(kableExtra)

The following lists the vignette(s) of this package in an HTML browser. Clicking the name of the corresponding vignette will open it.

browseVignettes('spatialHeatmap')

To reduce runtime, intermediate results can be cached under ~/.cache/shm.

cache.pa <- '~/.cache/shm' # Set path of the cache directory

3 Quick Start

To obtain for examples with randomized data or parameters always the same results, a fixed seed is set.

set.seed(10)

This quick start example is demonstrated on ‘cell-by-group’ coloring by using a single-cell data set from oligodendrocytes of mouse brain (Marques et al. 2016). This data set is obtained from the scRNAseq (Risso and Cole 2022) package with minor modificatons, which is included in spatialHeatmap.

The single-cell data is first pre-processed by the process_cell_meta function that applies common QC, normalization and dimension reduction routines. The details of these pre-processing methods are described in the corresponding help file. Additional background information on these topics can be found in the OSCA tutorial.

sce.pa <- system.file("extdata/shinyApp/data", "cell_mouse_brain.rds", package="spatialHeatmap")
sce <- readRDS(sce.pa)
sce.dimred.quick <- process_cell_meta(sce, qc.metric=list(subsets=list(Mt=rowData(sce)$featureType=='mito'), threshold=1))
colData(sce.dimred.quick)[1:3, 1:2]
## DataFrame with 3 rows and 2 columns
##                              label         age
##                        <character> <character>
## C1.1772096.085.B10          SN.VTA         p19
## C1.1772125.088.G02 corpus.callosum         p22
## C1.1772099.084.C06    zona.incerta         p19

The gene expression values in single-cell data are averaged within their group labels in the label column of colData slot, which correspond to their source tissues.

sce.aggr.quick <- aggr_rep(sce.dimred.quick, assay.na='logcounts', sam.factor='label', aggr='mean')

The aSVG of mouse brain is imported with the function read_svg and stored in an SVG object svg.mus.brain.

svg.mus.brain.pa <- system.file("extdata/shinyApp/data", "mus_musculus.brain.svg", package="spatialHeatmap")
svg.mus.brain <- read_svg(svg.mus.brain.pa)

A subset of features and related attributes are returned from svg.mus.brain, where fill and stroke refer to color and line width respectively.

tail(attribute(svg.mus.brain)[[1]])[, 1:4]
## # A tibble: 6 × 4
##   feature                      id             fill  stroke
##   <chr>                        <chr>          <chr>  <dbl>
## 1 brainstem                    UBERON_0002298 none    0.05
## 2 midbrain                     UBERON_0001891 none    0.05
## 3 dorsal.plus.ventral.thalamus UBERON_0001897 none    0.05
## 4 hypothalamus                 UBERON_0001898 none    0.05
## 5 nose                         UBERON_0000004 none    0.05
## 6 corpora.quadrigemina         UBERON_0002259 none    0.05

To map cell group labels to aSVG features, a list with named components is used, where cell labels are in name slots and tissue features are corresponding list elements. Note, in the cell-to-tisssue mapping, each cell label can be matched to multiple aSVG features but not vice versa.

lis.match.quick <- list(hypothalamus=c('hypothalamus'), cortex.S1=c('cerebral.cortex', 'nose'))

For efficient data management and reusability, the data objects for co-visualization are stored in an SPHM container.

dat.quick <- SPHM(svg=svg.mus.brain, bulk=sce.aggr.quick, cell=sce.dimred.quick, match=lis.match.quick)

The co-visualization plot is created with gene Apod using the function covis. In the embedding plot, the hypothalamus and cortex.S1 cells are colored according to their respecitive aggregated expression values of Apod. In the SHM plot, aSVG features are assigned the same color as the matching cells defined in lis.match.quick. The cell.group argument indicates cell group labels in the colData slot of sce.aggr.quick, tar.cell specifies the target cell groups to show, and dimred specifies the embeddings.

shm.res.quick <- covis(data=dat.quick, ID=c('Apod'), dimred='UMAP', cell.group='label', tar.cell=names(lis.match.quick), assay.na='logcounts', bar.width=0.09, dim.lgd.nrow=2, legend.r=1.5, legend.key.size=0.02, legend.text.size=10, legend.nrow=4, h=0.6) 
Co-visualization of "cell-by-group" coloring. The co-visualization is created with gene `Apod`. Single cells in the embedding plot and their matching aSVG features in the SHM are assigned the same colors that are created according to mean expression values of `Apod` within cell groups.

Figure 3: Co-visualization of “cell-by-group” coloring
The co-visualization is created with gene Apod. Single cells in the embedding plot and their matching aSVG features in the SHM are assigned the same colors that are created according to mean expression values of Apod within cell groups.

4 Co-visualization Plots

This section showcases different cell grouping methods (Figure 2a) and coloring options (Figure 2b) for co-visualizing SHMs with single-cell embedding plots. As the cell grouping methods of annotation labels, clustering, manual assignments, and marker genes are very similar, this section only demonstrates the methods of annotation labels and automated co-clustering, while the clustering/manual assignments are shown in the Supplementary Section. The ‘cell-by-group’ coloring is already showcased in the Quick Start, thus this section focuses on the other three coloring options. In addition, another functioinality of co-visualizing spatially resolved single-cell (SRSC) data with bulk data is also demonstrated.

4.1 Annotation Labels

To obtain reproducible results, a fixed seed is set for generating random numbers.

set.seed(10)

This section demonstrates the co-visualization plots created with annotation labels and ‘feature-by-group’ coloring. The single-cell data are stored in an SCE object downloaded from the scRNAseq package (Risso and Cole 2022), which is the same as the Quick Start (sce). The annotation labels are stored in the label column of the colData slot and partially shown below.

colData(sce)[1:3, 1:2]
## DataFrame with 3 rows and 2 columns
##                              label         age
##                        <character> <character>
## C1.1772096.085.B10          SN.VTA         p19
## C1.1772125.088.G02 corpus.callosum         p22
## C1.1772099.084.C06    zona.incerta         p19

The bulk RNA-seq data are modified from a research on mouse cerebellar development (Vacher et al. 2021) and are imported in an SE object, which are partially shown below. Note, replicates are indicated by the same tissue names (e.g. cerebral.cortex).

blk.mus.pa <- system.file("extdata/shinyApp/data", "bulk_mouse_cocluster.rds", package="spatialHeatmap") 
blk.mus <- readRDS(blk.mus.pa)
assay(blk.mus)[1:3, 1:5]
##          cerebral.cortex hippocampus hypothalamus cerebellum cerebral.cortex
## AI593442             177         256           50         24             285
## Actr3b               513        1465          228        244             666
## Adcy1                701        1243           57       1910             836
colData(blk.mus)[1:3, , drop=FALSE]
## DataFrame with 3 rows and 1 column
##                          tissue
##                     <character>
## cerebral.cortex cerebral.cortex
## hippocampus         hippocampus
## hypothalamus       hypothalamus

Bulk and single cell data are jointly normalized and subsequently separated.

mus.ann.nor <- read_cache(cache.pa, 'mus.ann.nor') 
if (is.null(mus.ann.nor)) {
  # Joint normalization.
  mus.lis.nor <- norm_cell(sce=sce, bulk=blk.mus, quick.clus=list(min.size = 100, d=15), com=FALSE)
  save_cache(dir=cache.pa, overwrite=TRUE, mus.ann.nor)
}
## Cache directory: ~/.cache/shm
## [1] "~/.cache/shm"
# Separate bulk and cell data.
blk.mus.nor <- mus.lis.nor$bulk
cell.mus.nor <- mus.lis.nor$cell
colData(cell.mus.nor) <- colData(sce)

In normalized single-cell data, dimension reductions are performed with PAC, UMAP, and TSNE methods, then single cells are plotted at the TSNE dimensions, where cells are represented by dots and are colored by the annotation labels (color.by="label").

cell.dim <- reduce_dim(cell.mus.nor, min.dim=5)
## "prop" is set 1 in "getTopHVGs" due to too less genes.
plot_dim(cell.dim, color.by="label", dim='UMAP')
Embedding plot of single-cell data. The cells (dots) are colored by the grouping information stored in the `colData` slot of the corresponding `SCE` object

Figure 4: Embedding plot of single-cell data
The cells (dots) are colored by the grouping information stored in the colData slot of the corresponding SCE object

In normalized bulk data, expression values for each gene are summarized by mean across tissue replicates (here aggr='mean').

# Aggregation.
blk.mus.aggr <- aggr_rep(blk.mus.nor, sam.factor='sample', aggr='mean')
assay(blk.mus.aggr)[1:2, ]

The aSVG instance of mouse brain from the Quick Start is used. Partial of the aSVG features are shown.

tail(attribute(svg.mus.brain)[[1]])[1:3, 1:4]
## # A tibble: 3 × 4
##   feature                      id             fill  stroke
##   <chr>                        <chr>          <chr>  <dbl>
## 1 brainstem                    UBERON_0002298 none    0.05
## 2 midbrain                     UBERON_0001891 none    0.05
## 3 dorsal.plus.ventral.thalamus UBERON_0001897 none    0.05

Following the same conventions in the main vignette, at least one tissue in bulk data should have the same identifier with an aSVG feature so as to create SHM.

colnames(blk.mus) %in% attribute(svg.mus.brain)[[1]]$feature
##  [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

In most cases, the feature labels in the aSVG and the cell group labels of the single-cell data may not be the same. To resolve this without manual relabeling, a translation list is used to make them match. In ‘feature-by-group’ coloring, the feature and cell labels should be be the names and corresponding elements of the list, respectively.

lis.match.blk <- list(cerebral.cortex=c('cortex.S1'), hypothalamus=c('corpus.callosum', 'hypothalamus'))

The following plots the corresponding co-visualization for sample gene ‘Cacnb4’. The legend under the embedding plot shows the cell labels in the matching list (lis.match.blk). The source tissue information is indicated by using the same colors in the embedding and SHM plots on the left and right, respectively. In contrast to the Quick Start, the tar.bulk indicates target tissues to show.

# Store data objects in an SPHM container. 
dat.ann.tocell <- SPHM(svg=svg.mus.brain, bulk=blk.mus.aggr, cell=cell.dim, match=lis.match.blk)
covis(data=dat.ann.tocell, ID=c('Cacnb4'), dimred='UMAP', cell.group='label', tar.bulk=names(lis.match.blk), bar.width=0.09, dim.lgd.nrow=2, dim.lgd.text.size=12, h=0.6, legend.r=1.5, legend.key.size=0.02, legend.text.size=10, legend.nrow=3)
Co-visualization plot with 'feature-by-group' coloring. This plot is created with gene 'Cacnb4'. Tissues in SHM are colored according to respective expression values of 'Cacnb4', and cells of each group in the embedding plot are assigned the same colors as the matching tissues in SHM.

Figure 5: Co-visualization plot with ‘feature-by-group’ coloring
This plot is created with gene ‘Cacnb4’. Tissues in SHM are colored according to respective expression values of ‘Cacnb4’, and cells of each group in the embedding plot are assigned the same colors as the matching tissues in SHM.

Several things are missing: (1) To track tissue assignment, if expression values are similar across tissues, there needs to be an argument toggle to color one time by expression values and another time by tissue. Otherwise tissue assignment info is lost. (2) I don’t understand the plot on the bottom that shows the stress sample. Why does the sp heatmap highlight in this case only one tissue?

In scenarios where expression values are similar across tissues, the mapping between cells and tissues can be indicated by constant colors by setting profile=FALSE.

covis(data=dat.ann.tocell, ID=c('Cacnb4'), profile=FALSE, dimred='UMAP', cell.group='label', tar.bulk=names(lis.match.blk), bar.width=0.09, dim.lgd.nrow=2, dim.lgd.text.size=12, h=0.8, legend.r=1.5, legend.key.size=0.02, legend.text.size=10, legend.nrow=3)
Co-visualization plot of constant colors. In this plot, mapping beween cell groups and tissues are indicated by fixed colors instead of expression values.

Figure 6: Co-visualization plot of constant colors
In this plot, mapping beween cell groups and tissues are indicated by fixed colors instead of expression values.

In the above examples, cells of the same group are assigned the same color in the embedding plots. It is useful to reveal matching between cell groups and tissues, but the cellular herterogeniety within groups is missing. The ‘cell-by-value’ coloring scheme is developed to overcome this limitation. In the following, this option is activated by col.idp=TRUE.

covis(data=dat.ann.tocell, ID=c('Cacnb4'), col.idp=TRUE, dimred='UMAP', cell.group='label', tar.bulk=names(lis.match.blk), bar.width=0.08, dim.lgd.nrow=2, dim.lgd.text.size=10, h=0.6, legend.r=0.1, legend.key.size=0.01, legend.text.size=10, legend.nrow=2, dim.lgd.plot.margin=margin(t=0.01, r=0.15, b=0.01, l=0.15, unit="npc"))
Co-visualization plot with 'cell-by-value' coloring. This plot is created with gene 'Cacnb4'. Tissues in SHM and cells in embedding plot are colored independently according to respective expression values of 'Cacnb4'.

Figure 7: Co-visualization plot with ‘cell-by-value’ coloring
This plot is created with gene ‘Cacnb4’. Tissues in SHM and cells in embedding plot are colored independently according to respective expression values of ‘Cacnb4’.

4.2 Automated Method

If both single cell and bulk gene expression data are available for the same or overlapping tissues then co-clustering can be used to assign cells to tissues automatically (Figure 8). Subsequently, the predicted tissue-cell assignments can be used for creating co-visualization plots. This approach is useful for predicting the source tissues of unassigned cells without prior knowledge as is required for the annotation and manual approaches introduced above. While attractive there are various challenges to overcome to reliably co-cluster single cell data with the corresponding tissue-level bulk data. This is due to the different properties of single cell and bulk gene expression data, such as lower sensitivity and higher sparsity in single cell compared to bulk data. This section introduces a co-clustering method that is largely based on parameter optimization including three major steps. First, both data are preprocessed to retain the most reliable expression values (Figure 8.1a-b). Second, the genes in the bulk data are reduced to those robustly expressed in the single cell data (Figure 8.1c). Third, bulk and cell data are co-clustered by using optimal default settings (Table 1) that are obtained through optimization on real data with known tissue-cell assignments. The following introduces the three steps of this method in more detail using the example of RNA-Seq data.

  1. The raw count matrices of bulk and single cells are column-wise combined for joint normalization (Figure 8.1a). After separated from bulk data, the single cell data are reduced to genes with robust expression across X% cells and to cells with robust expression across Y% genes (Figure 8.1b). In the bulk data, genes are filtered according to expression values \(\ge\) A at a proportion of \(\ge\) p across bulk samples and a coefficient of variance (CV) between CV1 and CV2 (Figure 8.1b).

  2. The bulk data are subsetted to the same genes as the single cell data (Figure 8.1c). This and the previous filtering steps in single cell data reduce the sparsity in the single cell data and the bulk data are made more compareable to the single cell data by subsetting it to the same genes.

  3. Bulk and single cell data are column-wise combined for joint embedding using PCA (UMAP, or other). Co-clustering is performed on the embedding data and three types of clusters are produced. First, only one bulk tissue is co-clustered with cells (Figure 8.2a). This bulk is assigned to all cells in the same cocluster. Second, multiple bulk tissues are co-clustered with multiple cells (Figure 8.2b). The nearest-neighbor bulk is assigned to each cell, which is measured by Spearman’s correlation coefficients. Third, no bulk tissue is co-clustered with cells (Figure 8.2c). All these cells are un-labeled, which are candidates for discovering novel cell types.

After co-clustering, cells are labeled by bulk tissues or un-labeled (Figure 8.3) and these labels are used for co-visualization (Figure 8.4), where cells in embedding plot and matching tissues in SHM can be indicated with one of the four coloring schemes (Figure 2b).

Overview of co-clustering. (1) The input raw count data of bulk tissues and single cells such as RNA-seq count data are preprocessed. (2) Bulk and single-cell data are combined in a column-wise manner for joint dimension reduction. Then co-clustering is performed on top joint dimensions. If a cluster comprises bulk tissue(s) and cells, the nearest-neighbor bulk is assigned to individual cells, which are measured by Spearman's correlation coefficients (similarities). By contrast, if a cluster only contains cells, these cells are un-labeled. (3) Tissue-cell assignments and corresponding similarities are stored in a table. (4) Single cells and tissues are co-visualized by using tissue-cell assignments in (3).

Figure 8: Overview of co-clustering
(1) The input raw count data of bulk tissues and single cells such as RNA-seq count data are preprocessed. (2) Bulk and single-cell data are combined in a column-wise manner for joint dimension reduction. Then co-clustering is performed on top joint dimensions. If a cluster comprises bulk tissue(s) and cells, the nearest-neighbor bulk is assigned to individual cells, which are measured by Spearman’s correlation coefficients (similarities). By contrast, if a cluster only contains cells, these cells are un-labeled. (3) Tissue-cell assignments and corresponding similarities are stored in a table. (4) Single cells and tissues are co-visualized by using tissue-cell assignments in (3).

To obtain reasonably robust default settings for co-clustering, four main parameters shown in Table 1 are optimized, where bold text indicates optimal settings that are treated as robust default settings. The reason to choose these parameters is they are most relevant to the co-clustering step. The details of this optimization are given here. The following demonstration applies the default settings (bold in Table 1) using single cell and bulk data from mouse brain (Vacher et al. 2021; Ortiz et al. 2020). Both data sets have been simplified for demonstraton purposes.

Table 1: Settings for optimization. Optimal settings are indicated by bold text.
Parameter Settings Description
dimensionReduction denoisePCA (PCA, scran), runUMAP (UMAP, scater) Dimension reduction methods
topDimensions 5 to 80 (5, 8-20, 22-35) Number of top dimensions selected for co-clustering
graphBuilding buildKNNGraph (knn), buildSNNGraph (snn) (scran) Methods for building a graph where nodes are cells and edges are connections between nearest neighbors
clusterDetection cluster_walktrap (wt), cluster_fast_greedy (fg), cluster_leading_eigen (le) (igraph) Methods for partitioning the graph to generate clusters

4.2.1 Pre-processing

To obtain reproducible results, a fixed seed is set for generating random numbers.

set.seed(10)

The bulk data (blk.mus) are the same with the Annotation Labels section. The following imports example single cell from the spatialHeatmap package and shows its partial metadata in colData slot.

sc.mus.pa <- system.file("extdata/shinyApp/data", "cell_mouse_cocluster.rds", package="spatialHeatmap") 
sc.mus <- readRDS(sc.mus.pa)
colData(sc.mus)[1:3, , drop=FALSE]
## DataFrame with 3 rows and 1 column
##                cell
##         <character>
## isocort     isocort
## isocort     isocort
## isocort     isocort

Bulk and single cell raw count data are jointly normalized by the function norm_cell, which is a wrapper of computeSumFactors from scran (Lun, McCarthy, and Marioni 2016). com=FALSE means bulk and single cells are separated after normalization for subsequent separate filtering.

mus.lis.nor <- read_cache(cache.pa, 'mus.lis.nor') 
if (is.null(mus.lis.nor)) {
  mus.lis.nor <- norm_cell(sce=sc.mus, bulk=blk.mus, com=FALSE)
  save_cache(dir=cache.pa, overwrite=TRUE, mus.lis.nor)
}

The normalized single cell and bulk data (log2-scale) are filtered to reduce sparsity and low expression values. In the bulk data, replicates are first aggregated by taking means using the function aggr_rep. Then the filtering retains genes in the bulk data to have expression values of \(\ge\) 1 at a proportion of \(\ge 10\%\) (pOA) across bulk samples and a coefficient of variance (CV) between \(0.1-50\) (Gentleman et al. 2018).

In the single cell data, genes with expression values \(\ge 1\) (cutoff=1) in \(\ge 1\%\) (p.in.gen=0.01) of cells are retained, and cells having expression values \(\ge 1\) (cutoff=1) in \(\ge 10\%\) (p.in.cell=0.1) of all genes are retained.

# Aggregate bulk replicates
blk.mus.aggr <- aggr_rep(data=mus.lis.nor$bulk, assay.na='logcounts', sam.factor='sample', aggr='mean')
# Filter bulk
blk.mus.fil <- filter_data(data=blk.mus.aggr, pOA=c(0.1, 1), CV=c(0.1, 50), verbose=FALSE) 
# Filter cell and subset bulk to genes in cell
blk.sc.mus.fil <- filter_cell(sce=mus.lis.nor$cell, bulk=blk.mus.fil, cutoff=1, p.in.cell=0.1, p.in.gen=0.01, verbose=FALSE) 

Compared to bulk RNA-Seq data, single cell data has a much higher level of sparsity. This difference is reduced by the above filtering and then subsetting the bulk data to the genes remaining in the filtered single cell data. This entire process is accomplished by the filter_cell function.

What you had here before didn’t make much sense “combined normalization” doesn’t make the data more similar rather than the subsetting to a joined gene set after filtering. Also the combined normalization is only necessary if a quantile normalization is used. Below you are saying you obtained log2 normalized data from this step. If so then combining the two is unnecessary.

The same aSVG instance of mouse brain as in the Quick Start section above is used here and the aSVG importing is omitted for brevity.

tail(attribute(svg.mus.brain)[[1]])[1:3, 1:4] # Partial features are shown.
## # A tibble: 3 × 4
##   feature                      id             fill  stroke
##   <chr>                        <chr>          <chr>  <dbl>
## 1 brainstem                    UBERON_0002298 none    0.05
## 2 midbrain                     UBERON_0001891 none    0.05
## 3 dorsal.plus.ventral.thalamus UBERON_0001897 none    0.05

4.2.2 Co-clustering

The co-clustering process is implemented in the function cocluster. In the following, default settings obtained from optimization are used, where min.dim, dimred, graph.meth, and cluster refers to topDimensions, dimensionReduction, graphBuilding, and clusterDetection in Table 1 respectively. The results are saved in coclus.mus.

coclus.mus <- read_cache(cache.pa, 'coclus.mus')
if (is.null(coclus.mus)) {
  coclus.mus <- cocluster(bulk=blk.sc.mus.fil$bulk, cell=blk.sc.mus.fil$cell, min.dim=27, dimred='PCA', graph.meth='knn', cluster='fg')
  save_cache(dir=cache.pa, overwrite=TRUE, coclus.mus)
}

The tissue-cell assignments from co-clustering above are stored in the colData slot of coclus.mus. The cluster column indicates cluster labels, the bulkCell indicates bulk tissues or single cells, the sample suggests original labels of bulk and cells, the assignedBulk refers to bulk tissues assigned to cells with none suggesting un-assigned, and the similarity refers to Spearman’s correlation coefficients for the tissue-cell assignments, which is a measure of assignment strigency.

colData(coclus.mus)
## DataFrame with 4458 rows and 7 columns
##                     cluster    bulkCell          sample    assignedBulk
##                 <character> <character>     <character>     <character>
## cerebral.cortex       clus4        bulk cerebral.cortex            none
## hippocampus           clus3        bulk     hippocampus            none
## hypothalamus          clus7        bulk    hypothalamus            none
## cerebellum            clus7        bulk      cerebellum            none
## isocort               clus6        cell         isocort            none
## ...                     ...         ...             ...             ...
## retrohipp             clus4        cell       retrohipp cerebral.cortex
## retrohipp             clus6        cell       retrohipp            none
## retrohipp             clus6        cell       retrohipp            none
## retrohipp             clus6        cell       retrohipp            none
## retrohipp             clus6        cell       retrohipp            none
##                  similarity     index sizeFactor
##                 <character> <integer>  <numeric>
## cerebral.cortex        none         1  45.811788
## hippocampus            none         2 100.736625
## hypothalamus           none         3  28.249392
## cerebellum             none         4  44.415396
## isocort                none         5   0.711862
## ...                     ...       ...        ...
## retrohipp             0.286      4454   1.006589
## retrohipp              none      4455   0.674636
## retrohipp              none      4456   0.646310
## retrohipp              none      4457   0.564467
## retrohipp              none      4458   0.636357

The tissue-cell assignments can be controled by filtering the values in the similarity column. This utility is impletmented in function filter_asg, where only assignments with similarities above the cutoff min.sim will be retained. Utilities are also developed to tailor the assignments, such as assigning specific tissues to cells without assignments. Details of the tailoring are explained in the Supplementary Section.

coclus.mus <- filter_asg(coclus.mus, min.sim=0.1)

The co-clusters that consist of tissues and cells can be visualized in an embeding plot with the function plot_dim. The dim argument specifies an embedding method. To see all co-clusters, assign TRUE to cocluster.only, in this case, other clusters containing only cells will be in grey. To only show a specific cluster, assign the cluster label to group.sel, for example, group.sel='clus3'. In the embedding plot, tissues and cells are indicated by large and small circles respectively.

plot_dim(coclus.mus, dim='PCA', color.by='cluster', cocluster.only=TRUE, group.sel=NULL)
Embedding plot of co-clusters. Large and small circles refer to tissues and single cells respectively.

Figure 9: Embedding plot of co-clusters
Large and small circles refer to tissues and single cells respectively.

Does the above plot shows all the genes after filtering or only the ones from a specific cluster where the ones that were coclustered are given in color and the grey ones that were not coclustered. Also why doesn’t the plot show the bulk tissue as a highlighted dot. This would be particularly important and interesting to show here.

4.2.3 Co-visualization

In co-clustering based co-visualization, tissue assignments are treated as group labels for cells. This section focuses on the ‘cell-by-value’ coloring, other coloring options (Figure 2b) are provided in the Supplementary Section.

Single-cell and bulk data are separated from each other.

# Separate bulk data.
coclus.blk <- subset(coclus.mus, , bulkCell=='bulk')
# Separate single cell data.
coclus.sc <- subset(coclus.mus, , bulkCell=='cell')

The co-visualization of ‘cell-by-value’ coloring is built on gene ‘Atp2b1’. Each cell in the embedding plot and each tissue in SHM are colored independently according to respective expression value of ‘Atp2b1’. The matching between cells and tissues are indicated in the legend plot with constant colors.

# Store data objects in an SPHM container. 
dat.auto.idp <- SPHM(svg=svg.mus.brain, bulk=coclus.blk, cell=coclus.sc)
covis(data=dat.auto.idp, ID=c('Atp2b1'), dimred='TSNE', tar.cell=c('hippocampus', 'hypothalamus', 'cerebellum', 'cerebral.cortex'), col.idp=TRUE, dim.lgd.text.size=10, dim.lgd.nrow=2, bar.width=0.08, legend.nrow=3, h=0.6, legend.key.size=0.01, legend.text.size=10, legend.r=0.27, dim.lgd.plot.margin=margin(t=0.01, r=0.15, b=0.01, l=0.15, unit="npc"))
Co-visualization of "cell-by-value" coloring in automated method. This plot is created on gene "Atp2b1". Each cell and each tissue are colored independently according to expression values of "Atp2b1".

Figure 10: Co-visualization of “cell-by-value” coloring in automated method
This plot is created on gene “Atp2b1”. Each cell and each tissue are colored independently according to expression values of “Atp2b1”.

4.3 Spatial Single Cell Data

Except for single-cell data, the co-visualization module is able to co-visualize spatially resolved single-cell (SRSC) and bulk data as well. In the following example, the bulk data are the same as the Annotation Labels section, while the SRSC data are from the anterior region of sagittal mouse brain, which is generated by 10X Genomics Visium. For simplicity, the pre-processing steps of bulk and SRSC data are not described here and the pre-processed data are imported directly. These steps include (1) jointly normalizing bulk and SRSC data and subsequently separating them, which is performed with the the function norm_srsc in spatialHeatmap, (2) reducing dimensions (PCA, UMAP, TSNE) of the separated SRSC data, and (3) clustering the SRSC data. More details are described in the Seurat vignette Hao et al. (2021).

The pre-processed bulk and SRSC data are imported and partially shown.

# Importing bulk data.
blk.sp <- readRDS(system.file("extdata/shinyApp/data", "bulk_sp.rds", package="spatialHeatmap"))
# Bulk assay data are partially shown.
assay(blk.sp)[1:3, ]
## 3 x 4 sparse Matrix of class "dgCMatrix"
##        cerebral.cortex hippocampus hypothalamus cerebellum
## Resp18               .           .            7          .
## Epha4                5           9            2          .
## Scg2                 4           6           52          4
# Importing SRSC data.
srt.sc <- read_cache(cache.pa, 'srt.sc') 
if (is.null(srt.sc)) {
  srt.sc <- readRDS(gzcon(url("https://zenodo.org/record/7843362/files/srt_sc.rds?download=1")))
  save_cache(dir=cache.pa, overwrite=TRUE, srt.sc)
}
# SRSC assay data are partially shown.
srt.sc@assays$SCT@data[1:3, 1:2]
## 3 x 2 sparse Matrix of class "dgCMatrix"
##        AAACAAGTATCTCCCA.1 AAACACCAATAACTGC.1
## Resp18          1.7917595           1.386294
## Epha4           0.6931472           .       
## Scg2            2.5649494           3.091042

The SRSC data are stored in a Seurat object. The cluster labels of cells are stored in the seurat_clusters column of the meta.data slot and partially shown.

# SRSC metadata of cells are partially shown.
srt.sc@meta.data[1:2, c('seurat_clusters', 'nFeature_SCT')]
##                    seurat_clusters nFeature_SCT
## AAACAAGTATCTCCCA.1              11          258
## AAACACCAATAACTGC.1               9          237

The coordinates of spatial spots are stored in the image slot in the Seurat object and partially shown.

# Coordinates of spatial spots are partially shown.
srt.sc@images$anterior1@coordinates[1:2, c('imagerow', 'imagecol')]
##                    imagerow imagecol
## AAACAAGTATCTCCCA.1     7475     8501
## AAACACCAATAACTGC.1     8553     2788

The aSVG of mouse brain is included in spatialHeatmap and imported into an SVG object.

svg.mus.sp <- read_svg(system.file("extdata/shinyApp/data", "mus_musculus.brain_sp.svg", package="spatialHeatmap"), srsc=TRUE)

In order to position spatial spots in the SRSC data correctly in the aSVG, a shape named overlay that defines the region of the spatial spots is required in the aSVG. By using this shape as a reference, the spatial coordinates in the SRSC data are transformed so that all spatial spots are positioned within this shape. The overlay shape can be created by using the spatial plot created by the SpatialFeaturePlot function in Seurat as a template.

attribute(svg.mus.sp)[[1]][7:8, c('feature', 'id', 'fill', 'stroke')]
## # A tibble: 2 × 4
##   feature           id             fill  stroke
##   <chr>             <chr>          <chr>  <dbl>
## 1 overlay           overlay        none   0.258
## 2 medulla.oblongata UBERON_0001896 none   0.1

In the SVG object, the angle slot is designed for optionally rotating the spatial spots. In the SRSC data generated by the Visium technology, a 90 degree rotation required for correctly positioning the spatial spots, which is shown below.

angle(svg.mus.sp)[[1]] <- angle(svg.mus.sp)[[1]] + 90 

The co-visualization plot is created with the gene ‘Epha4’ using ‘cell-by-value’ coloring. Assigning FALSE to profile will turn on the ‘fixed-group’ coloring. The cluster labels are treated as the cell group labels (cell.group='seurat_clusters'). The cell clusters of ‘1’, ‘2’, ‘3’, and ‘5’ roughly correspond to the ‘cerebral.cortex’ tissue in SHM, and they are chosen as the target cells to show (tar.cell=c(1:3, 5)). The co-visualization plot consists of an embedding plot on the left, a single-cell SHM (scSHM) in the middle, and a SHM on the right. In the scSHM, spatial spots are positioned in the overlay region and overlaid by the anatomical structures in the aSVG.

dat.srsc <- SPHM(svg=svg.mus.sp, bulk=blk.sp, cell=srt.sc)
covis(data=dat.srsc, ID='Epha4', assay.na='logcounts', dimred='TSNE', cell.group='seurat_clusters', tar.cell=c(1:3, 5), bar.width=0.08, dim.lgd.nrow=1, dim.lgd.text.size=10, legend.r=1.5, legend.key.size=0.013, legend.text.size=12, legend.nrow=5, h=0.6, profile=TRUE, ncol=3, vjust=5, dim.lgd.key.size=3, size.r=0.97, dim.axis.font.size=8, size.pt=1.5)
Co-visualization of SRSC data with SHM. This plot is created on gene "Epha4". Each spatial spot and each tissue are colored independently according to expression values of "Epha4".

Figure 11: Co-visualization of SRSC data with SHM
This plot is created on gene “Epha4”. Each spatial spot and each tissue are colored independently according to expression values of “Epha4”.

5 Shiny App

The co-visualization module is included in the Shiny App that is an GUI implementation of spatialHeatmap. To start this app, simply call shiny_shm() in R. Below is a screenshot of the co-visulization output.

Screenshot of the co-visualization output in Shiny App.

Figure 12: Screenshot of the co-visualization output in Shiny App

When using the Shiny App, if bulk and single-cell count data are provided, column-wise combine them in a SCE object, and format the metadata in the colData slot according to the following rules:

  1. In the bulkCell column, use bulk and cell to indicate bulk and cell samples respectively. If no bulk is included in this column, all samples are considered cells.

  2. If multiple cell group labels (annotation labels, manual assignments, marker genes) are provided, include them in columns of label, label1, label2, and so on respectively. In each of these label columns include corresponding aSVG features as tissue labels.

After formatting the metadata, save the SCE object as an .rds file using saveRDS, then upload the .rds and aSVG file to the App. An example of bulk and single-cell data for use in the Shiny App are included in spatialHeatmap and shown below.

shiny.dat.pa <- system.file("extdata/shinyApp/data", "shiny_covis_bulk_cell_mouse_brain.rds", package="spatialHeatmap")
shiny.dat <- readRDS(shiny.dat.pa)
colData(shiny.dat)
## DataFrame with 1061 rows and 4 columns
##                           label          label1    bulkCell    variable
##                     <character>     <character> <character> <character>
## cerebral.cortex cerebral.cortex cerebral.cortex        bulk     control
## hippocampus         hippocampus     hippocampus        bulk     control
## hypothalamus       hypothalamus    hypothalamus        bulk     control
## cerebellum           cerebellum      cerebellum        bulk     control
## cerebral.cortex cerebral.cortex cerebral.cortex        bulk     control
## ...                         ...             ...         ...         ...
## retrohipp             retrohipp           clus4        cell     control
## retrohipp             retrohipp           clus4        cell     control
## cere                       cere           clus2        cell     control
## cere                       cere           clus2        cell     control
## midbrain               midbrain           clus2        cell     control

6 Supplementary Section

6.1 Manual or Clustering Method

To provide additional flexibility for defining cell groupings, several manual options are provided. Here users can assign cell groups manually or by clustering methods for single cell embedding data that are often used in the analysis of single-cell data. The resulting cell grouping or cluster information needs to be stored in a tabular file, that will be imported into an SCE object (here cell_group function). The following demonstration uses the same single cell and aSVG instance as the annotation example above. The only difference is an additional clustering step. For demonstration purposes a small example of a cluster file is included in the spatialHeatmap package. In this case the group labels were created by the cluster_cell function. The details of this function are available in its help file. The cluster file contains at least two columns: a column (here cell) with single cell identifiers used under colData and a column (here cluster) with the cell group labels. For practical reasons of building this vignette a pure manual example could not be used here. However, the chosen clustering example can be easily adapted to manual or hybrid grouping approaches since the underlying tabular data structure is the same for both that can be generated in most text or spreadsheet programs.

manual.clus.mus.sc.pa <- system.file("extdata/shinyApp/data", "manual_cluster_mouse_brain.txt", package="spatialHeatmap")
manual.clus.mus.sc <- read.table(manual.clus.mus.sc.pa, header=TRUE, sep='\t')
manual.clus.mus.sc[1:3, ]
##                 cell cluster
## 1 C1.1772078.029.F11   clus7
## 2 C1.1772089.202.E04   clus7
## 3 C1.1772099.091.D10   clus1

The cell_group function can be used to append the imported group labels to the colData slot of an SCE object without interfering with other functions and methods operating on SCE objects.

sce.clus <- cell_group(sce=sce.dimred.quick, df.group=manual.clus.mus.sc, cell='cell', cell.group='cluster')
colData(sce.clus)[1:3, c('cluster', 'label', 'variable')]
## DataFrame with 3 rows and 3 columns
##                        cluster        label    variable
##                    <character>  <character> <character>
## C1.1772078.029.F11       clus7 hypothalamus     control
## C1.1772089.202.E04       clus7       SN.VTA     control
## C1.1772099.091.D10       clus1  dorsal.horn     control

An embedding plot of single cell data is created. The cells represented as dots are colored by the grouping information stored in the cluster column of the colData slot of SCE.

plot_dim(sce.clus, color.by="cluster", dim='TSNE')
Embedding plot of single cells. The cells (dots) are colored by the grouping information stored in the `colData` slot of the corresponding `SCE` object .

Figure 13: Embedding plot of single cells
The cells (dots) are colored by the grouping information stored in the colData slot of the corresponding SCE object .

The same mouse brain aSVG as above is used here.

tail(attribute(svg.mus.brain)[[1]])[1:3, 1:4]
## # A tibble: 3 × 4
##   feature                      id             fill  stroke
##   <chr>                        <chr>          <chr>  <dbl>
## 1 brainstem                    UBERON_0002298 none    0.05
## 2 midbrain                     UBERON_0001891 none    0.05
## 3 dorsal.plus.ventral.thalamus UBERON_0001897 none    0.05

Similarly as above, a mapping list is used to match the cell clusters with aSVG features.

lis.match.clus <- list(clus1=c('cerebral.cortex'), clus3=c('brainstem', 'medulla.oblongata'))

This example is demonstrated with ‘cell-by-group’ coloring, so gene expression values need to be summarized for the cells within each group label. Any grouping column in colData can be used as labels for summarizing. In this manual method, the cluster labels are chosen.

If additional experimental variables are provided, the summarizing will consider them as well (here variable). The following example uses the cluster and variable columns as group labels and experimental variables, respectively.

sce.clus.aggr <- aggr_rep(sce.clus, assay.na='logcounts', sam.factor='cluster', con.factor='variable', aggr='mean')
colData(sce.clus.aggr)[1:3, c('cluster', 'label', 'variable')]
## DataFrame with 3 rows and 3 columns
##                    cluster           label    variable
##                <character>     <character> <character>
## clus7__control       clus7    hypothalamus     control
## clus1__control       clus1     dorsal.horn     control
## clus5__control       clus5 corpus.callosum     control

The co-visualization is plotted for gene Tcea1. In this example the coloring is based on the gene expression summary for each cell cluster. Completely manual groupings can be provided the same way.

# Store data objects in an SPHM container. 
dat.man.tobulk <- SPHM(svg=svg.mus.brain, bulk=sce.clus.aggr, cell=sce.clus, match=lis.match.clus)
covis(data=dat.man.tobulk, ID=c('Tcea1'), dimred='TSNE', cell.group='cluster', assay.na='logcounts', tar.cell=names(lis.match.clus), bar.width=0.09, dim.lgd.nrow=1, h=0.6, legend.r=1.5, legend.key.size=0.02, legend.text.size=12, legend.nrow=4)
Co-visualization with cluster groupings. Gene `Tcea1` is used as an example and the cell groupings were obtained by clustering.

Figure 14: Co-visualization with cluster groupings
Gene Tcea1 is used as an example and the cell groupings were obtained by clustering.

6.2 Automated Method: Other Coloring

6.2.1 Cell-by-Group

In ‘cell-by-group’ coloring, after separated from bulk, gene expression values in single-cell data are summarized by means within each cell group, i.e. tissue assignement.

# Separate single cell data.
coclus.sc <- subset(coclus.mus, , bulkCell=='cell')
# Summarize expression values in each cell group.
sc.aggr.coclus <- aggr_rep(data=coclus.sc, assay.na='logcounts', sam.factor='assignedBulk', aggr='mean')
colData(sc.aggr.coclus)
## DataFrame with 5 rows and 7 columns
##                     cluster    bulkCell      sample    assignedBulk  similarity
##                 <character> <character> <character>     <character> <character>
## none                  clus6        cell     isocort            none        none
## hippocampus           clus3        cell        olfa     hippocampus       0.155
## cerebral.cortex       clus4        cell     isocort cerebral.cortex       0.314
## hypothalamus          clus7        cell    striatum    hypothalamus       0.178
## cerebellum            clus7        cell    striatum      cerebellum       0.166
##                     index sizeFactor
##                 <integer>  <numeric>
## none                    5   0.711862
## hippocampus            16   0.917178
## cerebral.cortex        30   0.685422
## hypothalamus          300   1.537074
## cerebellum            400   0.671634

The co-visualization of ‘cell-by-group’ is built on gene ‘Atp2b1’. Cells in the embedding plot and respective assigned tissues in SHM are colored by mean expression values of ‘Atp2b1’ in each cell group.

# Store data objects in an SPHM container. 
dat.auto.tobulk <- SPHM(svg=svg.mus.brain, bulk=sc.aggr.coclus, cell=coclus.sc)
covis(data=dat.auto.tobulk, ID=c('Atp2b1'), dimred='TSNE', tar.cell=c('hippocampus', 'hypothalamus', 'cerebellum', 'cerebral.cortex'), dim.lgd.text.size=10, dim.lgd.nrow=2, bar.width=0.09, legend.nrow=5, h=0.6, legend.key.size=0.02, legend.text.size=12, legend.r=1.5)
Co-visualization of "cell-by-group" in automated method. This plot is created on gene "Atp2b1". Colors between the embedding plot and SHM indicate matching of cells with source tissues.

Figure 15: Co-visualization of “cell-by-group” in automated method
This plot is created on gene “Atp2b1”. Colors between the embedding plot and SHM indicate matching of cells with source tissues.

6.2.2 Feature-by-Group

In ‘feature-by-group’, bulk data are separated from cell. Since replicates are already aggregated in preprocessing, this step is skipped.

coclus.blk <- subset(coclus.mus, , bulkCell=='bulk')

Same with conventions in the main vignette, at least one tissue in assay data should be the same with an aSVG feature so as to successfully plot SHMs.

colnames(coclus.blk) %in% attribute(svg.mus.brain)[[1]]$feature
## [1] TRUE TRUE TRUE TRUE

The co-visualization of ‘feature-by-group’ is built on gene ‘Atp2b1’. Cells in the embedding plot and respective assigned tissues in SHM are colored by mean expression values of ‘Atp2b1’ in tissue replicates.

# Store data objects in an SPHM container. 
dat.auto.tocell <- SPHM(svg=svg.mus.brain, bulk=coclus.blk, cell=coclus.sc)
covis(data=dat.auto.tocell, ID=c('Atp2b1'), dimred='TSNE', tar.bulk=colnames(coclus.blk), assay.na='logcounts', legend.nrow=5, dim.lgd.text.size=10, dim.lgd.nrow=2, bar.width=0.08, h=0.6, legend.key.size=0.02, legend.text.size=12, legend.r=1.5)
Co-visualization of "feature-by-group" in automated method. This plot is created on gene "Atp2b1". Colors between the embedding plot and SHM indicate matching of cells with tissues.

Figure 16: Co-visualization of “feature-by-group” in automated method
This plot is created on gene “Atp2b1”. Colors between the embedding plot and SHM indicate matching of cells with tissues.

6.2.3 Fixed-Group

By setting profile=FALSE, the co-visualization is created with ‘fixed-group’ coloring, where constant colors in the embedding plot and SHM indicate matching between cells and tissues.

covis(data=dat.auto.tocell, ID=c('Atp2b1'), dimred='TSNE', profile=FALSE, assay.na='logcounts', legend.nrow=4, dim.lgd.text.size=10, dim.lgd.nrow=2, bar.width=0.09)
Co-visualization of "fixed-group" in automated method. Matching of cells in the embedding plot with tissues in SHM is indicated by constant colors.

Figure 17: Co-visualization of “fixed-group” in automated method
Matching of cells in the embedding plot with tissues in SHM is indicated by constant colors.

6.3 Tailoring Co-clustering Results

6.3.1 Assigning Desired Tissue in R

The tissue-cell assignments in the automated co-clustering can be optionally tailored. The tailoring can be performed in command line or on a specialized Shiny App. This section illustrates the command-line based tailoring. First visualize single cells in an embedding plot as shown below. In order to define more accurate coordinates in the next step, tune the x-y axis breaks (x.break, y.break) and set panel.grid=TRUE.

plot_dim(coclus.mus, dim='PCA', color.by='sample', x.break=seq(-10, 10, 1), y.break=seq(-10, 10, 1), panel.grid=TRUE, lgd.ncol=2)
Embedding plot of tissues and single cells of mouse brain. Tissues and single cells are indicated by large and small circles respectively.

Figure 18: Embedding plot of tissues and single cells of mouse brain
Tissues and single cells are indicated by large and small circles respectively.

Second, define desired tissue (desiredBulk) for cells selected by x-y coordinate ranges (x.min, x.max, y.min, y.max) in the embedding plot in form of a data.frame (df.desired.bulk). The dimred reveals where the coordinates come from and is required. For demonstration, some cells near the tissue hippocampus (the large green dot) are selected and hippocampus is chosen as the desired tissue.

df.desired.bulk <- data.frame(x.min=c(-8), x.max=c(5), y.min=c(1), y.max=c(5), desiredBulk=c('hippocampus'), dimred='PCA') 
df.desired.bulk

The tissue-cell assignments are updated with the desired tissue assignments with the function refine_asg. The similarities corresponding to desired tissue are internally set at the maximum of 1. After that, single-cell data are separated from bulk data for co-visualization.

# Incorporate desired bulk
coclus.mus.tailor <- refine_asg(sce.all=coclus.mus, df.desired.bulk=df.desired.bulk)
# Separate cells from bulk
coclus.sc.tailor <- subset(coclus.mus.tailor, , bulkCell=='cell')

After tailoring, the co-visualization plot of ‘feature-by-group’ coloring is created on gene ‘Atp2b1’ (Figure 19). To reveal the tailoring in the plot, only the tissue hippocampus is selected to show through the argument tar.bulk. In this plot, cells defined in df.desired.bulk have the same color as the desired tissue hippocampus in SHM. Cells of hippocampus in the embedding plot include tailored cells in df.desired.bulk and those labeled hippocampus in co-clustering. As a comparison, the hippocampus cells before tailoring is shown in Figure 20.

# Store data objects in an SPHM container. 
dat.auto.tocell.tailor <- SPHM(svg=svg.mus.brain, bulk=coclus.blk, cell=coclus.sc.tailor)
covis(data=dat.auto.tocell.tailor, ID=c('Atp2b1'), dimred='PCA', tar.bulk=c('hippocampus'), assay.na='logcounts', legend.nrow=4, dim.lgd.text.size=10, dim.lgd.nrow=2, bar.width=0.08, legend.r=1.5)
Co-visualization of "feature-by-group" after tailoring. This plot is created on gene "Atp2b1". Only the tissue and cells of hippocampus are shown to display the tailoring.

Figure 19: Co-visualization of “feature-by-group” after tailoring
This plot is created on gene “Atp2b1”. Only the tissue and cells of hippocampus are shown to display the tailoring.

covis(data=dat.auto.tocell, ID=c('Atp2b1'), dimred='PCA', tar.bulk=c('hippocampus'), assay.na='logcounts', legend.nrow=4, dim.lgd.text.size=10, dim.lgd.nrow=2, bar.width=0.08, legend.r=1.5)
Co-visualization of "feature-by-group" before tailoring. This plot is created on gene "Atp2b1". Only the tissue and cells of hippocampus are shown.

Figure 20: Co-visualization of “feature-by-group” before tailoring
This plot is created on gene “Atp2b1”. Only the tissue and cells of hippocampus are shown.

6.3.2 Assigning Desired Tissue on Shiny App

This section describes tailoring co-clustering results on the convenience Shiny App, which can be lauched by calling desired_bulk_shiny.

Figure 21 is the screenshot of the Shiny app. The file to upload is the co-clustering result returned by cocluster, here coclus.mus. It should be saved in an .rds file by using saveRDS before uploaded to the App. On the left embedding plot, cells are selected with the “Lasso Select” tool. On the right, selected cells and their coordinates are listed in a table. The desired tissues (aSVG features) can be selected from the dropdown list, here hippocampus. To download the table just click the “Download” button. The “Help” button gives more instructions.

Screenshot of the Shiny App for selecting desired tissues. On the left is the embedding plot of single cells, where target cells are selected with the "Lasso Select" tool. On the right, desired tissues are assigned for selected cell.

Figure 21: Screenshot of the Shiny App for selecting desired tissues
On the left is the embedding plot of single cells, where target cells are selected with the “Lasso Select” tool. On the right, desired tissues are assigned for selected cell.

An example of desired tisssues downloaded from the convenience Shiny App is shown below. The x-y coordinates refer to single cells in embbeding plots (dimred). The df.desired.bulk is ready to use in the tailoring section.

desired.blk.pa <- system.file("extdata/shinyApp/data", "selected_cells_with_desired_bulk.txt", package="spatialHeatmap")
df.desired.blk <- read.table(desired.blk.pa, header=TRUE, row.names=1, sep='\t')
df.desired.blk[1:3, ]


7 Version Informaion

sessionInfo()
## R version 4.3.0 RC (2023-04-13 r84266)
## Platform: aarch64-apple-darwin20 (64-bit)
## Running under: macOS Monterey 12.6.1
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0
## 
## locale:
## [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: America/New_York
## tzcode source: internal
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] SeuratObject_4.1.3          sp_1.6-0                   
##  [3] kableExtra_1.3.4            SingleCellExperiment_1.22.0
##  [5] ggplot2_3.4.2               SummarizedExperiment_1.30.2
##  [7] Biobase_2.60.0              GenomicRanges_1.52.0       
##  [9] GenomeInfoDb_1.36.0         IRanges_2.34.0             
## [11] S4Vectors_0.38.1            BiocGenerics_0.46.0        
## [13] MatrixGenerics_1.12.0       matrixStats_0.63.0         
## [15] spatialHeatmap_2.6.0        knitr_1.42                 
## [17] BiocStyle_2.28.0           
## 
## loaded via a namespace (and not attached):
##   [1] spatstat.sparse_3.0-1     bitops_1.0-7             
##   [3] httr_1.4.5                webshot_0.5.4            
##   [5] RColorBrewer_1.1-3        doParallel_1.0.17        
##   [7] dynamicTreeCut_1.63-1     sctransform_0.3.5        
##   [9] tools_4.3.0               backports_1.4.1          
##  [11] utf8_1.2.3                R6_2.5.1                 
##  [13] lazyeval_0.2.2            uwot_0.1.14              
##  [15] withr_2.5.0               gridExtra_2.3            
##  [17] preprocessCore_1.62.1     progressr_0.13.0         
##  [19] WGCNA_1.72-1              cli_3.6.1                
##  [21] spatstat.explore_3.1-0    flashClust_1.01-2        
##  [23] grImport_0.9-7            labeling_0.4.2           
##  [25] sass_0.4.5                Seurat_4.3.0             
##  [27] spatstat.data_3.0-1       genefilter_1.82.1        
##  [29] pbapply_1.7-0             ggridges_0.5.4           
##  [31] systemfonts_1.0.4         yulab.utils_0.0.6        
##  [33] foreign_0.8-84            svglite_2.1.1            
##  [35] scater_1.28.0             parallelly_1.35.0        
##  [37] limma_3.56.2              rstudioapi_0.14          
##  [39] impute_1.74.1             RSQLite_2.3.1            
##  [41] FNN_1.1.3.2               visNetwork_2.1.2         
##  [43] generics_0.1.3            gridGraphics_0.5-1       
##  [45] spatstat.random_3.1-4     ica_1.0-3                
##  [47] gtools_3.9.4              dplyr_1.1.1              
##  [49] GO.db_3.17.0              Matrix_1.5-4             
##  [51] ggbeeswarm_0.7.1          fansi_1.0.4              
##  [53] abind_1.4-5               lifecycle_1.0.3          
##  [55] yaml_2.3.7                edgeR_3.42.4             
##  [57] gplots_3.1.3              BiocFileCache_2.8.0      
##  [59] Rtsne_0.16                grid_4.3.0               
##  [61] blob_1.2.4                promises_1.2.0.1         
##  [63] dqrng_0.3.0               crayon_1.5.2             
##  [65] shinydashboard_0.7.2      miniUI_0.1.1.1           
##  [67] lattice_0.21-8            beachmat_2.16.0          
##  [69] cowplot_1.1.1             annotate_1.78.0          
##  [71] KEGGREST_1.40.0           magick_2.7.4             
##  [73] pillar_1.9.0              metapod_1.8.0            
##  [75] future.apply_1.10.0       codetools_0.2-19         
##  [77] leiden_0.4.3              glue_1.6.2               
##  [79] data.table_1.14.8         vctrs_0.6.1              
##  [81] png_0.1-8                 gtable_0.3.3             
##  [83] cachem_1.0.7              xfun_0.38                
##  [85] S4Arrays_1.0.4            mime_0.12                
##  [87] survival_3.5-5            iterators_1.0.14         
##  [89] statmod_1.5.0             bluster_1.10.0           
##  [91] ellipsis_0.3.2            fitdistrplus_1.1-8       
##  [93] ROCR_1.0-11               nlme_3.1-162             
##  [95] bit64_4.0.5               filelock_1.0.2           
##  [97] RcppAnnoy_0.0.20          UpSetR_1.4.0             
##  [99] bslib_0.4.2               irlba_2.3.5.1            
## [101] vipor_0.4.5               KernSmooth_2.23-20       
## [103] rpart_4.1.19              colorspace_2.1-0         
## [105] DBI_1.1.3                 Hmisc_5.0-1              
## [107] nnet_7.3-18               tidyselect_1.2.0         
## [109] bit_4.0.5                 compiler_4.3.0           
## [111] curl_5.0.0                rvest_1.0.3              
## [113] htmlTable_2.4.1           BiocNeighbors_1.18.0     
## [115] xml2_1.3.3                ggdendro_0.1.23          
## [117] DelayedArray_0.26.3       plotly_4.10.1            
## [119] bookdown_0.33             checkmate_2.1.0          
## [121] scales_1.2.1              caTools_1.18.2           
## [123] lmtest_0.9-40             rappdirs_0.3.3           
## [125] goftest_1.2-3             stringr_1.5.0            
## [127] digest_0.6.31             spatstat.utils_3.0-2     
## [129] rmarkdown_2.21            XVector_0.40.0           
## [131] htmltools_0.5.5           pkgconfig_2.0.3          
## [133] base64enc_0.1-3           sparseMatrixStats_1.12.0 
## [135] highr_0.10                dbplyr_2.3.2             
## [137] fastmap_1.1.1             rlang_1.1.0              
## [139] htmlwidgets_1.6.2         shiny_1.7.4              
## [141] DelayedMatrixStats_1.22.0 farver_2.1.1             
## [143] jquerylib_0.1.4           zoo_1.8-12               
## [145] jsonlite_1.8.4            BiocParallel_1.34.2      
## [147] BiocSingular_1.16.0       RCurl_1.98-1.12          
## [149] magrittr_2.0.3            Formula_1.2-5            
## [151] scuttle_1.10.1            GenomeInfoDbData_1.2.10  
## [153] ggplotify_0.1.0           patchwork_1.1.2          
## [155] munsell_0.5.0             Rcpp_1.0.10              
## [157] reticulate_1.28           viridis_0.6.2            
## [159] stringi_1.7.12            zlibbioc_1.46.0          
## [161] MASS_7.3-58.4             plyr_1.8.8               
## [163] parallel_4.3.0            listenv_0.9.0            
## [165] ggrepel_0.9.3             deldir_1.0-6             
## [167] Biostrings_2.68.1         splines_4.3.0            
## [169] tensor_1.5                locfit_1.5-9.7           
## [171] igraph_1.4.2              fastcluster_1.2.3        
## [173] spatstat.geom_3.1-0       reshape2_1.4.4           
## [175] ScaledMatrix_1.8.1        XML_3.99-0.14            
## [177] evaluate_0.20             scran_1.28.1             
## [179] BiocManager_1.30.20       foreach_1.5.2            
## [181] httpuv_1.6.9              polyclip_1.10-4          
## [183] RANN_2.6.1                tidyr_1.3.0              
## [185] purrr_1.0.1               scattermore_0.8          
## [187] future_1.32.0             rsvd_1.0.5               
## [189] xtable_1.8-4              rsvg_2.4.0               
## [191] later_1.3.0               viridisLite_0.4.1        
## [193] tibble_3.2.1              memoise_2.0.1            
## [195] beeswarm_0.4.0            AnnotationDbi_1.62.1     
## [197] cluster_2.1.4             globals_0.16.2

8 Funding

This project has been funded by NSF awards: PGRP-1546879, PGRP-1810468, PGRP-1936492.

9 References

Amezquita, Robert, Aaron Lun, Etienne Becht, Vince Carey, Lindsay Carpp, Ludwig Geistlinger, Federico Marini, et al. 2020. “Orchestrating Single-Cell Analysis with Bioconductor.” Nature Methods 17: 137–45. https://www.nature.com/articles/s41592-019-0654-x.
Butler, Andrew, Paul Hoffman, Peter Smibert, Efthymia Papalexi, and Rahul Satija. 2018. “Integrating Single-Cell Transcriptomic Data Across Different Conditions, Technologies, and Species.” Nature Biotechnology 36: 411–20. https://doi.org/10.1038/nbt.4096.
Chang, Winston, Joe Cheng, JJ Allaire, Cars on Sievert, Barret Schloerke, Yihui Xie, Jeff Allen, Jonathan McPherson, Alan Dipert, and Barbara Borges. 2021. Shiny: Web Application Framework for r. https://CRAN.R-project.org/package=shiny.
Gentleman, R, V Carey, W Huber, and F Hahne. 2018. “Genefilter: Methods for Filtering Genes from High-Throughput Experiments.” http://bioconductor.uib.no/2.7/bioc/html/genefilter.html.
Hao, Yuhan, Stephanie Hao, Erica Andersen-Nissen, William M. Mauck III, Shiwei Zheng, Andrew Butler, Maddie J. Lee, et al. 2021. “Integrated Analysis of Multimodal Single-Cell Data.” Cell. https://doi.org/10.1016/j.cell.2021.04.048.
Lun, Aaron T. L., Davis J. McCarthy, and John C. Marioni. 2016. “A Step-by-Step Workflow for Low-Level Analysis of Single-Cell RNA-Seq Data with Bioconductor.” F1000Res. 5: 2122. https://doi.org/10.12688/f1000research.9501.2.
Marques, Sueli, Amit Zeisel, Simone Codeluppi, David van Bruggen, Ana Mendanha Falcão, Lin Xiao, Huiliang Li, et al. 2016. “Oligodendrocyte Heterogeneity in the Mouse Juvenile and Adult Central Nervous System.” Science 352 (6291): 1326–29.
Morgan, Martin, Valerie Obenchain, Jim Hester, and Hervé Pagès. 2018. SummarizedExperiment: SummarizedExperiment Container.
Ortiz, Cantin, Jose Fernandez Navarro, Aleksandra Jurek, Antje Märtin, Joakim Lundeberg, and Konstantinos Meletis. 2020. “Molecular Atlas of the Adult Mouse Brain.” Science Advances 6 (26): eabb3446.
Risso, Davide, and Michael Cole. 2022. scRNAseq: Collection of Public Single-Cell RNA-Seq Datasets.
Satija, Rahul, Jeffrey A Farrell, David Gennert, Alexander F Schier, and Aviv Regev. 2015. “Spatial Reconstruction of Single-Cell Gene Expression Data.” Nature Biotechnology 33: 495–502. https://doi.org/10.1038/nbt.3192.
Stuart, Tim, Andrew Butler, Paul Hoffman, Christoph Hafemeister, Efthymia Papalexi, William M Mauck III, Yuhan Hao, Marlon Stoeckius, Peter Smibert, and Rahul Satija. 2019. “Comprehensive Integration of Single-Cell Data.” Cell 177: 1888–1902. https://doi.org/10.1016/j.cell.2019.05.031.
Vacher, Claire-Marie, Helene Lacaille, Jiaqi J O’Reilly, Jacquelyn Salzbank, Dana Bakalar, Sonia Sebaoui, Philippe Liere, et al. 2021. “Placental Endocrine Function Shapes Cerebellar Development and Social Behavior.” Nat. Neurosci. 24 (10): 1392–1401.
Zhang, Jianhai, Jordan Hayes, Le Zhang, Bing Yang, Wolf Frommer, Julia Bailey-Serres, and Thomas Girke. 2022. spatialHeatmap: spatialHeatmap. https://github.com/jianhaizhang/spatialHeatmap.

Appendix