1 Introduction

1.1 Motivation

The spatialHeatmap package provides functionalities for visualizing cell-, tissue- and organ-specific data of biological assays by coloring the corresponding spatial features defined in anatomical images according to quantitative abundance levels of measured molecules, such as transcripts, proteins or metabolites. The color key used to represent the quantitative assay values can be customized by the user. This core functionality of the package is called a spatial heatmap (SHM) plot. Additional important functionalities include Spatial Enrichment (SE), Spatial Co-Expression (SCE), and Single Cell to SHM Co-Visualization (SC2SHM-CoViz). These extra utilities are useful for identifying molecules with spatially selective abundance patterns (SE), groups of molecules with related abundance profiles using matrix heatmaps combined with hierarchical clustering and network representations (SCE), and to co-visualize single cell embedding results with SHMs (SCSHM-CoViz). The latter co-visualization functionality (Figure 1E) is described in a separate vignette called SCSHM-CoViz.

The functionalities of spatialHeatmap can be used either in a command-driven mode from within R or a graphical user interface (GUI) provided by a Shiny App that is part of this project. While the R-based mode provides flexibility to customize and automate analysis routines, the Shiny App includes a variety of convenience features that will appeal to experimentalists and users less familiar with R. The Shiny App can be used on both local computers as well as centralized server-based deployments (e.g. cloud-based or custom servers). This way it can be used for both hosting community data on a public server or running on a private system. The core functionalities of the spatialHeatmap package are illustrated in Figure 1.

Overview of spatialHeatmap. (A) The _saptialHeatmap_ package plots numeric assay data onto spatially annotated images. The assay data can be provided as numeric vectors, tabular data, _SummarizedExperiment_, or _SingleCellExperiment_ objects. The latter two are widely used data containers for organizing both assays as well as associated annotation data and experimental designs. (B) Anatomical and other spatial images need to be provided as annotated SVG (aSVG) files where the spatial features and the corresponding components of the assay data have matching labels (_e.g._ tissue labels). To work with SVG data efficiently, the _SVG_ S4 class container has been developed. The assay data are used to color the matching spatial features in aSVG images according to a color key. (C)-(D) The result is called a spatial heatmap (SHM). Moreover, (E) Single cell embedding results can be co-visualized with SHMs. (F) Data mining graphics such as matrix heatmaps and network graphs can be integrated to facilitate the identification of transcripts or other molecules with similar abundance profiles.

Figure 1: Overview of spatialHeatmap
(A) The saptialHeatmap package plots numeric assay data onto spatially annotated images. The assay data can be provided as numeric vectors, tabular data, SummarizedExperiment, or SingleCellExperiment objects. The latter two are widely used data containers for organizing both assays as well as associated annotation data and experimental designs. (B) Anatomical and other spatial images need to be provided as annotated SVG (aSVG) files where the spatial features and the corresponding components of the assay data have matching labels (e.g. tissue labels). To work with SVG data efficiently, the SVG S4 class container has been developed. The assay data are used to color the matching spatial features in aSVG images according to a color key. (C)-(D) The result is called a spatial heatmap (SHM). Moreover, (E) Single cell embedding results can be co-visualized with SHMs. (F) Data mining graphics such as matrix heatmaps and network graphs can be integrated to facilitate the identification of transcripts or other molecules with similar abundance profiles.

As anatomical images the package supports both tissue maps from public repositories and custom images provided by the user. In general any type of image can be used as long as it can be provided in SVG (Scalable Vector Graphics) format, where the corresponding spatial features, such as organs, tissues, cellular compartments, are annotated (see aSVG below). The numeric values plotted onto an SHM are usually quantitative measurements from a wide range of profiling technologies, such as microarrays, next generation sequencing (e.g. RNA-Seq and scRNA-Seq), proteomics, metabolomics, or many other small- or large-scale experiments. For convenience, several preprocessing and normalization methods for the most common use cases are included that support raw and/or preprocessed data. Currently, the main application domains of the spatialHeatmap package are numeric data sets and spatially mapped images from biological, agricultural and biomedical areas. Moreover, the package has been designed to also work with many other spatial data types, such a population data plotted onto geographic maps. This high level of flexibility is one of the unique features of spatialHeatmap. Related software tools for biological applications in this field are largely based on pure web applications (Maag 2018; Lekschas et al. 2015; Papatheodorou et al. 2018; Winter et al. 2007; Waese et al. 2017) or local tools (Muschelli, Sweeney, and Crainiceanu 2014) that typically lack customization functionalities. These restrictions limit users to utilizing pre-existing expression data and/or fixed sets of anatomical image collections. To close this gap for biological use cases, we have developed spatialHeatmap as a generic R/Bioconductor package for plotting quantitative values onto any type of spatially mapped images in a programmable environment and/or in an intuitive to use GUI application.

1.2 Design

The core feature of spatialHeatmap is to map assay values (e.g. gene expression data) of one or many molecules (e.g. genes) measured under different conditions in form of numerically graded colors onto the corresponding cell types or tissues represented in a chosen SVG image. In the gene profiling field, this feature supports comparisons of the expression values among multiple genes by plotting their SHMs next to each other. Similarly, one can display the expression values of a single or multiple genes across multiple conditions in the same plot (Figure 4). This level of flexibility is very efficient for visualizing complicated expression patterns across genes, cell types and conditions. In case of more complex anatomical images with overlapping multiple layer tissues, it is important to visually expose the tissue layer of interest in the plots. To address this, several default and customizable layer viewing options are provided. They allow to hide features in the top layers by making them transparent in order to expose features below them. This transparency viewing feature is highlighted below in the mouse example (Figure 5). Moreover, one can plot multiple distinct aSVGs in a single SHM plot as shown in Figure 11. This is particularly useful for displaying abundance trends across multiple development stages, where each is represented by its own aSVG image. In addition to static SHM representations, one can visualize them in form of dynamic animations embedded in interactive HTML files or generate videos for them.

To maximize reusability and extensibility, the package organizes large-scale omics assay data along with the associated experimental design information in a SummarizedExperiment object (Figure 1A; Morgan et al. 2018). The latter is one of the core S4 classes within the Bioconductor ecosystem that has been widely adapted by many other software packages dealing with gene-, protein- and metabolite-level profiling data. In case of gene expression data, the assays slot of the SummarizedExperiment container is populated with a gene expression matrix, where the rows and columns represent the genes and tissue/conditions, respectively. The colData slot contains experimental design definitions including replicate information. The tissues and/or cell type information in the object maps via colData to the corresponding features in the SVG images using unique identifiers for the spatial features (e.g. tissues or cell types). This allows to color the features of interest in an SVG image according to the numeric data stored in a SummarizedExperiment object. For simplicity the numeric data can also be provided as numeric vectors or data.frames. This can be useful for testing purposes and/or the usage of simple data sets that may not require the more advanced features of the SummarizedExperiment class, such as measurements with only one or a few data points. The details about how to access the SVG images and properly format the associated expression data are provided in the Supplementary Section of this vignette.

1.3 Image Format: SVG

SHMs are images where colors encode numeric values in features of any shape. For plotting SHMs, Scalable Vector Graphics (SVG) has been chosen as image format since it is a flexible and widely adapted vector graphics format that provides many advantages for computationally embedding numerical and other information in images. SVG is based on XML formatted text describing all components present in images, including lines, shapes and colors. In case of biological images suitable for SHMs, the shapes often represent anatomical or cell structures. To assign colors to specific features in SHMs, annotated SVG (aSVG) files are used where the shapes of interest are labeled according to certain conventions so that they can be addressed and colored programmatically. One or multiple aSVG files can be parsed and stored in the SVG S4 container with utilities provided by the spatialHeatmap package (Figure 1B). The data slots of SVG include coordinate, attribute, dimension, svg, and raster. They correspond to feature coordinates, styling attributes (color, line width, etc.), width and heigth, aSVG file paths, and raster image paths, respectively. Raster images are required only when including photographic image components in SHMs (Figure 8), which is optional.

SVGs and aSVGs of anatomical structures can be downloaded from many sources including the repositories described below. Alternatively, users can generate them themselves with vector graphics software such as Inkscape. Typically, in aSVGs one or more shapes of a feature of interest, such as the cell shapes of an organ, are grouped together by a common feature identifier. Via these group identifiers one or many feature types can be colored simultaneously in an aSVG according to biological experiments assaying the corresponding feature types with the required spatial resolution. Correct assignment of image features and assay results is assured by using for both the same feature identifiers. The color gradient used to visually represent the numeric assay values is controlled by a color gradient parameter. To visually interpret the meaning of the colors, the corresponding color key is included in the SHM plots. Additional details for properly formatting and annotating both aSVG images and assay data are provided in the Supplementary Section section of this vignette.

1.4 Data Repositories

If not generated by the user, SHMs can be generated with data downloaded from various public repositories. This includes gene, protein and metabolic profiling data from databases, such as GEO, BAR and Expression Atlas from EMBL-EBI (Papatheodorou et al. 2018). A particularly useful resource, when working with spatialHeatmap, is the EBI Expression Atlas. This online service contains both assay and anatomical images. Its assay data include mRNA and protein profiling experiments for different species, tissues and conditions. The corresponding anatomical image collections are also provided for a wide range of species including animals and plants. In spatialHeatmap several import functions are provided to work with the expression and aSVG repository from the Expression Atlas directly. The aSVG images developed by the spatialHeatmap project are available in its own repository called spatialHeatmap aSVG Repository, where users can contribute their aSVG images that are formatted according to our guidlines.

1.5 Tutorial Overview

The following sections of this vignette showcase the most important functionalities of the spatialHeatmap package using as initial example a simple to understand testing data set, and then more complex mRNA profiling data from the Expression Atlas and GEO databases. The co-visualization of tissue and single cell embedding plot is explained in a separate vignette (see SCSHM-CoViz). First, SHM plots are generated for both the testing and mRNA expression data. The latter include gene expression data sets from RNA-Seq and microarray experiments of Human Brain, Mouse Organs, Chicken Organs, and Arabidopsis Shoots. The first three are RNA-Seq data from the Expression Atlas, while the last one is a microarray data set from GEO. Second, gene context analysis tools are introduced, which facilitate the visualization of gene modules sharing similar expression patterns. This includes the visualization of hierarchical clustering results with traditional matrix heatmaps (Matrix Heatmap) as well co-expression network plots (Network). Third, the Spatial Enrichment functionality is illustrated with mouse RNA-seq data. Lastly, an overview of the corresponding Shiny App is presented that provides access to the same functionalities as the R functions, but executes them in an interactive GUI environment (Chang et al. 2021; Chang and Borges Ribeiro 2018).

In addition, a use case is presented in an external vignette, which focuses on integrated workflow of discovery and visualization of potential new genes responsible for a particular biological process.

2 Getting Started

2.1 Installation

The spatialHeatmap package should be installed from an R (version \(\ge\) 3.6) session with the BiocManager::install command.

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("spatialHeatmap")

2.2 Packages and Documentation

Next, the packages required for running the sample code in this vignette need to be loaded.

library(spatialHeatmap); library(SummarizedExperiment); library(ExpressionAtlas); library(GEOquery); library(igraph); library(BiocParallel); library(kableExtra)

The following lists the vignette(s) of this package in an HTML browser. Clicking the corresponding name will open this vignette.

browseVignettes('spatialHeatmap')

To reduce runtime, intermediate results can be cached under ~/.cache/shm.

cache.pa <- '~/.cache/shm' # Set path of the cache directory 

3 Spatial Heatmaps

3.1 Quick Start

Spatial Heatmaps (SHMs) are plotted with the spatial_hm function. To provide a quick and intuitive overview how these plots are generated, the following uses a generalized tesing example where a small vector of random numeric values is generated that are used to color features in an aSVG image. The image chosen for this example is an aSVG depicting the human brain. The corresponding image file homo_sapiens.brain.svg is included in this package for testing purposes.

3.1.1 aSVG Image

After the full path to the chosen target aSVG image on a user’s system is obtained with the system.file function, the function read_svg is used to import the aSVG information relevant for creating SHMs, which is stored in a SVG object svg.hum.

svg.hum.pa <- system.file("extdata/shinyApp/example", 'homo_sapiens.brain.svg', package="spatialHeatmap")
svg.hum <- read_svg(svg.hum.pa)

All features and their attributes can be accessed with the attribute function, where fill and stroke are the two most important ones providing color and line width information, respectively. The feature column includes group labels for sub-features in sub.feature. In the downstream, SHM plots are created by mapping assay data to labels in feature.

feature.hum <- attribute(svg.hum)[[1]]
tail(feature.hum[, 1:6], 3) # Partial features and respective attributes
## # A tibble: 3 × 6
##   feature                 id             fill  stroke sub.feature          index
##   <chr>                   <chr>          <chr>  <dbl> <chr>                <int>
## 1 cerebellar.hemisphere   UBERON_0002245 none   0.016 cerebellar.hemisphe…   247
## 2 nucleus.accumbens       UBERON_0001882 none   0.016 nucleus.accumbens      248
## 3 telencephalic.ventricle UBERON_0002285 none   0.016 telencephalic.ventr…   249

Feature coordinates can be accessed with the coordinate function.

coord.df <- coordinate(svg.hum)[[1]]
tail(coord.df, 3) # Partial features and respective coordinates
## # A tibble: 3 × 4
##       x     y feature                 index
##   <dbl> <dbl> <chr>                   <int>
## 1  194.  329. telencephalic.ventricle   249
## 2  194.  329. telencephalic.ventricle   249
## 3  194.  329. telencephalic.ventricle   249

3.1.2 Numeric Data

The following generates a small named vector for testing, where the data slot contains four numbers, and the name slot is populated with three feature names and one missing one (here ’notMapped"). This example is the same as the aSVG example above named feature.hum. Note, the numbers are mapped to features via matching names among the numeric vector and the aSVG, respectively. Accordingly, only numbers and features with matching name counterparts can be colored in the aSVG image. In addition, a summary of the numeric assay to feature mappings is stored in a data.frame returned by the spatial_hm function (see below).

set.seed(20) # To obtain reproducible results, a fixed seed is set.
unique(feature.hum$feature) # All aSVG features
##  [1] "g4320"                   "locus.ceruleus"         
##  [3] "diencephalon"            "medulla.oblongata"      
##  [5] "middle.temporal.gyrus"   "caudate.nucleus"        
##  [7] "middle.frontal.gyrus"    "occipital.lobe"         
##  [9] "parietal.lobe"           "pineal.gland"           
## [11] "substantia.nigra"        "thalamus"               
## [13] "putamen"                 "globus.pallidus"        
## [15] "meninx"                  "dura.mater"             
## [17] "hypothalamus"            "hippocampus1"           
## [19] "hippocampus2"            "cingulate.cortex"       
## [21] "amygdala"                "prefrontal.cortex"      
## [23] "frontal.cortex"          "cerebral.cortex"        
## [25] "temporal.lobe"           "cerebellum"             
## [27] "cerebellar.hemisphere"   "nucleus.accumbens"      
## [29] "telencephalic.ventricle"
my_vec <- setNames(sample(1:100, 4), c('substantia.nigra', 'putamen', 'prefrontal.cortex', 'notMapped'))
my_vec
##  substantia.nigra           putamen prefrontal.cortex         notMapped 
##                38                63                 2                29

3.1.3 Plot SHM

Next, the SHM is plotted with the spatial_hm function (Figure 2). Internally, the numbers in my_vec are translated into colors based on the color key assigned to the col.com argument, and then painted onto the corresponding features in the aSVG, where the aSVG instance is defined by the svg argument. In the given example (Figure 2) only three features in my_vec (‘substantia.nigra’, ‘putamen’, and ‘prefrontal.cortex’) have matching entries in the corresponding aSVG.

shm.lis <- spatial_hm(svg=svg.hum, data=my_vec, ID='testing', ncol=1, sub.title.size=20, legend.nrow=3)
SHM of human brain with testing data. The plots from the left to the right represent the color key for the numeric data, followed by four SHM plots and the legend of the spatial features. The numeric values provided in `my_vec` are used to color the corresponding features in the SHM plots according to the color key on the left. The legend plot identifies the spatial regions in the SHM plots.

Figure 2: SHM of human brain with testing data
The plots from the left to the right represent the color key for the numeric data, followed by four SHM plots and the legend of the spatial features. The numeric values provided in my_vec are used to color the corresponding features in the SHM plots according to the color key on the left. The legend plot identifies the spatial regions in the SHM plots.

The named numeric values in my_vec, that have name matches with the features in the chosen aSVG, are stored in the mapped_feature slot.

# Mapped features
shm.lis$mapped_feature
##        ID           feature value    fill                    SVG
## 1 testing  substantia.nigra    38 #FF8700 homo_sapiens.brain.svg
## 2 testing           putamen    63 #FF0000 homo_sapiens.brain.svg
## 3 testing prefrontal.cortex     2 #FFFF00 homo_sapiens.brain.svg

3.2 Human Brain

This subsection introduces how to query and download cell- and tissue-specific assay data in the Expression Atlas database using the ExpressionAtlas package (Keays 2019). After the choosen data is downloaded directly into a user's R session, the expression values for selected genes can be plotted onto a chosen aSVG image with or without prior preprocessing steps (e.g. normalization).

3.2.1 Gene Expression Data

The following example searches the Expression Atlas for expression data derived from specific tissues and species of interest, here ‘cerebellum’ and ‘Homo sapiens’, respectively.

all.hum <- read_cache(cache.pa, 'all.hum') # Retrieve data from cache.
if (is.null(all.hum)) { # Save downloaded data to cache if it is not cached.
  all.hum <- searchAtlasExperiments(properties="cerebellum", species="Homo sapiens")
  save_cache(dir=cache.pa, overwrite=TRUE, all.hum)
}

The search result contains 15 accessions. In the following code, the accession ‘E-GEOD-67196’ from Prudencio et al. (2015) has been chosen, which corresponds to an RNA-Seq profiling experiment of ‘cerebellum’ and ‘frontal cortex’ brain tissue from patients with amyotrophic lateral sclerosis (ALS).

all.hum[2, ]
## DataFrame with 1 row and 4 columns
##      Accession      Species                  Type                  Title
##    <character>  <character>           <character>            <character>
## 1 E-GEOD-75852 Homo sapiens RNA-seq of coding RNA Human iPSC-Derived C..

The getAtlasData function allows to download the chosen RNA-Seq experiment from the Expression Atlas and import it into a RangedSummarizedExperiment object of a user's R session.

rse.hum <- read_cache(cache.pa, 'rse.hum') # Read data from cache.
if (is.null(rse.hum)) { # Save downloaded data to cache if it is not cached.
  rse.hum <- getAtlasData('E-GEOD-67196')[[1]][[1]]
  save_cache(dir=cache.pa, overwrite=TRUE, rse.hum)
}

The design of the downloaded RNA-Seq experiment is described in the colData slot of rse.hum. The following returns only its first four rows and columns.

colData(rse.hum)[1:2, c(2, 4)]
## DataFrame with 2 rows and 2 columns
##                organism  organism_part
##             <character>    <character>
## SRR1927019 Homo sapiens     cerebellum
## SRR1927020 Homo sapiens frontal cortex

3.2.2 aSVG Image

The following example shows how to download from the above described SVG repositories an aSVG image that matches the tissues and species assayed in the gene expression data set downloaded above. The return_feature function queries the repository for feature- and species-related keywords, here c('frontal cortex', 'cerebellum') and c('homo sapiens', 'brain'), respectively. To avoid overwriting of existing SVG files, an empty target directory is recommended, here tmp.dir.shm.

The remote data are downloaded before calling return_feature.

# Remote aSVG repos.
data(aSVG.remote.repo)
tmp.dir <- normalizePath(tempdir(check=TRUE), winslash="/", mustWork=FALSE)
tmp.dir.ebi <- paste0(tmp.dir, '/ebi.zip')
tmp.dir.shm <- paste0(tmp.dir, '/shm.zip')
# Download the remote aSVG repos as zip files.
download.file(aSVG.remote.repo$ebi, tmp.dir.ebi)
download.file(aSVG.remote.repo$shm, tmp.dir.shm)
remote <- list(tmp.dir.ebi, tmp.dir.shm)

The downloaded aSVG repos are queried.

tmp.dir.shm <- file.path(tempdir(check=TRUE), 'shm') # Create empty directory
feature.df.hum <- return_feature(feature=c('frontal cortex', 'cerebellum'), species=c('homo sapiens', 'brain'), dir=tmp.dir.shm, remote=remote) # Query aSVGs
feature.df.hum[1:8, ] # Return first 8 rows for checking
unique(feature.df.hum$SVG) # Return all matching aSVGs

To build this vignettes according to Bioconductor’s package requirements, the following code section uses aSVG file instances included in the spatialHeatmap package rather than the downloaded instances above.

svg.dir <- system.file("extdata/shinyApp/example", package="spatialHeatmap") # Directory of the aSVG collection in spatialHeatmap
feature.df.hum <- return_feature(feature=c('frontal cortex', 'cerebellum'), species=c('homo sapiens', 'brain'), keywords.any=TRUE, return.all=FALSE, dir=svg.dir, remote=NULL)

Note, the target tissues frontal cortex and cerebellum are included in both the experimental design slot of the downloaded expression data as well as the annotations of the aSVG. This way these features can be colored in the downstream SHM plots. If necessary users can also change from within R the feature identifiers in an aSVG (see Supplementary Section).

tail(feature.df.hum[, c('feature', 'stroke', 'SVG')], 3)
## # A tibble: 3 × 3
##   feature         stroke SVG                   
##   <chr>            <dbl> <chr>                 
## 1 cerebellum       0.016 homo_sapiens.brain.svg
## 2 cerebral.cortex  0.05  mus_musculus.brain.svg
## 3 cerebellum       0.05  mus_musculus.brain.svg

Among the returned aSVG files, homo_sapiens.brain.svg is chosen for creating SHMs. Since it is the same as the one in Quick Start, the one stored in svg.hum is used in the downstream steps for simplicity.

3.2.3 Experimental Design

To display ‘pretty’ sample names in columns and legends of downstream tables and plots respectively, the following example imports a ‘targets’ file that can be customized by users in a text program. The targets file content is used to replace the text in the colData slot of the RangedSummarizedExperiment object with a version containing shorter sample names for plotting purposes.

The custom targets file is imported and then loaded into colData slot of rse.hum. A slice of the simplified colData object is shown below.

hum.tar <- system.file('extdata/shinyApp/example/target_human.txt', package='spatialHeatmap')
target.hum <- read.table(hum.tar, header=TRUE, row.names=1, sep='\t') # Importing
colData(rse.hum) <- DataFrame(target.hum) # Loading to "colData"
colData(rse.hum)[c(1:2, 41:42), 4:5]
## DataFrame with 4 rows and 2 columns
##             organism_part     disease
##               <character> <character>
## SRR1927019     cerebellum         ALS
## SRR1927020 frontal cortex         ALS
## SRR1927059     cerebellum      normal
## SRR1927060 frontal cortex      normal

3.2.4 Preprocess Assay Data

The raw count gene expression data is stored in the assay slot of rse.hum. The following shows how to apply basic preprocessing routines on the count data, such as normalizing, aggregating replicates, and removing genes with unreliable expression responses, which are optional for plotting SHMs.

The norm_data function is developed to normalize RNA-seq raw count data. The following example uses the ESF normalization option due to its good time performance, which is estimateSizeFactors from DESeq2 (Love, Huber, and Anders 2014).

se.nor.hum <- norm_data(data=rse.hum, norm.fun='ESF', log2.trans=TRUE)

Replicates are aggregated with the summary statistics chosen under the aggr argument (e.g. aggr='mean'). The columns specifying replicates can be assigned to the sam.factor and con.factor arguments corresponding to samples and conditions, respectively. For tracking, the corresponding sample/condition labels are used as column titles in the aggregated assay instance, where they are concatenated with a double underscore as separator (Table 1).

se.aggr.hum <- aggr_rep(data=se.nor.hum, sam.factor='organism_part', con.factor='disease', aggr='mean')
assay(se.aggr.hum)[c(120, 49939, 49977), ]
Table 1: Slice of aggregated expression matrix.
cerebellum__ALS frontal.cortex__ALS cerebellum__normal frontal.cortex__normal
ENSG00000006047 1.134172 5.2629629 0.5377534 5.3588310
ENSG00000268433 5.324064 0.3419665 3.4780744 0.1340332
ENSG00000268555 5.954572 2.6148548 4.9349736 2.0351776

The filtering example below retains genes with expression values larger than 5 (log2 space) in at least 1% of all samples (pOA=c(0.01, 5)), and a coefficient of variance (CV) between 0.30 and 100 (CV=c(0.30, 100)).

se.fil.hum <- filter_data(data=se.aggr.hum, sam.factor='organism_part', con.factor='disease', pOA=c(0.01, 5), CV=c(0.3, 100))

3.2.5 SHM: Single Gene

The aSVG features in svg.hum are subsetted with the function sub_ft so that only colored sub-images are shown in Figure 3, 4, and 17. Features of interest are retained by assigning their indexes (see below) to the argument show, which corresponding to ‘brain outline’, ‘prefrontal.cortex’, ‘frontal.cortex’, and ‘cerebellum’ respectively.

svg.hum.sub <- sub_ft(svg.hum, show=c(64:132, 162:164, 190:218))
tail(attribute(svg.hum.sub)[[1]], 3)
## # A tibble: 3 × 10
##   feature    id        fill  stroke sub.f…¹ index element parent index…² index…³
##   <chr>      <chr>     <chr>  <dbl> <chr>   <int> <chr>   <chr>    <int>   <int>
## 1 cerebellum UBERON_0… none   0.016 cerebe…   216 g       LAYER…      26      25
## 2 cerebellum UBERON_0… none   0.016 cerebe…   217 g       LAYER…      26      25
## 3 cerebellum UBERON_0… none   0.016 cerebe…   218 g       LAYER…      26      25
## # … with abbreviated variable names ¹​sub.feature, ²​index.all, ³​index.sub

The preprocessed expression values for any gene in the assay slot of se.fil.hum can be plotted as an SHM. The following uses gene ENSG00000268433 as an example. The chosen aSVG is a depiction of the human brain where the assayed featured are colored by the corresponding expression values in se.fil.hum.

spatial_hm(svg=svg.hum.sub, data=se.fil.hum, ID=c('ENSG00000268433'), legend.r=1.5, legend.key.size=0.02, legend.text.size=12, legend.nrow=2, lay='con', ncol=1)
SHM of human brain. Only cerebellum and frontal cortex are colored, because they are present in both the aSVG and the expression data. The legend plot on the right maps the feature labels to the corresponding spatial regions in the image.

Figure 3: SHM of human brain
Only cerebellum and frontal cortex are colored, because they are present in both the aSVG and the expression data. The legend plot on the right maps the feature labels to the corresponding spatial regions in the image.

In the above example, the normalized expression values of gene ENSG00000268434 are colored in the frontal cortex and cerebellum, where the different conditions, here normal and ALS, are given in separate SHMs plotted next to each other. The color and feature mappings are defined by the corresponding color key and legend plot on the left and right, respectively.

3.2.6 SHM: Multiple Genes

SHMs for multiple genes can be plotted by providing the corresponding gene IDs under the ID argument as a character vector. The spatial_hm function will then sequentially arrange the SHMs for each gene in a single composite plot. To facilitate comparisons among expression values across genes and/or conditions, the lay.shm parameter can be assigned 'gene' or 'con', respectively. For instance, in Figure 4 the SHMs of the genes ENSG00000268433 and ENSG00000006047 are organized by condition in a horizontal view. This functionality is particularly useful when comparing gene families.

spatial_hm(svg=svg.hum.sub, data=se.fil.hum, ID=c('ENSG00000268433', 'ENSG00000006047'), lay.shm='con', legend.r=1.5, legend.nrow=2)
SHMs of two genes. The subplots are organized by "condition" with the `lay.shm='con'` setting.

Figure 4: SHMs of two genes
The subplots are organized by “condition” with the lay.shm='con' setting.

The matching between data and aSVG features can be customized, such as matching a spatial feature in the data to a different or multiple counterparts in the aSVG. This advanced function is demonstrated in the Supplementary Section.

3.2.7 SHM: HTML and Video

SHMs can be saved to interactive HTML files as well as video files. Each HTML file contains an interactive SHM with zoom in and out functionality. Hovering over graphics features will display data, gene, condition and other information. The video will play the SHM subplots in the order specified under the lay.shm argument.

The following example saves the interactive HTML and video files under
a directory named tmp.dir.shm.

tmp.dir.shm <- file.path(tempdir(check=TRUE), 'shm')
spatial_hm(svg=svg.hum, data=se.fil.hum, ID=c('ENSG00000268433', 'ENSG00000006047'), lay.shm='con', legend.r=1.5, legend.nrow=2, out.dir=tmp.dir.shm)

3.2.8 SHM: Customization

To provide a high level of flexibility, the spatial_hm contains many arguments. An overview of important arguments and their utility is provided in Table 2.

Table 2: List of important argumnets of ‘spatial_hm’.
argument description
1 svg An SVG object containing aSVG instances
2 data Input data of SummarizedExperiment (SE), data frame, or vector
3 sam.factor Applies to SE. Column name of sample replicates in colData slot. Default is NULL
4 con.factor Applies to SE. Column name of condition replicates in colData slot. Default is NULL
5 ID A character vector of row items for plotting spatial heatmaps
6 col.com A character vector of color components for building colour scale. Default is c(‘yellow’, ‘orange’,‘red’)
7 col.bar ‘selected’ or ‘all’, the former means use values of ID to build the colour scale while the latter use all values in data. Default is ‘selected’.
8 bar.width A numeric of colour bar width. Default is 0.7
9 trans.scale One of ‘log2’, ‘exp2’, ‘row’, ‘column’, or NULL, which means transform the data by ‘log2’ or ‘2-base expoent’, scale by ‘row’ or ‘column’, or no manipuation respectively.
10 ft.trans A vector of aSVG features to be transparent. Default is NULL.
11 legend.r A numeric to adjust the dimension of the legend plot. Default is 1. The larger, the higher ratio of width to height.
12 sub.title.size The title size of each spatial heatmap subplot. Default is 11.
13 lay.shm ‘gen’ or ‘con’, applies to multiple genes or conditions respectively. ‘gen’ means spatial heatmaps are organised by genes while ‘con’ organised by conditions. Default is ‘gen’
14 ncol The total column number of spatial heatmaps, not including legend plot. Default is 2.
15 ft.legend ‘identical’, ‘all’, or a vector of samples/features in aSVG to show in legend plot. ‘identical’ only shows matching features while ‘all’ shows all features.
16 legend.ncol, legend.nrow Two numbers of columns and rows of legend keys respectively. Default is NULL, NULL, since they are automatically set.
17 legend.position the position of legend keys (‘none’, ‘left’, ‘right’,‘bottom’, ‘top’), or two-element numeric vector. Default is ‘bottom’.
18 legend.key.size, legend.text.size The size of legend keys and labels respectively. Default is 0.5 and 8 respectively.
19 line.size, line.color The size and colour of all plogyon outlines respectively. Default is 0.2 and ‘grey70’ respectively.
20 verbose TRUE or FALSE. Default is TRUE and the aSVG features not mapped are printed to R console.
21 out.dir The directory to save HTML and video files of spatial heatmaps. Default is NULL.

3.3 Mouse Organs

This section generates an SHM plot for mouse data from the Expression Atlas. The code components are very similar to the previous Human Brain example. For brevity, the corresponding text explaining the code has been reduced to a minimum.

3.3.1 Gene Expression Data

The chosen mouse RNA-Seq data compares tissue level gene expression across mammalian species (Merkin et al. 2012). The following searches the Expression Atlas for expression data from ‘kidney’ and ‘Mus musculus’.

all.mus <- read_cache(cache.pa, 'all.mus') # Retrieve data from cache.
if (is.null(all.mus)) { # Save downloaded data to cache if it is not cached.
  all.mus <- searchAtlasExperiments(properties="kidney", species="Mus musculus")
  save_cache(dir=cache.pa, overwrite=TRUE, all.mus)
}

Among the many matching entries, accession ‘E-MTAB-2801’ will be downloaded.

all.mus[7, ]
rse.mus <- read_cache(cache.pa, 'rse.mus') # Read data from cache.
if (is.null(rse.mus)) { # Save downloaded data to cache if it is not cached.
  rse.mus <- getAtlasData('E-MTAB-2801')[[1]][[1]]
  save_cache(dir=cache.pa, overwrite=TRUE, rse.mus)
}

The design of the downloaded RNA-Seq experiment is described in the colData slot of rse.mus. The following returns only its first three rows.

colData(rse.mus)[1:3, ]
## DataFrame with 3 rows and 4 columns
##           AtlasAssayGroup     organism organism_part      strain
##               <character>  <character>   <character> <character>
## SRR594393              g7 Mus musculus         brain      DBA/2J
## SRR594394             g21 Mus musculus         colon      DBA/2J
## SRR594395             g13 Mus musculus         heart      DBA/2J

3.3.2 aSVG Image

The following example shows how to retrieve from the remote SVG repositories an aSVG image that matches the tissues and species assayed in the downloaded data above. The sample data from Human Brain are used such as remote.

tmp.dir.shm <- file.path(tempdir(check=TRUE), 'shm')
feature.df.mus <- return_feature(feature=c('heart', 'kidney'), species=c('Mus musculus'), dir=tmp.dir.shm, remote=remote)

To meet the R/Bioconductor package requirements, the following uses aSVG file instances included in the spatialHeatmap package rather than the downloaded instances.

feature.df.mus <- return_feature(feature=c('heart', 'kidney'), species=NULL, dir=svg.dir, remote=NULL) 

Return the names of the matching aSVG files.

unique(feature.df.mus$SVG)
## [1] "gallus_gallus.svg"     "mus_musculus.male.svg"

The mus_musculus.male.svg instance is selected as target aSVG and imported.

svg.mus.pa <- system.file("extdata/shinyApp/example", "mus_musculus.male.svg", package="spatialHeatmap")
svg.mus <- read_svg(svg.mus.pa)

3.3.3 Experimental Design

A sample target file that is included in this package is imported and then loaded to the colData slot of rse.mus. To inspect its content, the first three rows are shown.

mus.tar <- system.file('extdata/shinyApp/example/target_mouse.txt', package='spatialHeatmap')
target.mus <- read.table(mus.tar, header=TRUE, row.names=1, sep='\t') # Importing
colData(rse.mus) <- DataFrame(target.mus) # Loading
target.mus[1:3, ]
##           AtlasAssayGroup     organism organism_part strain
## SRR594393              g7 Mus musculus         brain DBA.2J
## SRR594394             g21 Mus musculus         colon DBA.2J
## SRR594395             g13 Mus musculus         heart DBA.2J

3.3.4 Preprocess Assay Data

The raw RNA-Seq count are preprocessed with the following steps: (1) normalization, (2) aggregation of replicates, and (3) filtering of un-reliable expression data. The details of these steps are explained in the sub-section of the Human Brain example.

se.nor.mus <- norm_data(data=rse.mus, norm.fun='ESF', log2.trans=TRUE) # Normalization
se.aggr.mus <- aggr_rep(data=se.nor.mus, sam.factor='organism_part', con.factor='strain', aggr='mean') # Aggregation of replicates
se.fil.mus <- filter_data(data=se.aggr.mus, sam.factor='organism_part', con.factor='strain', pOA=c(0.01, 5), CV=c(0.6, 100)) # Filtering of genes with low counts and variance 

3.3.5 SHM: Transparency

The pre-processed expression data for gene ‘ENSMUSG00000000037’ is plotted in form of an SHM. In this case the plot includes expression data for 8 tissues across 3 mouse strains.

spatial_hm(svg=svg.mus, data=se.fil.mus, ID=c('ENSMUSG00000000037'), legend.width=0.7, legend.text.size=10, sub.title.size=9, ncol=3, ft.trans=c('skeletal muscle'), legend.nrow=4, line.size=0.2, line.color='grey70')
SHM of mouse organs. This is a multiple-layer image where the shapes of the 'skeletal muscle' is set transparent to expose 'lung' and 'heart'.

Figure 5: SHM of mouse organs
This is a multiple-layer image where the shapes of the ‘skeletal muscle’ is set transparent to expose ‘lung’ and ‘heart’.

The SHM plots in Figures 5 and below demonstrate the usage of the transparency feature via the ft.trans parameter. The corresponding mouse organ aSVG image includes overlapping tissue layers. In this case the skelectal muscle layer partially overlaps with lung and heart tissues. To view lung and heart in Figure 5, the skelectal muscle tissue is set transparent with ft.trans=c('skeletal muscle').

To fine control the visual effects in feature rich aSVGs, the line.size and line.color parameters are useful. This way one can adjust the thickness and color of complex structures.

spatial_hm(svg=svg.mus, data=se.fil.mus, ID=c('ENSMUSG00000000037'), legend.text.size=10, sub.title.size=9, ncol=3, legend.ncol=2, line.size=0.1, line.color='grey70')
SHM of mouse organs. This is a multiple-layer image where the view onto 'lung' and 'heart' is obstructed by displaying the 'skeletal muscle' tissue.

Figure 6: SHM of mouse organs
This is a multiple-layer image where the view onto ‘lung’ and ‘heart’ is obstructed by displaying the ‘skeletal muscle’ tissue.

A third example on real data from Expression Atlas is SHMs of time series across chicken organs. Since the procedures are the same with the examples above, this example is illustrated in the Supplementary Section.

3.4 Arabidopsis Shoot

This section generates an SHM for Arabidopsis thaliana tissues with gene expression data from the Affymetrix microarray technology. The chosen experiment used ribosome-associated mRNAs from several cell populations of shoots and roots that were exposed to hypoxia stress (Mustroph et al. 2009). In this case the expression data will be downloaded from GEO with utilites from the GEOquery package (Davis and Meltzer 2007). The data preprocessing routines are specific to the Affymetrix technology. The remaining code components for generating SHMs are very similar to the previous examples. For brevity, the text in this section explains mainly the steps that are specific to this data set.

3.4.1 Gene Expression Data

The GSE14502 data set is downloaded with the getGEO function from the GEOquery package. Intermediately, the expression data is stored in an ExpressionSet container (Huber et al. 2015), and then converted to a SummarizedExperiment object.

gset <- read_cache(cache.pa, 'gset') # Retrieve data from cache.
if (is.null(gset)) { # Save downloaded data to cache if it is not cached.
  gset <- getGEO("GSE14502", GSEMatrix=TRUE, getGPL=TRUE)[[1]]
  save_cache(dir=cache.pa, overwrite=TRUE, gset)
}
se.sh <- as(gset, "SummarizedExperiment")

The gene symbol identifiers are extracted from the rowData component to be used as row names. Similarly, one can work with AGI identifiers by providing below AGI under Gene.Symbol.

rownames(se.sh) <- make.names(rowData(se.sh)[, 'Gene.Symbol'])

A slice of the experimental design stored in the colData slot is returned. Both the samples and treatments are contained in the title column. The samples are indicated by corresponding promoters (pGL2, pCO2, pSCR, pWOL, p35S) and treatments include control and hypoxia.

colData(se.sh)[60:63, 1:2]
## DataFrame with 4 rows and 2 columns
##                            title geo_accession
##                      <character>   <character>
## GSM362227 shoot_hypoxia_pGL2_r..     GSM362227
## GSM362228 shoot_hypoxia_pGL2_r..     GSM362228
## GSM362229 shoot_control_pRBCS_..     GSM362229
## GSM362230 shoot_control_pRBCS_..     GSM362230

3.4.2 aSVG Image

In this example, the aSVG image has been generated in Inkscape from the corresponding figure in Mustroph et al. (2009). Detailed instructions for generating custom aSVG images are provided in the SVG tutorial. The resulting custom aSVG file ‘arabidopsis.thaliana_shoot_shm.svg’ is included in the spatialHeatmap package and imported as below.

svg.sh.pa <- system.file("extdata/shinyApp/example", "arabidopsis.thaliana_shoot_shm.svg", package="spatialHeatmap")
svg.sh <- read_svg(svg.sh.pa)

3.4.3 Experimental Design

A sample target file that is included in this package is imported and then loaded to the colData slot of se.sh. To inspect its content, four selected rows are returned.

sh.tar <- system.file('extdata/shinyApp/example/target_arab.txt', package='spatialHeatmap')
target.sh <- read.table(sh.tar, header=TRUE, row.names=1, sep='\t') # Importing
colData(se.sh) <- DataFrame(target.sh) # Loading
target.sh[60:63, ]
##                           col.name     samples conditions
## shoot_hypoxia_pGL2_rep1  GSM362227  shoot_pGL2    hypoxia
## shoot_hypoxia_pGL2_rep2  GSM362228  shoot_pGL2    hypoxia
## shoot_control_pRBCS_rep1 GSM362229 shoot_pRBCS    control
## shoot_control_pRBCS_rep2 GSM362230 shoot_pRBCS    control

3.4.4 Preprocess Assay Data

The downloaded GSE14502 data set has already been normalized with the RMA algorithm (Gautier et al. 2004). Thus, the pre-processing steps can be restricted to replicate aggregation and filtering.

se.aggr.sh <- aggr_rep(data=se.sh, sam.factor='samples', con.factor='conditions', aggr='mean') # Replicate agggregation using mean
se.fil.arab <- filter_data(data=se.aggr.sh, sam.factor='samples', con.factor='conditions', pOA=c(0.03, 6), CV=c(0.30, 100)) # Filtering of genes with low intensities and variance

3.4.5 SHM: Microarray

The expression profile for the HRE2 gene is plotted for the control and the hypoxia treatment across six cell types (Figure 7).

spatial_hm(svg=svg.sh, data=se.fil.arab, ID=c("HRE2"), legend.nrow=3, legend.text.size=11)
SHM of Arabidopsis shoots. The expression profile of the HRE2 gene is plotted for control and hypoxia treatment across six cell types.

Figure 7: SHM of Arabidopsis shoots
The expression profile of the HRE2 gene is plotted for control and hypoxia treatment across six cell types.

3.5 Superimposing raster and vector graphics

spatialHeatmap allows to superimpose raster images with vector-based SHMs. This way one can generate SHMs that resemble photographic representations of tissues, organs or entire organisms. For this to work the shapes represented in the vector-graphics need to be an aligned carbon copy of the raster image. Supported file formats for the raster image are JPG/JPEG and PNG, and for the vector image it is SVG. Matching raster and vector graphics are indicated by identical base names in their file names (e.g. imageA.png and imageA.svg). The layout order in SHMs composed of multiple independent images can be controlled by numbering the corresponding file pairs accordingly such as imageA_1.png and imageA_1.svg, imageA_2.png and imageA_2.svg, etc.

In the following example, the required image pairs have been pre-generated from a study on abaxial bundle sheath (ABS) cells in maize leaves (Bezrutczyk et al. 2021). Their file names are labeled 1 and 2 to indicate two developmental stages.

Import paths of first png/svg image pair:

raster.pa1 <- system.file('extdata/shinyApp/example/maize_leaf_shm1.png', package='spatialHeatmap')
svg.pa1 <- system.file('extdata/shinyApp/example/maize_leaf_shm1.svg', package='spatialHeatmap')

Import paths of second png/svg image pair:

raster.pa2 <- system.file('extdata/shinyApp/example/maize_leaf_shm2.png', package='spatialHeatmap')
svg.pa2 <- system.file('extdata/shinyApp/example/maize_leaf_shm2.svg', package='spatialHeatmap')

The two pairs of png/svg images are imported in the SVG container svg.overlay.

svg.overlay <- read_svg(svg.path=c(svg.pa1, svg.pa2), raster.path=c(raster.pa1, raster.pa2))

A slice of attributes in the first aSVG instance is shown.

attribute(svg.overlay)[[1]][1:3, ]
## # A tibble: 3 × 10
##   feature id      fill    stroke sub.feat…¹ index element parent index…² index…³
##   <chr>   <chr>   <chr>    <dbl> <chr>      <int> <chr>   <chr>    <int>   <int>
## 1 rect817 rect817 none       0.1 rect817        1 rect    conta…       1       1
## 2 cell1   cell1   #98f0aa    0   path819        2 g       conta…       2       2
## 3 cell1   cell1   #98f0aa    0   path821        3 g       conta…       2       2
## # … with abbreviated variable names ¹​sub.feature, ²​index.all, ³​index.sub

Import of quantitative assay data:

dat.overlay <- read_fr(system.file('extdata/shinyApp/example/dat_overlay.txt', package='spatialHeatmap'))
dat.overlay[1:2, ]
##       cell1 cell2
## gene1     7    65
## gene2    70    27

To minimize masking of the features in the SHMs by dense regions in the raster images, the alpha.overlay argument allows to adjust the transparency level. In Figure 8, the spatial features of interest are superimposed onto the raster image.

spatial_hm(svg=svg.overlay, data=dat.overlay, charcoal=FALSE, ID=c('gene1'), alpha.overlay=0.5, bar.width=0.09, sub.title.vjust=4, legend.r=0.2)
Superimposing raster images with SHMs (colorful backaground). The expression profiles of gene1 are plotted on ABS cells.

Figure 8: Superimposing raster images with SHMs (colorful backaground)
The expression profiles of gene1 are plotted on ABS cells.

Another option for reducing masking effects is to display the raster image in black and white by setting charcoal=TRUE (Figure 9).

spatial_hm(svg=svg.overlay, data=dat.overlay, charcoal=TRUE, ID=c('gene1'), alpha.overlay=0.5, bar.width=0.09, sub.title.vjust=4, legend.r=0.2)
Superimposing raster images with SHMs (black and white background). The expression profiles of gene1 are plotted on ABS cells.

Figure 9: Superimposing raster images with SHMs (black and white background)
The expression profiles of gene1 are plotted on ABS cells.

3.6 SHMs of Multiple Dimensions

The SHM plots shown so far are restricted to two dimensions, here spatial features (e.g. organs, tissues, cellular compartments) and treatments. In theory, the complexity of experimental designs scales to any number of dimensions in spatialHeatmap. This section extends to experiments with three or more dimensions, such as experiments with spatiotemporal resolution and geographical locations, genotypes, treatments, etc.

To visualize multi-dimensional spatial data, these dimensions are reduced to two by keeping the spatial feature dimension and combining all other dimensions into a composite one. For instance, the following example contains four dimensions including spatial features, time points, drug treatments and injury conditions. The latter three are combined to produce a composite dimension.

3.6.1 Gene Expression Data

The following uses transcriptomic data of mouse brain after traumatic brain injury (TBI), which include RAN-seq data from hippocampus, thalamus and hypothalamus at 3 or 29 days post injury (DPI) treated with either candesartan or vehicle (Attilio et al. 2021). The original data are modified for demonstration purpose and included in spatialHeatmap as a SummarizedExperiment object, which is imported below.

se.mus.vars <- readRDS(system.file('extdata/shinyApp/example/mus_brain_vars_se.rds', package='spatialHeatmap'))

The experiment design is stored in colData slot, where ‘Veh’, ‘Drug’, ‘TBI’, and ‘NoTBI’ refer to ‘vehicle’, ‘candesartan’, ‘traumatic brain injury’, and ‘sham injury’ respectively. The time, treatment and injury dimensions are combined into a composite dimension comVar, which will be used for creating SHMs of multiple dimensions.

colData(se.mus.vars)[1:3, ]
## DataFrame with 3 rows and 5 columns
##                  tissue        time   treatment      injury         comVar
##             <character> <character> <character> <character>    <character>
## hippocampus hippocampus        3DPI         Veh       NoTBI 3DPI.Veh.NoTBI
## hippocampus hippocampus        3DPI         Veh       NoTBI 3DPI.Veh.NoTBI
## hippocampus hippocampus        3DPI         Veh         TBI   3DPI.Veh.TBI
unique(colData(se.mus.vars)$comVar)
## [1] "3DPI.Veh.NoTBI"   "3DPI.Veh.TBI"     "3DPI.Drug.NoTBI"  "3DPI.Drug.TBI"   
## [5] "29DPI.Veh.NoTBI"  "29DPI.Veh.TBI"    "29DPI.Drug.NoTBI" "29DPI.Drug.TBI"

Since this example data are small, the pre-processing only involves normalization without the filtering step. The expression values are aggregated across replicates in tissues (tissue) and replicates in the composite dimension (comVar) with the summary statistic of mean, which is similar with the Human Brain exmple.

se.mus.vars.nor <- norm_data(data=se.mus.vars, norm.fun='ESF', log2.trans=TRUE) # Normalization.
## Normalising: ESF 
##    type 
## "ratio"
se.mus.vars.aggr <- aggr_rep(data=se.mus.vars.nor, sam.factor='tissue', con.factor='comVar', aggr='mean') # Aggregate replicates.
assay(se.mus.vars.aggr)[1:3, 1:3]
##               hippocampus__3DPI.Veh.NoTBI hippocampus__3DPI.Veh.TBI
## 1190002F15Rik                    0.557003                  1.724881
## 1500015O10Rik                    8.722591                  3.109537
## 2810417H13Rik                    2.705410                  3.683482
##               hippocampus__3DPI.Drug.NoTBI
## 1190002F15Rik                    0.7333133
## 1500015O10Rik                    5.8373584
## 2810417H13Rik                    2.9321865

3.6.2 aSVG Image

The aSVG image of mouse brain is downloaded from the repository maintained by EBI Gene Expression Group and included in spatialHeatmap, which is imported as below.

svg.mus.brain.pa <- system.file("extdata/shinyApp/example", "mus_musculus.brain.svg", package="spatialHeatmap")
svg.mus.brain <- read_svg(svg.mus.brain.pa)

The aSVG features and attributes are partially shown.

tail(attribute(svg.mus.brain)[[1]], 3)
## # A tibble: 3 × 10
##   feature        id    fill  stroke sub.f…¹ index element parent index…² index…³
##   <chr>          <chr> <chr>  <dbl> <chr>   <int> <chr>   <chr>    <int>   <int>
## 1 hypothalamus   UBER… none    0.05 hypoth…    22 path    LAYER…      15      11
## 2 nose           UBER… none    0.05 nose       23 path    LAYER…      16      12
## 3 corpora.quadr… UBER… none    0.05 corpor…    24 path    LAYER…      17      13
## # … with abbreviated variable names ¹​sub.feature, ²​index.all, ³​index.sub

3.6.3 SHM: Multiple Dimensions

The expression values of gene Acnat1 in hippocampus and hypothalamus across the composite dimension are mapped to matching aSVG features. The output SHM plots of each composite dimension are organized in a composite plot in Figure 10.

spatial_hm(svg=svg.mus.brain, data=se.mus.vars.aggr, ID=c('Acnat1'), legend.r=1.5, legend.key.size=0.02, legend.text.size=12, legend.nrow=2)
SHM plots of multiple dimension. Gene expression values of Acnat1 in hippocampus and hypothalamus under composite dimensions are mapped to corresponding aSVG features.

Figure 10: SHM plots of multiple dimension
Gene expression values of Acnat1 in hippocampus and hypothalamus under composite dimensions are mapped to corresponding aSVG features.

3.6.4 Multiple aSVGs

In a spatiotemporal application, different development stages may need to be represented in separate aSVG images. In such a case, the spatial_hm function is able to arrange multiple aSVGs in a single SHM plot. To organize the subplots, the names of the separate aSVG files are expected to include the following suffixes: *_shm1.svg, *_shm2.svg, etc.
As a simple testing example, the following stores random numbers as expression values in a data.frame.

df.random <- data.frame(matrix(sample(x=1:100, size=50, replace=TRUE), nrow=10))
colnames(df.random) <- c('shoot_totalA__condition1', 'shoot_totalA__condition2', 'shoot_totalB__condition1', 'shoot_totalB__condition2', 'notMapped') # Assign column names
rownames(df.random) <- paste0('gene', 1:10) # Assign row names 
df.random[1:2, ]
##       shoot_totalA__condition1 shoot_totalA__condition2
## gene1                       94                       73
## gene2                       62                       85
##       shoot_totalB__condition1 shoot_totalB__condition2 notMapped
## gene1                       98                        5        21
## gene2                       25                       13        25

The paths to the aSVG files are obtained, here for younger and older plants using *_shm1 and *_shm1, respectively, which are generated from Mustroph et al. (2009). Subsequently, the two aSVG files are loaded with the read_svg function.

svg.sh1 <- system.file("extdata/shinyApp/example", "arabidopsis.thaliana_organ_shm1.svg", package="spatialHeatmap")
svg.sh2 <- system.file("extdata/shinyApp/example", "arabidopsis.thaliana_organ_shm2.svg", package="spatialHeatmap")
svg.sh.mul <- read_svg(c(svg.sh1, svg.sh2))

The following generates the corresponding SHMs plot for gene1. The orginal image dimensions can be preserved by assigning TRUE to the preserve.scale argument.

spatial_hm(svg=svg.sh.mul, data=df.random, ID=c('gene1'), width=0.7, legend.r=0.2, legend.width=1, preserve.scale=TRUE, bar.width=0.09, line.color='grey50') 
Spatial heatmap of Arabidopsis at two growth stages. The expression profile of gene1 under condition1 and condition2 is plotted for two growth stages (top and bottom row).

Figure 11: Spatial heatmap of Arabidopsis at two growth stages
The expression profile of gene1 under condition1 and condition2 is plotted for two growth stages (top and bottom row).

Note in Figure 11 shoots are drawn with thicker outlines than roots. This is another useful feature of spatial_hm, i.e. preserving the outline thicknesses defined in aSVG files. This feature is particularly useful in cellular SHMs where different cell types may have different cell-wall thicknesses. The outline widths can be updated with update_feature programatically, or within Inkscape manually. The former is illustrated in the Supplementary Section.

4 Extended functionalities

4.1 Spatial Enrichment

The Spatial Enrichment (SE) functionality is an extension of the SHMs to identify groups of transcripts, proteins or other molecules that are particularly abundant or enriched in certain spatial regions, such as tissue-specific transcripts. For this, it compares the abundance profiles of molecules between the target spatial feature and each other reference feature in a pairwise manner. The molecules significantly up- or down-regulated in the target feature across all pairwise comparisons are denoted target feature-specifically expressed. The methods for detecting differentially-expressed genes (DEGs) in SE include edgeR (McCarthy et al. 2012), limma (Ritchie et al. 2015), DESeq2 (Love, Huber, and Anders 2014), distinct (Tiberi and Robinson. 2020).

In addition to spatial feature-specific molecules, the SE is also able to identify molecules specifically expressed in a certain treatment in a similar pairwise comparison, where the treatment of interest is compared with every other reference treatment.

The application of SE is illustrated here using the above mouse brain data as example. In the following, only three features ‘brain’, ‘liver’, and ‘kidney’ are selected so as to reduce runtime. Among the three features, ‘brain’ is chosen as the target under the argument target. The experimental variable of three strains ‘DBA.2J’, ‘C57BL.6’, ‘CD1’ will be treated as replicates for each feature.

data.sub.mus <- tar_ref(data=rse.mus, feature='organism_part', ft.sel=c('brain', 'liver', 'kidney'), variable='strain', var.sel=c('DBA.2J', 'C57BL.6', 'CD1'), com.by='feature', target='brain')
colData(data.sub.mus)[1:3, c('organism_part', 'target', 'strain')]
## DataFrame with 3 rows and 3 columns
##                organism_part      target      strain
##                  <character> <character> <character>
## brain__DBA.2J          brain         yes      DBA.2J
## kidney__DBA.2J        kidney          no      DBA.2J
## liver__DBA.2J          liver          no      DBA.2J

The SE method requires replicates that is stored here under count data. As it has build-in normalizing utilities, the pre-processing only requires a filtering step. The details about this filtering step are given under the human brain section.

data.sub.mus.fil <- filter_data(data=data.sub.mus, sam.factor='organism_part', con.factor='strain', pOA=c(0.2, 15), CV=c(0.8, 100), verbose=FALSE)

The SE is implemented in function spatial_enrich. In the following, the count data are internally normalized by TMM method from edgeR. The default method ‘edgeR’ is used for detecting spatial feature-specific genes. The brain-specific genes are selected here by log2 fold change (log2.fc) and FDR (fdr) thresholds.

If multiple methods are selected, the feature-specific molecules are first detected with each method independently then summarized in a table. The overlaps of feature-specific molecules across methods can be plotted in an UpSet (Gehlenborg 2019) or matrix plot. An example is provided in the Supplementary Section.

deg.lis.mus <- spatial_enrich(data.sub.mus.fil, methods=c('edgeR'), norm='TMM', log2.fc=1, fdr=0.05, aggr='mean', log2.trans.aggr=TRUE)

The up- and down- regulated genes detected by edgeR in brain can be accessed as follows.

deg.lis.mus$lis.up.down$up.lis$edgeR.up[1:2] # Up-regulated.
## [1] "ENSMUSG00000026764" "ENSMUSG00000027350"
deg.lis.mus$lis.up.down$down.lis$edgeR.down[1:2] # Down-regulated. 
## [1] "ENSMUSG00000025479" "ENSMUSG00000017950"

The identified brain-specific genes are stored in a tabular format. The type column indicates up- or down-regulated; the total column shows how many methods identify a gene as up or down as specified in type, here only ‘edgeR’; and 1 under the edgeR column indicates this DEG method identifies an up or down gene as specified in type. The columns to the right are the data used by the spatial_enrich function.

deg.table.mus <- deg.lis.mus$deg.table; deg.table.mus[1:2, 1:9] 
## DataFrame with 2 rows and 9 columns
##                 gene        type     total     edgeR brain__DBA.2J
##          <character> <character> <numeric> <numeric>     <numeric>
## 1 ENSMUSG00000026764          up         1         1      10.03888
## 2 ENSMUSG00000027350          up         1         1       8.69843
##   kidney__DBA.2J liver__DBA.2J brain__C57BL.6 kidney__C57BL.6
##        <numeric>     <numeric>      <numeric>       <numeric>
## 1        1.70799      -2.29798        9.97080         1.77276
## 2       -2.30229      -2.29798        8.92193        -2.32841

The up- or down-regulated genes in brain can be subsetted with the type and total columns.

df.up.mus <- subset(deg.table.mus, type=='up' & total==1)
df.up.mus[1:2, 1:8]
## DataFrame with 2 rows and 8 columns
##                 gene        type     total     edgeR brain__DBA.2J
##          <character> <character> <numeric> <numeric>     <numeric>
## 1 ENSMUSG00000026764          up         1         1      10.03888
## 2 ENSMUSG00000027350          up         1         1       8.69843
##   kidney__DBA.2J liver__DBA.2J brain__C57BL.6
##        <numeric>     <numeric>      <numeric>
## 1        1.70799      -2.29798        9.97080
## 2       -2.30229      -2.29798        8.92193
df.down.mus <- subset(deg.table.mus, type=='down' & total==1)
df.down.mus[1:2, 1:8]
## DataFrame with 2 rows and 8 columns
##                 gene        type     total     edgeR brain__DBA.2J
##          <character> <character> <numeric> <numeric>     <numeric>
## 1 ENSMUSG00000025479        down         1         1      -1.91059
## 2 ENSMUSG00000017950        down         1         1      -3.46383
##   kidney__DBA.2J liver__DBA.2J brain__C57BL.6
##        <numeric>     <numeric>      <numeric>
## 1        12.0379       14.8190       -2.27321
## 2        10.5755       11.1949       -3.46383

Plot SHMs on one up-regulated (ENSMUSG00000026764) and one down-regulated (ENSMUSG00000025479) gene in mouse brain. The resulting SHMs are termed here as SHMs of spatially-enriched molecules (transcripts).

spatial_hm(svg=svg.mus, data=deg.lis.mus$deg.table, ID=c('ENSMUSG00000026764', 'ENSMUSG00000025479'), legend.r=1, legend.nrow=3, sub.title.size=6.1, ncol=3, bar.width=0.09)
Spatially-enriched heatmaps. The two genes in the image are enriched in the mouse brain relative to kidney and liver. Top: down-regulated in brain. Bottom: up-regulated in brain.

Figure 12: Spatially-enriched heatmaps
The two genes in the image are enriched in the mouse brain relative to kidney and liver. Top: down-regulated in brain. Bottom: up-regulated in brain.

The expression profiles of the two spatially enriched genes (Figure 12) are also presented in line graphs.

graph_line(rbind(df.up.mus[1, ], df.down.mus[1, ]))
Line graph of gene expression profiles. The count data is normalized and replicates are aggregated by taking mean.

Figure 13: Line graph of gene expression profiles
The count data is normalized and replicates are aggregated by taking mean.

4.2 Matrix Heatmaps

SHMs are suitable for comparing assay profiles among small number of molecules (e.g. few genes or proteins) across cell types and conditions. To also support analysis routines of larger number of molecules, spatialHeatmap integrates functionalities for identifying groups of molecules with similar and/or dissimilar assay profiles, and subsequently analyzing the results with data mining methods that scale to larger numbers of molecules than SHMs, such as hierarchical clustering and network analysis methods. The following introduces both approaches using as sample data the gene expression data from Arabidopsis shoots from the previous section.

To identify similar molecules, the submatrix function can be used. It identifies molecules sharing similar profiles with one or more query molecules of interest. The given example uses correlation coefficients as similarity metric. The p argument allows to restrict the number of similar molecules to return based on a percentage cutoff relative to the number of molecules in the assay data set (e.g. 1% of the top most similar molecules). If several query molecules are provided then the function returns the similar genes for each query, while assuring uniqueness among molecules in the result.

The following example uses as query the gene ‘HRE2’. The ann argument corresponds to the column name in the rowData slot containing gene annotation information. The latter is only relevant if users want to see these annotations when mousing over a node in the interactive network plot generated in the next section.

sub.mat <- submatrix(data=se.fil.arab, ann='Target.Description', ID=c('HRE2'), p=0.1)

The result from the previous step is the assay matrix subsetted to the genes sharing similar assay profiles with the query genes ‘HRE2’.

sub.mat[c('HRE2'), c(1:3, 37)] # Subsetted assay matrix
##      root_total__control root_total__hypoxia root_p35S__control
## HRE2             5.48692            11.37016           5.578123
##                            Target.Description
## HRE2 putative AP2 domain transcription factor

Subsequently, hierarchical clustering is applied to the subsetted assay matrix containing only the genes that share profile similarities with the query gene ‘HRE2’. The clustering result is displayed as a matrix heatmap where the rows and columns are sorted by the corresponding hierarchical clustering dendrograms (Figure 14). The position of the query gene (‘HRE2’) is indicated in the heatmap by black lines. Setting static=FALSE will launch the interactive mode, where users can zoom into the heatmap by selecting subsections in the image or zoom out by double clicking.

matrix_hm(ID=c('HRE2'), data=sub.mat, angleCol=80, angleRow=35, cexRow=0.8, cexCol=0.8, margin=c(10, 6), static=TRUE, arg.lis1=list(offsetRow=0.01, offsetCol=0.01))
Matrix Heatmap. Rows are genes and columns are samples. The input genes are tagged by black lines.

Figure 14: Matrix Heatmap
Rows are genes and columns are samples. The input genes are tagged by black lines.

4.3 Network Graphs

4.3.1 Module Identification

Network analysis is performed with the WGCNA algorithm (Langfelder and Horvath 2008; Ravasz et al. 2002) using as input the subsetted assay matrix generated in the previous section. The objective is to identify network modules that can be visualized in the following step in form of network graphs. Applied to the gene expression sample data used here, these network modules represent groups of genes sharing highly similar expression profiles. Internally, the network module identification includes five major steps. First, a correlation matrix (Pearson or Spearman) is computed for each pair of molecules. Second, the obtained correlation matrix is transformed into an adjacency matrix that approximates the underlying global network to scale-free topology (Ravasz et al. 2002). Third, the adjacency matrix is used to calculate a topological overlap matrix (TOM) where shared neighborhood information among molecules is used to preserve robust connections, while removing spurious connections. Fourth, the distance transformed TOM is used for hierarchical clustering. To maximize time performance, the hierarchical clustering is performed with the flashClust package (Langfelder and Horvath 2012). Fifth, network modules are identified with the dynamicTreeCut package (Langfelder, Zhang, and Steve Horvath 2016). Its ds (deepSplit) argument can be assigned integer values from 0 to 3, where higher values increase the stringency of the module identification process. To simplify the network module identification process with WGCNA, the individual steps can be executed with a single function called adj_mod. The result is a list containing the adjacency matrix and the final module assignments stored in a data.frame. Since the interactive network feature used in the visualization step below performs best on smaller modules, only modules are returned that were obtained with stringent ds settings (here ds=2 and ds=3).

adj.mod <- adj_mod(data=sub.mat)

A slice of the adjacency matrix is shown below.

adj.mod[['adj']][1:3, 1:3]
##                 ANNAT7     ATEXPA10    AT1G65490
## ANNAT7    1.000000e+00 2.523764e-07 6.242176e-07
## ATEXPA10  2.523764e-07 1.000000e+00 6.284012e-01
## AT1G65490 6.242176e-07 6.284012e-01 1.000000e+00

The module assignments are stored in a data frame. Its columns contain the results for the ds=2 and ds=3 settings. Integer values \(>0\) are the module labels, while \(0\) indicates unassigned molecules. The following returns the first three rows of the module assignment result.

adj.mod[['mod']][1:3, ] 
##           2 3
## ANNAT7    0 0
## ATEXPA10  0 2
## AT1G65490 0 2

4.3.2 Module Visualization

Network modules can be visualized with the network function. To plot a module containing an molecule (gene) of interest, its ID (e.g. ‘HRE2’) needs to be provided under the corresponding argument. Typically, this could be one of the molecules chosen for the above SHM plots. There are two modes to visualize the selected module: static or interactive. Figure 15 was generated with static=TRUE. Setting static=FALSE will generate the interactive version. In the network plot shown below the nodes and edges represent genes and their interactions, respectively. The thickness of the edges denotes the adjacency levels, while the size of the nodes indicates the degree of connectivity of each molecule in the network. The adjacency and connectivity levels are also indicated by colors. For example, in Figure 15 connectivity increases from “turquoise” to “violet”, while the adjacency increases from “yellow” to “blue”. If a gene of interest has been assigned under ID, then its ID is labeled in the plot with the suffix tag: *_target.

network(ID="HRE2", data=sub.mat, adj.mod=adj.mod, adj.min=0.90, vertex.label.cex=1.2, vertex.cex=2, static=TRUE)
Static network. Node size denotes gene connectivity while edge thickness stands for co-expression similarity.

Figure 15: Static network
Node size denotes gene connectivity while edge thickness stands for co-expression similarity.

Setting static=FALSE launches the interactive network. In this mode there is an interactive color bar that denotes the gene connectivity. To modify it, the color labels need to be provided in a comma separated format, e.g.: yellow, orange, red. The latter would indicate that the gene connectivity increases from yellow to red.

If the subsetted expression matrix contains gene/protein annotation information in the last column, then it will be shown when moving the cursor over a network node.

network(ID="HRE2", data=sub.mat, adj.mod=adj.mod, static=FALSE)

5 Shiny App

In additon to running spatialHeatmap from R, the package includes a Shiny App that provides access to the same functionalities from an intuitive-to-use web browser interface. Apart from being very user-friendly, this App conveniently organizes the results of the entire visualization workflow in a single browser window with options to adjust the parameters of the individual components interactively. For instance, genes can be selected and replotted in the SHM simply by clicking the corresponding rows in the expression table included in the same window. This representation is very efficient in guiding the interpretation of the results in a visual and user-friendly manner. For testing purposes, the spatialHeatmap Shiny App also includes ready-to-use sample expression data and aSVG images along with embedded user instructions.

5.1 Local System

The Shiny App of spatialHeatmap can be launched from an R session with the following function call.

shiny_shm()

The dashboard panels of the Shiny App are organized as follows:

  1. Landing Page: gallery of pre-upload examples, menu for uploading custom files, selecting default files, or downloading example files.
  2. Spatial Heatmap: spatially colored images based on aSVG images selected and/or uploaded by user, matrix heatmap, network graph, data table (replicates aggregated).
  3. Spatial Enrichment: data table (with replicates), menu for enrichment, enrichment results.
  4. Co-visualization: co-visualization of bulk tissues in SHMs and single cells in embedding plots, methods options (annotation/manual, automatic), mapping direction options (cell-to-bulk, bulk-to-cell).
  5. Parameter Control Menu: postioned on top of each panel.

A screenshot is shown below depicting an SHM plot generated with the Shiny App of spatialHeatmap (Figure 16).

Screenshot of spatialHeatmap's Shiny App.

Figure 16: Screenshot of spatialHeatmap’s Shiny App

After launching, the Shiny App displays by default one of the included data sets.

The assay data and aSVG images are uploaded to the Shiny App as tabular files (e.g. in CSV or TSV format) and SVG files, respectively. To also allow users to upload gene expression data stored in SummarizedExperiment objects, one can export it from R to a tabular file with the filter_data function, where the user specifies the directory path under the dir argument. This will create in the target directory a tablular file named customData.txt in TSV format. The column names in this file preserve the experimental design information from the colData slot by concatenating the corresponding sample and condition information separated by double underscores. An example of this format is shown in Table 1.

se.fil.arab <- filter_data(data=se.aggr.sh, ann="Target.Description", sam.factor='sample', con.factor='condition', pOA=c(0.03, 6), CV=c(0.30, 100), dir='./')

To interactively access gene-, transcript- or protein-level annotations in the plots and tables of the Shiny App, such as viewing functional descriptions by moving the cursor over network nodes, the corresponding annotation column needs to be present in the rowData slot and its column name assigned to the metadata argument. In the exported tabular file, the extra annotation column is appended to the expression matrix.

Alternatively, once can export the three slots (assay, colData, rowData) of SummarizedExperiment in three tabular files and upload them separately.

5.2 Server Deployment

As most Shiny Apps, spatialHeatmap can be deployed as a centralized web service. A major advantage of a web server deployment is that the functionalities can be accessed remotely by anyone on the internet without the need to use R on the user system. For deployment one can use custom web servers or cloud services, such as AWS, GCP or shinysapps.io. An example web instance for testing spatialHeatmap online is available here.

5.3 Custom Shiny App

The spatialHeatmap package also allows users to create customized Shiny App instances using the custom_shiny function. This function provides options to include custom example data and aSVGs, and define default values within most visualization panels (e.g. color schemes, image dimensions). For details users want to consult the help file of the custom_shiny function. To maximize flexibility, the default parameters are stored in a yaml file on the Shiny App. This makes it easy to refine and optimize default parameters simply by changing this yaml file.

5.4 Database Backend

To maintain scalability, the customized Shiny app is designed to work with HDF5-based database (Fischer, Smith, and Pau 2020; Pagès 2020), which enables users to incorporate data and aSVGs in a batch. The database is constructed with the function write_hdf5. Basically, the accepted data formats are data.frame or SummarizedExperiment. Each data set is saved in an independent HDF5 database. Meanwhile, a pairing table describing matchings between data and aSVGs is required. All individual databases and the pairing table are then compressed in the file “data_shm.tar”. Accordingly, all aSVG files should be compressed in another tar file such as “aSVGs.tar”. Finally, these two tar files are included in the Shiny app by feeding their paths in a list to custom_shiny.

Except for user-provided data, the database is able to store data sets downloaded from GEO and Expression Atlas. The detailed examples of the database utility are documented in the help file of write_hdf5.

6 Supplementary Section

6.1 Numeric Data

The numceric data used to color the features in aSVG images can be provided as three different object types including vector, data.frame and SummerizedExperiment. When working with complex omics-based assay data then the latter provides the most flexibility, and thus should be the preferred container class for managing numeric data in spatialHeatmap. Both data.frame and SummarizedExperiment can hold data from many measured molecules, such as many genes or proteins. In contrast to this, the vector class is only suitable for data from single molecules. Due to its simplicity this less complex container is often useful for testing or when dealing with simple data sets.

In data assayed only at spatial dimension, there are two factors samples and conditions, while data assayed at spatial and temporal dimension contains an additional factor time points or development stages. The spatialHeatmap is able to work with both data types. In this section, the application of SHMs on spatial data is explained first in terms of the three object types, which is more popular. Later, the spatiotemporal usage of SHMs is presented in a specific subsection.

6.1.1 Object Types

This subsection refers to data assayed only at spatial dimension, where two factors samples and conditions are involved.

6.1.1.1 vector

When using numeric vectors as input to spatial_hm, then their name slot needs to be populated with strings matching the feature names in the corresponding aSVG. To also specify conditions, their labels need to be appended to the feature names with double underscores as separator, i.e. ’feature__condition’.

The following example replots the testing example for two spatial features (‘putamen’ and ‘prefrontal.cortex’) and two conditions (‘1’ and ‘2’).

vec <- sample(x=1:100, size=5) # Random numeric values
names(vec) <- c('putamen__condition1', 'putamen__condition2', 'prefrontal.cortex__condition1', 'prefrontal.cortex__condition2', 'notMapped') # Assign unique names to random values
vec
##           putamen__condition1           putamen__condition2 
##                             6                            52 
## prefrontal.cortex__condition1 prefrontal.cortex__condition2 
##                            90                            63 
##                     notMapped 
##                             2

With this configuration the resulting plot contains two spatial heatmap plots for the human brain, one for ‘condition 1’ and another one for ‘contition 2’. To keep the build time and storage size of this package to a minimum, the spatial_hm function call in the code block below is not evaluated. Thus, the corresponding SHM is not shown in this vignette.

spatial_hm(svg=svg.hum, data=vec, ID='testing', ncol=1, legend.r=1.2, sub.title.size=14, ft.trans='g4320')

6.1.1.2 data.frame

Compared to the above vector input, data.frames are structured here like row-wise appended vectors, where the name slot information in the vectors is stored in the column names. Each row also contains a name that corresponds to the corresponding molecule name, such as a gene ID. The naming of spatial features and conditions in the column names follows the same conventions as the naming used for the name slots in the above vector example.

The following illustrates this with an example where a numeric data.frame with random numbers is generated containing 20 rows and 5 columns. To avoid name clashes, the values in the rows and columns should be unique.

df.test <- data.frame(matrix(sample(x=1:1000, size=100), nrow=20)) # Create numeric data.frame
colnames(df.test) <- names(vec) # Assign column names
rownames(df.test) <- paste0('gene', 1:20) # Assign row names
df.test[1:3, ]
##       putamen__condition1 putamen__condition2 prefrontal.cortex__condition1
## gene1                 247                 254                           766
## gene2                 577                 986                           511
## gene3                 510                 508                           530
##       prefrontal.cortex__condition2 notMapped
## gene1                           625       259
## gene2                           189       848
## gene3                           296       488

With the resulting data.frame one can generate the same SHM as in the previous example, that used a named vector as input to the spatial_hm function. Additionally, one can now select each row by its name (here gene ID) under the ID argument.

spatial_hm(svg=svg.hum, data=df.test, ID=c('gene1'), ncol=1, legend.r=1.2, sub.title.size=14)

Additional information can be appended to the data.frame column-wise, such as annotation data including gene descriptions. This information can then be displayed interactively in the network plots of the Shiny App by placing the curser over network nodes.

df.test$ann <- paste0('ann', 1:20)
df.test[1:3, ]
##       putamen__condition1 putamen__condition2 prefrontal.cortex__condition1
## gene1                 247                 254                           766
## gene2                 577                 986                           511
## gene3                 510                 508                           530
##       prefrontal.cortex__condition2 notMapped  ann
## gene1                           625       259 ann1
## gene2                           189       848 ann2
## gene3                           296       488 ann3

6.1.1.3 SummarizedExperiment

The SummarizedExperiment class is a much more extensible and flexible container for providing metadata for both rows and columns of numeric data stored in tabular format.

To import experimental design information from tabular files, users can provide a target file that will be stored in the colData slot of the SummarizedExperiment (SE) object. In other words, the target file provides the metadata for the columns of the numeric assay data. Usually, the target file contains at least two columns: one for the features/samples and one for the conditions. Replicates are indicated by identical entries in these columns. The actual numeric matrix representing the assay data is stored in the assay slot, where the rows correspond to molecules, such as gene IDs. Optionally, additional annotation information for the rows (e.g. gene descriptions) can be stored in the rowData slot.

For constructing a valid SummarizedExperiment object, that can be used by the spatial_hm function, the target file should meet the following requirements.

  1. It should be imported with read.table or read.delim into a data.frame or the data.frame can be constructed in R on the fly (as shown below).

  2. It should contain at least two columns. One column represents the features/samples and the other one the conditions. The rows in the target file correspond to the columns of the numeric data stored in the assay slot. If the condition column is empty, then the same condition is assumed under the corresponding features/samples entry.

  3. The feature/sample names must have matching entries in corresponding aSVG to be considered in the final SHM. Note, the double underscore is a special string that is reserved for specific purposes in spatialHeatmap, and thus should be avoided for naming feature/samples and conditions.

The following example illustrates the design of a valid SummarizedExperiment object for generating SHMs. In this example, the ‘putamen’ tissue has 2 conditions and each condition has 2 replicates. Thus, there are 4 assays for putamen, and the same design applies to the prefrontal.cortex tissue.

sample <- c(rep('putamen', 4), rep('prefrontal.cortex', 4))
condition <- rep(c('condition1', 'condition1', 'condition2', 'condition2'), 2)
target.test <- data.frame(sample=sample, condition=condition, row.names=paste0('assay', 1:8))
target.test
##                   sample  condition
## assay1           putamen condition1
## assay2           putamen condition1
## assay3           putamen condition2
## assay4           putamen condition2
## assay5 prefrontal.cortex condition1
## assay6 prefrontal.cortex condition1
## assay7 prefrontal.cortex condition2
## assay8 prefrontal.cortex condition2

The assay slot is populated with a 8 x 20 data.frame containing random numbers. Each column corresponds to an assay in the target file (here imported into colData), while each row corresponds to a gene.

df.se <- data.frame(matrix(sample(x=1:1000, size=160), nrow=20))
rownames(df.se) <- paste0('gene', 1:20)
colnames(df.se) <- row.names(target.test)
df.se[1:3, ]
##       assay1 assay2 assay3 assay4 assay5 assay6 assay7 assay8
## gene1    738    185    895    987    632    929    302    379
## gene2    528    546    966    732    857    479    139    140
## gene3    298     14    718    152     17      6    760     54

Next, the final SummarizedExperiment object is constructed by providing the numeric and target data under the assays and colData arguments, respectively.

se <- SummarizedExperiment(assays=df.se, colData=target.test)
se
## class: SummarizedExperiment 
## dim: 20 8 
## metadata(0):
## assays(1): ''
## rownames(20): gene1 gene2 ... gene19 gene20
## rowData names(0):
## colnames(8): assay1 assay2 ... assay7 assay8
## colData names(2): sample condition

If needed row-wise annotation information (e.g. for genes) can be included in the SummarizedExperiment object as well. This can be done during the construction of the initial object, or as below by injecting the information into an existing SummarizedExperiment object.

rowData(se) <- df.test['ann']

In this simple example, possible normalization and filtering steps are skipped. Yet, the aggregation of replicates is performed as shown below.

se.aggr <- aggr_rep(data=se, sam.factor='sample', con.factor='condition', aggr='mean')
assay(se.aggr)[1:3, ]
##       putamen__condition1 putamen__condition2 prefrontal.cortex__condition1
## gene1               461.5                 941                         780.5
## gene2               537.0                 849                         668.0
## gene3               156.0                 435                          11.5
##       prefrontal.cortex__condition2
## gene1                         340.5
## gene2                         139.5
## gene3                         407.0

With the fully configured SummarizedExperiment object, a similar SHM is plotted as in the previous examples.

spatial_hm(svg=svg.hum, data=se.aggr, ID=c('gene1'), ncol=1, legend.r=1.2, sub.title.size=14, ft.trans=c('g4320'))

6.2 aSVG Files

6.2.1 aSVG repository

A public aSVG repository, that can be used by spatialHeatmap directly, is the one maintained by the EBI Gene Expression Group. It contains annatomical aSVG images from different species. The same aSVG images are also used by the web service of the Expression Atlas project. In addition, the spatialHeatmap has its own repository called spatialHeatmap aSVG Repository, that stores custom aSVG files developed for this project (e.g. Figure 7).

If users cannot find a suitable aSVG image in these two repositories, we have developed a step-by-step SVG tutorial for creating custom aSVG images. For example, the BAR eFP browser at University of Toronto contains many anatomical images. These images can be used as templates in the SVG tutorial to construct custom aSVGs.

Moreover, in the future we will add more aSVGs to our repository, and users are welcome to deposit their own aSVGs to this repository to share them with the community.

6.2.2 Update aSVG features

To create and edit existing feature identifiers in aSVGs, the update_feature function can be used. The demonstration below, creates an empty folder tmp.dir1 and copies into it the homo_sapiens.brain.svg image provided by the spatialHeatmap package. Subsequently, selected feature annotations are updated in this file.

tmp.dir1 <- file.path(tempdir(check=TRUE), 'shm1') 
if (!dir.exists(tmp.dir1)) dir.create(tmp.dir1)
svg.hum.pa <- system.file("extdata/shinyApp/example", 'homo_sapiens.brain.svg', package="spatialHeatmap") 
file.copy(from=svg.hum.pa, to=tmp.dir1, overwrite=TRUE) # Copy "homo_sapiens.brain.svg" file into 'tmp.dir1'

Query the above aSVG with feature and species keywords, and return the resulting matches in a data.frame.

feature.df <- return_feature(feature=c('frontal cortex', 'prefrontal cortex'), species=c('homo sapiens', 'brain'), dir=tmp.dir1, remote=NULL, keywords.any=FALSE)
feature.df

Subsequently, create a character vector of new feature identifiers corresponding to each of the returned features.

Sample code that creates new feature names and stores them in a character vector.

f.new <- c('prefrontalCortex', 'frontalCortex')

Similarly, if the stroke (outline thickness) or color need to update, make vectors respectively and make sure each entry corresponds to each returned feature.

s.new <- c('0.05', '0.1') # New strokes.
c.new <- c('red', 'green') # New colors.

Next, new features, strokes, and colors are added to the feature data.frame as three columns with the names featureNew, strokeNew, and colorNew respectively. The three column names are used by the update_feature function to look up new entries.

feature.df.new <- cbind(featureNew=f.new, strokeNew=s.new, colorNew=c.new, feature.df)
feature.df.new

Finally, the features, strokes, and colors are updated in the aSVG file(s) located in the tmp.dir1 directory.

update_feature(df.new=feature.df.new, dir=tmp.dir1)

6.3 Advanced Functionalities

6.3.1 SHM: Re-matching

SHMs supports re-matching between data and aSVG features, i.e. re-match one spatial feature in the data to a different or multiple counterparts in the aSVG. The re-matching is defined in a named list, where a name slot refers to a spatial feature in data and the corresponding element in the list are one or multiple aSVG features to re-match.

The example below takes data from the Human Brain section, such as svg.hum.sub and se.fil.hum. The re-matching is demonstrated on Figure 3. The data feature frontal.cortex is re-matched to aSVG features frontal.cortex and prefrontal.cortex in a list.

remat.hum <- list(frontal.cortex=c('frontal.cortex', 'prefrontal.cortex'))

SHM plots of re-matching. Examples of re-matching application include mapping gene expression profiles of one species to a closely related species in the same anatomical region, or mapping gene assay profiles of a sub-tissue to the whole tissue, where the gene abundance profiles of the whole tissue are hard to assay and only a sub-tissue is assayed.

spatial_hm(svg=svg.hum.sub, data=se.fil.hum, ID=c('ENSG00000268433'), lay.shm='con', ncol=1, legend.r=0.8, legend.nrow=2, lis.rematch=remat.hum)
SHMs of re-matching. The data feature `frontal.cortex` is re-matched to aSVG features `frontal.cortex` and `prefrontal.cortex`.

Figure 17: SHMs of re-matching
The data feature frontal.cortex is re-matched to aSVG features frontal.cortex and prefrontal.cortex.

6.3.2 Spatial Enrichment: Multiple Methods

This section showcases using multiple DEG methods for SE and the data are taken from Spatial Enrichment such as data.sub.mus.fil. In the following, the selected DEG methods are ‘edgeR’ and ‘limma’. In each of the two methods, the target feature brain is compared with liver and kidney in a pairwise manner separately. The brain-specific genes are selected here by log2 fold change (log2.fc) and FDR (fdr) thresholds.

deg.lis.mus.mul <- spatial_enrich(data.sub.mus.fil, methods=c('edgeR', 'limma'), norm='TMM', log2.fc=1, fdr=0.05, aggr='mean', log2.trans.aggr=TRUE)

The up- and down- regulated genes detected by edgeR in brain can be accessed as follows.

deg.lis.mus.mul$lis.up.down$up.lis$edgeR.up[1:2] # Up-regulated.
## [1] "ENSMUSG00000026764" "ENSMUSG00000027350"
deg.lis.mus.mul$lis.up.down$down.lis$edgeR.down[1:2] # Down-regulated. 
## [1] "ENSMUSG00000025479" "ENSMUSG00000017950"

The overlap of up-regulated genes in brain tissue for two DEG methods is displayed in form of an UpSet plot.

deg_ovl(deg.lis.mus.mul$lis.up.down, type='up', plot='upset')
UpSet plot of up-regulated genes in mouse brain. The overlap of up-regulated genes detected by edgeR and limma is indicated by bars.

Figure 18: UpSet plot of up-regulated genes in mouse brain
The overlap of up-regulated genes detected by edgeR and limma is indicated by bars.

Alternatively, the overlap can be displayed in form of a matrix plot.

deg_ovl(deg.lis.mus.mul$lis.up.down, type='up', plot='matrix')
Matrix plot of up-regulated genes in mouse brain. The matrix plot displays any overlap between up-regulated genes detected by edgeR and limma.

Figure 19: Matrix plot of up-regulated genes in mouse brain
The matrix plot displays any overlap between up-regulated genes detected by edgeR and limma.

The identified brain-specific genes are stored in a tabular format. The type column indicates up- or down-regulated; the total column shows how many methods identify a gene as up or down; and 1 under the edgeR and limma columns indicates the corresponding DEG method identifies a gene as up- or down-regulated as specfied in type. The columns to the right are the input data used by the spatial_enrich function.

deg.table.mus.mul <- deg.lis.mus.mul$deg.table; deg.table.mus.mul[1:2, 1:8] 
## DataFrame with 2 rows and 8 columns
##                 gene        type     total     edgeR     limma brain__DBA.2J
##          <character> <character> <numeric> <numeric> <numeric>     <numeric>
## 1 ENSMUSG00000026764          up         2         1         1      10.03888
## 2 ENSMUSG00000027350          up         2         1         1       8.69843
##   kidney__DBA.2J liver__DBA.2J
##        <numeric>     <numeric>
## 1        1.70799      -2.29798
## 2       -2.30229      -2.29798

The numbers in the total column are stringency measures of gene specificity, where the larger the more strigent. For example, the up- or down-regulated genes in brain can be subsetted with the highest stringency, here total==2.

df.up.mus <- subset(deg.table.mus.mul, type=='up' & total==2)
df.up.mus[1:2, 1:8]
## DataFrame with 2 rows and 8 columns
##                 gene        type     total     edgeR     limma brain__DBA.2J
##          <character> <character> <numeric> <numeric> <numeric>     <numeric>
## 1 ENSMUSG00000026764          up         2         1         1      10.03888
## 2 ENSMUSG00000027350          up         2         1         1       8.69843
##   kidney__DBA.2J liver__DBA.2J
##        <numeric>     <numeric>
## 1        1.70799      -2.29798
## 2       -2.30229      -2.29798
df.down.mus <- subset(deg.table.mus.mul, type=='down' & total==2)
df.down.mus[1:2, 1:9]
## DataFrame with 2 rows and 9 columns
##                 gene        type     total     edgeR     limma brain__DBA.2J
##          <character> <character> <numeric> <numeric> <numeric>     <numeric>
## 1 ENSMUSG00000025479        down         2         1         1      -1.91059
## 2 ENSMUSG00000017950        down         2         1         1      -3.46383
##   kidney__DBA.2J liver__DBA.2J brain__C57BL.6
##        <numeric>     <numeric>      <numeric>
## 1        12.0379       14.8190       -2.27321
## 2        10.5755       11.1949       -3.46383

To plot SHMs using spatially-enriched molecules refer to the Spatial Enrichment section.

6.4 SHMs of Time Series

This section generates a SHM plot for chicken data from the Expression Atlas. The code components are very similar to the Human Brain example. For brevity, the corresponding text explaining the code has been reduced to a minimum.

6.4.1 Gene Expression Data

The following searches the Expression Atlas for expression data from ‘heart’ and ‘gallus’.

all.chk <- read_cache(cache.pa, 'all.chk') # Retrieve data from cache.
if (is.null(all.chk)) { # Save downloaded data to cache if it is not cached.
  all.chk <- searchAtlasExperiments(properties="heart", species="gallus")
  save_cache(dir=cache.pa, overwrite=TRUE, all.chk)
}

Among the matching entries, accession ‘E-MTAB-6769’ will be downloaded, which is an RNA-Seq experiment comparing the developmental changes across nine time points of seven organs (Cardoso-Moreira et al. 2019).

all.chk[3, ]
rse.chk <- read_cache(cache.pa, 'rse.chk') # Read data from cache.
if (is.null(rse.chk)) { # Save downloaded data to cache if it is not cached.
  rse.chk <- getAtlasData('E-MTAB-6769')[[1]][[1]]
  save_cache(dir=cache.pa, overwrite=TRUE, rse.chk)
}

The first three rows of the design in the downloaded experiment are shown.

colData(rse.chk)[1:3, ]
## DataFrame with 3 rows and 8 columns
##            AtlasAssayGroup      organism         strain           genotype
##                <character>   <character>    <character>        <character>
## ERR2576379              g1 Gallus gallus Red Junglefowl wild type genotype
## ERR2576380              g1 Gallus gallus Red Junglefowl wild type genotype
## ERR2576381              g2 Gallus gallus Red Junglefowl wild type genotype
##            developmental_stage         age         sex organism_part
##                    <character> <character> <character>   <character>
## ERR2576379              embryo      10 day      female         brain
## ERR2576380              embryo      10 day      female         brain
## ERR2576381              embryo      10 day      female    cerebellum

6.4.2 aSVG Image

The following example shows how to download from the above SVG repositories an aSVG image that matches the tissues and species assayed in the downloaded data. As before downloaded images are saved to the directory tmp.dir.shm.

tmp.dir.shm <- file.path(tempdir(check=TRUE), 'shm')
# Query aSVGs.
feature.df <- return_feature(feature=c('heart', 'kidney'), species=c('gallus'), dir=tmp.dir.shm, remote=remote)

To meet the R/Bioconductor package requirements, the following uses aSVG file instances included in the spatialHeatmap package rather than the downloaded instances.

feature.df <- return_feature(feature=c('heart', 'kidney'), species=c('gallus'), dir=svg.dir, remote=NULL)
feature.df[1:2, c('feature', 'stroke', 'SVG')] # A slice of the search result.

The target aSVG instance gallus_gallus.svg is imported.

svg.chk.pa <- system.file("extdata/shinyApp/example", "gallus_gallus.svg", package="spatialHeatmap")
svg.chk <- read_svg(svg.chk.pa)

6.4.3 Experimental Design

A sample target file that is included in this package is imported and then loaded to the colData slot of rse.chk, the first three rows of which are displayed.

chk.tar <- system.file('extdata/shinyApp/example/target_chicken.txt', package='spatialHeatmap')
target.chk <- read.table(chk.tar, header=TRUE, row.names=1, sep='\t') # Importing
colData(rse.chk) <- DataFrame(target.chk) # Loading
target.chk[1:3, ]
##            AtlasAssayGroup      organism         strain           genotype
## ERR2576379              g1 Gallus gallus Red Junglefowl wild type genotype
## ERR2576380              g1 Gallus gallus Red Junglefowl wild type genotype
## ERR2576381              g2 Gallus gallus Red Junglefowl wild type genotype
##            developmental_stage   age    sex organism_part
## ERR2576379              embryo day10 female         brain
## ERR2576380              embryo day10 female         brain
## ERR2576381              embryo day10 female    cerebellum

6.4.4 Preprocess Assay Data

The raw RNA-Seq count are preprocessed with the following steps: (1) normalization, (2) aggregation of replicates, and (3) filtering of reliable expression data, details of which are seen in the Human Brain example.

se.nor.chk <- norm_data(data=rse.chk, norm.fun='ESF', log2.trans=TRUE) # Normalization
se.aggr.chk <- aggr_rep(data=se.nor.chk, sam.factor='organism_part', con.factor='age', aggr='mean') # Replicate agggregation using mean 
se.fil.chk <- filter_data(data=se.aggr.chk, sam.factor='organism_part', con.factor='age', pOA=c(0.01, 5), CV=c(0.6, 100)) # Filtering of genes with low counts and varince

6.4.5 SHM: Time Series

The expression profile for gene ENSGALG00000006346 is plotted across nine time points in four organs in form of a composite SHM with 9 panels. Their layout in three columns is controlled with the argument setting ncol=3.

spatial_hm(svg=svg.chk, data=se.fil.chk, ID='ENSGALG00000006346', width=0.9, legend.width=0.9, legend.r=1.5, sub.title.size=9, ncol=3, legend.nrow=2, label=TRUE)
Time course of chicken organs. The SHM shows the expression profile of a single gene across nine time points and four organs.

Figure 20: Time course of chicken organs
The SHM shows the expression profile of a single gene across nine time points and four organs.


7 Version Informaion

sessionInfo()
## R version 4.2.1 (2022-06-23)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.5 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.16-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.16-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] GEOquery_2.66.0             ExpressionAtlas_1.26.0     
##  [3] jsonlite_1.8.3              RCurl_1.98-1.9             
##  [5] xml2_1.3.3                  limma_3.54.0               
##  [7] kableExtra_1.3.4            BiocParallel_1.32.0        
##  [9] igraph_1.3.5                scater_1.26.0              
## [11] ggplot2_3.3.6               scran_1.26.0               
## [13] scuttle_1.8.0               SingleCellExperiment_1.20.0
## [15] SummarizedExperiment_1.28.0 Biobase_2.58.0             
## [17] GenomicRanges_1.50.0        GenomeInfoDb_1.34.0        
## [19] IRanges_2.32.0              S4Vectors_0.36.0           
## [21] BiocGenerics_0.44.0         MatrixGenerics_1.10.0      
## [23] matrixStats_0.62.0          spatialHeatmap_2.4.0       
## [25] knitr_1.40                  BiocStyle_2.26.0           
## 
## loaded via a namespace (and not attached):
##   [1] utf8_1.2.2                shinydashboard_0.7.2     
##   [3] tidyselect_1.2.0          RSQLite_2.2.18           
##   [5] AnnotationDbi_1.60.0      htmlwidgets_1.5.4        
##   [7] grid_4.2.1                Rtsne_0.16               
##   [9] munsell_0.5.0             ScaledMatrix_1.6.0       
##  [11] codetools_0.2-18          preprocessCore_1.60.0    
##  [13] interp_1.1-3              statmod_1.4.37           
##  [15] withr_2.5.0               colorspace_2.0-3         
##  [17] filelock_1.0.2            highr_0.9                
##  [19] rstudioapi_0.14           labeling_0.4.2           
##  [21] GenomeInfoDbData_1.2.9    farver_2.1.1             
##  [23] bit64_4.0.5               vctrs_0.5.0              
##  [25] generics_0.1.3            xfun_0.34                
##  [27] BiocFileCache_2.6.0       fastcluster_1.2.3        
##  [29] R6_2.5.1                  doParallel_1.0.17        
##  [31] ggbeeswarm_0.6.0          rsvd_1.0.5               
##  [33] locfit_1.5-9.6            rsvg_2.3.2               
##  [35] bitops_1.0-7              cachem_1.0.6             
##  [37] gridGraphics_0.5-1        DelayedArray_0.24.0      
##  [39] assertthat_0.2.1          promises_1.2.0.1         
##  [41] scales_1.2.1              nnet_7.3-18              
##  [43] beeswarm_0.4.0            gtable_0.3.1             
##  [45] beachmat_2.14.0           WGCNA_1.71               
##  [47] rlang_1.0.6               genefilter_1.80.0        
##  [49] systemfonts_1.0.4         splines_4.2.1            
##  [51] lazyeval_0.2.2            impute_1.72.0            
##  [53] checkmate_2.1.0           BiocManager_1.30.19      
##  [55] yaml_2.3.6                reshape2_1.4.4           
##  [57] backports_1.4.1           httpuv_1.6.6             
##  [59] Hmisc_4.7-1               tools_4.2.1              
##  [61] bookdown_0.29             ggplotify_0.1.0          
##  [63] ellipsis_0.3.2            gplots_3.1.3             
##  [65] jquerylib_0.1.4           RColorBrewer_1.1-3       
##  [67] ggdendro_0.1.23           dynamicTreeCut_1.63-1    
##  [69] Rcpp_1.0.9                plyr_1.8.7               
##  [71] base64enc_0.1-3           sparseMatrixStats_1.10.0 
##  [73] visNetwork_2.1.2          zlibbioc_1.44.0          
##  [75] purrr_0.3.5               rpart_4.1.19             
##  [77] deldir_1.0-6              viridis_0.6.2            
##  [79] ggrepel_0.9.1             cluster_2.1.4            
##  [81] magrittr_2.0.3            magick_2.7.3             
##  [83] data.table_1.14.4         grImport_0.9-5           
##  [85] hms_1.1.2                 mime_0.12                
##  [87] evaluate_0.17             xtable_1.8-4             
##  [89] XML_3.99-0.12             jpeg_0.1-9               
##  [91] gridExtra_2.3             compiler_4.2.1           
##  [93] tibble_3.1.8              KernSmooth_2.23-20       
##  [95] crayon_1.5.2              htmltools_0.5.3          
##  [97] tzdb_0.3.0                later_1.3.0              
##  [99] Formula_1.2-4             geneplotter_1.76.0       
## [101] tidyr_1.2.1               DBI_1.1.3                
## [103] dbplyr_2.2.1              MASS_7.3-58.1            
## [105] rappdirs_0.3.3            readr_2.1.3              
## [107] Matrix_1.5-1              cli_3.4.1                
## [109] parallel_4.2.1            metapod_1.6.0            
## [111] pkgconfig_2.0.3           flashClust_1.01-2        
## [113] foreign_0.8-83            plotly_4.10.0            
## [115] foreach_1.5.2             svglite_2.1.0            
## [117] annotate_1.76.0           vipor_0.4.5              
## [119] bslib_0.4.0               dqrng_0.3.0              
## [121] webshot_0.5.4             XVector_0.38.0           
## [123] rvest_1.0.3               yulab.utils_0.0.5        
## [125] stringr_1.4.1             digest_0.6.30            
## [127] RcppAnnoy_0.0.20          Biostrings_2.66.0        
## [129] rmarkdown_2.17            htmlTable_2.4.1          
## [131] uwot_0.1.14               edgeR_3.40.0             
## [133] DelayedMatrixStats_1.20.0 curl_4.3.3               
## [135] shiny_1.7.3               gtools_3.9.3             
## [137] lifecycle_1.0.3           BiocNeighbors_1.16.0     
## [139] viridisLite_0.4.1         fansi_1.0.3              
## [141] pillar_1.8.1              lattice_0.20-45          
## [143] KEGGREST_1.38.0           fastmap_1.1.0            
## [145] httr_1.4.4                survival_3.4-0           
## [147] GO.db_3.16.0              glue_1.6.2               
## [149] FNN_1.1.3.1               UpSetR_1.4.0             
## [151] png_0.1-7                 iterators_1.0.14         
## [153] bluster_1.8.0             bit_4.0.4                
## [155] stringi_1.7.8             sass_0.4.2               
## [157] blob_1.2.3                DESeq2_1.38.0            
## [159] BiocSingular_1.14.0       latticeExtra_0.6-30      
## [161] caTools_1.18.2            memoise_2.0.1            
## [163] dplyr_1.0.10              irlba_2.3.5.1

8 Funding

This project has been funded by NSF awards: PGRP-1546879, PGRP-1810468, PGRP-1936492.

9 References

Attilio, Peter J, Dustin M Snapper, Milan Rusnak, Akira Isaac, Anthony R Soltis, Matthew D Wilkerson, Clifton L Dalgard, and Aviva J Symes. 2021. “Transcriptomic Analysis of Mouse Brain After Traumatic Brain Injury Reveals That the Angiotensin Receptor Blocker Candesartan Acts Through Novel Pathways.” Front. Neurosci. 15 (March): 636259.

Bezrutczyk, Margaret, Nora R Zöllner, Colin P S Kruse, Thomas Hartwig, Tobias Lautwein, Karl Köhrer, Wolf B Frommer, and Ji-Yun Kim. 2021. “Evidence for Phloem Loading via the Abaxial Bundle Sheath Cells in Maize Leaves.” Plant Cell 33 (3): 531–47.

Cardoso-Moreira, Margarida, Jean Halbert, Delphine Valloton, Britta Velten, Chunyan Chen, Yi Shao, Angélica Liechti, et al. 2019. “Gene Expression Across Mammalian Organ Development.” Nature 571 (7766): 505–9.

Chang, Winston, and Barbara Borges Ribeiro. 2018. Shinydashboard: Create Dashboards with ’Shiny’. https://CRAN.R-project.org/package=shinydashboard.

Chang, Winston, Joe Cheng, JJ Allaire, Cars on Sievert, Barret Schloerke, Yihui Xie, Jeff Allen, Jonathan McPherson, Alan Dipert, and Barbara Borges. 2021. Shiny: Web Application Framework for R. https://CRAN.R-project.org/package=shiny.

Davis, Sean, and Paul Meltzer. 2007. “GEOquery: A Bridge Between the Gene Expression Omnibus (Geo) and Bioconductor.” Bioinformatics 14: 1846–7.

Fischer, Bernd, Mike Smith, and Gregoire Pau. 2020. Rhdf5: R Interface to Hdf5. https://github.com/grimbough/rhdf5.

Gautier, Laurent, Leslie Cope, Benjamin M. Bolstad, and Rafael A. Irizarry. 2004. “Affy—Analysis of Affymetrix Genechip Data at the Probe Level.” Bioinformatics 20 (3): 307–15. https://doi.org/10.1093/bioinformatics/btg405.

Gehlenborg, Nils. 2019. UpSetR: A More Scalable Alternative to Venn and Euler Diagrams for Visualizing Intersecting Sets. https://CRAN.R-project.org/package=UpSetR.

Huber, W., V. J. Carey, R. Gentleman, S. An ders, M. Carlson, B. S. Carvalho, H. C. Bravo, et al. 2015. “Orchestrating High-Throughput Genomic Analysis Wit H Bioconductor.” Nature Methods 12 (2): 115–21. http://www.nature.com/nmeth/journal/v12/n2/full/nmeth.3252.html.

Keays, Maria. 2019. ExpressionAtlas: Download Datasets from Embl-Ebi Expression Atlas.

Langfelder, Peter, and Steve Horvath. 2008. “WGCNA: An R Package for Weighted Correlation Network Analysis.” BMC Bioinformatics 9 (December): 559.

Langfelder, Peter, Bin Zhang, and with contributions from Steve Horvath. 2016. DynamicTreeCut: Methods for Detection of Clusters in Hierarchical Clustering Dendrograms. https://CRAN.R-project.org/package=dynamicTreeCut.

Langfelder, P., and S. Horvath. 2012. “Fast R Functions for Robust Correlations and Hierarchical Clustering.” J. Stat. Softw. 46 (11). https://www.ncbi.nlm.nih.gov/pubmed/23050260.

Lekschas, Fritz, Harald Stachelscheid, Stefanie Seltmann, and Andreas Kurtz. 2015. “Semantic Body Browser: Graphical Exploration of an Organism and Spatially Resolved Expression Data Visualization.” Bioinformatics 31 (5): 794–96.

Love, Michael I., Wolfgang Huber, and Simon Anders. 2014. “Moderated Estimation of Fold Change and Dispersion for Rna-Seq Data with Deseq2.” Genome Biology 15 (12): 550. https://doi.org/10.1186/s13059-014-0550-8.

Maag, Jesper L V. 2018. “Gganatogram: An R Package for Modular Visualisation of Anatograms and Tissues Based on Ggplot2.” F1000Res. 7 (September): 1576.

McCarthy, Davis J., Chen, Yunshun, Smyth, and Gordon K. 2012. “Differential Expression Analysis of Multifactor Rna-Seq Experiments with Respect to Biological Variation.” Nucleic Acids Research 40 (10): 4288–97.

Merkin, Jason, Caitlin Russell, Ping Chen, and Christopher B Burge. 2012. “Evolutionary Dynamics of Gene and Isoform Regulation in Mammalian Tissues.” Science 338 (6114): 1593–9.

Morgan, Martin, Valerie Obenchain, Jim Hester, and Hervé Pagès. 2018. SummarizedExperiment: SummarizedExperiment Container.

Muschelli, John, Elizabeth Sweeney, and Ciprian Crainiceanu. 2014. “BrainR: Interactive 3 and 4D Images of High Resolution Neuroimage Data.” R J. 6 (1): 41–48.

Mustroph, Angelika, M Eugenia Zanetti, Charles J H Jang, Hans E Holtan, Peter P Repetti, David W Galbraith, Thomas Girke, and Julia Bailey-Serres. 2009. “Profiling Translatomes of Discrete Cell Populations Resolves Altered Cellular Priorities During Hypoxia in Arabidopsis.” Proc Natl Acad Sci U S A 106 (44): 18843–8.

Pagès, Hervé. 2020. HDF5Array: HDF5 Backend for Delayedarray Objects.

Papatheodorou, Irene, Nuno A Fonseca, Maria Keays, Y Amy Tang, Elisabet Barrera, Wojciech Bazant, Melissa Burke, et al. 2018. “Expression Atlas: gene and protein expression across multiple studies and organisms.” Nucleic Acids Res. 46 (D1): D246–D251. https://doi.org/10.1093/nar/gkx1158.

Prudencio, Mercedes, Veronique V Belzil, Ranjan Batra, Christian A Ross, Tania F Gendron, Luc J Pregent, Melissa E Murray, et al. 2015. “Distinct Brain Transcriptome Profiles in C9orf72-Associated and Sporadic ALS.” Nat. Neurosci. 18 (8): 1175–82.

Ravasz, E, A L Somera, D A Mongru, Z N Oltvai, and A L Barabási. 2002. “Hierarchical Organization of Modularity in Metabolic Networks.” Science 297 (5586): 1551–5.

Ritchie, Matthew E, Belinda Phipson, Di Wu, Yifang Hu, Charity W Law, Wei Shi, and Gordon K Smyth. 2015. “limma Powers Differential Expression Analyses for RNA-Sequencing and Microarray Studies.” Nucleic Acids Research 43 (7): e47. https://doi.org/10.1093/nar/gkv007.

Tiberi, Simone, and Mark D. Robinson. 2020. Distinct: Distinct: A Method for Differential Analyses via Hierarchical Permutation Tests. https://github.com/SimoneTiberi/distinct.

Waese, Jamie, Jim Fan, Asher Pasha, Hans Yu, Geoffrey Fucile, Ruian Shi, Matthew Cumming, et al. 2017. “EPlant: Visualizing and Exploring Multiple Levels of Data for Hypothesis Generation in Plant Biology.” Plant Cell 29 (8): 1806–21.

Winter, Debbie, Ben Vinegar, Hardeep Nahal, Ron Ammar, Greg V Wilson, and Nicholas J Provart. 2007. “An ‘Electronic Fluorescent Pictograph’ Browser for Exploring and Analyzing Large-Scale Biological Data Sets.” PLoS One 2 (8): e718.