escheR
escheR 1.4.0
The goal of escheR
is to create an unified multi-dimensional spatial visualizations for spatially-resolved transcriptomics data following Gestalt principles.
Our manuscript describing the innovative visualization is published in Bioinformatics Advances.
You can install the latest release version of escheR
from Bioconductor via the following code. Additional details are shown on the Bioconductor page.
if (!require("BiocManager", quietly = TRUE)) {
install.packages("BiocManager")
}
BiocManager::install("escheR")
The latest development version can also be installed from the devel
version of Bioconductor or from GitHub following
if (!require("devtools")) install.packages("devtools")
devtools::install_github("boyiguo1/escheR")
# `devel` version from Bioconductor
BiocManager::install(version='devel')
Starting from Version 1.2.0, escheR
package supports three data structures, including SpatialExperiment
, SingleCellExperiment
, and data.frame
from base
R.
In the following example, we demonstrate the how to use escheR
with a SpatialExperiment
object. Please visit our other tutorials for [TODO: add items and list].
To run the demonstration, there are two necessary packages to load, escheR
and STexampleData
. STexampleData
contains a pre-processed 10x Visium dataset.
To note, escheR
will automatically load ggplot2
package. Hence, explicitly loading ggplot2
is not required.
library(escheR)
library(STexampleData)
In this step, we will find one 10xVisium sample from STexampleData
package, indexed by brain number of “151673”. For more information, please see the vignettes of STexampleData
.
spe <- Visium_humanDLPFC()
# Subset in-tissue spots
spe <- spe[, spe$in_tissue == 1]
spe <- spe[, !is.na(spe$ground_truth)]
Here is a summary of the SpatialExperiment
object called spe
.
spe
#> class: SpatialExperiment
#> dim: 33538 3611
#> metadata(0):
#> assays(1): counts
#> rownames(33538): ENSG00000243485 ENSG00000237613 ... ENSG00000277475
#> ENSG00000268674
#> rowData names(3): gene_id gene_name feature_type
#> colnames(3611): AAACAAGTATCTCCCA-1 AAACAATCTACTAGCA-1 ...
#> TTGTTTGTATTACACG-1 TTGTTTGTGTAAATTC-1
#> colData names(8): barcode_id sample_id ... reference cell_count
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
#> spatialCoords names(2) : pxl_col_in_fullres pxl_row_in_fullres
#> imgData names(4): sample_id image_id data scaleFactor
escheR
plotSimilar to ggplot2::ggplot()
, we first use the function make_escheR()
to create an empty plot. The input of make_escheR()
is a SpatialExperiment
object. The output of the function is a ggplot
object with no layer in it.
p <- make_escheR(spe)
Unlike ggplot2
, we use piping |>
instead of +
to apply layers the figure. Mainly, we have three functions add_fill
, add_ground
, add_symbol
. The inputs of these add_*
functions include the plots created using make_scheR()
and the variable name for the layer. Currently, the variable name should be in the the column data of the spe
object, i.e. colData(spe)
.
Here we first apply the add_fill
to add the spots color-coded by the total number of cells all spots(sum_umi
).
(p1 <- p |>
add_fill(var = "cell_count"))
It is okay to use any combination of the add_*
functions. For example, we want to show the spatial domains of the samples as the ground of the figure and use symbols to denote if each spot is within the outline of the tissue slice. In this example, all plotted spots are in the outlines of the tissue slice and hence marked with dots.
(p2 <- p |>
add_ground(var = "ground_truth")) # round layer
p2 |>
add_symbol(var = "ground_truth", size = 0.2) # Symbol layer
#> Warning: The shape palette can deal with a maximum of 6 discrete values because more
#> than 6 becomes difficult to discriminate
#> ℹ you have requested 7 values. Consider specifying shapes manually if you need
#> that many have them.
#> Warning: Removed 513 rows containing missing values or values outside the scale range
#> (`geom_point()`).
It is okay to change the ordering of these add_*
functions. However, we advise to always have the add_fill
as the first step to achieve the best visual effect due to the laying mechanism.
add_fill
and add_ground
To maximize the utility of the multi-dimensional plot by applying both color-coded layers using add_fill()
and add_ground()
, it is important to choose minimally interfering color-palette for the fill
and ground
to avoid visualization confusion. The following demonstration provide some examples for simultaneously visualization of two variables regardless of their types (continuous vs categorical, or categorical vs categorical.)
The following example visualizes the differential gene expression of MOBP, a marker gene for white matter, across different spatial domains. The default color palette, viridis
, are not easily visible with color-coded spatial domains as there are overlapping in the color space, which could lead to possible confusion.
# Prep data
# Adding gene counts for MBP to the colData
spe$counts_MOBP <- counts(spe)[which(rowData(spe)$gene_name=="MOBP"),]
(p <- make_escheR(spe) |>
add_fill(var = "counts_MOBP") |>
add_ground(var = "ground_truth", stroke = 0.5))
To improve the visualization, we choose to use a color that is not included in the color palette for ground_truth
, which is the color black. Specifically, we use a color gradient from white (no expression) to black (maximum of gene counts) to represent the expression of MOBP in each spot. By using the white-black color gradient for the gene expression, we minimize the overlapping of the choice of color for spatial domains.
(p2 <- p +
scale_fill_gradient(low = "white", high = "black"))
After customizing the color palettes to be minimally overlapping, it is easier to observe that MOBP has higher raw gene counts in the white matter (WM
) region than other regions.
In this example, we demonstrate how to optimize color palettes for visualizing two categorical variables. We first create an arbitrary 5-level categorical variable called tmp_group
, representing different horizontal regions of the tissue section.
spe$tmp_group <- cut(
spe$array_row,
breaks = c(min(spe$array_row)-1 ,
fivenum(spe$array_row))
)
table(spe$tmp_group)
#>
#> (-1,0] (0,18] (18,36] (36,52] (52,73]
#> 40 864 933 893 881
Following the principle to avoid overlapping of two color palettes, we use gradients of blue for different levels of tmp_group
.
make_escheR(spe) |>
add_fill(var = "tmp_group") |>
add_ground(var = "ground_truth", stroke = 0.5) +
scale_fill_brewer() +
theme_void()
Here is another example where we try another manually-curated color-palette. We follow the same principle, minimize overlapping of two color-palettes for ground (scale_color_manual
) and fill (scale_fill_brewer
) respectively. Specifically, we use gradients of blue to show tmp_group
and other colors for spatial domains ground_truth
.
make_escheR(spe) |>
add_fill(var = "tmp_group") |>
add_ground(var = "ground_truth", stroke = 0.5) +
scale_fill_brewer() +
scale_color_manual(
name = "", # turn off legend name for ground_truth
values = c(
"Layer1" = "#F0027F",
"Layer2" = "transparent",
"Layer3" = "#4DAF4A",
"Layer4" = "#984EA3",
"Layer5" = "#FFD700",
"Layer6" = "#FF7F00",
"WM" = "#1A1A1A")
)
NOTE: It would be intrinsically difficult to construct a clear, intuitive and precise visualization when the number of categories/levels is large.
In this vignettes, we don’t provide or recommend specific color palettes, because the selection of color palettes is highly relevant to the underlying message and heterogeneous across analysis and studies, e.g. sequential palettes, qualitative palette, and divergent palette. Instead, we direct interested user to explore the topic on bivariate color palette. The blog post by Jakub Nowosad and R package biscale
could be helpful to optimize your color palette for bivariate visualization.
In addition, if color palette is extremely to curate, e.g. large number of levels, it is possible to use symbols (add_symbol()
) to annotate specific levels to avoid clutter in the color space.
Given that the escheR
package is developed based on ggplot2
, aesthetics can be easily adjusted following the ggplot2
syntax. For example, given a escheR
plot object, one can use +
with theme_*
, scale_*
functions.
For example, to change the aesthetics of each layer, one can simply use the scale_*
from ggplot2
to optimize the final visual presentation. For example, to optimize add_fill
, one can use scale_fill_*
; to optimize add_ground
, one can use scale_color_*
; to optimize add_sumbol
, one use scale_shape_*
. Here, we demonstrate how to change the color for the ground layer ( add_ground
) using scale_color_manual
.
(p_final <- p2 +
scale_color_manual(
name = "", # No legend name
values = c(
"Layer1" = "#F0027F",
"Layer2" = "#377EB8",
"Layer3" = "#4DAF4A",
"Layer4" = "#984EA3",
"Layer5" = "#FFD700",
"Layer6" = "#FF7F00",
"WM" = "#1A1A1A")
) +
labs(title = "Example Title"))
The easiest way to show only a subset of levels of a categorical variable is to create a new variable where all the unwanted levels will be set to NA values. Please see the example below
table(spe$ground_truth, useNA = "ifany")
#>
#> Layer1 Layer2 Layer3 Layer4 Layer5 Layer6 WM
#> 273 253 989 218 673 692 513
spe$tmp_fac <- factor(spe$ground_truth,
levels = c("Layer1", "Layer2"))
table(spe$tmp_fac, useNA = "ifany")
#>
#> Layer1 Layer2 <NA>
#> 273 253 3085
make_escheR(spe) |>
add_ground(var = "ground_truth") |>
add_symbol(var = "tmp_fac", size = 0.4) +
scale_shape_manual(
values=c(3, 16), #> Set different symbols for the 2 levels
breaks = c("Layer1", "Layer2") #> Remove NA from legend
)
#> Warning: Removed 3085 rows containing missing values or values outside the scale range
#> (`geom_point()`).
By design, make_escheR
operates on only one sample. In order to create a figure compiling the spatial plots for multiple samples, individual plots are required via a series of calls to make_escheR
, possibly via a for
loop or an iterator function (e.g. lapply
).
# Create a list of `escheR` plots
plot_list <- unique(spe$sample_id) |> # Create a list of sample names
lapply(FUN = function(.sample_id){ # Iterate over all samples
spe_single <- spe[, spe$sample_id == .sample_id]
make_escheR(spe_single) |>
add_fill(var = "counts_MOBP") |>
add_ground(var = "ground_truth", stroke = 0.5))
# Customize theme
})
Given all plots made for individual samples are stored in a preferred data structure
(e.g. a list
), one can use many functions, e.g. cowplot::plot_grid
, patchwork
, to compile and arrange
individual plots to a paneled figure. The following example uses
ggpubr::ggarrange
to create a figure from a list of escheR
plots.
library(ggpubr)
plot_list <- list(p2, p2)
ggarrange(
plotlist = plot_list,
ncol = 2, nrow = 1,
common.legend = TRUE)
The procedure to save escheR
plots is exactly the same as saving a ggplot
object.
In the example below, we use the function ggplot2::ggsave()
to save escheR
plots in the pdf
format.
ggsave(
filename = "path/file_name.pdf",
plot = p_final
)
utils::sessionInfo()
#> R version 4.4.0 (2024-04-24)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 22.04.4 LTS
#>
#> Matrix products: default
#> BLAS: /home/biocbuild/bbs-3.19-bioc/R/lib/libRblas.so
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_GB LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: America/New_York
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] ggpubr_0.6.0 STexampleData_1.12.0
#> [3] SpatialExperiment_1.14.0 SingleCellExperiment_1.26.0
#> [5] SummarizedExperiment_1.34.0 Biobase_2.64.0
#> [7] GenomicRanges_1.56.0 GenomeInfoDb_1.40.0
#> [9] IRanges_2.38.0 S4Vectors_0.42.0
#> [11] MatrixGenerics_1.16.0 matrixStats_1.3.0
#> [13] ExperimentHub_2.12.0 AnnotationHub_3.12.0
#> [15] BiocFileCache_2.12.0 dbplyr_2.5.0
#> [17] BiocGenerics_0.50.0 escheR_1.4.0
#> [19] ggplot2_3.5.1 BiocStyle_2.32.0
#>
#> loaded via a namespace (and not attached):
#> [1] DBI_1.2.2 gridExtra_2.3 rlang_1.1.3
#> [4] magrittr_2.0.3 compiler_4.4.0 RSQLite_2.3.6
#> [7] png_0.1-8 vctrs_0.6.5 pkgconfig_2.0.3
#> [10] crayon_1.5.2 fastmap_1.2.0 backports_1.4.1
#> [13] magick_2.8.3 XVector_0.44.0 labeling_0.4.3
#> [16] utf8_1.2.4 rmarkdown_2.26 UCSC.utils_1.0.0
#> [19] tinytex_0.51 purrr_1.0.2 bit_4.0.5
#> [22] xfun_0.44 zlibbioc_1.50.0 cachem_1.1.0
#> [25] jsonlite_1.8.8 blob_1.2.4 highr_0.10
#> [28] DelayedArray_0.30.1 broom_1.0.5 R6_2.5.1
#> [31] bslib_0.7.0 RColorBrewer_1.1-3 car_3.1-2
#> [34] jquerylib_0.1.4 Rcpp_1.0.12 bookdown_0.39
#> [37] knitr_1.46 Matrix_1.7-0 tidyselect_1.2.1
#> [40] abind_1.4-5 yaml_2.3.8 curl_5.2.1
#> [43] lattice_0.22-6 tibble_3.2.1 withr_3.0.0
#> [46] KEGGREST_1.44.0 evaluate_0.23 Biostrings_2.72.0
#> [49] pillar_1.9.0 BiocManager_1.30.23 filelock_1.0.3
#> [52] carData_3.0-5 generics_0.1.3 BiocVersion_3.19.1
#> [55] munsell_0.5.1 scales_1.3.0 glue_1.7.0
#> [58] tools_4.4.0 ggsignif_0.6.4 cowplot_1.1.3
#> [61] grid_4.4.0 tidyr_1.3.1 AnnotationDbi_1.66.0
#> [64] colorspace_2.1-0 GenomeInfoDbData_1.2.12 cli_3.6.2
#> [67] rappdirs_0.3.3 fansi_1.0.6 S4Arrays_1.4.0
#> [70] viridisLite_0.4.2 dplyr_1.1.4 gtable_0.3.5
#> [73] rstatix_0.7.2 sass_0.4.9 digest_0.6.35
#> [76] SparseArray_1.4.4 rjson_0.2.21 farver_2.1.2
#> [79] memoise_2.0.1 htmltools_0.5.8.1 lifecycle_1.0.4
#> [82] httr_1.4.7 mime_0.12 bit64_4.0.5