1 Overview

The muSpaData package includes datasets for use in the DESpace package’s examples and vignettes.

It provides access to a publicly available Stereo-seq spatial dataset with complex experimental designs.

This dataset, containing multiple samples (e.g., serial sections) measured under various experimental conditions (e.g., time points), is formatted as SpatialExperiment (SPE) Bioconductor objects.

2 Available datasets

The table below provides details about the available dataset, including its unique identifier (ID), description, source, and reference.

View details directly in R using ?ID (e.g., ?Wei22_full).

ID	Description	Availability	Reference
`Wei22_full`	Single-cell Stereo-seq spatial transcriptomics data includes axolotl brain tissues collected from multiple sections across various regeneration stages (16 samples in total)	Spatial Transcript Omics DataBase (STOmics DB) STDS0000056	Wei et al. (2022) Singhal et al. (2024)
`Wei22_example`	A subset of the Wei22_full dataset, focusing on fewer genes and regeneration stages (6 samples in total)	Spatial Transcript Omics DataBase (STOmics DB) STDS0000056	Wei et al. (2022) Singhal et al. (2024)

After downloading the raw data from the original source, we merge samples across different time phases, perform quality control to filter low-quality genes and cells, and apply Banksy(Singhal et al. 2024) for multi-sample clustering and smoothing.

The finalized SPE objects are made available via Bioconductor’s ExperimentHub for easy access and reproducibility.

3 Installation

muSpaData is an R package available via Bioconductor repository for packages. GitHub repository can be found here.

if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("muSpaData")

## Check that you have a valid Bioconductor installation
BiocManager::valid()

Then load packages:

suppressMessages({
    library(muSpaData)
    library(ExperimentHub)
    library(ggplot2)
})

4 Data loading

All datasets in muSpaData can be loaded either through named functions corresponding to the object names or via the ExperimentHub interface.

Each SPE contains filtered counts in the assay slot, with Banksy clusters stored in the Banksy and Banksy_smoothed columns within the colData slot.

4.1 Via functions

# Load the small example spe data
(spe <- Wei22_example())

## class: SpatialExperiment 
## dim: 5000 55660 
## metadata(0):
## assays(2): counts logcounts
## rownames(5000): AMEX60DD009830 AMEX60DD009962 ... AMEX60DD004094
##   AMEX60DD054542
## rowData names(2): gene_name gene_id
## colnames(55660): CELL.17879.10DPI_1 CELL.17922.10DPI_1 ...
##   CELL.9123.2DPI_2 CELL.9124.2DPI_2
## colData names(5): sample_id condition Banksy_smooth sdimx sdimy
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## spatialCoords names(2) : sdimx sdimy
## imgData names(1): sample_id

# If you want to download the full data (about 5.2 GB in RAM) use:
if (benchmarkme::get_ram() > 5e9) {
    Wei22_full()
}

4.2 Using `query` via `ExperimentHub`

First, initialize a Hub instance with ExperimentHub to load all records into the variable eh.

Use query to identify muSpaData records and their accession IDs (e.g., EH123), then load the data into R with eh[[id]].

# Connect to ExperimentHub and create Hub instance
eh <- ExperimentHub()
(q <- query(eh, "muSpaData"))

## ExperimentHub with 2 records
## # snapshotDate(): 2025-04-11
## # $dataprovider: Spatial Transcript Omics DataBase (STOmics DB)
## # $species: Ambystoma mexicanum
## # $rdataclass: SpatialExperiment
## # additional mcols(): taxonomyid, genome, description,
## #   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
## #   rdatapath, sourceurl, sourcetype 
## # retrieve records with, e.g., 'object[["EH9612"]]' 
## 
##            title        
##   EH9612 | Wei22_full   
##   EH9613 | Wei22_example

# load the first resource in the list
q[[1]]

## class: SpatialExperiment 
## dim: 13890 147432 
## metadata(0):
## assays(2): counts logcounts
## rownames(13890): AMEX60DD000003 AMEX60DD000004 ... AMEX60DDU001041989
##   AMEX60DDU001042129
## rowData names(2): gene_name gene_id
## colnames(147432): CELL.17879.10DPI_1 CELL.17922.10DPI_1 ...
##   CELL.9330.5DPI_3 CELL.9331.5DPI_3
## colData names(5): sample_id condition Banksy_smooth sdimx sdimy
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## spatialCoords names(2) : sdimx sdimy
## imgData names(1): sample_id

# load by accession id
eh[["EH9613"]]

## class: SpatialExperiment 
## dim: 5000 55660 
## metadata(0):
## assays(2): counts logcounts
## rownames(5000): AMEX60DD009830 AMEX60DD009962 ... AMEX60DD004094
##   AMEX60DD054542
## rowData names(2): gene_name gene_id
## colnames(55660): CELL.17879.10DPI_1 CELL.17922.10DPI_1 ...
##   CELL.9123.2DPI_2 CELL.9124.2DPI_2
## colData names(5): sample_id condition Banksy_smooth sdimx sdimy
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## spatialCoords names(2) : sdimx sdimy
## imgData names(1): sample_id

4.3 Using `list/loadResources`

To facilitate data discovery within muSpaData rather than across all of ExperimentHub, available records can be viewed using listResources.

To load a specific dataset or subset, use loadResources.

listResources(eh, "muSpaData")

## [1] "Wei22_full"    "Wei22_example"

# load data using a character vector of metadata search terms 
loadResources(eh, "muSpaData", c("example"))

## [[1]]
## class: SpatialExperiment 
## dim: 5000 55660 
## metadata(0):
## assays(2): counts logcounts
## rownames(5000): AMEX60DD009830 AMEX60DD009962 ... AMEX60DD004094
##   AMEX60DD054542
## rowData names(2): gene_name gene_id
## colnames(55660): CELL.17879.10DPI_1 CELL.17922.10DPI_1 ...
##   CELL.9123.2DPI_2 CELL.9124.2DPI_2
## colData names(5): sample_id condition Banksy_smooth sdimx sdimy
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## spatialCoords names(2) : sdimx sdimy
## imgData names(1): sample_id

5 Explore the data

Since manual annotations are unavailable in the original dataset, we used Banksy (Singhal et al. 2024) to define spatial domains by jointly modeling multiple samples.

The Banksy spatial cluster assignments are available in the colData().

# View LIBD layers for one sample
CD <- colData(spe) |> as.data.frame()
ggplot(CD, 
    aes(x=sdimx,y=sdimy, 
    color=factor(Banksy_smooth))) +
    geom_point(size = 0.25) + 
    theme_void() + 
    theme(legend.position="bottom") + 
    facet_wrap(~ sample_id, scales = 'free') +
    labs(color = "", title = paste0("Banksy spatial clusters"))

6 Session info

sessionInfo()

## R version 4.5.0 RC (2025-04-04 r88126)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.2 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.21-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] SpatialExperiment_1.18.0    SingleCellExperiment_1.30.0
##  [3] SummarizedExperiment_1.38.0 Biobase_2.68.0             
##  [5] GenomicRanges_1.60.0        GenomeInfoDb_1.44.0        
##  [7] IRanges_2.42.0              S4Vectors_0.46.0           
##  [9] MatrixGenerics_1.20.0       matrixStats_1.5.0          
## [11] ggplot2_3.5.2               muSpaData_1.0.0            
## [13] ExperimentHub_2.16.0        AnnotationHub_3.16.0       
## [15] BiocFileCache_2.16.0        dbplyr_2.5.0               
## [17] BiocGenerics_0.54.0         generics_0.1.3             
## [19] BiocStyle_2.36.0           
## 
## loaded via a namespace (and not attached):
##  [1] tidyselect_1.2.1        dplyr_1.1.4             farver_2.1.2           
##  [4] blob_1.2.4              filelock_1.0.3          Biostrings_2.76.0      
##  [7] fastmap_1.2.0           digest_0.6.37           mime_0.13              
## [10] lifecycle_1.0.4         KEGGREST_1.48.0         RSQLite_2.3.9          
## [13] magrittr_2.0.3          compiler_4.5.0          rlang_1.1.6            
## [16] sass_0.4.10             tools_4.5.0             yaml_2.3.10            
## [19] knitr_1.50              S4Arrays_1.8.0          labeling_0.4.3         
## [22] bit_4.6.0               curl_6.2.2              DelayedArray_0.34.0    
## [25] abind_1.4-8             withr_3.0.2             purrr_1.0.4            
## [28] grid_4.5.0              colorspace_2.1-1        scales_1.3.0           
## [31] tinytex_0.57            cli_3.6.4               rmarkdown_2.29         
## [34] crayon_1.5.3            httr_1.4.7              rjson_0.2.23           
## [37] DBI_1.2.3               cachem_1.1.0            AnnotationDbi_1.70.0   
## [40] BiocManager_1.30.25     XVector_0.48.0          vctrs_0.6.5            
## [43] Matrix_1.7-3            jsonlite_2.0.0          bookdown_0.43          
## [46] bit64_4.6.0-1           magick_2.8.6            jquerylib_0.1.4        
## [49] glue_1.8.0              gtable_0.3.6            BiocVersion_3.21.1     
## [52] UCSC.utils_1.4.0        munsell_0.5.1           tibble_3.2.1           
## [55] pillar_1.10.2           rappdirs_0.3.3          htmltools_0.5.8.1      
## [58] GenomeInfoDbData_1.2.14 R6_2.6.1                evaluate_1.0.3         
## [61] lattice_0.22-7          png_0.1-8               memoise_2.0.1          
## [64] bslib_0.9.0             Rcpp_1.0.14             SparseArray_1.8.0      
## [67] xfun_0.52               pkgconfig_2.0.3

References

Singhal, V, N Chou, J Lee, Y Yue, J Liu, W. K Chock, L Lin, et al. 2024. “BANKSY unifies cell typing and tissue domain segmentation for scalable spatial omics data analysis.” Nature Genetics 56: 431–41. https://doi.org/10.1038/s41588-024-01664-3.

Wei, X, S Fu, H Li, and Y Gu. 2022. “Single-cell Stereo-seq reveals induced progenitor cells involved in axolotl brain regeneration.” Science 377 (6610). https://doi.org/10.1126/science.abp9444.

Multi-sample multi-group
spatial transcriptomics data

17 April 2025

Contents

1 Overview

2 Available datasets

3 Installation

4 Data loading

4.1 Via functions

4.2 Using `query` via `ExperimentHub`

4.3 Using `list/loadResources`

5 Explore the data

6 Session info

References

Multi-sample multi-group spatial transcriptomics data

17 April 2025

Contents

1 Overview

2 Available datasets

3 Installation

4 Data loading

4.1 Via functions

4.2 Using query via ExperimentHub

4.3 Using list/loadResources

5 Explore the data

6 Session info

References

Multi-sample multi-group
spatial transcriptomics data

4.2 Using `query` via `ExperimentHub`

4.3 Using `list/loadResources`