The muSpaData
package includes datasets for use in the DESpace
package’s examples and vignettes.
It provides access to a publicly available Stereo-seq spatial dataset with complex experimental designs.
This dataset, containing multiple samples
(e.g., serial sections) measured under various
experimental conditions (e.g., time points),
is formatted as SpatialExperiment
(SPE) Bioconductor objects.
The table below provides details about the available dataset, including its unique identifier (ID), description, source, and reference.
View details directly in R using ?ID
(e.g., ?Wei22_full
).
ID | Description | Availability | Reference |
---|---|---|---|
Wei22_full |
Single-cell Stereo-seq spatial transcriptomics data includes axolotl brain tissues collected from multiple sections across various regeneration stages (16 samples in total) | Spatial Transcript Omics DataBase (STOmics DB) STDS0000056 | Wei et al. (2022) Singhal et al. (2024) |
Wei22_example |
A subset of the Wei22_full dataset, focusing on fewer genes and regeneration stages (6 samples in total) | Spatial Transcript Omics DataBase (STOmics DB) STDS0000056 | Wei et al. (2022) Singhal et al. (2024) |
After downloading the raw data from the original source, we merge samples across different time phases, perform quality control to filter low-quality genes and cells, and apply Banksy(Singhal et al. 2024) for multi-sample clustering and smoothing.
The finalized SPE objects are made available via Bioconductor’s ExperimentHub for easy access and reproducibility.
muSpaData
is an R package available via
Bioconductor repository for packages.
GitHub repository can be found
here.
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("muSpaData")
## Check that you have a valid Bioconductor installation
BiocManager::valid()
Then load packages:
suppressMessages({
library(muSpaData)
library(ExperimentHub)
library(ggplot2)
})
All datasets in muSpaData
can be loaded either through
named functions corresponding to the object
names or via the ExperimentHub
interface.
Each SPE contains filtered counts in the assay
slot,
with Banksy clusters stored in the Banksy
and
Banksy_smoothed
columns within the colData
slot.
# Load the small example spe data
(spe <- Wei22_example())
## class: SpatialExperiment
## dim: 5000 55660
## metadata(0):
## assays(2): counts logcounts
## rownames(5000): AMEX60DD009830 AMEX60DD009962 ... AMEX60DD004094
## AMEX60DD054542
## rowData names(2): gene_name gene_id
## colnames(55660): CELL.17879.10DPI_1 CELL.17922.10DPI_1 ...
## CELL.9123.2DPI_2 CELL.9124.2DPI_2
## colData names(5): sample_id condition Banksy_smooth sdimx sdimy
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## spatialCoords names(2) : sdimx sdimy
## imgData names(1): sample_id
# If you want to download the full data (about 5.2 GB in RAM) use:
if (benchmarkme::get_ram() > 5e9) {
Wei22_full()
}
query
via ExperimentHub
First, initialize a Hub instance with ExperimentHub
to
load all records into the variable eh
.
Use query
to identify muSpaData
records
and their accession IDs (e.g., EH123), then load
the data into R with eh[[id]]
.
# Connect to ExperimentHub and create Hub instance
eh <- ExperimentHub()
(q <- query(eh, "muSpaData"))
## ExperimentHub with 2 records
## # snapshotDate(): 2025-03-24
## # $dataprovider: Spatial Transcript Omics DataBase (STOmics DB)
## # $species: Ambystoma mexicanum
## # $rdataclass: SpatialExperiment
## # additional mcols(): taxonomyid, genome, description,
## # coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
## # rdatapath, sourceurl, sourcetype
## # retrieve records with, e.g., 'object[["EH9612"]]'
##
## title
## EH9612 | Wei22_full
## EH9613 | Wei22_example
# load the first resource in the list
q[[1]]
## class: SpatialExperiment
## dim: 13890 147432
## metadata(0):
## assays(2): counts logcounts
## rownames(13890): AMEX60DD000003 AMEX60DD000004 ... AMEX60DDU001041989
## AMEX60DDU001042129
## rowData names(2): gene_name gene_id
## colnames(147432): CELL.17879.10DPI_1 CELL.17922.10DPI_1 ...
## CELL.9330.5DPI_3 CELL.9331.5DPI_3
## colData names(5): sample_id condition Banksy_smooth sdimx sdimy
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## spatialCoords names(2) : sdimx sdimy
## imgData names(1): sample_id
# load by accession id
eh[["EH9613"]]
## class: SpatialExperiment
## dim: 5000 55660
## metadata(0):
## assays(2): counts logcounts
## rownames(5000): AMEX60DD009830 AMEX60DD009962 ... AMEX60DD004094
## AMEX60DD054542
## rowData names(2): gene_name gene_id
## colnames(55660): CELL.17879.10DPI_1 CELL.17922.10DPI_1 ...
## CELL.9123.2DPI_2 CELL.9124.2DPI_2
## colData names(5): sample_id condition Banksy_smooth sdimx sdimy
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## spatialCoords names(2) : sdimx sdimy
## imgData names(1): sample_id
list/loadResources
To facilitate data discovery within muSpaData
rather than across all of ExperimentHub,
available records can be viewed using listResources
.
To load a specific dataset or subset, use loadResources
.
listResources(eh, "muSpaData")
## [1] "Wei22_full" "Wei22_example"
# load data using a character vector of metadata search terms
loadResources(eh, "muSpaData", c("example"))
## [[1]]
## class: SpatialExperiment
## dim: 5000 55660
## metadata(0):
## assays(2): counts logcounts
## rownames(5000): AMEX60DD009830 AMEX60DD009962 ... AMEX60DD004094
## AMEX60DD054542
## rowData names(2): gene_name gene_id
## colnames(55660): CELL.17879.10DPI_1 CELL.17922.10DPI_1 ...
## CELL.9123.2DPI_2 CELL.9124.2DPI_2
## colData names(5): sample_id condition Banksy_smooth sdimx sdimy
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## spatialCoords names(2) : sdimx sdimy
## imgData names(1): sample_id
Since manual annotations are unavailable in the original dataset, we used Banksy (Singhal et al. 2024) to define spatial domains by jointly modeling multiple samples.
The Banksy spatial cluster assignments are available in the colData()
.
# View LIBD layers for one sample
CD <- colData(spe) |> as.data.frame()
ggplot(CD,
aes(x=sdimx,y=sdimy,
color=factor(Banksy_smooth))) +
geom_point(size = 0.25) +
theme_void() +
theme(legend.position="bottom") +
facet_wrap(~ sample_id, scales = 'free') +
labs(color = "", title = paste0("Banksy spatial clusters"))
sessionInfo()
## R Under development (unstable) (2025-03-13 r87965)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.2 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.21-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0 LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] SpatialExperiment_1.17.0 SingleCellExperiment_1.29.2
## [3] SummarizedExperiment_1.37.0 Biobase_2.67.0
## [5] GenomicRanges_1.59.1 GenomeInfoDb_1.43.4
## [7] IRanges_2.41.3 S4Vectors_0.45.4
## [9] MatrixGenerics_1.19.1 matrixStats_1.5.0
## [11] ggplot2_3.5.1 muSpaData_0.99.5
## [13] ExperimentHub_2.15.0 AnnotationHub_3.15.0
## [15] BiocFileCache_2.15.1 dbplyr_2.5.0
## [17] BiocGenerics_0.53.6 generics_0.1.3
## [19] BiocStyle_2.35.0
##
## loaded via a namespace (and not attached):
## [1] tidyselect_1.2.1 dplyr_1.1.4 farver_2.1.2
## [4] blob_1.2.4 filelock_1.0.3 Biostrings_2.75.4
## [7] fastmap_1.2.0 digest_0.6.37 mime_0.13
## [10] lifecycle_1.0.4 KEGGREST_1.47.0 RSQLite_2.3.9
## [13] magrittr_2.0.3 compiler_4.6.0 rlang_1.1.5
## [16] sass_0.4.9 tools_4.6.0 yaml_2.3.10
## [19] knitr_1.50 S4Arrays_1.7.3 labeling_0.4.3
## [22] bit_4.6.0 curl_6.2.2 DelayedArray_0.33.6
## [25] abind_1.4-8 withr_3.0.2 purrr_1.0.4
## [28] grid_4.6.0 colorspace_2.1-1 scales_1.3.0
## [31] tinytex_0.56 cli_3.6.4 rmarkdown_2.29
## [34] crayon_1.5.3 httr_1.4.7 rjson_0.2.23
## [37] DBI_1.2.3 cachem_1.1.0 AnnotationDbi_1.69.0
## [40] BiocManager_1.30.25 XVector_0.47.2 vctrs_0.6.5
## [43] Matrix_1.7-3 jsonlite_1.9.1 bookdown_0.42
## [46] bit64_4.6.0-1 magick_2.8.6 jquerylib_0.1.4
## [49] glue_1.8.0 gtable_0.3.6 BiocVersion_3.21.1
## [52] UCSC.utils_1.3.1 munsell_0.5.1 tibble_3.2.1
## [55] pillar_1.10.1 rappdirs_0.3.3 htmltools_0.5.8.1
## [58] GenomeInfoDbData_1.2.14 R6_2.6.1 evaluate_1.0.3
## [61] lattice_0.22-6 png_0.1-8 memoise_2.0.1
## [64] bslib_0.9.0 Rcpp_1.0.14 SparseArray_1.7.7
## [67] xfun_0.51 pkgconfig_2.0.3
Singhal, V, N Chou, J Lee, Y Yue, J Liu, W. K Chock, L Lin, et al. 2024. “BANKSY unifies cell typing and tissue domain segmentation for scalable spatial omics data analysis.” Nature Genetics 56: 431–41. https://doi.org/10.1038/s41588-024-01664-3.
Wei, X, S Fu, H Li, and Y Gu. 2022. “Single-cell Stereo-seq reveals induced progenitor cells involved in axolotl brain regeneration.” Science 377 (6610). https://doi.org/10.1126/science.abp9444.