STexampleData 1.12.3
The STexampleData
package contains a collection of spatial transcriptomics datasets, which have been formatted into the SpatialExperiment Bioconductor class, for use in examples, demonstrations, and tutorials. The datasets are from several different technological platforms and have been sourced from various publicly available sources. Some of the datasets include images and/or reference annotation labels.
The STexampleData
package can be installed from Bioconductor:
install.packages("BiocManager")
BiocManager::install("STexampleData")
The package contains the following datasets:
Visium_humanDLPFC
(10x Genomics Visium): A single sample (sample 151673) of human brain dorsolateral prefrontal cortex (DLPFC) in the human brain, measured using the 10x Genomics Visium platform. This is a subset of the full dataset containing 12 samples from 3 neurotypical donors, published by Maynard and Collado-Torres et al. (2021). The full dataset is available from the spatialLIBD Bioconductor package.
Visium_mouseCoronal
(10x Genomics Visium): A single coronal section from the mouse brain, spanning one hemisphere. This dataset was previously released by 10x Genomics on their website.
seqFISH_mouseEmbryo
(seqFISH): A subset of cells (embryo 1, z-slice 2) from a previously published dataset investigating mouse embryogenesis by Lohoff and Ghazanfar et al. (2022), generated using the seqFISH platform. The full dataset is available online.
ST_mouseOB
(Spatial Transcriptomics): A single sample from the mouse brain olfactory bulb (OB), measured with the Spatial Transcriptomics platform (Stahl et al. 2016). This dataset contains annotations for five cell layers from the original authors.
SlideSeqV2_mouseHPC
(Slide-seqV2): A single sample of mouse brain from the hippocampus (HPC) and surrounding regions, measured with the Slide-seqV2 platform (Stickels et al. 2021). This dataset contains cell type annotations generated by Cable et al. (2022).
Janesick_breastCancer_Chromium
(10x Genomics Chromium): 10x Genomics Chromium single-cell RNA sequencing data from human breast cancer dataset by Janesick et al. (2023). High resolution mapping of the breast cancer tumor microenvironment using integrated single-cell, spatial, and in situ analysis of FFPE tissue. Contains annotations for cell type from the original authors.
Janesick_breastCancer_Visium
(10x Genomics Visium): 10x Genomics Visium spatial transcriptomics data from human breast cancer dataset by Janesick et al. (2023). High resolution mapping of the breast cancer tumor microenvironment using integrated single-cell, spatial, and in situ analysis of FFPE tissue.
Janesick_breastCancer_Xenium_rep1
(10x Genomics Xenium): 10x Genomics Xenium in situ spatial data (sample 1, replicate 1) from human breast cancer dataset by Janesick et al. (2023). High resolution mapping of the breast cancer tumor microenvironment using integrated single-cell, spatial, and in situ analysis of FFPE tissue.
Janesick_breastCancer_Xenium_rep2
(10x Genomics Xenium): 10x Genomics Xenium in situ spatial data (sample 1, replicate 2) from human breast cancer dataset by Janesick et al. (2023). High resolution mapping of the breast cancer tumor microenvironment using integrated single-cell, spatial, and in situ analysis of FFPE tissue.
CosMx_lungCancer
(NanoString CosMx): NanoString CosMx human non-small cell lung cancer (NSCLC) dataset. Contains data from one sample (patient 9, slice 1). This dataset was previously released by NanoString on their website.
MERSCOPE_ovarianCancer
(Vizgen MERSCOPE): Vizgen MERSCOPE human ovarian cancer dataset. Contains data from one sample (patient 2, sample 1). This dataset was previously released by Vizgen on their website.
STARmapPLUS_mouseBrain
(STARmap PLUS): STARmap PLUS mouse brain data by Shi et al. (2023). Contains data from one sample (well 05), including annotations for cell type and tissue regions from the original authors.
The following examples show how to load the example datasets as SpatialExperiment
objects in an R session.
There are two options for loading the datasets: either using named accessor functions or by querying the ExperimentHub database.
library(SpatialExperiment)
library(STexampleData)
# load object
spe <- Visium_humanDLPFC()
# check object
spe
## class: SpatialExperiment
## dim: 33538 4992
## metadata(0):
## assays(1): counts
## rownames(33538): ENSG00000243485 ENSG00000237613 ... ENSG00000277475
## ENSG00000268674
## rowData names(3): gene_id gene_name feature_type
## colnames(4992): AAACAACGAATAGTTC-1 AAACAAGTATCTCCCA-1 ...
## TTGTTTGTATTACACG-1 TTGTTTGTGTAAATTC-1
## colData names(8): barcode_id sample_id ... reference cell_count
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## spatialCoords names(2) : pxl_col_in_fullres pxl_row_in_fullres
## imgData names(4): sample_id image_id data scaleFactor
dim(spe)
## [1] 33538 4992
assayNames(spe)
## [1] "counts"
rowData(spe)
## DataFrame with 33538 rows and 3 columns
## gene_id gene_name feature_type
## <character> <character> <character>
## ENSG00000243485 ENSG00000243485 MIR1302-2HG Gene Expression
## ENSG00000237613 ENSG00000237613 FAM138A Gene Expression
## ENSG00000186092 ENSG00000186092 OR4F5 Gene Expression
## ENSG00000238009 ENSG00000238009 AL627309.1 Gene Expression
## ENSG00000239945 ENSG00000239945 AL627309.3 Gene Expression
## ... ... ... ...
## ENSG00000277856 ENSG00000277856 AC233755.2 Gene Expression
## ENSG00000275063 ENSG00000275063 AC233755.1 Gene Expression
## ENSG00000271254 ENSG00000271254 AC240274.1 Gene Expression
## ENSG00000277475 ENSG00000277475 AC213203.1 Gene Expression
## ENSG00000268674 ENSG00000268674 FAM231C Gene Expression
colData(spe)
## DataFrame with 4992 rows and 8 columns
## barcode_id sample_id in_tissue array_row
## <character> <character> <integer> <integer>
## AAACAACGAATAGTTC-1 AAACAACGAATAGTTC-1 sample_151673 0 0
## AAACAAGTATCTCCCA-1 AAACAAGTATCTCCCA-1 sample_151673 1 50
## AAACAATCTACTAGCA-1 AAACAATCTACTAGCA-1 sample_151673 1 3
## AAACACCAATAACTGC-1 AAACACCAATAACTGC-1 sample_151673 1 59
## AAACAGAGCGACTCCT-1 AAACAGAGCGACTCCT-1 sample_151673 1 14
## ... ... ... ... ...
## TTGTTTCACATCCAGG-1 TTGTTTCACATCCAGG-1 sample_151673 1 58
## TTGTTTCATTAGTCTA-1 TTGTTTCATTAGTCTA-1 sample_151673 1 60
## TTGTTTCCATACAACT-1 TTGTTTCCATACAACT-1 sample_151673 1 45
## TTGTTTGTATTACACG-1 TTGTTTGTATTACACG-1 sample_151673 1 73
## TTGTTTGTGTAAATTC-1 TTGTTTGTGTAAATTC-1 sample_151673 1 7
## array_col ground_truth reference cell_count
## <integer> <character> <character> <integer>
## AAACAACGAATAGTTC-1 16 NA NA NA
## AAACAAGTATCTCCCA-1 102 Layer3 Layer3 6
## AAACAATCTACTAGCA-1 43 Layer1 Layer1 16
## AAACACCAATAACTGC-1 19 WM WM 5
## AAACAGAGCGACTCCT-1 94 Layer3 Layer3 2
## ... ... ... ... ...
## TTGTTTCACATCCAGG-1 42 WM WM 3
## TTGTTTCATTAGTCTA-1 30 WM WM 4
## TTGTTTCCATACAACT-1 27 Layer6 Layer6 3
## TTGTTTGTATTACACG-1 41 WM WM 16
## TTGTTTGTGTAAATTC-1 51 Layer2 Layer2 5
head(spatialCoords(spe))
## pxl_col_in_fullres pxl_row_in_fullres
## AAACAACGAATAGTTC-1 3913 2435
## AAACAAGTATCTCCCA-1 9791 8468
## AAACAATCTACTAGCA-1 5769 2807
## AAACACCAATAACTGC-1 4068 9505
## AAACAGAGCGACTCCT-1 9271 4151
## AAACAGCTTTCAGAAG-1 3393 7583
imgData(spe)
## DataFrame with 2 rows and 4 columns
## sample_id image_id data scaleFactor
## <character> <character> <list> <numeric>
## 1 sample_151673 lowres #### 0.0450045
## 2 sample_151673 hires #### 0.1500150
# load object
spe <- Visium_mouseCoronal()
# check object
spe
## class: SpatialExperiment
## dim: 32285 4992
## metadata(0):
## assays(1): counts
## rownames(32285): ENSMUSG00000051951 ENSMUSG00000089699 ...
## ENSMUSG00000095019 ENSMUSG00000095041
## rowData names(3): gene_id gene_name feature_type
## colnames(4992): AAACAACGAATAGTTC-1 AAACAAGTATCTCCCA-1 ...
## TTGTTTGTATTACACG-1 TTGTTTGTGTAAATTC-1
## colData names(5): barcode_id sample_id in_tissue array_row array_col
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## spatialCoords names(2) : pxl_col_in_fullres pxl_row_in_fullres
## imgData names(4): sample_id image_id data scaleFactor
# load object
spe <- seqFISH_mouseEmbryo()
# check object
spe
## class: SpatialExperiment
## dim: 351 11026
## metadata(0):
## assays(2): counts molecules
## rownames(351): Abcc4 Acp5 ... Zfp57 Zic3
## rowData names(1): gene_name
## colnames(11026): embryo1_Pos0_cell10_z2 embryo1_Pos0_cell100_z2 ...
## embryo1_Pos28_cell97_z2 embryo1_Pos28_cell98_z2
## colData names(14): cell_id embryo ... segmentation_vertices sample_id
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## spatialCoords names(2) : x y
## imgData names(0):
# load object
spe <- ST_mouseOB()
# check object
spe
## class: SpatialExperiment
## dim: 15928 262
## metadata(0):
## assays(1): counts
## rownames(15928): 0610007N19Rik 0610007P14Rik ... Zzef1 Zzz3
## rowData names(1): gene_name
## colnames(262): ACAACTATGGGTTGGCGG ACACAGATCCTGTTCTGA ...
## TTTCAACCCGAGGAAGTC TTTCTAACTCATAAGGAT
## colData names(3): barcode_id sample_id layer
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## spatialCoords names(2) : x y
## imgData names(0):
# load object
spe <- SlideSeqV2_mouseHPC()
# check object
spe
## class: SpatialExperiment
## dim: 23264 53208
## metadata(0):
## assays(1): counts
## rownames(23264): 0610005C13Rik 0610007P14Rik ... n-R5s40 n-R5s95
## rowData names(1): gene_name
## colnames(53208): AACGTCATAATCGT TACTTTAGCGCAGT ... GACTTTTCTTAAAG
## GTCAATAAAGGGCG
## colData names(3): barcode_id sample_id celltype
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## spatialCoords names(2) : xcoord ycoord
## imgData names(0):
# load object
# note: this dataset is in SingleCellExperiment format
sce <- Janesick_breastCancer_Chromium()
# check object
sce
## class: SingleCellExperiment
## dim: 37143 27472
## metadata(1): Samples
## assays(1): counts
## rownames(37143): MIR1302-2HG FAM138A ... DEPRECATED_ENSG00000284873
## DEPRECATED_ENSG00000285687
## rowData names(3): ID Symbol Type
## colnames(27472): AAACAAGCAAACGGGA-1 AAACAAGCAAATAGGA-1 ...
## TTTGTGAGTTGTCATA-1 TTTGTGAGTTTGGCCA-1
## colData names(3): Barcode Sample Annotation
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
# load object
spe <- Janesick_breastCancer_Visium()
# check object
spe
## class: SpatialExperiment
## dim: 18085 4992
## metadata(0):
## assays(1): counts
## rownames(18085): SAMD11 NOC2L ... MT-ND6 MT-CYB
## rowData names(2): symbol geneid
## colnames(4992): AACACCTACTATCGAA-1 AACACGTGCATCGCAC-1 ...
## TGTTGGCCAGACCTAC-1 TGTTGGCCTACACGTG-1
## colData names(4): in_tissue array_row array_col sample_id
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## spatialCoords names(2) : pxl_col_in_fullres pxl_row_in_fullres
## imgData names(4): sample_id image_id data scaleFactor
# load object
spe <- Janesick_breastCancer_Xenium_rep1()
# check object
spe
## class: SpatialExperiment
## dim: 313 167780
## metadata(0):
## assays(1): counts
## rownames(313): ABCC11 ACTA2 ... ZEB2 ZNF562
## rowData names(3): ID Symbol Type
## colnames(167780): 1 2 ... 167779 167780
## colData names(8): cell_id transcript_counts ... nucleus_area sample_id
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## spatialCoords names(2) : x_centroid y_centroid
## imgData names(0):
# load object
spe <- Janesick_breastCancer_Xenium_rep2()
# check object
spe
## class: SpatialExperiment
## dim: 313 118752
## metadata(0):
## assays(1): counts
## rownames(313): ABCC11 ACTA2 ... ZEB2 ZNF562
## rowData names(3): ID Symbol Type
## colnames(118752): 1 2 ... 118751 118752
## colData names(8): cell_id transcript_counts ... nucleus_area sample_id
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## spatialCoords names(2) : x_centroid y_centroid
## imgData names(0):
# load object
spe <- CosMx_lungCancer()
# check object
spe
## class: SpatialExperiment
## dim: 980 91972
## metadata(0):
## assays(1): counts
## rownames(980): AATK ABL1 ... NegPrb22 NegPrb23
## rowData names(0):
## colnames(91972): 1 2 ... 91971 91972
## colData names(19): fov cell_ID ... Max.DAPI sample_id
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## spatialCoords names(2) : CenterX_global_px CenterY_global_px
## imgData names(0):
# load object
spe <- MERSCOPE_ovarianCancer()
# check object
spe
## class: SpatialExperiment
## dim: 550 254347
## metadata(0):
## assays(1): counts
## rownames(550): PDK4 CCL26 ... Blank.42 Blank.46
## rowData names(0):
## colnames(254347): 1 2 ... 254346 254347
## colData names(7): fov volume ... max_y sample_id
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## spatialCoords names(2) : center_x center_y
## imgData names(0):
# load object
spe <- STARmapPLUS_mouseBrain()
# check object
spe
## class: SpatialExperiment
## dim: 1022 46184
## metadata(0):
## assays(1): counts
## rownames(1022): A2M ABCC9 ... ZIC1 ZMYM1
## rowData names(0):
## colnames(46184): well05_0 well05_1 ... well05_46182 well05_46183
## colData names(7): NAME Main_molecular_cell_type ...
## Molecular_spatial_cell_type sample_id
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## spatialCoords names(3) : X Y Z
## imgData names(0):
library(ExperimentHub)
# create ExperimentHub instance
eh <- ExperimentHub()
# query STexampleData datasets
myfiles <- query(eh, "STexampleData")
myfiles
## ExperimentHub with 12 records
## # snapshotDate(): 2024-04-29
## # $dataprovider: NA
## # $species: Homo sapiens, Mus musculus
## # $rdataclass: SpatialExperiment, SingleCellExperiment
## # additional mcols(): taxonomyid, genome, description,
## # coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
## # rdatapath, sourceurl, sourcetype
## # retrieve records with, e.g., 'object[["EH9516"]]'
##
## title
## EH9516 | Visium_humanDLPFC
## EH9517 | Visium_mouseCoronal
## EH9518 | seqFISH_mouseEmbryo
## EH9519 | ST_mouseOB
## EH9520 | SlideSeqV2_mouseHPC
## ... ...
## EH9523 | Janesick_breastCancer_Xenium_rep1
## EH9524 | Janesick_breastCancer_Xenium_rep2
## EH9525 | CosMx_lungCancer
## EH9526 | MERSCOPE_ovarianCancer
## EH9527 | STARmapPLUS_mouseBrain
# metadata
md <- as.data.frame(mcols(myfiles))
# load 'Visium_humanDLPFC' dataset using ExperimentHub query
spe <- myfiles[[1]]
spe
## class: SpatialExperiment
## dim: 33538 4992
## metadata(0):
## assays(1): counts
## rownames(33538): ENSG00000243485 ENSG00000237613 ... ENSG00000277475
## ENSG00000268674
## rowData names(3): gene_id gene_name feature_type
## colnames(4992): AAACAACGAATAGTTC-1 AAACAAGTATCTCCCA-1 ...
## TTGTTTGTATTACACG-1 TTGTTTGTGTAAATTC-1
## colData names(8): barcode_id sample_id ... reference cell_count
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## spatialCoords names(2) : pxl_col_in_fullres pxl_row_in_fullres
## imgData names(4): sample_id image_id data scaleFactor
# load 'Visium_humanDLPFC' dataset using ExperimentHub ID
spe <- myfiles[["EH9516"]]
spe
## class: SpatialExperiment
## dim: 33538 4992
## metadata(0):
## assays(1): counts
## rownames(33538): ENSG00000243485 ENSG00000237613 ... ENSG00000277475
## ENSG00000268674
## rowData names(3): gene_id gene_name feature_type
## colnames(4992): AAACAACGAATAGTTC-1 AAACAAGTATCTCCCA-1 ...
## TTGTTTGTATTACACG-1 TTGTTTGTGTAAATTC-1
## colData names(8): barcode_id sample_id ... reference cell_count
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## spatialCoords names(2) : pxl_col_in_fullres pxl_row_in_fullres
## imgData names(4): sample_id image_id data scaleFactor
For reference, we include code scripts to generate the SpatialExperiment
objects from the raw data files.
These scripts are saved in /inst/scripts/
in the source code of the STexampleData
package. The scripts include references and links to the data files from the original sources for each dataset.
sessionInfo()
## R version 4.4.0 (2024-04-24)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.4 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.19-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] HDF5Array_1.32.0 rhdf5_2.48.0
## [3] DelayedArray_0.30.1 SparseArray_1.4.5
## [5] S4Arrays_1.4.1 abind_1.4-5
## [7] Matrix_1.7-0 BumpyMatrix_1.12.0
## [9] STexampleData_1.12.3 ExperimentHub_2.12.0
## [11] AnnotationHub_3.12.0 BiocFileCache_2.12.0
## [13] dbplyr_2.5.0 SpatialExperiment_1.14.0
## [15] SingleCellExperiment_1.26.0 SummarizedExperiment_1.34.0
## [17] Biobase_2.64.0 GenomicRanges_1.56.0
## [19] GenomeInfoDb_1.40.0 IRanges_2.38.0
## [21] S4Vectors_0.42.0 BiocGenerics_0.50.0
## [23] MatrixGenerics_1.16.0 matrixStats_1.3.0
## [25] BiocStyle_2.32.0
##
## loaded via a namespace (and not attached):
## [1] tidyselect_1.2.1 dplyr_1.1.4 blob_1.2.4
## [4] filelock_1.0.3 Biostrings_2.72.0 fastmap_1.2.0
## [7] digest_0.6.35 mime_0.12 lifecycle_1.0.4
## [10] KEGGREST_1.44.0 RSQLite_2.3.6 magrittr_2.0.3
## [13] compiler_4.4.0 rlang_1.1.3 sass_0.4.9
## [16] tools_4.4.0 utf8_1.2.4 yaml_2.3.8
## [19] knitr_1.46 bit_4.0.5 curl_5.2.1
## [22] withr_3.0.0 purrr_1.0.2 grid_4.4.0
## [25] fansi_1.0.6 Rhdf5lib_1.26.0 cli_3.6.2
## [28] rmarkdown_2.27 crayon_1.5.2 generics_0.1.3
## [31] httr_1.4.7 rjson_0.2.21 DBI_1.2.2
## [34] cachem_1.1.0 zlibbioc_1.50.0 AnnotationDbi_1.66.0
## [37] BiocManager_1.30.23 XVector_0.44.0 vctrs_0.6.5
## [40] jsonlite_1.8.8 bookdown_0.39 bit64_4.0.5
## [43] magick_2.8.3 jquerylib_0.1.4 glue_1.7.0
## [46] BiocVersion_3.19.1 UCSC.utils_1.0.0 tibble_3.2.1
## [49] pillar_1.9.0 rhdf5filters_1.16.0 rappdirs_0.3.3
## [52] htmltools_0.5.8.1 GenomeInfoDbData_1.2.12 R6_2.5.1
## [55] evaluate_0.23 lattice_0.22-6 png_0.1-8
## [58] memoise_2.0.1 bslib_0.7.0 Rcpp_1.0.12
## [61] xfun_0.44 pkgconfig_2.0.3