Contents

1 Installation

The SpatialExperiment package is available via Bioconductor.

if(!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("SpatialExperiment")

Load the package as follows:

library(SpatialExperiment)

2 The SpatialExperiment class

The SpatialExperiment class is designed to represent spatially resolved transcriptomics (ST) data. It inherits from the SingleCellExperiment class and is used in the same manner. In addition, the class supports storage of spatial information via spatialData and spatialCoords, and storage of images via imgData.

3 spatialData and spatialCoords

The SpatialExperiment class constructor is defined with several arguments to provide maximum flexibility to the user.

In particular, we distinguish between spatialData and spatialCoords as follows:

When building a SpatialExperiment object, the columns of spatial coordinates from the spatialData DFrame are identified via the spatialCoordsNames argument and stored separately as a numeric matrix within spatialCoords.

Following is an example of spatialCoords built via spatialData and spatialCoordsNames.

cd <- DataFrame(x=1:26, y=1:26, z=letters)
mat <- matrix(nrow=26, ncol=26)

spe <- SpatialExperiment(assay=mat, 
    spatialData=cd, 
    spatialCoordsNames=c("x", "y"))

head(spatialCoords(spe))
##      x y
## [1,] 1 1
## [2,] 2 2
## [3,] 3 3
## [4,] 4 4
## [5,] 5 5
## [6,] 6 6
spatialCoordsNames(spe)
## [1] "x" "y"
head(spatialData(spe))
## DataFrame with 6 rows and 1 column
##             z
##   <character>
## 1           a
## 2           b
## 3           c
## 4           d
## 5           e
## 6           f
spatialDataNames(spe)
## [1] "z"
head(colData(spe))
## DataFrame with 6 rows and 2 columns
##     sample_id           z
##   <character> <character>
## 1    sample01           a
## 2    sample01           b
## 3    sample01           c
## 4    sample01           d
## 5    sample01           e
## 6    sample01           f

It is also possible to display the combined spatial information with spatialData() using the spatialCoords=TRUE argument:

spatialData(spe, spatialCoords=TRUE)
## DataFrame with 26 rows and 3 columns
##               z         x         y
##     <character> <integer> <integer>
## 1             a         1         1
## 2             b         2         2
## 3             c         3         3
## 4             d         4         4
## 5             e         5         5
## ...         ...       ...       ...
## 22            v        22        22
## 23            w        23        23
## 24            x        24        24
## 25            y        25        25
## 26            z        26        26

Alternatively, it is possible to define the spatialDataNames to define the spatialData DFrame from the columns of colData.

cd <- DataFrame(x=1:26, y=1:26, z=letters)
mat <- matrix(nrow=26, ncol=26)

spe <- SpatialExperiment(
  assay=mat, colData=cd, spatialCoordsNames=c("x", "y"), spatialDataNames="z")

head(spatialData(spe))
## DataFrame with 6 rows and 1 column
##             z
##   <character>
## 1           a
## 2           b
## 3           c
## 4           d
## 5           e
## 6           f
head(spatialCoords(spe))
##      x y
## [1,] 1 1
## [2,] 2 2
## [3,] 3 3
## [4,] 4 4
## [5,] 5 5
## [6,] 6 6
head(colData(spe))
## DataFrame with 6 rows and 2 columns
##             z   sample_id
##   <character> <character>
## 1           a    sample01
## 2           b    sample01
## 3           c    sample01
## 4           d    sample01
## 5           e    sample01
## 6           f    sample01

Also, it is possible to load a numeric matrix of coordinates with the spatialCoords argument.

y <- diag(n <- 10)
mat <- matrix(0, n, m <- 2)

spe <- SpatialExperiment(assays = y, spatialCoords = mat)

Finally, it is possible to set spatialData, spatialCoords, and colData separately.

mat <- as.matrix(cd[,1:2])
colnames(mat) <- c("ecs","uai")
spad <- DataFrame(a=1:26, b=1:26, z=letters)
asy <- matrix(nrow=26, ncol=26)
spe <- SpatialExperiment(assays = asy, spatialCoords = mat, 
    spatialData=spad, colData=cd)

head(spatialData(spe))
## DataFrame with 6 rows and 3 columns
##           a         b           z
##   <integer> <integer> <character>
## 1         1         1           a
## 2         2         2           b
## 3         3         3           c
## 4         4         4           d
## 5         5         5           e
## 6         6         6           f
head(spatialCoords(spe))
##      ecs uai
## [1,]   1   1
## [2,]   2   2
## [3,]   3   3
## [4,]   4   4
## [5,]   5   5
## [6,]   6   6
head(colData(spe))
## DataFrame with 6 rows and 7 columns
##           x         y           z   sample_id         a         b           z
##   <integer> <integer> <character> <character> <integer> <integer> <character>
## 1         1         1           a    sample01         1         1           a
## 2         2         2           b    sample01         2         2           b
## 3         3         3           c    sample01         3         3           c
## 4         4         4           d    sample01         4         4           d
## 5         5         5           e    sample01         5         5           e
## 6         6         6           f    sample01         6         6           f

4 Working with multiple samples

To work with multiple samples, the SpatialExperiment class provides the cbind method, which assumes unique sample_id(s) are provided for each sample.

In case the sample_id(s) are duplicated across multiple samples, the cbind method takes care of this by appending indices to create unique sample identifiers.

spe1 <- spe2 <- spe
spe3 <- cbind(spe1, spe2)
## 'sample_id's are duplicated across 'SpatialExperiment' objects to cbind; appending sample indices.
unique(spe3$sample_id)
## [1] "sample01.1" "sample01.2"

Otherwise, it is possible to create unique sample_id(s) as follows.

# make sample identifiers unique
spe1 <- spe2 <- spe
spe1$sample_id <- paste(spe1$sample_id, "sample1", sep = ".")
spe2$sample_id <- paste(spe2$sample_id, "sample2", sep = ".")

# combine into single object
spe3 <- cbind(spe1, spe2)

spe3
## class: SpatialExperiment 
## dim: 26 52 
## metadata(0):
## assays(1): ''
## rownames: NULL
## rowData names(0):
## colnames: NULL
## colData names(4): x y sample_id z.1
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## spatialData names(3) : a b z
## spatialCoords names(2) : ecs uai
## imgData names(1): sample_id

5 Subsetting a SpatialExperiment object

Subsetting objects is automatically defined to synchronize across all attributes of the objects, as for any other Bioconductor Experiment class.

For example, it is possible to subset by sample_id as follows:

spe3[, colData(spe)$sample_id=="sample1.sample1"]
## class: SpatialExperiment 
## dim: 26 0 
## metadata(0):
## assays(1): ''
## rownames: NULL
## rowData names(0):
## colnames: NULL
## colData names(4): x y sample_id z.1
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## spatialData names(3) : a b z
## spatialCoords names(2) : ecs uai
## imgData names(1): sample_id

6 sample_id requires one-to-one mapping replacement

In particular, when trying to replace the sample_id(s) of a SpatialExperiment object, these must map uniquely with the already existing ones, otherwise an error is returned.

new <- spe3$sample_id; 
new[1] <- "sample1.sample2"
spe3$sample_id <- new
## Error in .local(x, ..., value): Number of unique 'sample_id's is 2, but 3 were provided.
new[1] <- "third.one.of.two"
spe3$sample_id <- new
## Error in .local(x, ..., value): Number of unique 'sample_id's is 2, but 3 were provided.

7 Spot-based ST data (e.g. 10x Genomics Visium)

When working with spot-based ST data, such as 10x Genomics Visium or other platforms providing images, it is possible to store the image information in the dedicated imgData structure.

Also, the SpatialExperiment class stores a sample_id value in the spatialData structure, which is possible to set with the sample_id argument (default is “sample_01”).

Here we show how to load the default Space Ranger data files from a 10x Genomics Visium experiment, and build a SpatialExperiment object.

In particular, the readImgData() function is used to build an imgData DataFrame to be passed to the SpatialExperiment constructor. The sample_id used to build the imgData object must be the same one used to build the SpatialExperiment object, otherwise an error is returned.

dir <- system.file(
   file.path("extdata", "10xVisium", "section1"),
   package = "SpatialExperiment")

# read in counts
fnm <- file.path(dir, "raw_feature_bc_matrix")
sce <- DropletUtils::read10xCounts(fnm)

# read in image data
img <- readImgData(
    path = file.path(dir, "spatial"),
    sample_id="foo")

# read in spatial coordinates
fnm <- file.path(dir, "spatial", "tissue_positions_list.csv")
xyz <- read.csv(fnm, header = FALSE,
    col.names = c(
        "barcode", "in_tissue", "array_row", "array_col",
        "pxl_row_in_fullres", "pxl_col_in_fullres"))

# construct observation & feature metadata
rd <- S4Vectors::DataFrame(
    symbol = rowData(sce)$Symbol)

# construct 'SpatialExperiment'
(spe <- SpatialExperiment(
    assays = list(counts = assay(sce)),
    colData = colData(sce), rowData = rd, imgData = img,
    spatialData=DataFrame(xyz),
    spatialCoordsNames=c("pxl_col_in_fullres", "pxl_row_in_fullres"),
    sample_id="foo"))
## class: SpatialExperiment 
## dim: 50 50 
## metadata(0):
## assays(1): counts
## rownames(50): ENSMUSG00000051951 ENSMUSG00000089699 ...
##   ENSMUSG00000005886 ENSMUSG00000101476
## rowData names(1): symbol
## colnames: NULL
## colData names(3): Sample Barcode sample_id
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## spatialData names(4) : barcode in_tissue array_row array_col
## spatialCoords names(2) : pxl_col_in_fullres pxl_row_in_fullres
## imgData names(4): sample_id image_id data scaleFactor

Alternatively, the read10xVisium() function facilitates the import of 10x Genomics Visium data to handle one or more samples organized in folders reflecting the default Space Ranger folder tree organization:

sample
|—outs
· · |—raw/filtered_feature_bc_matrix.h5
· · |—raw/filtered_feature_bc_matrix
· · · · |—barcodes.tsv
· · · · |—features.tsv
· · · · |—matrix.mtx
· · |—spatial
· · · · |—scalefactors_json.json
· · · · |—tissue_lowres_image.png
· · · · |—tissue_positions_list.csv

dir <- system.file(
    file.path("extdata", "10xVisium"),
    package = "SpatialExperiment")

sample_ids <- c("section1", "section2")
samples <- file.path(dir, sample_ids)

(spe <- read10xVisium(samples, sample_ids,
    type = "sparse", data = "raw",
    images = "lowres", load = FALSE))
## class: SpatialExperiment 
## dim: 50 99 
## metadata(0):
## assays(1): counts
## rownames(50): ENSMUSG00000051951 ENSMUSG00000089699 ...
##   ENSMUSG00000005886 ENSMUSG00000101476
## rowData names(1): symbol
## colnames(99): AAACAACGAATAGTTC-1 AAACAAGTATCTCCCA-1 ...
##   AAAGTCGACCCTCAGT-1 AAAGTGCCATCAATTA-1
## colData names(1): sample_id
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## spatialData names(3) : in_tissue array_row array_col
## spatialCoords names(2) : pxl_col_in_fullres pxl_row_in_fullres
## imgData names(4): sample_id image_id data scaleFactor

8 Molecule-based ST data

To demonstrate how to accommodate molecule-based ST data (e.g. seqFISH platform) inside a SpatialExperiment object, we generate some mock data of molecule coordinates across and cells. These should be formatted into a data.frame where each row corresponds to a molecule, and columns specify the xy-position as well as which gene/cell the molecule has been assigned to:

# sample xy-coordinates in [0,1]
x <- runif(n)
y <- runif(n)
# assign each molecule to some gene-cell pair
gs <- paste0("gene", seq(ng))
cs <- paste0("cell", seq(nc))
gene <- sample(gs, n, TRUE)
cell <- sample(cs, n, TRUE)
# construct data.frame of molecule coodinates
df <- data.frame(gene, cell, x, y)
head(df)
##     gene   cell         x           y
## 1  gene2  cell1 0.8808741 0.320652463
## 2 gene13  cell3 0.6192341 0.543930311
## 3 gene35 cell16 0.8563139 0.249875277
## 4 gene12 cell16 0.1921838 0.740945357
## 5 gene49  cell1 0.8762683 0.001373172
## 6 gene40  cell4 0.2871124 0.811706067

Next, it is possible to re-shape the above table into a BumpyMatrix using splitAsBumpyMatrix(), which takes as input the xy-coordinates, as well as arguments specifying the row and column index of each observation:

# (assure gene & cell are factor so that
# missing observations aren't dropped)
df$gene <- factor(df$gene, gs)
df$cell <- factor(df$cell, cs)
# construct BumpyMatrix
library(BumpyMatrix)

mol <- splitAsBumpyMatrix(
    df[, c("x", "y")], 
    row = gs, col = cs)

Finally, it is possible to construct a SpatialExperiment object with two data slots: - The counts assay stores the number of molecules per gene and cell (equivalent to transcript counts in spot-based data). - The molecules assay holds the spatial molecule positions (xy-coordinates). Here, each entry is a DFrame that contains the positions of all molecules from a given gene that have been assigned to a given cell.

# get count matrix
y <- with(df, table(gene, cell))
y <- as.matrix(unclass(y))
y[1:5, 1:5]
##        cell
## gene    cell1 cell2 cell3 cell4 cell5
##   gene1     2     0     0     1     0
##   gene2     3     2     4     1     0
##   gene3     1     1     0     1     3
##   gene4     1     0     1     1     1
##   gene5     2     1     1     0     0
# construct SpatialExperiment
spe <- SpatialExperiment(
    assays = list(
        counts = y, 
        molecules = mol))
spe
## class: SpatialExperiment 
## dim: 50 20 
## metadata(0):
## assays(2): counts molecules
## rownames(50): gene1 gene2 ... gene49 gene50
## rowData names(0):
## colnames(20): cell1 cell2 ... cell19 cell20
## colData names(1): sample_id
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## spatialData names(0) :
## spatialCoords names(0) :
## imgData names(0):

The BumpyMatrix of molecule locations can be accessed using the dedicated molecules() accessor:

molecules(spe)
## 50 x 20 BumpyDataFrameMatrix
## rownames: gene1 gene2 ... gene49 gene50 
## colnames: cell1 cell2 ... cell19 cell20 
## preview [1,1]:
##   DataFrame with 20 rows and 2 columns
##                x         y
##        <numeric> <numeric>
##   1    0.8808741  0.320652
##   2    0.0260193  0.469372
##   3    0.8526963  0.294299
##   4    0.0341913  0.918395
##   5    0.3358630  0.748199
##   ...        ...       ...
##   16  0.38597319  0.799529
##   17  0.50525161  0.116668
##   18  0.62342685  0.925884
##   19  0.00431502  0.625905
##   20  0.34600773  0.944289

9 Session Info

sessionInfo()
## R version 4.1.0 (2021-05-18)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.2 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.13-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.13-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] parallel  stats4    stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] BumpyMatrix_1.0.0           SpatialExperiment_1.2.1    
##  [3] SingleCellExperiment_1.14.1 SummarizedExperiment_1.22.0
##  [5] Biobase_2.52.0              GenomicRanges_1.44.0       
##  [7] GenomeInfoDb_1.28.0         IRanges_2.26.0             
##  [9] S4Vectors_0.30.0            BiocGenerics_0.38.0        
## [11] MatrixGenerics_1.4.0        matrixStats_0.59.0         
## [13] BiocStyle_2.20.0           
## 
## loaded via a namespace (and not attached):
##  [1] locfit_1.5-9.4            xfun_0.23                
##  [3] bslib_0.2.5.1             beachmat_2.8.0           
##  [5] HDF5Array_1.20.0          lattice_0.20-44          
##  [7] rhdf5_2.36.0              htmltools_0.5.1.1        
##  [9] yaml_2.2.1                rlang_0.4.11             
## [11] R.oo_1.24.0               jquerylib_0.1.4          
## [13] scuttle_1.2.0             R.utils_2.10.1           
## [15] BiocParallel_1.26.0       dqrng_0.3.0              
## [17] GenomeInfoDbData_1.2.6    stringr_1.4.0            
## [19] zlibbioc_1.38.0           R.methodsS3_1.8.1        
## [21] codetools_0.2-18          evaluate_0.14            
## [23] knitr_1.33                Rcpp_1.0.6               
## [25] edgeR_3.34.0              formatR_1.11             
## [27] BiocManager_1.30.15       limma_3.48.0             
## [29] DelayedArray_0.18.0       magick_2.7.2             
## [31] jsonlite_1.7.2            XVector_0.32.0           
## [33] rjson_0.2.20              digest_0.6.27            
## [35] stringi_1.6.2             bookdown_0.22            
## [37] grid_4.1.0                tools_4.1.0              
## [39] bitops_1.0-7              rhdf5filters_1.4.0       
## [41] magrittr_2.0.1            sass_0.4.0               
## [43] RCurl_1.98-1.3            Matrix_1.3-4             
## [45] DelayedMatrixStats_1.14.0 sparseMatrixStats_1.4.0  
## [47] rmarkdown_2.8             Rhdf5lib_1.14.1          
## [49] R6_2.5.0                  DropletUtils_1.12.1      
## [51] compiler_4.1.0