A brief overview of the tidySpatialExperiment package - demonstrating the SpatialExperiment-tibble abstraction, compatibility with the tidyverse ecosystem, compatibility with the tidyomics ecosystem and a few helpful utility functions.
tidySpatialExperiment 1.0.0
Resources to help you get started with tidySpatialExperiment and tidyomics:
The tidyomics ecosystem includes packages for:
Working with genomic features:
Working with transcriptomic features:
Working with cytometry features:
tidySpatialExperiment provides a bridge between the SpatialExperiment [@righelli2022spatialexperiment] package and the tidyverse [@wickham2019welcome] ecosystem. It creates an invisible layer that allows you to interact with a SpatialExperiment object as if it were a tibble; enabling the use of functions from dplyr, tidyr, ggplot2 and plotly. But, underneath, your data remains a SpatialExperiment object.
tidySpatialExperiment also provides five additional utility functions.
Package | Functions available |
---|---|
SpatialExperiment |
All |
dplyr |
arrange ,bind_rows , bind_cols , distinct , filter , group_by , summarise , select , mutate , rename , left_join , right_join , inner_join , slice , sample_n , sample_frac , count , add_count |
tidyr |
nest , unnest , unite , separate , extract , pivot_longer |
ggplot2 |
ggplot |
plotly |
plot_ly |
Utility | Description |
---|---|
as_tibble |
Convert cell data to a tbl_df |
join_features |
Append feature data to cell data |
aggregate_cells |
Aggregate cell-feature abundance into a pseudobulk SummarizedExperiment object |
rectangle |
Select rectangular region of space |
ellipse |
Select elliptical region of space |
You can install the stable version of tidySpatialExperiment from Bioconductor with:
if (!requireNamespace("BiocManager", quietly=TRUE))
install.packages("BiocManager")
BiocManager::install("tidySpatialExperiment")
You can install the development version of tidySpatialExperiment from GitHub with:
if (!requireNamespace("devtools", quietly=TRUE))
install.packages("devtools")
devtools::install_github("william-hutchison/tidySpatialExperiment")
Here, we attach tidySpatialExperiment and an example SpatialExperiment object.
# Load example SpatialExperiment object
library(tidySpatialExperiment)
example(read10xVisium)
A SpatialExperiment object represents observations (cells) as columns
and variables (features) as rows, as is the Bioconductor convention.
Additional information about the cells is accessed through the
reducedDims
, colData
and spatialCoords
functions.
tidySpatialExperiment provides a SpatialExperiment-tibble abstraction,
representing cells as rows and features as columns, as is the
tidyverse convention. colData
and spatialCoords
are appended as
columns to the same abstraction, allowing easy interaction with this
additional data.
The default view is now of the SpatialExperiment-tibble abstraction.
spe
## # A SpatialExperiment-tibble abstraction: 99 × 7
## # [90mFeatures=50 | Cells=99 | Assays=counts[0m
## .cell in_tissue array_row array_col sample_id pxl_col_in_fullres
## <chr> <lgl> <int> <int> <chr> <int>
## 1 AAACAACGAATAGTTC-1 FALSE 0 16 section1 2312
## 2 AAACAAGTATCTCCCA-1 TRUE 50 102 section1 8230
## 3 AAACAATCTACTAGCA-1 TRUE 3 43 section1 4170
## 4 AAACACCAATAACTGC-1 TRUE 59 19 section1 2519
## 5 AAACAGAGCGACTCCT-1 TRUE 14 94 section1 7679
## 6 AAACAGCTTTCAGAAG-1 FALSE 43 9 section1 1831
## 7 AAACAGGGTCTATATT-1 FALSE 47 13 section1 2106
## 8 AAACAGTGTTCCTGGG-1 FALSE 73 43 section1 4170
## 9 AAACATGGTGAGAGGA-1 FALSE 62 0 section1 1212
## 10 AAACATTTCCCGGATT-1 FALSE 61 97 section1 7886
## # ℹ 89 more rows
## # ℹ 1 more variable: pxl_row_in_fullres <int>
However, our data maintains its status as a SpatialExperiment object. Therefore, we have access to all SpatialExperiment functions.
spe |>
colData() |>
head()
## DataFrame with 6 rows and 4 columns
## in_tissue array_row array_col sample_id
## <logical> <integer> <integer> <character>
## AAACAACGAATAGTTC-1 FALSE 0 16 section1
## AAACAAGTATCTCCCA-1 TRUE 50 102 section1
## AAACAATCTACTAGCA-1 TRUE 3 43 section1
## AAACACCAATAACTGC-1 TRUE 59 19 section1
## AAACAGAGCGACTCCT-1 TRUE 14 94 section1
## AAACAGCTTTCAGAAG-1 FALSE 43 9 section1
spe |>
spatialCoords() |>
head()
## pxl_col_in_fullres pxl_row_in_fullres
## AAACAACGAATAGTTC-1 2312 1252
## AAACAAGTATCTCCCA-1 8230 7237
## AAACAATCTACTAGCA-1 4170 1611
## AAACACCAATAACTGC-1 2519 8315
## AAACAGAGCGACTCCT-1 7679 2927
## AAACAGCTTTCAGAAG-1 1831 6400
spe |>
imgData()
## DataFrame with 2 rows and 4 columns
## sample_id image_id data scaleFactor
## <character> <character> <list> <numeric>
## 1 section1 lowres #### 0.0510334
## 2 section2 lowres #### 0.0510334
Most functions from dplyr are available for use with the
SpatialExperiment-tibble abstraction. For example, filter
can be used
to select cells by a variable of interest.
spe |>
filter(array_col < 5)
## # A SpatialExperiment-tibble abstraction: 6 × 7
## # [90mFeatures=50 | Cells=6 | Assays=counts[0m
## .cell in_tissue array_row array_col sample_id pxl_col_in_fullres
## <chr> <lgl> <int> <int> <chr> <int>
## 1 AAACATGGTGAGAGGA-1 FALSE 62 0 section1 1212
## 2 AAACGAAGATGGAGTA-1 FALSE 58 4 section1 1487
## 3 AAAGAATGACCTTAGA-1 FALSE 64 2 section1 1349
## 4 AAACATGGTGAGAGGA-1 FALSE 62 0 section2 1212
## 5 AAACGAAGATGGAGTA-1 FALSE 58 4 section2 1487
## 6 AAAGAATGACCTTAGA-1 FALSE 64 2 section2 1349
## # ℹ 1 more variable: pxl_row_in_fullres <int>
And mutate
can be used to add new variables, or modify the value of an
existing variable.
spe |>
mutate(in_region = c(in_tissue & array_row < 10))
## # A SpatialExperiment-tibble abstraction: 99 × 8
## # [90mFeatures=50 | Cells=99 | Assays=counts[0m
## .cell in_tissue array_row array_col sample_id in_region pxl_col_in_fullres
## <chr> <lgl> <int> <int> <chr> <lgl> <int>
## 1 AAACAAC… FALSE 0 16 section1 FALSE 2312
## 2 AAACAAG… TRUE 50 102 section1 FALSE 8230
## 3 AAACAAT… TRUE 3 43 section1 TRUE 4170
## 4 AAACACC… TRUE 59 19 section1 FALSE 2519
## 5 AAACAGA… TRUE 14 94 section1 FALSE 7679
## 6 AAACAGC… FALSE 43 9 section1 FALSE 1831
## 7 AAACAGG… FALSE 47 13 section1 FALSE 2106
## 8 AAACAGT… FALSE 73 43 section1 FALSE 4170
## 9 AAACATG… FALSE 62 0 section1 FALSE 1212
## 10 AAACATT… FALSE 61 97 section1 FALSE 7886
## # ℹ 89 more rows
## # ℹ 1 more variable: pxl_row_in_fullres <int>
Most functions from tidyr are also available. Here, nest
is used to
group the data by sample_id
, and unnest
is used to ungroup the data.
# Nest the SpatialExperiment object by sample_id
spe_nested <-
spe |>
nest(data = -sample_id)
# View the nested SpatialExperiment object
spe_nested
## # A tibble: 2 × 2
## sample_id data
## <chr> <list>
## 1 section1 <SptlExpr[,50]>
## 2 section2 <SptlExpr[,49]>
# Unnest the nested SpatialExperiment objects
spe_nested |>
unnest(data)
## # A SpatialExperiment-tibble abstraction: 99 × 7
## # [90mFeatures=50 | Cells=99 | Assays=counts[0m
## .cell in_tissue array_row array_col sample_id pxl_col_in_fullres
## <chr> <lgl> <int> <int> <chr> <int>
## 1 AAACAACGAATAGTTC-1 FALSE 0 16 section1 2312
## 2 AAACAAGTATCTCCCA-1 TRUE 50 102 section1 8230
## 3 AAACAATCTACTAGCA-1 TRUE 3 43 section1 4170
## 4 AAACACCAATAACTGC-1 TRUE 59 19 section1 2519
## 5 AAACAGAGCGACTCCT-1 TRUE 14 94 section1 7679
## 6 AAACAGCTTTCAGAAG-1 FALSE 43 9 section1 1831
## 7 AAACAGGGTCTATATT-1 FALSE 47 13 section1 2106
## 8 AAACAGTGTTCCTGGG-1 FALSE 73 43 section1 4170
## 9 AAACATGGTGAGAGGA-1 FALSE 62 0 section1 1212
## 10 AAACATTTCCCGGATT-1 FALSE 61 97 section1 7886
## # ℹ 89 more rows
## # ℹ 1 more variable: pxl_row_in_fullres <int>
The ggplot
function can be used to create a plot from a
SpatialExperiment object. This example also demonstrates how tidy
operations can be combined to build up more complex analysis. It should
be noted that helper functions such aes
are not included and should be
imported from ggplot2.
spe |>
filter(sample_id == "section1" & in_tissue) |>
# Add a column with the sum of feature counts per cell
mutate(count_sum = purrr::map_int(.cell, ~
spe[, .x] |>
counts() |>
sum()
)) |>
# Plot with tidySpatialExperiment and ggplot2
ggplot(ggplot2::aes(x = reorder(.cell, count_sum), y = count_sum)) +
ggplot2::geom_point() +
ggplot2::coord_flip()
The plot_ly
function can also be used to create a plot from a
SpatialExperiment object.
spe |>
filter(sample_id == "section1") |>
plot_ly(
x = ~ array_col,
y = ~ array_row,
color = ~ in_tissue,
type = "scatter"
)
Different packages from the tidyomics ecosystem are easy to use together. Here, tidygate is used to interactively gate cells based on their array location.
spe_regions <-
spe |>
filter(sample_id == "section1") |>
mutate(region = tidygate::gate_chr(array_col, array_row))
The gated cells can then be divided into pseudobulks within a
SummarizedExperiment object using tidySpatialExperiment’s
aggregate_cells
utility function.
spe_regions_aggregated <-
spe_regions |>
aggregate_cells(region)
The tidyomics ecosystem places the emphasis on interacting with cell
data. To interact with feature data, the join_feature
function can be
used to append feature values to cell data.
# Join feature data in wide format, preserving the SpatialExperiment object
spe |>
join_features(features = c("ENSMUSG00000025915", "ENSMUSG00000042501"), shape = "wide") |>
head()
## # A SpatialExperiment-tibble abstraction: 99 × 9
## # [90mFeatures=6 | Cells=99 | Assays=counts[0m
## .cell in_tissue array_row array_col sample_id ENSMUSG00000025915
## <chr> <lgl> <int> <int> <chr> <dbl>
## 1 AAACAACGAATAGTTC-1 FALSE 0 16 section1 0
## 2 AAACAAGTATCTCCCA-1 TRUE 50 102 section1 0
## 3 AAACAATCTACTAGCA-1 TRUE 3 43 section1 0
## 4 AAACACCAATAACTGC-1 TRUE 59 19 section1 0
## 5 AAACAGAGCGACTCCT-1 TRUE 14 94 section1 0
## 6 AAACAGCTTTCAGAAG-1 FALSE 43 9 section1 0
## 7 AAACAGGGTCTATATT-1 FALSE 47 13 section1 0
## 8 AAACAGTGTTCCTGGG-1 FALSE 73 43 section1 0
## 9 AAACATGGTGAGAGGA-1 FALSE 62 0 section1 0
## 10 AAACATTTCCCGGATT-1 FALSE 61 97 section1 0
## # ℹ 89 more rows
## # ℹ 3 more variables: ENSMUSG00000042501 <dbl>, pxl_col_in_fullres <int>,
## # pxl_row_in_fullres <int>
# Join feature data in long format, discarding the SpatialExperiment object
spe |>
join_features(features = c("ENSMUSG00000025915", "ENSMUSG00000042501"), shape = "long") |>
head()
## tidySpatialExperiment says: A data frame is returned for independent data analysis.
## # A tibble: 6 × 7
## .cell in_tissue array_row array_col sample_id .feature .abundance_counts
## <chr> <lgl> <int> <int> <chr> <chr> <dbl>
## 1 AAACAACGAA… FALSE 0 16 section1 ENSMUSG… 0
## 2 AAACAACGAA… FALSE 0 16 section1 ENSMUSG… 0
## 3 AAACAAGTAT… TRUE 50 102 section1 ENSMUSG… 0
## 4 AAACAAGTAT… TRUE 50 102 section1 ENSMUSG… 1
## 5 AAACAATCTA… TRUE 3 43 section1 ENSMUSG… 0
## 6 AAACAATCTA… TRUE 3 43 section1 ENSMUSG… 0
Sometimes, it is necessary to aggregate the gene-transcript abundance from a group of cells into a single value. For example, when comparing groups of cells across different samples with fixed-effect models.
Cell aggregation can be achieved using the aggregate_cells
function.
spe |>
aggregate_cells(in_tissue, assays = "counts")
## class: SummarizedExperiment
## dim: 50 2
## metadata(0):
## assays(1): counts
## rownames(50): ENSMUSG00000002459 ENSMUSG00000005886 ...
## ENSMUSG00000104217 ENSMUSG00000104328
## rowData names(1): feature
## colnames(2): FALSE TRUE
## colData names(2): in_tissue .aggregated_cells
To select cells by their geometric region in space, the ellipse
and
rectangle
functions can be used.
spe |>
filter(sample_id == "section1") |>
mutate(in_ellipse = ellipse(array_col, array_row, c(20, 40), c(20, 20))) |>
ggplot(aes(x = array_col, y = array_row, colour = in_ellipse)) +
geom_point()
Removing the .cell
column will return a tibble. This is consistent
with the behaviour in other tidyomics packages.
spe |>
select(-.cell) |>
head()
## tidySpatialExperiment says: Key columns are missing. A data frame is returned for independent data analysis.
## # A tibble: 6 × 4
## in_tissue array_row array_col sample_id
## <lgl> <int> <int> <chr>
## 1 FALSE 0 16 section1
## 2 TRUE 50 102 section1
## 3 TRUE 3 43 section1
## 4 TRUE 59 19 section1
## 5 TRUE 14 94 section1
## 6 FALSE 43 9 section1
The sample_id column cannot be removed with tidyverse functions, and
can only be modified if the changes are accepted by SpatialExperiment’s
colData
function.
# sample_id is not removed, despite the user's request
spe |>
select(-sample_id)
## # A SpatialExperiment-tibble abstraction: 99 × 7
## # [90mFeatures=50 | Cells=99 | Assays=counts[0m
## .cell in_tissue array_row array_col sample_id pxl_col_in_fullres
## <chr> <lgl> <int> <int> <chr> <int>
## 1 AAACAACGAATAGTTC-1 FALSE 0 16 section1 2312
## 2 AAACAAGTATCTCCCA-1 TRUE 50 102 section1 8230
## 3 AAACAATCTACTAGCA-1 TRUE 3 43 section1 4170
## 4 AAACACCAATAACTGC-1 TRUE 59 19 section1 2519
## 5 AAACAGAGCGACTCCT-1 TRUE 14 94 section1 7679
## 6 AAACAGCTTTCAGAAG-1 FALSE 43 9 section1 1831
## 7 AAACAGGGTCTATATT-1 FALSE 47 13 section1 2106
## 8 AAACAGTGTTCCTGGG-1 FALSE 73 43 section1 4170
## 9 AAACATGGTGAGAGGA-1 FALSE 62 0 section1 1212
## 10 AAACATTTCCCGGATT-1 FALSE 61 97 section1 7886
## # ℹ 89 more rows
## # ℹ 1 more variable: pxl_row_in_fullres <int>
# This change maintains separation of sample_ids and is permitted
spe |>
mutate(sample_id = stringr::str_c(sample_id, "_modified")) |>
head()
## # A SpatialExperiment-tibble abstraction: 99 × 7
## # [90mFeatures=6 | Cells=99 | Assays=counts[0m
## .cell in_tissue array_row array_col sample_id pxl_col_in_fullres
## <chr> <lgl> <int> <int> <chr> <int>
## 1 AAACAACGAATAGTTC-1 FALSE 0 16 section1… 2312
## 2 AAACAAGTATCTCCCA-1 TRUE 50 102 section1… 8230
## 3 AAACAATCTACTAGCA-1 TRUE 3 43 section1… 4170
## 4 AAACACCAATAACTGC-1 TRUE 59 19 section1… 2519
## 5 AAACAGAGCGACTCCT-1 TRUE 14 94 section1… 7679
## 6 AAACAGCTTTCAGAAG-1 FALSE 43 9 section1… 1831
## 7 AAACAGGGTCTATATT-1 FALSE 47 13 section1… 2106
## 8 AAACAGTGTTCCTGGG-1 FALSE 73 43 section1… 4170
## 9 AAACATGGTGAGAGGA-1 FALSE 62 0 section1… 1212
## 10 AAACATTTCCCGGATT-1 FALSE 61 97 section1… 7886
## # ℹ 89 more rows
## # ℹ 1 more variable: pxl_row_in_fullres <int>
# This change does not maintain separation of sample_ids and produces an error
spe |>
mutate(sample_id = "new_sample")
## Error in .local(x, ..., value): Number of unique 'sample_id's is 2, but 1 was provided.
The pxl_col_in_fullres
and px_row_in_fullres
columns cannot be
removed or modified with tidyverse functions. This is consistent with
the behaviour of dimension reduction data in other tidyomics packages.
# Attempting to remove pxl_col_in_fullres produces an error
spe |>
select(-pxl_col_in_fullres)
## Error in `select_helper()`:
## ! Can't select columns that don't exist.
## ✖ Column `pxl_col_in_fullres` doesn't exist.
# Attempting to modify pxl_col_in_fullres produces an error
spe |>
mutate(pxl_col_in_fullres)
## Error in `dplyr::mutate()`:
## ℹ In argument: `pxl_col_in_fullres`.
## Caused by error:
## ! object 'pxl_col_in_fullres' not found
sessionInfo()
## R version 4.4.0 beta (2024-04-15 r86425)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.4 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.19-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] tidySpatialExperiment_1.0.0 ggplot2_3.5.1
## [3] ttservice_0.4.0 tidyr_1.3.1
## [5] dplyr_1.1.4 tidySingleCellExperiment_1.14.0
## [7] SpatialExperiment_1.14.0 SingleCellExperiment_1.26.0
## [9] SummarizedExperiment_1.34.0 Biobase_2.64.0
## [11] GenomicRanges_1.56.0 GenomeInfoDb_1.40.0
## [13] IRanges_2.38.0 S4Vectors_0.42.0
## [15] BiocGenerics_0.50.0 MatrixGenerics_1.16.0
## [17] matrixStats_1.3.0 BiocStyle_2.32.0
##
## loaded via a namespace (and not attached):
## [1] rlang_1.1.3 magrittr_2.0.3
## [3] compiler_4.4.0 DelayedMatrixStats_1.26.0
## [5] vctrs_0.6.5 stringr_1.5.1
## [7] pkgconfig_2.0.3 crayon_1.5.2
## [9] fastmap_1.1.1 magick_2.8.3
## [11] XVector_0.44.0 ellipsis_0.3.2
## [13] labeling_0.4.3 scuttle_1.14.0
## [15] utf8_1.2.4 rmarkdown_2.26
## [17] UCSC.utils_1.0.0 tinytex_0.50
## [19] purrr_1.0.2 xfun_0.43
## [21] zlibbioc_1.50.0 cachem_1.0.8
## [23] beachmat_2.20.0 jsonlite_1.8.8
## [25] highr_0.10 rhdf5filters_1.16.0
## [27] DelayedArray_0.30.0 Rhdf5lib_1.26.0
## [29] BiocParallel_1.38.0 parallel_4.4.0
## [31] R6_2.5.1 bslib_0.7.0
## [33] stringi_1.8.3 limma_3.60.0
## [35] jquerylib_0.1.4 Rcpp_1.0.12
## [37] bookdown_0.39 knitr_1.46
## [39] R.utils_2.12.3 Matrix_1.7-0
## [41] tidyselect_1.2.1 abind_1.4-5
## [43] yaml_2.3.8 codetools_0.2-20
## [45] lattice_0.22-6 tibble_3.2.1
## [47] withr_3.0.0 evaluate_0.23
## [49] pillar_1.9.0 BiocManager_1.30.22
## [51] plotly_4.10.4 generics_0.1.3
## [53] sparseMatrixStats_1.16.0 munsell_0.5.1
## [55] scales_1.3.0 glue_1.7.0
## [57] lazyeval_0.2.2 tools_4.4.0
## [59] data.table_1.15.4 locfit_1.5-9.9
## [61] rhdf5_2.48.0 grid_4.4.0
## [63] DropletUtils_1.24.0 edgeR_4.2.0
## [65] colorspace_2.1-0 GenomeInfoDbData_1.2.12
## [67] HDF5Array_1.32.0 cli_3.6.2
## [69] fansi_1.0.6 S4Arrays_1.4.0
## [71] viridisLite_0.4.2 gtable_0.3.5
## [73] R.methodsS3_1.8.2 sass_0.4.9
## [75] digest_0.6.35 SparseArray_1.4.0
## [77] dqrng_0.3.2 farver_2.1.1
## [79] rjson_0.2.21 htmlwidgets_1.6.4
## [81] htmltools_0.5.8.1 R.oo_1.26.0
## [83] lifecycle_1.0.4 httr_1.4.7
## [85] statmod_1.5.0