Contents

# Loading required packages
library(Statial)
library(tidyverse)
library(SingleCellExperiment)

theme_set(theme_classic())

1 Installation

# Install the package from Bioconductor
if (!requireNamespace("BiocManager", quietly = TRUE)) {
  install.packages("BiocManager")
}
BiocManager::install("Statial")

2 Overview

There are over 37 trillion cells in the human body, each taking up different forms and functions. The behaviour of these cells can be described by canonical characteristics, but their functions can also dynamically change based on their environmental context, leading to cells with diverse states. Understanding changes in cell state and the interplay between cells is key to understanding their mechanisms of action and how they contribute to human disease. Statial is a suite of functions for identifying changes in cell state. This guide will provide a step-by-step overview of some key functions within Statial.

3 Evaluating cell localisation

Konditional is a method to evaluate the localisation relationship between two cell types in an image. Konditional builds on the L-function by contextualising the relationship between two cell types in reference to the typical spatial behaviour of a \(3^{rd}\) cell type/population. By taking this approach, Konditional is invariant to changes in the window of the image as well as tissue structures which may be present.

The definitions of cell types and cell states are somewhat ambiguous, cell types imply well defined groups of cells that serve different roles from one another, on the other hand cell states imply that cells are a dynamic entity which cannot be discretised, and thus exist in a continuum. For the purposes of using Konditional we treat cell states as identified clusters of cells, where larger clusters represent a “parent” cell population, and finer sub-clusters representing a “child” cell population. For example a CD4 T cell may be considered a child to a larger parent population of Immune cells. Konditional thus aims to see how a child population of cells deviate from the spatial behaviour of their parent population, and how that influences the localisation between the child cell state and another cell state.

4 Loading example data

In the following we will analyse cutaneous squamous cell carcinoma data from the head and neck region (Ferguson et al. 2022). These images are stored in a SingleCellExperiment object. The first step in analysing these images is to organise all the cells present into cell state populations or clusters, e.g. all the different B cell types are put in a vector called bcells.

# Load head and neck data
data("headSCE")

# Examine all cell types in image
unique(headSCE$cellType)
#>  [1] EC1       SC7       MC3       Undefined BC3       SC1       SC2      
#>  [8] EP        SC4       BC2       TC_CD4    SC5       EC2       BC1      
#> [15] SC3       MC2       GC        SC6       MC1       TC_CD8   
#> 20 Levels: BC1 BC2 BC3 EC1 EC2 EP GC MC1 MC2 MC3 SC1 SC2 SC3 SC4 SC5 ... Undefined

# Set up cell populations
tumour <- c("SC1", "SC2", "SC3", "SC4", "SC5", "SC6", "SC7")

bcells <- c("BC1", "BC2", "BC3")
tcells <- c("TC_CD4", "TC_CD8")
myeloid <- c("MC1", "MC2", "MC3")

endothelial <- c("EC1", "EC2")
epithelial <- c("EP")

tissue <- c(endothelial, epithelial)
immune <- c(bcells, tcells, myeloid, "GC") # GC = granulocytes

all <- c(tumour, tissue, immune, "Undefined")

5 Konditional

Konditional accepts a SingleCellExperiment object, or a single image, or list of images from a SingleCellExperiment object, this gets passed into the cells argument. The two cell types which will be evaluated are specified in the to and from arguments. A parent population must also be specified in the parent argument, note the parent cell population must include the to cell type. The argument r will specify the radius which the cell relationship will be evaluated on. Konditional supports parallel processing, the number of cores can be specified using the cores argument. Konditional can take a single value or multiple values for each argument and will test all combinations of the arguments specified.

Here we test the relationship between CD4 T cells (TC_CD4) and one type of Tumour cell (SC1) with all immune cells being our parent population. The output is a data frame with the original L-function values and konditional values as well as the values for the arguments specified. Where a negative original or konditional value represents dispersion between TC_CD4 and SC1 and a positive value indicates localisation.

CD4_Konditional <- Konditional(
  cells = headSCE,
  r = 50,
  from = "TC_CD4",
  to = "SC1",
  parent = immune,
  cores = 1
)

head(CD4_Konditional)
#>   imageID        test  original konditional  r weightQuantile inhom  edge
#> 1       1 TC_CD4__SC1 -4.990157 -4.87036888 50            0.8  TRUE FALSE
#> 2       5 TC_CD4__SC1 -5.122654 -0.02746783 50            0.8  TRUE FALSE
#> 3       6 TC_CD4__SC1 -7.945885 -4.75753347 50            0.8  TRUE FALSE
#> 4       7 TC_CD4__SC1 -1.301764  1.84603274 50            0.8  TRUE FALSE
#> 5       8 TC_CD4__SC1  4.951783  3.33973263 50            0.8  TRUE FALSE
#>   includeZeroCells window window.length
#> 1             TRUE convex            NA
#> 2             TRUE convex            NA
#> 3             TRUE convex            NA
#> 4             TRUE convex            NA
#> 5             TRUE convex            NA

The original and konditional values can be compared in a scatter plot to identify relationships which have been identified differently between the functions. The dashed red line represents the expected pattern of the data, i.e. where there exists no relationship between the two cell types. The top left quadrant therefore represents values where Konditional has identified localisation between CD4 and SC1 and the original L-function has identified dispersion, the opposite is true for the bottom right quadrant.


ggplot(CD4_Konditional, aes(x = original, y = konditional, col = imageID)) +
  geom_point(size = 2) +
  geom_hline(yintercept = 0, col = "red", linetype = "dashed") +
  geom_vline(xintercept = 0, col = "red", linetype = "dashed") +
  theme(
    axis.title.x = element_text(size = 14),
    axis.text.x = element_text(size = 10),
    axis.text.y = element_text(size = 10),
    axis.title.y = element_text(size = 14)
  )

Alternatively all pairwise cell relationships and their corresponding parent in the dataset can be tested. A data frame with all pairwise combinations can be creating using the parentCombinations function. This function takes in a vector of all the cells, as well as all the parent vectors set up earlier. As shown below the output is a data frame specifying the to, from, and parent arguments for Konditional.

# Get all relationships between cell types and their parents
parentDf <- parentCombinations(
  all = all,
  tumour,
  bcells,
  tcells,
  myeloid,
  endothelial,
  epithelial,
  tissue,
  immune
)
head(parentDf)
#>   from  to       parent parent_name
#> 1  BC1 SC1 BC1, BC2....      bcells
#> 2  BC1 SC2 BC1, BC2....      bcells
#> 3  BC1 SC3 BC1, BC2....      bcells
#> 4  BC1 SC4 BC1, BC2....      bcells
#> 5  BC1 SC5 BC1, BC2....      bcells
#> 6  BC1 SC6 BC1, BC2....      bcells

Rather than specifying to, from, and parent in Konditional, the output from parentCombinations can be inputed into Konditional using the parentDf argument. Here we will test the first 20 pairwise cell relationships in parentCombinations on the first image of the dataset.

# Selecting Image 1 as an example
image_1 <- headSCE |>
  colData() |>
  data.frame() |>
  filter(imageID == "1")


image1_Konditional <- Konditional(
  cells = image_1,
  parentDf = parentDf[1:20, ],
  r = 50,
  cores = 1
)

head(image1_Konditional)
#>   imageID             test   original konditional  r weightQuantile inhom  edge
#> 1       1 BC1__SC1__bcells  0.5812206    4.630450 50            0.8  TRUE FALSE
#> 2       1 BC1__SC2__bcells -7.1248136   -8.571246 50            0.8  TRUE FALSE
#> 3       1 BC1__SC3__bcells -9.0872476  -10.469486 50            0.8  TRUE FALSE
#> 4       1 BC1__SC4__bcells  1.3391851   -1.813705 50            0.8  TRUE FALSE
#> 5       1 BC1__SC5__bcells  6.9660272    5.263392 50            0.8  TRUE FALSE
#> 6       1 BC1__SC6__bcells -8.5564596   -4.657076 50            0.8  TRUE FALSE
#>   includeZeroCells window window.length
#> 1             TRUE convex            NA
#> 2             TRUE convex            NA
#> 3             TRUE convex            NA
#> 4             TRUE convex            NA
#> 5             TRUE convex            NA
#> 6             TRUE convex            NA

6 RsCurve

Here we examine an image were the relationship between 2 cell types depends on a parent cell population. This is a breast cancer image obtained from Keren et al, 2018 from patient 6. In the image below we can see that p53 and Immune are dispersed. However when the behaviour of p53 is placed in the context of the spatial behaviour of its parent population Keratin+Tumour, p53 and Immune now appear localised.

data("kerenImage")

kerenImage |>
  filter(cellType %in% c("Keratin+Tumour", "Immune", "p53")) |>
  arrange(cellType) |>
  ggplot(aes(x = x, y = y, color = cellType)) +
  geom_point(size = 1) +
  scale_colour_manual(values = c("#505050", "#D6D6D6", "#64BC46"))

The rsCurve function plots the L-function value and Konditional values over a range of radii. If the points lie above the red line (expected pattern) then localisation is indicated for that radius, if the points lie below the red line then dispersion is indicated. As seen in the following plot Konditional is able to correctly identify localisation between p53 and Immune in the example image for a certain range of radii. When the radius gets too large the overall relationship between p53 and Immune looks dispersed. The original L-function is not able to identify localisation at any value of radii.

rsDf <- rsCurve(
  cells = kerenImage,
  from = "p53",
  to = "Immune",
  parent = c("p53", "Keratin+Tumour"),
  rs = seq(10, 510, 100),
  cores = 1
)


ggplotRs(rsDf)
#> Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
#> parametric, : Chernobyl! trL>n 6

#> Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
#> parametric, : Chernobyl! trL>n 6
#> Warning in sqrt(sum.squares/one.delta): NaNs produced
#> Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
#> parametric, : Chernobyl! trL>n 6

#> Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
#> parametric, : Chernobyl! trL>n 6
#> Warning in sqrt(sum.squares/one.delta): NaNs produced

7 References

Ferguson, A. L., Sharman, A. R., Allen, R. O., Ye, T., Lee, J. H., Low, T.-H. H., Ch’ng, S., Palme, C. E., Ashford, B., Ranson, M., Clark, J. R., Patrick, E., Gupta, R., & Palendira, U. (2022). High-Dimensional and Spatial Analysis Reveals Immune Landscape–Dependent Progression in Cutaneous Squamous Cell Carcinoma. Clinical Cancer Research, OF1-OF12. (DOI)

Keren, L., Bosse, M., Marquez, D., Angoshtari, R., Jain, S., Varma, S., Yang, S. R., Kurian, A., Van Valen, D., West, R., Bendall, S. C., & Angelo, M. (2018). A Structured Tumor-Immune Microenvironment in Triple Negative Breast Cancer Revealed by Multiplexed Ion Beam Imaging. Cell, 174(6), 1373-1387.e1319. (DOI)

8 sessionInfo

sessionInfo()
#> R version 4.2.2 (2022-10-31)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 20.04.5 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.16-bioc/R/lib/libRblas.so
#> LAPACK: /home/biocbuild/bbs-3.16-bioc/R/lib/libRlapack.so
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB              LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#>  [1] SingleCellExperiment_1.20.0 SummarizedExperiment_1.28.0
#>  [3] Biobase_2.58.0              GenomicRanges_1.50.1       
#>  [5] GenomeInfoDb_1.34.3         IRanges_2.32.0             
#>  [7] S4Vectors_0.36.0            BiocGenerics_0.44.0        
#>  [9] MatrixGenerics_1.10.0       matrixStats_0.62.0         
#> [11] forcats_0.5.2               stringr_1.4.1              
#> [13] dplyr_1.0.10                purrr_0.3.5                
#> [15] readr_2.1.3                 tidyr_1.2.1                
#> [17] tibble_3.1.8                ggplot2_3.4.0              
#> [19] tidyverse_1.3.2             Statial_1.0.1              
#> [21] BiocStyle_2.26.0           
#> 
#> loaded via a namespace (and not attached):
#>  [1] googledrive_2.0.0      colorspace_2.0-3       deldir_1.0-6          
#>  [4] ellipsis_0.3.2         XVector_0.38.0         fftwtools_0.9-11      
#>  [7] fs_1.5.2               spatstat.data_3.0-0    farver_2.1.1          
#> [10] fansi_1.0.3            lubridate_1.9.0        xml2_1.3.3            
#> [13] splines_4.2.2          codetools_0.2-18       cachem_1.0.6          
#> [16] knitr_1.40             polyclip_1.10-4        jsonlite_1.8.3        
#> [19] broom_1.0.1            dbplyr_2.2.1           spatstat.sparse_3.0-0 
#> [22] BiocManager_1.30.19    compiler_4.2.2         httr_1.4.4            
#> [25] backports_1.4.1        assertthat_0.2.1       Matrix_1.5-3          
#> [28] fastmap_1.1.0          gargle_1.2.1           cli_3.4.1             
#> [31] htmltools_0.5.3        tools_4.2.2            gtable_0.3.1          
#> [34] glue_1.6.2             GenomeInfoDbData_1.2.9 Rcpp_1.0.9            
#> [37] cellranger_1.1.0       jquerylib_0.1.4        vctrs_0.5.1           
#> [40] spatstat.explore_3.0-5 nlme_3.1-160           spatstat.random_3.0-1 
#> [43] xfun_0.35              rvest_1.0.3            timechange_0.1.1      
#> [46] lifecycle_1.0.3        googlesheets4_1.0.1    goftest_1.2-3         
#> [49] zlibbioc_1.44.0        scales_1.2.1           hms_1.1.2             
#> [52] spatstat.utils_3.0-1   parallel_4.2.2         yaml_2.3.6            
#> [55] sass_0.4.2             stringi_1.7.8          highr_0.9             
#> [58] BiocParallel_1.32.1    rlang_1.0.6            pkgconfig_2.0.3       
#> [61] bitops_1.0-7           evaluate_0.18          lattice_0.20-45       
#> [64] tensor_1.5             labeling_0.4.2         tidyselect_1.2.0      
#> [67] magrittr_2.0.3         bookdown_0.30          R6_2.5.1              
#> [70] magick_2.7.3           generics_0.1.3         DelayedArray_0.24.0   
#> [73] DBI_1.1.3              mgcv_1.8-41            pillar_1.8.1          
#> [76] haven_2.5.1            withr_2.5.0            abind_1.4-5           
#> [79] RCurl_1.98-1.9         modelr_0.1.10          crayon_1.5.2          
#> [82] utf8_1.2.2             spatstat.geom_3.0-3    tzdb_0.3.0            
#> [85] rmarkdown_2.18         grid_4.2.2             readxl_1.4.1          
#> [88] data.table_1.14.6      reprex_2.0.2           digest_0.6.30         
#> [91] munsell_0.5.0          concaveman_1.1.0       bslib_0.4.1