1 Version Info
2 Introduction
- 2.1 Motivation for submitting to Bioconductor
3 Installation
4 Loading R packages
5 Global paramaters
6 Context
7 Read in images
8 Load the clinical data
- 8.1 Put the clinical data into the colData of SingleCellExperiment
9 SimpleSeg: Segment the cells in the images
10 Summarise cell features.
11 Normalise data
12 FuseSOM: Cluster cells into cell types
13 Test For association between the proportion of each cell type and progression status
14 spicyR: test spatial relationships
15 lisaClust: Find cellular neighbourhoods
16 ClassifyR: Classification
- 16.1 Visualise cross-validated prediction performance
17 Summary
18 sessionInfo

1 Version Info

R version: R version 4.3.2 Patched (2023-11-13 r85521)
Bioconductor version: 3.18

2 Introduction

Understanding the interplay between different types of cells and their immediate environment is critical for understanding the mechanisms of cells themselves and their function in the context of human diseases. Recent advances in high dimensional in situ cytometry technologies have fundamentally revolutionised our ability to observe these complex cellular relationships providing an unprecedented characterisation of cellular heterogeneity in a tissue environment.

2.1 Motivation for submitting to Bioconductor

We have developed an analytical framework for analysing data from high dimensional in situ cytometry assays including CODEX, CycIF, IMC and High Definition Spatial Transcriptomics. Implemented in R, this framework makes use of functionality from our Bioconductor packages spicyR, lisaClust, treekoR, FuseSOM, simpleSeg and ClassifyR. Below we will provide an overview of key steps which are needed to interrogate the comprehensive spatial information generated by these exciting new technologies including cell segmentation, feature normalisation, cell type identification, micro-environment characterisation, spatial hypothesis testing and patient classification. Ultimately, our modular analysis framework provides a cohesive and accessible entry point into spatially resolved single cell data analysis for any R-based bioinformaticians.

3 Installation

To install the current release of spicyWorkflow, run the following code.

if (!require("BiocManager", quietly = TRUE)) {
  install.packages("BiocManager")
}

BiocManager::install("spicyWorkflow")

4 Loading R packages

library(cytomapper)
library(dplyr)
library(ggplot2)
library(simpleSeg)
library(FuseSOM)
library(ggpubr)
library(scater)
library(spicyR)
library(ClassifyR)
library(lisaClust)
library(tidySingleCellExperiment)

5 Global paramaters

It is convenient to set the number of cores for running code in parallel. Please choose a number that is appropriate for your resources. Set the use_mc flag to TRUE if you would like to use parallel processing for the rest of the vignette. A minimum of 2 cores is suggested since running this workflow is rather computationally intensive.

use_mc <- FALSE

if (use_mc) {
  nCores <- max(parallel::detectCores() - 1, 1)
} else {
  nCores <- 2
}
BPPARAM <- simpleSeg:::generateBPParam(nCores)

theme_set(theme_classic())

6 Context

In the following we will re-analyse some MIBI-TOF data (Risom et al, 2022) profiling the spatial landscape of ductal carcinoma in situ (DCIS), which is a pre-invasive lesion that is thought to be a precursor to invasive breast cancer (IBC). The key conclusion of this manuscript (amongst others) is that spatial information about cells can be used to predict disease progression in patients. We will use our spicy workflow to make a similar conclusion.

The R code for this analysis is available on github https://github.com/SydneyBioX/spicyWorkflow. A mildly processed version of the data used in the manuscript is available in this repository.

7 Read in images

The images are stored in the images folder within the data folder. Here we use loadImages() from the cytomapper package to load all the tiff images into a CytoImageList object and store the images as h5 file on-disk.

pathToImages <- system.file("extdata/images", package = "spicyWorkflow")

# Store images in a CytoImageList on_disk as h5 files to save memory.
images <- cytomapper::loadImages(
  pathToImages,
  single_channel = TRUE,
  on_disk = TRUE,
  h5FilesPath = HDF5Array::getHDF5DumpDir(),
  BPPARAM = BPPARAM
)
gc()

##            used  (Mb) gc trigger   (Mb) max used  (Mb)
## Ncells 11765888 628.4   19627788 1048.3 13602602 726.5
## Vcells 19522645 149.0   33427113  255.1 23091038 176.2

8 Load the clinical data

To associate features in our image with disease progression, it is important to read in information which links image identifiers to their progression status. We will do this here, making sure that our imageID match. ## Read the clinical data

# Read in clinical data, manipulate imageID and select columns
clinical <- read.csv(
  system.file(
    "extdata/1-s2.0-S0092867421014860-mmc1.csv",
    package = "spicyWorkflow"
  )
)

clinical <- clinical |>
  mutate(imageID = paste0(
    "Point", PointNumber, "_pt", Patient_ID, "_", TMAD_Patient
  ))

image_idx <- grep("normal", clinical$Tissue_Type)
clinical$imageID[image_idx] <- paste0(clinical$imageID[image_idx], "_Normal")

clinicalVariables <- c(
  "imageID", "Patient_ID", "Status", "Age", "SUBTYPE", "PAM50", "Treatment",
  "DCIS_grade", "Necrosis"
)
rownames(clinical) <- clinical$imageID

8.1 Put the clinical data into the colData of SingleCellExperiment

We can then store the clinical information in the mcols of the CytoImageList.

# Add the clinical data to mcols of images.
mcols(images) <- clinical[names(images), clinicalVariables]

9 SimpleSeg: Segment the cells in the images

Our simpleSeg R package on https://github.com/SydneyBioX/simpleSeg provides a series of functions to generate simple segmentation masks of images. These functions leverage the functionality of the EBImage package on Bioconductor. For more flexibility when performing your segmentation in R we recommend learning to use the EBimage package. A key strength of the simpleSeg package is that we have coded multiple ways to perform some simple segmentation operations as well as incorporating multiple automatic procedures to optimise some key parameters when these aren’t specified.

9.1 Run simpleSeg

If your images are stored in a list or CytoImageList they can be segmented with a simple call to simpleSeg(). Here we have ask simpleSeg to do multiple things. First, we would like to use a combination of principal component analysis of all channels guided by the H33 channel to summarise the nuclei signal in the images. Secondly, to estimate the cell body of the cells we will simply dilate out from the nuclei by 2 pixels. We have also requested that the channels be square root transformed and that a minimum cell size of 40 pixels be used as a size selection step.

# Generate segmentation masks
masks <- simpleSeg(
  images,
  nucleus = c("HH3"),
  cellBody = "dilate",
  transform = "sqrt",
  sizeSelection = 40,
  discSize = 2,
  pca = TRUE,
  cores = nCores
)

9.2 Visualise separation

The display and colorLabels functions in EBImage make it very easy to examine the performance of the cell segmentation. The great thing about display is that if used in an interactive session it is very easy to zoom in and out of the image.

# Visualise segmentation performance one way.
EBImage::display(colorLabels(masks[[1]]))

9.3 Visualise outlines

The plotPixels function in cytomapper make it easy to overlay the masks on top of the intensities of 6 markers. Here we can see that the segmentation appears to be performing reasonably.

# Visualise segmentation performance another way.
cytomapper::plotPixels(
  image = images[1],
  mask = masks[1],
  img_id = "imageID",
  colour_by = c("PanKRT", "GLUT1", "HH3", "CD3", "CD20"),
  display = "single",
  colour = list(
    HH3 = c("black", "blue"),
    CD3 = c("black", "purple"),
    CD20 = c("black", "green"),
    GLUT1 = c("black", "red"),
    PanKRT = c("black", "yellow")
  ),
  bcg = list(
    HH3 = c(0, 1, 1.5),
    CD3 = c(0, 1, 1.5),
    CD20 = c(0, 1, 1.5),
    GLUT1 = c(0, 1, 1.5),
    PanKRT = c(0, 1, 1.5)
  ),
  legend = NULL
)

10 Summarise cell features.

In order to characterise the phenotypes of each of the segmented cells, measureObjects from cytomapper will calculate the average intensity of each channel within each cell as well as a few morphological features. The channel intensities will be stored in the counts assay in a SingleCellExperiment. Information on the spatial location of each cell is stored in colData in the m.cx and m.cy columns. In addition to this, it will propagate the information we have store in the mcols of our CytoImageList in the colData of the resulting SingleCellExperiment.

# Summarise the expression of each marker in each cell
cells <- cytomapper::measureObjects(
  masks,
  images,
  img_id = "imageID",
  BPPARAM = BPPARAM
)

cells <- cells |> filter(!imageID %in% c('Point6103_pt1008_20624', 'Point6203_pt1107_31568', 'Point6206_pt1197_31571', 'Point6201_pt1027_20597'))

11 Normalise data

We should check to see if the marker intensities of each cell require some form of transformation or normalisation. Here we extract the intensities from the counts assay. Looking at CK7 which should be expressed in the majority of the tumour cells, the intensities are clearly very skewed.

# Extract marker data and bind with information about images
df <- as.data.frame(cbind(colData(cells), t(assay(cells, "counts"))))

# Plots densities of CK7 for each image.
ggplot(df, aes(x = CK7, colour = imageID)) +
  geom_density() +
  theme(legend.position = "none")

We can transform and normalise our data using the normalizeCells function. Here we have taken the intensities from the counts assay, performed a square root transform, then for each image trimmed the 99 quantile and min-max scaled to 0-1. This modified data is then stored in the norm assay by default. We can see that this normalised data appears more bimodal, not perfectly, but likely to a sufficient degree for clustering.

# Transform and normalise the marker expression of each cell type.
# Use a square root transform, then trimmed the 99 quantile
cells <- normalizeCells(cells,
  transformation = "asinh",
  method = c("trim99", "minMax", "PC1"),
  assayIn = "counts",
  cores = nCores
)

# Extract normalised marker information.
norm_df <- as.data.frame(cbind(colData(cells), t(assay(cells, "norm"))))


# Plots densities of normalised CK7 for each image.
ggplot(norm_df, aes(x = CK7, colour = imageID)) +
  geom_density() +
  theme(legend.position = "none")

12 FuseSOM: Cluster cells into cell types

Our FuseSOM R package on https://github.com/ecool50/FuseSOM and provides a pipeline for the clustering of highly multiplexed in situ imaging cytometry assays. This pipeline uses the Self Organising Map architecture coupled with Multiview hierarchical clustering and provides functions for the estimation of the number of clusters.

Here we cluster using the runFuseSOM function. We have chosen to specify the same subset of markers used in the original manuscript for gating cell types. We have also specified the number of clusters to identify to be numClusters = 24. In addition to this, while FuseSOM can automatically estimate a grid size for the self organising map.

12.1 Perform the clustering

# The markers used in the original publication to gate cell types.
useMarkers <- c(
  "PanKRT", "ECAD", "CK7", "VIM", "FAP", "CD31", "CK5", "SMA",
  "CD45", "CD4", "CD3", "CD8", "CD20", "CD68", "CD14", "CD11c",
  "HLADRDPDQ", "MPO", "Tryptase"
)

# Set seed.
set.seed(51773)

# Generate SOM and cluster cells into 20 groups.
cells <- runFuseSOM(
  cells,
  markers = useMarkers,
  assay = "norm",
  numClusters = 24
)

12.2 Attempt to interpret the phenotype of each cluster

We can begin the process of understanding what each of these cell clusters are by using the plotGroupedHeatmap function from scater. At the least, here we can see we capture all the major immune populations that we expect to see.

# Visualise marker expression in each cluster.
scater::plotGroupedHeatmap(
  cells,
  features = useMarkers,
  group = "clusters",
  exprs_values = "norm",
  center = TRUE,
  scale = TRUE,
  zlim = c(-3, 3),
  cluster_rows = FALSE,
  block = "clusters"
)

12.3 Check how many clusters should be used.

We can check to see how reasonable our choice of 24 clusters is using the estimateNumCluster and the optiPlot functions. Here we examine the Gap method, others such as Silhouette and Within Cluster Distance are also available.

As we can be seen below, we chose the second elbow point as the optimal number of clusters.

# Generate metrics for estimating the number of clusters.
# As I've already run runFuseSOM I don't need to run generateSOM().
cells <- estimateNumCluster(cells, kSeq = 2:30)
optiPlot(cells, method = "gap")

12.4 Check cluster frequencies

We find it always useful to check the number of cells in each cluster. Here we can see that cluster 4 is contains lots of (most likely tumour) cells and cluster 16 contains very few cells.

# Check cluster frequencies.
colData(cells)$clusters |>
  table() |>
  sort()

## 
## cluster_19 cluster_20 cluster_11  cluster_3 cluster_16  cluster_6  cluster_9 
##        205        456        486        549        646        724        738 
## cluster_24 cluster_15  cluster_2  cluster_7 cluster_10  cluster_8 cluster_17 
##        794        812        839        893       1018       1073       1075 
##  cluster_1 cluster_22 cluster_13 cluster_12 cluster_14 cluster_21 cluster_23 
##       1223       1435       1928       1932       2368       2676       4384 
## cluster_18  cluster_4  cluster_5 
##       7327      14912      29804

12.5 Dimension reduction

As our data is stored in a SingleCellExperiment we can also use scater to perform and visualise our data in a lower dimension to look for cluster differences.

set.seed(51773)
# Perform dimension reduction using UMP.
cells <- scater::runUMAP(
  cells,
  subset_row = useMarkers,
  exprs_values = "norm"
)

# Select a subset of images to plot.
someImages <- unique(colData(cells)$imageID)[c(1, 10, 20, 40, 50, 60)]


# UMAP by cell type cluster.
scater::plotReducedDim(
  cells[, colData(cells)$imageID %in% someImages],
  dimred = "UMAP",
  colour_by = "clusters"
)

13 Test For association between the proportion of each cell type and progression status

We recommend using a package such as diffcyt for testing for changes in abundance of cell types. However, the colTest function allows us to quickly test for associations between the proportions of the cell types and progression status using either Wilcoxon rank sum tests or t-tests. Here we see a p-value less than 0.05, but this does not equate to a small FDR.

# Select cells which belong to individuals with progressor status.
cellsToUse <- cells$Status %in% c("nonprogressor", "progressor")

# Perform simple wicoxon rank sum tests on the columns of the proportion matrix.
testProp <- colTest(cells[, cellsToUse],
  condition = "Status",
  feature = "clusters"
)

testProp

##            mean in group nonprogressor mean in group progressor tval.t  pval
## cluster_2                       0.0087                   0.0130 -1.800 0.082
## cluster_24                      0.0083                   0.0030  1.700 0.100
## cluster_7                       0.0120                   0.0092  1.600 0.120
## cluster_17                      0.0150                   0.0120  1.400 0.160
## cluster_14                      0.0350                   0.0250  1.300 0.220
## cluster_19                      0.0027                   0.0035 -1.100 0.300
## cluster_12                      0.0240                   0.0190  0.950 0.360
## cluster_18                      0.0960                   0.1300 -0.950 0.360
## cluster_20                      0.0062                   0.0050  0.860 0.390
## cluster_3                       0.0064                   0.0082 -0.780 0.450
## cluster_1                       0.0170                   0.0220 -0.750 0.460
## cluster_11                      0.0066                   0.0049  0.720 0.480
## cluster_10                      0.0120                   0.0150 -0.690 0.500
## cluster_23                      0.0530                   0.0420  0.620 0.550
## cluster_22                      0.0210                   0.0180  0.570 0.580
## cluster_21                      0.0310                   0.0370 -0.560 0.590
## cluster_6                       0.0094                   0.0160 -0.540 0.600
## cluster_5                       0.3900                   0.3700  0.420 0.680
## cluster_8                       0.0130                   0.0120  0.350 0.730
## cluster_13                      0.0150                   0.0140  0.290 0.780
## cluster_9                       0.0082                   0.0075  0.240 0.810
## cluster_15                      0.0120                   0.0130 -0.210 0.830
## cluster_16                      0.0086                   0.0084  0.085 0.930
## cluster_4                       0.1900                   0.1900 -0.071 0.940
##            adjPval    cluster
## cluster_2     0.85  cluster_2
## cluster_24    0.85 cluster_24
## cluster_7     0.85  cluster_7
## cluster_17    0.85 cluster_17
## cluster_14    0.85 cluster_14
## cluster_19    0.85 cluster_19
## cluster_12    0.85 cluster_12
## cluster_18    0.85 cluster_18
## cluster_20    0.85 cluster_20
## cluster_3     0.85  cluster_3
## cluster_1     0.85  cluster_1
## cluster_11    0.85 cluster_11
## cluster_10    0.85 cluster_10
## cluster_23    0.85 cluster_23
## cluster_22    0.85 cluster_22
## cluster_21    0.85 cluster_21
## cluster_6     0.85  cluster_6
## cluster_5     0.91  cluster_5
## cluster_8     0.91  cluster_8
## cluster_13    0.91 cluster_13
## cluster_9     0.91  cluster_9
## cluster_15    0.91 cluster_15
## cluster_16    0.94 cluster_16
## cluster_4     0.94  cluster_4

imagesToUse <- rownames(clinical)[clinical[, "Status"] %in% c("nonprogressor", "progressor")]

prop <- getProp(cells, feature = "clusters")
clusterToUse <- rownames(testProp)[1]

boxplot(prop[imagesToUse, clusterToUse] ~ clinical[imagesToUse, "Status"])

14 spicyR: test spatial relationships

Our spicyR package (https://www.bioconductor.org/packages/devel/bioc/html/spicyR.html)[https://www.bioconductor.org/packages/devel/bioc/html/spicyR.html] provides a series of functions to aid in the analysis of both immunofluorescence and mass cytometry imaging data as well as other assays that can deeply phenotype individual cells and their spatial location. Here we use the spicy function to test for changes in the spatial relationships between pair-wise combinations of cells. We quantify spatial relationships using a combination of three radii Rs = c(20, 50, 100) and mildly account for some global tissue structure using sigma = 50.

# Test for changes in pair-wise spatial relationships between cell types.
spicyTest <- spicy(
  cells[, cellsToUse],
  condition = "Status",
  cellType = "clusters",
  imageID = "imageID",
  spatialCoords = c("m.cx", "m.cy"),
  Rs = c(20, 50, 100),
  sigma = 50,
  BPPARAM = BPPARAM
)

topPairs(spicyTest, n = 10)

##                        intercept coefficient      p.value adj.pvalue       from
## cluster_2__cluster_4   -86.74796    97.22394 0.0004843381  0.2789787  cluster_2
## cluster_4__cluster_2   -80.47576    84.43728 0.0027174743  0.7216127  cluster_4
## cluster_15__cluster_6  -60.40356   366.43688 0.0056848267  0.7216127 cluster_15
## cluster_6__cluster_15  -63.15415   436.64319 0.0080157862  0.7216127  cluster_6
## cluster_10__cluster_11  72.27788   240.86667 0.0094002153  0.7216127 cluster_10
## cluster_15__cluster_4  -94.09710    48.74326 0.0121264342  0.7216127 cluster_15
## cluster_10__cluster_10 233.66256   220.18695 0.0121327499  0.7216127 cluster_10
## cluster_10__cluster_2  -18.30472   144.79724 0.0163493985  0.7216127 cluster_10
## cluster_2__cluster_10  -18.29444   151.09170 0.0169948663  0.7216127  cluster_2
## cluster_2__cluster_22  -90.00492    87.44517 0.0170390197  0.7216127  cluster_2
##                                to
## cluster_2__cluster_4    cluster_4
## cluster_4__cluster_2    cluster_2
## cluster_15__cluster_6   cluster_6
## cluster_6__cluster_15  cluster_15
## cluster_10__cluster_11 cluster_11
## cluster_15__cluster_4   cluster_4
## cluster_10__cluster_10 cluster_10
## cluster_10__cluster_2   cluster_2
## cluster_2__cluster_10  cluster_10
## cluster_2__cluster_22  cluster_22

We can visualise these tests using signifPlot where we observe that cell type pairs appear to become less attractive (or avoid more) in the progression sample.

# Visualise which relationships are changing the most.
signifPlot(
  spicyTest,
  breaks = c(-1.5, 3, 0.5)
)

15 lisaClust: Find cellular neighbourhoods

Our lisaClust package (https://www.bioconductor.org/packages/devel/bioc/html/lisaClust.html)[https://www.bioconductor.org/packages/devel/bioc/html/lisaClust.html] provides a series of functions to identify and visualise regions of tissue where spatial associations between cell-types is similar. This package can be used to provide a high-level summary of cell-type co-localisation in multiplexed imaging data that has been segmented at a single-cell resolution. Here we use the lisaClust function to clusters cells into 5 regions with distinct spatial ordering.

set.seed(51773)

# Cluster cells into spatial regions with similar composition.
cells <- lisaClust(
  cells,
  k = 5,
  Rs = c(20, 50, 100),
  sigma = 50,
  spatialCoords = c("m.cx", "m.cy"),
  cellType = "clusters",
  BPPARAM = BPPARAM
)

15.1 Region - cell type enrichment heatmap

We can try to interpret which spatial orderings the regions are quantifying using the regionMap function. This plots the frequency of each cell type in a region relative to what you would expect by chance.

# Visualise the enrichment of each cell type in each region
regionMap(cells, cellType = "clusters", limit = c(0.2, 5))

15.2 Visualise regions

By default, these identified regions are stored in the regions column in the colData of our object. We can quickly examine the spatial arrangement of these regions using ggplot.

# Extract cell information and filter to specific image.
df <- colData(cells) |>
  as.data.frame() |>
  filter(imageID == "Point2206_pt1116_31620")

# Colour cells by their region.
ggplot(df, aes(x = m.cx, y = m.cy, colour = region)) +
  geom_point()

While much slower, we have also implemented a function for overlaying the region information as a hatching pattern so that the information can be viewed simultaneously with the cell type calls.

# Use hatching to visualise regions and cell types.
hatchingPlot(
  cells,
  useImages = "Point2206_pt1116_31620",
  cellType = "clusters",
  spatialCoords = c("m.cx", "m.cy")
)

This plot is a ggplot object and so the scale can be modified with scale_region_manual.

# Use hatching to visualise regions and cell types.
# Relabel the hatching of the regions.
hatchingPlot(
  cells,
  useImages = "Point2206_pt1116_31620",
  cellType = "clusters",
  spatialCoords = c("m.cx", "m.cy"),
  window = "square",
  nbp = 300,
  line.spacing = 41
) +

  scale_region_manual(values = c(
    region_1 = 2,
    region_2 = 1,
    region_3 = 5,
    region_4 = 4,
    region_5 = 3
  )) +

  guides(colour = guide_legend(ncol = 2))

15.3 Test for association with progression

If needed, we can again quickly use the colTest function to test for associations between the proportions of the cells in each region and progression status using either Wilcoxon rank sum tests or t-tests. Here we see an adjusted p-value less than 0.05.

# Test if the proportion of each region is associated
# with progression status.
testRegion <- colTest(
  cells[, cellsToUse],
  feature = "region",
  condition = "Status"
)

testRegion

##          mean in group nonprogressor mean in group progressor tval.t pval
## region_2                       0.240                    0.270  -1.30 0.20
## region_1                       0.320                    0.270   0.97 0.35
## region_3                       0.240                    0.250  -0.41 0.69
## region_4                       0.072                    0.066   0.41 0.69
## region_5                       0.130                    0.140  -0.20 0.84
##          adjPval  cluster
## region_2    0.84 region_2
## region_1    0.84 region_1
## region_3    0.84 region_3
## region_4    0.84 region_4
## region_5    0.84 region_5

16 ClassifyR: Classification

Our ClassifyR package, https://github.com/SydneyBioX/ClassifyR, formalises a convenient framework for evaluating classification in R. We provide functionality to easily include four key modelling stages; Data transformation, feature selection, classifier training and prediction; into a cross-validation loop. Here we use the crossValidate function to perform 100 repeats of 5-fold cross-validation to evaluate the performance of an elastic net model applied to three quantification of our MIBI-TOF data; cell type proportions, average mean of each cell type and region proportions.

# Create list to store data.frames
data <- list()

# Add proportions of each cell type in each image
data[["props"]] <- getProp(cells, "clusters")

# Add pair-wise associations
data[["dist"]] <- getPairwise(
  cells,
  spatialCoords = c("m.cx", "m.cy"),
  cellType = "clusters",
  Rs = c(20, 50, 100),
  sigma = 50,
  BPPARAM = BPPARAM
)
data[["dist"]] <- as.data.frame(data[["dist"]])


# Add proportions of each region in each image
# to the list of dataframes.
data[["regions"]] <- getProp(cells, "region")

# Subset data images with progression status and NA clinical variables.
measurements <- lapply(data, function(x) x[imagesToUse, ])

# Set seed
set.seed(51773)

# Perform cross-validation of an elastic net model
# with 100 repeats of 5-fold cross-validation.
cv <- crossValidate(
  measurements = measurements,
  outcome = clinical[imagesToUse, "Status"],
  classifier = "GLM",
  nFolds = 5,
  nRepeats = 100,
  nCores = nCores
)

16.1 Visualise cross-validated prediction performance

Here we use the performancePlot function to assess the AUC from each repeat of the 5-fold cross-validation. We see that the lisaClust regions appear to capture information which is predictive of progression status of the patients.

# Calculate AUC for each cross-validation repeat and plot.
performancePlot(
  cv,
  metric = "AUC",
  characteristicsList = list(x = "Assay Name")
)

17 Summary

Here we have used a pipeline of our spatial analysis R packages to demonstrate an easy way to segment, cluster, normalise, quantify and classify high dimensional in situ cytometry data all within R.

18 sessionInfo

sessionInfo()

## R version 4.3.2 Patched (2023-11-13 r85521)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.3 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.18-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] ttservice_0.4.0                 tidyr_1.3.1                    
##  [3] tidySingleCellExperiment_1.12.0 lisaClust_1.10.1               
##  [5] ClassifyR_3.6.3                 survival_3.5-7                 
##  [7] BiocParallel_1.36.0             MultiAssayExperiment_1.28.0    
##  [9] generics_0.1.3                  spicyR_1.14.3                  
## [11] scater_1.30.1                   scuttle_1.12.0                 
## [13] ggpubr_0.6.0                    FuseSOM_1.4.0                  
## [15] simpleSeg_1.4.1                 ggplot2_3.4.4                  
## [17] dplyr_1.1.4                     cytomapper_1.14.0              
## [19] SingleCellExperiment_1.24.0     SummarizedExperiment_1.32.0    
## [21] Biobase_2.62.0                  GenomicRanges_1.54.1           
## [23] GenomeInfoDb_1.38.5             IRanges_2.36.0                 
## [25] S4Vectors_0.40.2                BiocGenerics_0.48.1            
## [27] MatrixGenerics_1.14.0           matrixStats_1.2.0              
## [29] EBImage_4.44.0                  BiocStyle_2.30.0               
## 
## loaded via a namespace (and not attached):
##   [1] fs_1.6.3                  spatstat.sparse_3.0-3    
##   [3] bitops_1.0-7              httr_1.4.7               
##   [5] RColorBrewer_1.1-3        prabclus_2.3-3           
##   [7] DataVisualizations_1.3.2  numDeriv_2016.8-1.1      
##   [9] tools_4.3.2               backports_1.4.1          
##  [11] utf8_1.2.4                R6_2.5.1                 
##  [13] vegan_2.6-4               HDF5Array_1.30.0         
##  [15] uwot_0.1.16               lazyeval_0.2.2           
##  [17] mgcv_1.9-1                rhdf5filters_1.14.1      
##  [19] permute_0.9-7             withr_3.0.0              
##  [21] sp_2.1-2                  analogue_0.17-6          
##  [23] gridExtra_2.3             cli_3.6.2                
##  [25] spatstat.explore_3.2-5    profileModel_0.6.1       
##  [27] labeling_0.4.3            sass_0.4.8               
##  [29] diptest_0.77-0            robustbase_0.99-1        
##  [31] brglm_0.7.2               nnls_1.5                 
##  [33] spatstat.data_3.0-4       genefilter_1.84.0        
##  [35] proxy_0.4-27              systemfonts_1.0.5        
##  [37] yulab.utils_0.1.3         ggupset_0.3.0            
##  [39] svglite_2.1.3             RSQLite_2.3.5            
##  [41] gridGraphics_0.5-1        spatstat.random_3.2-2    
##  [43] car_3.1-2                 scam_1.2-14              
##  [45] Matrix_1.6-5              ggbeeswarm_0.7.2         
##  [47] fansi_1.0.6               abind_1.4-5              
##  [49] terra_1.7-65              lifecycle_1.0.4          
##  [51] yaml_2.3.8                carData_3.0-5            
##  [53] rhdf5_2.46.1              SparseArray_1.2.3        
##  [55] blob_1.2.4                grid_4.3.2               
##  [57] promises_1.2.1            crayon_1.5.2             
##  [59] shinydashboard_0.7.2      lattice_0.22-5           
##  [61] cowplot_1.1.3             beachmat_2.18.0          
##  [63] annotate_1.80.0           KEGGREST_1.42.0          
##  [65] magick_2.8.2              pillar_1.9.0             
##  [67] knitr_1.45                rjson_0.2.21             
##  [69] boot_1.3-28.1             fpc_2.2-11               
##  [71] codetools_0.2-19          glue_1.7.0               
##  [73] FCPS_1.3.4                data.table_1.14.10       
##  [75] vctrs_0.6.5               png_0.1-8                
##  [77] gtable_0.3.4              kernlab_0.9-32           
##  [79] cachem_1.0.8              xfun_0.41                
##  [81] princurve_2.1.6           S4Arrays_1.2.0           
##  [83] mime_0.12                 coop_0.6-3               
##  [85] pheatmap_1.0.12           ellipsis_0.3.2           
##  [87] nlme_3.1-164              bit64_4.0.5              
##  [89] RcppAnnoy_0.0.22          bslib_0.6.1              
##  [91] irlba_2.3.5.1             svgPanZoom_0.3.4         
##  [93] vipor_0.4.7               DBI_1.2.1                
##  [95] colorspace_2.1-0          raster_3.6-26            
##  [97] nnet_7.3-19               mnormt_2.1.1             
##  [99] tidyselect_1.2.0          bit_4.0.5                
## [101] compiler_4.3.2            BiocNeighbors_1.20.2     
## [103] DelayedArray_0.28.0       plotly_4.10.4            
## [105] bookdown_0.37             scales_1.3.0             
## [107] DEoptimR_1.1-3            psych_2.4.1              
## [109] tiff_0.1-12               stringr_1.5.1            
## [111] SpatialExperiment_1.12.0  digest_0.6.34            
## [113] goftest_1.2-3             fftwtools_0.9-11         
## [115] spatstat.utils_3.0-4      minqa_1.2.6              
## [117] rmarkdown_2.25            XVector_0.42.0           
## [119] htmltools_0.5.7           pkgconfig_2.0.3          
## [121] jpeg_0.1-10               lme4_1.1-35.1            
## [123] sparseMatrixStats_1.14.0  highr_0.10               
## [125] fastmap_1.1.1             rlang_1.1.3              
## [127] htmlwidgets_1.6.4         shiny_1.8.0              
## [129] DelayedMatrixStats_1.24.0 farver_2.1.1             
## [131] jquerylib_0.1.4           jsonlite_1.8.8           
## [133] mclust_6.0.1              BiocSingular_1.18.0      
## [135] RCurl_1.98-1.14           magrittr_2.0.3           
## [137] modeltools_0.2-23         GenomeInfoDbData_1.2.11  
## [139] ggplotify_0.1.2           Rhdf5lib_1.24.1          
## [141] munsell_0.5.0             Rcpp_1.0.12              
## [143] viridis_0.6.4             stringi_1.8.3            
## [145] zlibbioc_1.48.0           MASS_7.3-60.0.1          
## [147] plyr_1.8.9                flexmix_2.3-19           
## [149] parallel_4.3.2            ggrepel_0.9.5            
## [151] deldir_2.0-2              Biostrings_2.70.1        
## [153] splines_4.3.2             tensor_1.5               
## [155] locfit_1.5-9.8            fastcluster_1.2.6        
## [157] spatstat.geom_3.2-7       ggsignif_0.6.4           
## [159] reshape2_1.4.4            ScaledMatrix_1.10.0      
## [161] XML_3.99-0.16.1           evaluate_0.23            
## [163] BiocManager_1.30.22       nloptr_2.0.3             
## [165] tweenr_2.0.2              httpuv_1.6.13            
## [167] purrr_1.0.2               polyclip_1.10-6          
## [169] ggforce_0.4.1             rsvd_1.0.5               
## [171] broom_1.0.5               xtable_1.8-4             
## [173] rstatix_0.7.2             later_1.3.2              
## [175] viridisLite_0.4.2         class_7.3-22             
## [177] tibble_3.2.1              lmerTest_3.1-3           
## [179] AnnotationDbi_1.64.1      memoise_2.0.1            
## [181] beeswarm_0.4.0            cluster_2.1.6            
## [183] concaveman_1.1.0

Performing a spatial analysis of multiplexed tissue imaging data.

27 July, 2022

Contents