Contents

1 Version Info

R version: R version 4.3.2 Patched (2023-11-13 r85521)
Bioconductor version: 3.18

2 Introduction

Understanding the interplay between different types of cells and their immediate environment is critical for understanding the mechanisms of cells themselves and their function in the context of human diseases. Recent advances in high dimensional in situ cytometry technologies have fundamentally revolutionised our ability to observe these complex cellular relationships providing an unprecedented characterisation of cellular heterogeneity in a tissue environment.

2.1 Motivation for submitting to Bioconductor

We have developed an analytical framework for analysing data from high dimensional in situ cytometry assays including CODEX, CycIF, IMC and High Definition Spatial Transcriptomics. Implemented in R, this framework makes use of functionality from our Bioconductor packages spicyR, lisaClust, treekoR, FuseSOM, simpleSeg and ClassifyR. Below we will provide an overview of key steps which are needed to interrogate the comprehensive spatial information generated by these exciting new technologies including cell segmentation, feature normalisation, cell type identification, micro-environment characterisation, spatial hypothesis testing and patient classification. Ultimately, our modular analysis framework provides a cohesive and accessible entry point into spatially resolved single cell data analysis for any R-based bioinformaticians.

3 Installation

To install the current release of spicyWorkflow, run the following code.

if (!require("BiocManager", quietly = TRUE)) {
  install.packages("BiocManager")
}

BiocManager::install("spicyWorkflow")

4 Loading R packages

library(cytomapper)
library(dplyr)
library(ggplot2)
library(simpleSeg)
library(FuseSOM)
library(ggpubr)
library(scater)
library(spicyR)
library(ClassifyR)
library(lisaClust)
library(tidySingleCellExperiment)

5 Global paramaters

It is convenient to set the number of cores for running code in parallel. Please choose a number that is appropriate for your resources. Set the use_mc flag to TRUE if you would like to use parallel processing for the rest of the vignette. A minimum of 2 cores is suggested since running this workflow is rather computationally intensive.

use_mc <- FALSE

if (use_mc) {
  nCores <- max(parallel::detectCores() - 1, 1)
} else {
  nCores <- 2
}
BPPARAM <- simpleSeg:::generateBPParam(nCores)

theme_set(theme_classic())

6 Context

In the following we will re-analyse some MIBI-TOF data (Risom et al, 2022) profiling the spatial landscape of ductal carcinoma in situ (DCIS), which is a pre-invasive lesion that is thought to be a precursor to invasive breast cancer (IBC). The key conclusion of this manuscript (amongst others) is that spatial information about cells can be used to predict disease progression in patients. We will use our spicy workflow to make a similar conclusion.

The R code for this analysis is available on github https://github.com/SydneyBioX/spicyWorkflow. A mildly processed version of the data used in the manuscript is available in this repository.

7 Read in images

The images are stored in the images folder within the data folder. Here we use loadImages() from the cytomapper package to load all the tiff images into a CytoImageList object and store the images as h5 file on-disk.

pathToImages <- system.file("extdata/images", package = "spicyWorkflow")

# Store images in a CytoImageList on_disk as h5 files to save memory.
images <- cytomapper::loadImages(
  pathToImages,
  single_channel = TRUE,
  on_disk = TRUE,
  h5FilesPath = HDF5Array::getHDF5DumpDir(),
  BPPARAM = BPPARAM
)
gc()
##            used  (Mb) gc trigger   (Mb) max used  (Mb)
## Ncells 11765888 628.4   19627788 1048.3 13602602 726.5
## Vcells 19522645 149.0   33427113  255.1 23091038 176.2

8 Load the clinical data

To associate features in our image with disease progression, it is important to read in information which links image identifiers to their progression status. We will do this here, making sure that our imageID match. ## Read the clinical data

# Read in clinical data, manipulate imageID and select columns
clinical <- read.csv(
  system.file(
    "extdata/1-s2.0-S0092867421014860-mmc1.csv",
    package = "spicyWorkflow"
  )
)

clinical <- clinical |>
  mutate(imageID = paste0(
    "Point", PointNumber, "_pt", Patient_ID, "_", TMAD_Patient
  ))

image_idx <- grep("normal", clinical$Tissue_Type)
clinical$imageID[image_idx] <- paste0(clinical$imageID[image_idx], "_Normal")

clinicalVariables <- c(
  "imageID", "Patient_ID", "Status", "Age", "SUBTYPE", "PAM50", "Treatment",
  "DCIS_grade", "Necrosis"
)
rownames(clinical) <- clinical$imageID

8.1 Put the clinical data into the colData of SingleCellExperiment

We can then store the clinical information in the mcols of the CytoImageList.

# Add the clinical data to mcols of images.
mcols(images) <- clinical[names(images), clinicalVariables]

9 SimpleSeg: Segment the cells in the images

Our simpleSeg R package on https://github.com/SydneyBioX/simpleSeg provides a series of functions to generate simple segmentation masks of images. These functions leverage the functionality of the EBImage package on Bioconductor. For more flexibility when performing your segmentation in R we recommend learning to use the EBimage package. A key strength of the simpleSeg package is that we have coded multiple ways to perform some simple segmentation operations as well as incorporating multiple automatic procedures to optimise some key parameters when these aren’t specified.

9.1 Run simpleSeg

If your images are stored in a list or CytoImageList they can be segmented with a simple call to simpleSeg(). Here we have ask simpleSeg to do multiple things. First, we would like to use a combination of principal component analysis of all channels guided by the H33 channel to summarise the nuclei signal in the images. Secondly, to estimate the cell body of the cells we will simply dilate out from the nuclei by 2 pixels. We have also requested that the channels be square root transformed and that a minimum cell size of 40 pixels be used as a size selection step.

# Generate segmentation masks
masks <- simpleSeg(
  images,
  nucleus = c("HH3"),
  cellBody = "dilate",
  transform = "sqrt",
  sizeSelection = 40,
  discSize = 2,
  pca = TRUE,
  cores = nCores
)

9.2 Visualise separation

The display and colorLabels functions in EBImage make it very easy to examine the performance of the cell segmentation. The great thing about display is that if used in an interactive session it is very easy to zoom in and out of the image.

# Visualise segmentation performance one way.
EBImage::display(colorLabels(masks[[1]]))

9.3 Visualise outlines

The plotPixels function in cytomapper make it easy to overlay the masks on top of the intensities of 6 markers. Here we can see that the segmentation appears to be performing reasonably.

# Visualise segmentation performance another way.
cytomapper::plotPixels(
  image = images[1],
  mask = masks[1],
  img_id = "imageID",
  colour_by = c("PanKRT", "GLUT1", "HH3", "CD3", "CD20"),
  display = "single",
  colour = list(
    HH3 = c("black", "blue"),
    CD3 = c("black", "purple"),
    CD20 = c("black", "green"),
    GLUT1 = c("black", "red"),
    PanKRT = c("black", "yellow")
  ),
  bcg = list(
    HH3 = c(0, 1, 1.5),
    CD3 = c(0, 1, 1.5),
    CD20 = c(0, 1, 1.5),
    GLUT1 = c(0, 1, 1.5),
    PanKRT = c(0, 1, 1.5)
  ),
  legend = NULL
)