Contents

1 Introduction

The goal of unsupervised analysis of mass spectrometry (MS) imaging experiments is to discover regions in the data with distinct chemical profiles, and to select the m/z-values that uniquely distinguish these different regions from each other.

Algorithmically, this means clustering the data. In imaging experiments, the resulting cluster configurations are called spatial segmentations, and the clusters are called segments.

In this vignette, we present an example segmentation workflow using Cardinal.

We begin by loading the package:

library(Cardinal)

2 Segmentation of a pig fetus wholy body cross section

This example uses the PIGII_206 dataset: a cross section of a pig fetus captured using a Thermo LTQ instrument using desorption electrospray ionization (DESI).

First, we load the dataset from the CardinalWorkflows package. The data is stored in an older format, so we need to coerce it to an MSImagingExperiment.

data(pig206, package="CardinalWorkflows")
pig206 <- as(pig206, "MSImagingExperiment")

The dataset contains 4,959 spectra with 10,200 m/z-values.

pig206
## An object of class 'MSContinuousImagingExperiment'
##   <10200 feature, 4959 pixel> imaging dataset
##     imageData(1): intensity
##     featureData(0):
##     pixelData(0):
##     run(1): PIGII_206
##     raster dimensions: 111 x 66
##     coord(2): x = 10..120, y = 1..66
##     mass range:  150.0833 to 1000.0000 
##     centroided: FALSE
Optical image of the pig fetus section

Optical image of the pig fetus section

In the optical image shown above, the brain (left), heart (center), and liver (large dark region) are clearly visible.

image(pig206, mz=885.5, plusminus=0.25)

The dataset has been cropped to remove the background slide pixels, leaving only the tissue section itself for analysis.

2.1 Pre-processing

For statistical analysis, it is useful to reduce the dataset to include only the peaks.

We calculate the mean spectrum using summarizeFeatures().

pig206_mean <- summarizeFeatures(pig206, "mean")
plot(pig206_mean)

In order to make the mass spectra comparable between different pixels, it is necessary to normalize the data. We will use TIC normalization.

Let’s calculate the TIC to see how it currently varies across the dataset in the raw, unprocessed specra.

pig206_tic <- summarizePixels(pig206, c(tic="sum"))
image(pig206_tic)

To process the dataset, we will first perform peak picking on the mean spectrum to create a set of reference peaks. We will then bin the peaks in the entire dataset to this reference.

Note that peak picking on the mean spectrum is the fastest option, but may miss low-intensity peaks or peaks that only occur in a small part of the dataset. If we wanted to be more thorough, we could use a similar procedure to perform peak picking on the entire dataset (or on a random sample of many spectra) to create the set of reference peaks.

pig206_ref <- pig206_mean %>%
  peakPick(SNR=3) %>%
  peakAlign(ref="mean",
            tolerance=0.5,
            units="mz") %>%
  peakFilter() %>%
  process()

Now we bin the rest of the dataset to the reference peaks.

pig206_peaks <- pig206 %>%
  normalize(method="tic") %>%
  peakBin(ref=mz(pig206_ref),
          tolerance=0.5,
          units="mz") %>%
  process()

pig206_peaks
## An object of class 'MSContinuousImagingExperiment'
##   <110 feature, 4959 pixel> imaging dataset
##     imageData(1): intensity
##     featureData(0):
##     pixelData(0):
##     processing complete(2): normalize peakBin
##     processing pending(0):
##     run(1): PIGII_206
##     raster dimensions: 111 x 66
##     coord(2): x = 10..120, y = 1..66
##     mass range: 157.3966 to 889.6261 
##     centroided: TRUE

This produces a centroided dataset with 110 peaks.

2.2 Visualization

Before proceeding with the statistical analysis, we’ll first perform some and exploratory visual analysis of the dataset.

2.2.1 Ion images

Below, we plot several hand-selected peaks corresponding to major organs.

m/z 187 appears highly abundant in the heart.

image(pig206_peaks, mz=187)

m/z 840 appears highly abundant in the brain and spinal cord.

image(pig206_peaks, mz=840)

m/z 537 appears highly abundant in the liver.

image(pig206_peaks, mz=537)

Rather than manually going the full dataset and hand-selecting peaks, the goal of our statistical analysis will be to automatically select the peaks that distinguish such regions (e.g., the major organs).

2.2.2 Principal components analysis (PCA)

Principal component analysis (PCA) is a popular method for exploring a dataset. PCA is available in Cardinal through the PCA() method.

Below, we calculate the first 3 principal components.

pig206_pca <- PCA(pig206_peaks, ncomp=3)

Next, we overlay the first 3 principal components.

The overlay requires some contrast enhancement to see the structures clearly. In addition, the range of the PC scores are normalized to the same range (between 0 and 1).

image(pig206_pca, contrast.enhance="histogram", normalize.image="linear")

We can plot the loadings for the principal components as well.

plot(pig206_pca, lwd=2)

PCA can sometimes be useful for exploring a dataset. For example, here, we can see that PC3 appears to distinguish the liver, but also includes several other structures. This makes it difficult to fully utilize PCA for analysis.

2.3 Segmentation with spatial shrunken centroids (SSC)

To segment the dataset and automatically select peaks that distinguish each region, we will use the spatialShrunkenCentroids() method provided by Cardinal.

Important parameters to this method include:

  • method The type of spatial weights to use:

    • “gaussian” weights use a simple Gaussian smoothing kernel

    • “adaptive” weights use an adaptive kernel that sometimes preserve edges better

  • r The neighborhood smoothing radius; this should be selected based on the size and granularity of the spatial regions in your dataset

  • s The shrinkage or sparsity parameter; the higher this number, the fewer peaks will be used to determine the final segmentation.

  • k The maximum number of segments to try; empty segments are dropped, so the resulting segmentation may use fewer than this number.

It is typically best to set k relatively high and let the algorithm drop empty segments. You typically also want to try a wide range of sparsity with the s parameter.

set.seed(1)
pig206_ssc <- spatialShrunkenCentroids(pig206_peaks, method="adaptive",
                                       r=2, s=c(0,5,10,15,20,25), k=10)
summary(pig206_ssc)
## Spatially-aware nearest shrunken centroids:
##  
##  Segmentation / clustering 
##  Method = adaptive 
##  Distance = chebyshev
##  
##   Radius (r) Init (k) Shrinkage (s) Classes Features/Class
## 1          2       10             0      10         110.00
## 2          2       10             5      10          68.90
## 3          2       10            10       9          41.56
## 4          2       10            15       6          33.00
## 5          2       10            20       6          21.33
## 6          2       10            25       6          12.67

As shown in the summary, the resulting number of segments typically decreases as s increases. This is because fewer peaks are used to determine the segmentation.

First, non-informative peaks are removed, but as s increases meaningful peaks may be removed as well. The most interesting and useful segmentations tend to be the ones with the highest value of s just before the resulting number of segments decreases too much.

2.3.1 Plotting the segmentation

Let’s plot the resulting segmentations for s = 10, 15, 20, 25.

image(pig206_ssc, model=list(s=c(10,15,20,25)))

It is useful to see how the segmentation changes as fewer peaks are used and the number of segments decreases. Noisy, less-meaningful segments tend to be removed first, so we want to explore the segmentation with the highest value of s that still captures most of the regions we would expect to see.

image(pig206_ssc, model=list(s=20))

Here, we can see the heart, brain, and liver distinguished as segments 1, 5, and 6.

2.3.2 Plotting the (shrunken) mean spectra

Plotting the shrunken centroids is analogous to plotting the mean spectrum of each segment.

plot(pig206_ssc, model=list(s=20), lwd=2)

Let’s break out the centroids for the heart, brain, and liver segments.

cols <- discrete.colors(6)
setup.layout(c(3,1))
plot(pig206_ssc, model=list(s=20), column=1, col=cols[1], lwd=2, layout=NULL)
plot(pig206_ssc, model=list(s=20), column=5, col=cols[5], lwd=2, layout=NULL)
plot(pig206_ssc, model=list(s=20), column=6, col=cols[6], lwd=2, layout=NULL)

Some differences are visible, but it can be difficult to tell exactly which peaks are changing between different segments based on the mean spectra alone.

2.3.3 Plotting and interpretting t-statistics of the m/z values

Plotting the t-statistics tells us exactly the relationship between each segment’s centroid and the global mean spectrum. The t-statistics are the difference between a segment’s centroid and the global mean, divided by a standard error.

Positive t-statistics indicate that peak is systematically higher in that segment as compared to the global mean spectrum.

Negative t-statistics indicate that peak is systematically lower in that segment as compared to the global mean spectrum.

Spatial shrunken centroids works by shrinking these t-statistics toward 0 by s, and using the new t-statistics to recompute the segment centroids. The effect is that peaks that are not very different between a specific segment and the global mean are effectively eliminated from the segmentation.

plot(pig206_ssc, model=list(s=20), values="statistic", lwd=2)

If we break out the t-statistics for the heart, brain, and liver segments we can learn something interesting.

setup.layout(c(3,1))
plot(pig206_ssc, model=list(s=20), values="statistic",
     column=1, col=cols[1], lwd=2, layout=NULL)
plot(pig206_ssc, model=list(s=20), values="statistic",
     column=5, col=cols[5], lwd=2, layout=NULL)
plot(pig206_ssc, model=list(s=20), values="statistic",
     column=6, col=cols[6], lwd=2, layout=NULL)

Very few peaks distinguish the heart, while many more distinguish the brain and liver.

2.3.4 Retrieving the top m/z-values

Use the topFeatures() method to extract the m/z values of the peaks that most distinguish each segment, ranked by t-statistic.

Peaks associated with the heart:

topFeatures(pig206_ssc, model=list(s=20), class==1)
## Top-ranked features: 
##          mz r  k  s class   centers statistic
## 1  261.4384 2 10 20     1 217.70697 33.959982
## 2  487.4983 2 10 20     1 204.94474 27.679345
## 3  263.4262 2 10 20     1  70.15717 19.205054
## 4  203.3418 2 10 20     1  41.06041 16.425219
## 5  167.3431 2 10 20     1  37.91037 12.980175
## 6  489.4989 2 10 20     1  72.09721 12.881436
## 7  214.3259 2 10 20     1 194.64263  9.787334
## 8  246.2845 2 10 20     1  30.18695  8.386133
## 9  488.5507 2 10 20     1  47.85836  7.849920
## 10 159.3753 2 10 20     1  28.50508  5.978385

Peaks associated with the brain:

topFeatures(pig206_ssc, model=list(s=20), class==5)
## Top-ranked features: 
##          mz r  k  s class   centers statistic
## 1  885.6166 2 10 20     5 164.47056  40.41442
## 2  834.4763 2 10 20     5  75.41334  35.13439
## 3  840.5077 2 10 20     5  77.75684  31.27584
## 4  838.4872 2 10 20     5  48.45794  26.76828
## 5  886.6237 2 10 20     5  77.47821  26.56689
## 6  812.5135 2 10 20     5  90.13775  25.97902
## 7  810.5023 2 10 20     5  67.40802  25.62788
## 8  305.5679 2 10 20     5  78.51007  25.39787
## 9  887.6398 2 10 20     5 163.45893  25.00990
## 10 303.5484 2 10 20     5  93.91384  20.92517

Peaks associated with the liver:

topFeatures(pig206_ssc, model=list(s=20), class==6)
## Top-ranked features: 
##          mz r  k  s class   centers statistic
## 1  215.3414 2 10 20     6 500.39801 57.594538
## 2  217.3652 2 10 20     6 201.39237 41.639237
## 3  269.3483 2 10 20     6 346.73917 39.299140
## 4  271.3883 2 10 20     6  59.92785 18.478022
## 5  166.3010 2 10 20     6  73.19267 15.020955
## 6  201.3838 2 10 20     6  53.24275  9.273022
## 7  211.3150 2 10 20     6  33.86107  9.075501
## 8  157.3966 2 10 20     6  14.36833  0.000000
## 9  159.3753 2 10 20     6  25.57561  0.000000
## 10 167.3431 2 10 20     6  31.54123  0.000000

The top m/z values for each segment match up well with the hand-selected peaks.

3 Segmentation of a cardinal painting

It can be difficult to evaluate unsupervised methods (like segmentation) on data where we do not know the ground truth.

In this section, we use an MS image of a painting, where we know the ground truth.

data(cardinal, package="CardinalWorkflows")
cardinal <- as(cardinal, "MSImagingExperiment")
Cardinal painting

Cardinal painting

In this experiment, DESI spectra were collected from an oil painting of a cardinal.

cardinal
## An object of class 'MSContinuousImagingExperiment'
##   <10800 feature, 12600 pixel> imaging dataset
##     imageData(1): intensity
##     featureData(0):
##     pixelData(0):
##     run(1): Bierbaum_demo_
##     raster dimensions: 120 x 105
##     coord(2): x = 1..120, y = 1..105
##     mass range:  100.0833 to 1000.0000 
##     centroided: FALSE

The dataset includes 12,600 spectra with 10,800 m/z values.

3.1 Pre-processing

First, we will pre-process the dataset as before, by applying peak picking to the mean spectrum.

cardinal_mean <- summarizeFeatures(cardinal, "mean")
cardinal_ref <- cardinal_mean %>%
  peakPick(SNR=3) %>%
  peakAlign(ref="mean",
            tolerance=0.5,
            units="mz") %>%
  peakFilter() %>%
  process()
cardinal_peaks <- cardinal %>%
  normalize(method="tic") %>%
  peakBin(ref=mz(cardinal_ref),
          tolerance=0.5,
          units="mz") %>%
  process()

cardinal_peaks
## An object of class 'MSContinuousImagingExperiment'
##   <106 feature, 12600 pixel> imaging dataset
##     imageData(1): intensity
##     featureData(0):
##     pixelData(0):
##     processing complete(2): normalize peakBin
##     processing pending(0):
##     run(1): Bierbaum_demo_
##     raster dimensions: 120 x 105
##     coord(2): x = 1..120, y = 1..105
##     mass range: 101.0506 to 650.1482 
##     centroided: TRUE

This results in a centroided dataset with 106 peaks.

3.2 Segmetation with SSC

Now we use spatial shrunken centroids to segment the dataset.

set.seed(1)
cardinal_ssc <- spatialShrunkenCentroids(cardinal_peaks, method="adaptive",
                                       r=2, s=c(10,20,30,40), k=10)
summary(cardinal_ssc)
## Spatially-aware nearest shrunken centroids:
##  
##  Segmentation / clustering 
##  Method = adaptive 
##  Distance = chebyshev
##  
##   Radius (r) Init (k) Shrinkage (s) Classes Features/Class
## 1          2       10            10      10          32.00
## 2          2       10            20       9          18.67
## 3          2       10            30       7          14.29
## 4          2       10            40       7           9.86
image(cardinal_ssc)

We can see that with s = 10 and s = 20, two segmments are capturing an unwanted background gradient. At s = 30, this background gradient is eliminated.

Now we can use the segmentation to re-construct the original painting.

image(cardinal_ssc, model=list(s=40),
      col=c("1"=NA, "2"="gray", "3"="black", "4"="firebrick",
            "5"="brown", "6"="darkred", "7"="red"))

Let’s find the m/z values associated with the cardinal’s body.

topFeatures(cardinal_ssc, model=list(s=40), class==7)
## Top-ranked features: 
##          mz r  k  s class    centers statistic
## 1  265.1950 2 10 40     7 1434.46497  94.39643
## 2  101.0506 2 10 40     7   42.69946   0.00000
## 3  112.9770 2 10 40     7   51.59129   0.00000
## 4  115.0566 2 10 40     7  117.74032   0.00000
## 5  120.0000 2 10 40     7   19.10327   0.00000
## 6  121.0259 2 10 40     7   14.09180   0.00000
## 7  127.0170 2 10 40     7   30.61063   0.00000
## 8  129.0556 2 10 40     7   76.83087   0.00000
## 9  133.9931 2 10 40     7   39.91965   0.00000
## 10 137.0290 2 10 40     7   49.09114   0.00000
image(cardinal, mz=207)

And let’s find the m/z values associated with the “DESI-MS” text.

topFeatures(cardinal_ssc, model=list(s=40), class==3)
## Top-ranked features: 
##          mz r  k  s class  centers statistic
## 1  255.2727 2 10 40     3 949.6528 107.36406
## 2  227.2328 2 10 40     3 457.5308  92.95730
## 3  241.2235 2 10 40     3 349.4095  77.81783
## 4  157.1185 2 10 40     3 417.4835  77.73334
## 5  283.3018 2 10 40     3 390.3660  72.24834
## 6  269.2578 2 10 40     3 194.0101  47.01588
## 7  171.1399 2 10 40     3 241.6526  31.03703
## 8  199.1874 2 10 40     3 178.9121  27.22540
## 9  256.2652 2 10 40     3 109.0887  24.00488
## 10 213.1741 2 10 40     3 105.9698  13.94293
image(cardinal, mz=649)

4 Session information

sessionInfo()
## R version 4.2.2 (2022-10-31)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.5 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.16-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.16-bioc/R/lib/libRlapack.so
## 
## Random number generation:
##  RNG:     L'Ecuyer-CMRG 
##  Normal:  Inversion 
##  Sample:  Rejection 
##  
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
## [1] CardinalWorkflows_1.30.0 Cardinal_3.0.1           S4Vectors_0.36.0        
## [4] EBImage_4.40.0           BiocParallel_1.32.1      BiocGenerics_0.44.0     
## [7] ProtGenerics_1.30.0      BiocStyle_2.26.0        
## 
## loaded via a namespace (and not attached):
##  [1] biglm_0.9-2.1       locfit_1.5-9.6      xfun_0.35          
##  [4] bslib_0.4.1         lattice_0.20-45     htmltools_0.5.3    
##  [7] viridisLite_0.4.1   yaml_2.3.6          rlang_1.0.6        
## [10] jquerylib_0.1.4     DBI_1.1.3           sp_1.5-1           
## [13] jpeg_0.1-9          stringr_1.4.1       htmlwidgets_1.5.4  
## [16] codetools_0.2-18    evaluate_0.18       Biobase_2.58.0     
## [19] knitr_1.40          fastmap_1.1.0       irlba_2.3.5.1      
## [22] parallel_4.2.2      highr_0.9           Rcpp_1.0.9         
## [25] BiocManager_1.30.19 cachem_1.0.6        magick_2.7.3       
## [28] jsonlite_1.8.3      abind_1.4-5         png_0.1-7          
## [31] digest_0.6.30       stringi_1.7.8       tiff_0.1-11        
## [34] bookdown_0.30       grid_4.2.2          cli_3.4.1          
## [37] tools_4.2.2         bitops_1.0-7        magrittr_2.0.3     
## [40] sass_0.4.2          RCurl_1.98-1.9      MASS_7.3-58.1      
## [43] Matrix_1.5-3        matter_2.0.1        rmarkdown_2.18     
## [46] R6_2.5.1            fftwtools_0.9-11    mclust_6.0.0       
## [49] signal_0.7-7        nlme_3.1-160        compiler_4.2.2