Contents

1 Introduction

Cardinal 2 provides statistical methods for both supervised and unsupervised analysis of mass spectrometry (MS) imaging experiments. Class comparison can also be performed, provided an appropriate experimental design and sample size.

Before statistical analysis, it is important to identify the statistical goal of the experiment:

CardinalWorkflows provides real experimental data and more detailed discussion of the statistical methods than will be covered in this brief overview.

2 Exploratory analysis

Suppose we are exploring an unlabeled dataset, and wish to understand the structure of the data.

set.seed(2020)
mse <- simulateImage(preset=2, npeaks=10, dim=c(20,20), sdnoise=0.5,
                    peakheight=c(2,4), representation="centroid")

design <- makeFactor(circle=mse$circle, square=mse$square,
                        bg=!(mse$circle | mse$square))

image(mse, design ~ x * y, key=TRUE)

image(mse, feature=c(1,4,7), layout=c(1,3))

2.1 Principal components analysis (PCA)

Principal components analysis is an unsupervised dimension reduction technique. It reduces the data to some number of “principal components” that are a linear combination of the original mass features, where each component is orthogonal to the last, and explains as much of the variance in the data as possible.

Use PCA() to perform PCA on a MSImagingExperiment.

pca <- PCA(mse, ncomp=3)

summary(pca)
## Principal components analysis:
##  
##  Component Standard deviation
##          1          4.7061494
##          2          2.6134145
##          3          0.6136734

We can see that the first two principal components explain most of the variation in the data.

image(pca, values="scores", superpose=FALSE, layout=c(1,3))

The loadings of the components show how each mass feature contributes to each component.

plot(pca, values="loadings", superpose=FALSE, layout=c(1,3), lwd=2)

Plotting the principal component scores against each other is a useful way of visualization the separation between data classes.

pca_scores <- DataFrame(resultData(pca, 1, "scores"))

plot(pca_scores, PC1 ~ PC2, groups=design, pch=20)

2.2 Feature colocalization

Finding other mass features colocalized with a particular image is a common task in analysis of MS imaging experiments.

Use colocalize() to find mass features that are colocalized with another image.

coloc <- colocalized(mse, mz=1023)
coloc
## Colocalized features: 
##           mz   circle   square correlation    M1    M2
## 1  1023.7081 2.011661 4.063644   1.0000000 1.000 1.000
## 2  1135.9335 2.434873 3.985370   0.9430259 0.875 0.875
## 3  1200.4653 2.219637 4.166854   0.9292093 0.865 0.865
## 4  1361.2682 0.000000 4.259568   0.6712111 0.710 0.710
## 5  1227.9380 0.000000 4.039750   0.6671688 0.675 0.675
## 6  1453.5096 0.000000 4.187344   0.6657311 0.695 0.695
## 7  1858.8985 0.000000 3.970513   0.6620943 0.705 0.705
## 8   781.2367 1.392247 0.000000   0.3891237 0.650 0.650
## 9   473.9206 2.340799 0.000000   0.3632409 0.600 0.600
## 10  788.8633 1.542205 0.000000   0.3378016 0.605 0.605

By default, Pearson correlation is used to rank the colocalized features. Manders’ colocalization coefficients (M1 and M2) are also provided.

image(mse, mz=coloc$mz[1:3], layout=c(1,3))

3 Image segmentation

Segmentation (clustering) a dataset is a useful way to summarize an MS imaging experiment and discover regions of interest within the sample.

3.1 Spatial shrunken centroids clustering

Spatially-aware nearest shrunken centroids clustering allows simultaneous image segmentation and feature selection.

A smoothing radius r, initial number of clusters k, and sparsity parameters s must be provided.

The larger the sparsity parameter s, the fewer mass features will contribute to the segmentation.

Spatial shrunken centroids may result in fewer clusters than the initial number of clusters k, so it is recommended to use a value for k that is larger than the expected number of clusters, and allow the method to automatically choose the number of clusters.

ssc <- spatialShrunkenCentroids(mse, r=1, k=5, s=c(0,3,6,9))

summary(ssc)
## Spatially-aware nearest shrunken centroids:
##  
##  Segmentation / clustering 
##  Method = gaussian 
##  Distance = chebyshev
##  
##  Radius (r) Init (k) Sparsity (s) Classes Features / Class      BIC
##           1        5            0       4            10.00 346.8825
##           1        5            3       3            10.00 243.5810
##           1        5            6       3             8.67 224.4511
##           1        5            9       3             7.33 213.2929

Plotting the predicted cluster probabilities shows a clear segmentation into the ground truth image.

image(ssc, model=list(s=9), values="probability")

Spatial shrunken centroids calculates t-statistics for each segment and each mass feature. These t-statistics a measure of the difference between the cluster center and the global mean.

plot(ssc, model=list(s=9), values="statistic", lwd=2)

Mass features with t-statistics of zero do not contribute to the segmentation. The sign of the t-statistic indicates whether the mass feature is over- or under-expressed in the given cluster relative to the global mean.

Use topFeatures() to rank mass features by t-statistic.

ssc_top <- topFeatures(ssc, model=list(s=9), class == 1)
ssc_top
## Top-ranked features: 
##           mz   circle   square r k s class   centers  statistic
## 1   473.9206 2.340799 0.000000 1 5 9     1 2.3475172 11.9165555
## 2  1135.9335 2.434873 3.985370 1 5 9     1 4.3294593  4.0936001
## 3   788.8633 1.542205 0.000000 1 5 9     1 0.7122964  0.9371928
## 4   781.2367 1.392247 0.000000 1 5 9     1 0.6067728  0.3683360
## 5  1023.7081 2.011661 4.063644 1 5 9     1 2.8075020  0.0000000
## 6  1200.4653 2.219637 4.166854 1 5 9     1 2.4490113  0.0000000
## 7  1858.8985 0.000000 3.970513 1 5 9     1 1.2245057  0.0000000
## 8  1361.2682 0.000000 4.259568 1 5 9     1 1.3563276 -0.3583295
## 9  1453.5096 0.000000 4.187344 1 5 9     1 1.3431417 -1.3316029
## 10 1227.9380 0.000000 4.039750 1 5 9     1 1.2913549 -2.2072382

3.2 Spatial Dirichlet Gaussian mixture modeling

Spatially-aware Dirichlet Gaussian mixture models (spatial-DGMM) is a method of image segmentation applied to each mass feature individually, rather than the dataset as a whole.

This is useful for summarizing molecular ion images, and for discovering structures that clustering using all mass features together may miss.

dgmm <- spatialDGMM(mse, r=1, k=5, method="adaptive")

summary(dgmm)
## Spatially-aware Dirichlet Gaussian mixture models:
##  
##  Segmentation on 1 group: run0 
##  Method = adaptive 
##  Distance = chebyshev
##  
##  Radius (r) Init (k) Feature Classes / Group
##           1        5       1               2
##           1        5       2               4
##           1        5       3               1
##           1        5       4               3
##           1        5       5               3
##           1        5       6               4
##           1        5       7               2
##           1        5       8               2
##           1        5       9               2
##           1        5      10               2

A different segmentation is fit for each mass feature.

image(dgmm, model=list(feature=c(1,4,7)), layout=c(1,3))

Spatial-DGMM segmentations can be especially useful for finding mass features colocalized with a region-of-interest.

When applied to a SpatialDGMM object, colocalize() is able to use match scores that can have a higher specificity than using Pearson correlation on the raw ion images.

coloc2 <- colocalized(dgmm, mse$square)
select(coloc2, -r, -k, -group)
##           mz   circle   square feature class    Mscore        M1        M2
## 1  1227.9380 0.000000 4.039750       7     2 0.9745223 0.9807692 0.9935065
## 2  1453.5096 0.000000 4.187344       9     2 0.9622642 0.9807692 0.9807692
## 3  1361.2682 0.000000 4.259568       8     2 0.9426752 0.9487179 0.9932886
## 4  1858.8985 0.000000 3.970513      10     2 0.9050633 0.9166667 0.9862069
## 5  1023.7081 2.011661 4.063644       4     3 0.5240642 0.6282051 0.7596899
## 6   473.9206 2.340799 0.000000       1     1 0.4834437 0.9358974 0.5000000
## 7  1135.9335 2.434873 3.985370       5     3 0.3960396 0.5128205 0.6349206
## 8   788.8633 1.542205 0.000000       3     1 0.3900000 1.0000000 0.3900000
## 9   781.2367 1.392247 0.000000       2     1 0.3705036 0.6602564 0.4577778
## 10 1135.9335 2.434873 3.985370       5     2 0.3636364 0.4871795 0.5891473

4 Classification and cross-validation

Classification of pixels into different known classes (e.g., cancer vs normal) based on the mass spectra is a common application for MS imaging.

set.seed(2020)
mse2 <- simulateImage(preset=7, npeaks=10, dim=c(10,10), sdnoise=0.5,
                    nruns=3, peakdiff=2, representation="centroid")

class <- makeFactor(A=mse2$circleA, B=mse2$circleB)

image(mse2, class ~ x * y, key=TRUE, layout=c(1,3))

image(mse2, feature=1, layout=c(1,3))

When performing classification, it is important to use cross-validation so that reported accuracies are not overly optimistic.

We strongly recomend making sure that all spectra from the same experiment run belong to the same fold, to reduce predictive bias due to run effects.

4.1 Projection to latent structures (PLS)

Projection to latent structures (PLS), also called partial least squares, is a supervised dimension reduction technique. It can be thought of as being similar to PCA, but for classification or regression.

cv_pls <- crossValidate(mse2, .y=class, .fun=PLS, ncomp=1:5, .fold=run(mse2))

summary(cv_pls)
## Cross validation:
##  
##  Classification on 2 classes: A B 
##  Summarized 3 folds: run0 run1 run2
##  
##  ncomp  Accuracy Sensitivity Specificity
##      1 0.6608485   0.0000000   1.0000000
##      2 0.8100534   0.4811594   0.9690476
##      3 0.9200094   0.8405797   0.9776557
##      4 0.9132067   0.8550725   0.9553114
##      5 0.9088528   0.7884058   0.9648352

We can see that using 3 PLS components produces the best cross-validated accuracy.

pls <- PLS(mse2, y=class, ncomp=3)

summary(pls)
## Projection to latent components:
##  
##  Classification on 2 classes: A B 
##  Method = pls
##  
##  Number of Components Accuracy Sensitivity Specificity
##                     3      0.9   0.7647059   0.9775281

We can plot the fitted values to visualize the prediction.

image(pls, values="fitted", layout=c(1,3))

The PLS regression coefficients can be used to select influential features.

plot(pls, values="coefficients", lwd=2)

Like PCA, it can be useful to plot the PLS scores against each other to visualize the separation between classes.

pls_scores <- DataFrame(resultData(pls, 1, "scores"))

plot(pls_scores, C1 ~ C2, groups=class, pch=20)

Note that orthgonal PLS (O-PLS) is also available via method="opls" or by using the separate OPLS() method. Typically, both methods perform similarly, although O-PLS can sometimes produce more easily interpretable regression coefficients.

4.2 Spatial shrunken centroids classification

Spatially-aware nearest shrunken centroids classification is an extension of nearest shrunken centroids that incorporates spatial information into the model.

Like in the clustering case of spatial shrunken centroids, a smoothing radius r must be provided along with sparsity parameters s.

cv_ssc <- crossValidate(mse2, .y=class,
                        .fun=spatialShrunkenCentroids,
                        r=1, s=c(0,3,6,9), .fold=run(mse2))

summary(cv_ssc)
## Cross validation:
##  
##  Classification on 2 classes: A B 
##  Summarized 3 folds: run0 run1 run2
##  
##  r s  Accuracy Sensitivity Specificity
##  1 0 0.9661925   1.0000000   0.9505495
##  1 3 0.9429979   0.8840580   0.9871795
##  1 6 0.8593713   0.6666667   1.0000000
##  1 9 0.7955665   0.5217391   1.0000000

We can see that in this case, the fully dense model (s=0) that uses all mass features has the best cross-validated accuracy for the data.

ssc2 <- spatialShrunkenCentroids(mse2, y=class, r=1, s=0)

summary(ssc2)
## Spatially-aware nearest shrunken centroids:
##  
##  Classification on 2 classes: A B 
##  Method = gaussian 
##  Distance = chebyshev
##  
##  Radius (r) Sparsity (s) Features / Class  Accuracy Sensitivity Specificity
##           1            0               10 0.9714286           1   0.9550562

Plotting the predicted class probabilities produces a more easily interpretable visualization than PLS in this case.

image(ssc2, layout=c(1,3))

Plotting t-statistics shows the first three features have much higher abundance in condition “B”.

plot(ssc2, values="statistic", lwd=2)