*Cardinal 2* provides statistical methods for both supervised and unsupervised analysis of mass spectrometry (MS) imaging experiments. Class comparison can also be performed, provided an appropriate experimental design and sample size.

Before statistical analysis, it is important to identify the statistical goal of the experiment:

**Unsupervised analysis**. The data has no class labels or conditions, and we are interested in exploratory analysis to*discover*regions of interest in the data.**Supervised analysis**. The data has class labels and we want to train a statistical or machine learning model to*predict*the class labels of new data.**Class comparison**. The data has class labels or conditions, and we want to*test*whether the abundance of the mass features is different between conditions.

*CardinalWorkflows* provides real experimental data and more detailed discussion of the statistical methods than will be covered in this brief overview.

Suppose we are exploring an unlabeled dataset, and wish to understand the structure of the data.

```
set.seed(2020)
mse <- simulateImage(preset=2, npeaks=10, dim=c(20,20), sdnoise=0.5,
peakheight=c(2,4), representation="centroid")
design <- makeFactor(circle=mse$circle, square=mse$square,
bg=!(mse$circle | mse$square))
image(mse, design ~ x * y, key=TRUE)
```

`image(mse, feature=c(1,4,7), layout=c(1,3))`

Principal components analysis is an unsupervised dimension reduction technique. It reduces the data to some number of “principal components” that are a linear combination of the original mass features, where each component is orthogonal to the last, and explains as much of the variance in the data as possible.

Use `PCA()`

to perform PCA on a `MSImagingExperiment`

.

```
pca <- PCA(mse, ncomp=3)
summary(pca)
```

```
## Principal components analysis:
##
## Component Standard deviation
## 1 4.7061494
## 2 2.6134145
## 3 0.6136734
```

We can see that the first two principal components explain most of the variation in the data.

`image(pca, values="scores", superpose=FALSE, layout=c(1,3))`

The loadings of the components show how each mass feature contributes to each component.

`plot(pca, values="loadings", superpose=FALSE, layout=c(1,3), lwd=2)`

Plotting the principal component scores against each other is a useful way of visualization the separation between data classes.

```
pca_scores <- DataFrame(resultData(pca, 1, "scores"))
plot(pca_scores, PC1 ~ PC2, groups=design, pch=20)
```

Finding other mass features colocalized with a particular image is a common task in analysis of MS imaging experiments.

Use `colocalize()`

to find mass features that are colocalized with another image.

```
coloc <- colocalized(mse, mz=1023)
coloc
```

```
## Colocalized features:
## mz circle square correlation M1 M2
## 1 1023.7081 2.011661 4.063644 1.0000000 1.000 1.000
## 2 1135.9335 2.434873 3.985370 0.9430259 0.875 0.875
## 3 1200.4653 2.219637 4.166854 0.9292093 0.865 0.865
## 4 1361.2682 0.000000 4.259568 0.6712111 0.710 0.710
## 5 1227.9380 0.000000 4.039750 0.6671688 0.675 0.675
## 6 1453.5096 0.000000 4.187344 0.6657311 0.695 0.695
## 7 1858.8985 0.000000 3.970513 0.6620943 0.705 0.705
## 8 781.2367 1.392247 0.000000 0.3891237 0.650 0.650
## 9 473.9206 2.340799 0.000000 0.3632409 0.600 0.600
## 10 788.8633 1.542205 0.000000 0.3378016 0.605 0.605
```

By default, Pearson correlation is used to rank the colocalized features. Manders’ colocalization coefficients (M1 and M2) are also provided.

`image(mse, mz=coloc$mz[1:3], layout=c(1,3))`

Segmentation (clustering) a dataset is a useful way to summarize an MS imaging experiment and discover regions of interest within the sample.

Spatially-aware nearest shrunken centroids clustering allows simultaneous image segmentation and feature selection.

A smoothing radius `r`

, initial number of clusters `k`

, and sparsity parameters `s`

must be provided.

The larger the sparsity parameter `s`

, the fewer mass features will contribute to the segmentation.

Spatial shrunken centroids may result in fewer clusters than the initial number of clusters `k`

, so it is recommended to use a value for `k`

that is larger than the expected number of clusters, and allow the method to automatically choose the number of clusters.

```
ssc <- spatialShrunkenCentroids(mse, r=1, k=5, s=c(0,3,6,9))
summary(ssc)
```

```
## Spatially-aware nearest shrunken centroids:
##
## Segmentation / clustering
## Method = gaussian
## Distance = chebyshev
##
## Radius (r) Init (k) Sparsity (s) Classes Features / Class BIC
## 1 5 0 4 10.00 346.8825
## 1 5 3 3 10.00 243.5810
## 1 5 6 3 8.67 224.4511
## 1 5 9 3 7.33 213.2929
```

Plotting the predicted cluster probabilities shows a clear segmentation into the ground truth image.

`image(ssc, model=list(s=9), values="probability")`

Spatial shrunken centroids calculates t-statistics for each segment and each mass feature. These t-statistics a measure of the difference between the cluster center and the global mean.

`plot(ssc, model=list(s=9), values="statistic", lwd=2)`

Mass features with t-statistics of zero do not contribute to the segmentation. The sign of the t-statistic indicates whether the mass feature is over- or under-expressed in the given cluster relative to the global mean.

Use `topFeatures()`

to rank mass features by t-statistic.

```
ssc_top <- topFeatures(ssc, model=list(s=9), class == 1)
ssc_top
```

```
## Top-ranked features:
## mz circle square r k s class centers statistic
## 1 473.9206 2.340799 0.000000 1 5 9 1 2.3475172 11.9165555
## 2 1135.9335 2.434873 3.985370 1 5 9 1 4.3294593 4.0936001
## 3 788.8633 1.542205 0.000000 1 5 9 1 0.7122964 0.9371928
## 4 781.2367 1.392247 0.000000 1 5 9 1 0.6067728 0.3683360
## 5 1023.7081 2.011661 4.063644 1 5 9 1 2.8075020 0.0000000
## 6 1200.4653 2.219637 4.166854 1 5 9 1 2.4490113 0.0000000
## 7 1858.8985 0.000000 3.970513 1 5 9 1 1.2245057 0.0000000
## 8 1361.2682 0.000000 4.259568 1 5 9 1 1.3563276 -0.3583295
## 9 1453.5096 0.000000 4.187344 1 5 9 1 1.3431417 -1.3316029
## 10 1227.9380 0.000000 4.039750 1 5 9 1 1.2913549 -2.2072382
```

Spatially-aware Dirichlet Gaussian mixture models (spatial-DGMM) is a method of image segmentation applied to each mass feature individually, rather than the dataset as a whole.

This is useful for summarizing molecular ion images, and for discovering structures that clustering using all mass features together may miss.

```
dgmm <- spatialDGMM(mse, r=1, k=5, method="adaptive")
summary(dgmm)
```

```
## Spatially-aware Dirichlet Gaussian mixture models:
##
## Segmentation on 1 group: run0
## Method = adaptive
## Distance = chebyshev
##
## Radius (r) Init (k) Feature Classes / Group
## 1 5 1 2
## 1 5 2 4
## 1 5 3 1
## 1 5 4 3
## 1 5 5 3
## 1 5 6 4
## 1 5 7 2
## 1 5 8 2
## 1 5 9 2
## 1 5 10 2
```

A different segmentation is fit for each mass feature.

`image(dgmm, model=list(feature=c(1,4,7)), layout=c(1,3))`

Spatial-DGMM segmentations can be especially useful for finding mass features colocalized with a region-of-interest.

When applied to a `SpatialDGMM`

object, `colocalize()`

is able to use match scores that can have a higher specificity than using Pearson correlation on the raw ion images.

```
coloc2 <- colocalized(dgmm, mse$square)
select(coloc2, -r, -k, -group)
```

```
## mz circle square feature class Mscore M1 M2
## 1 1227.9380 0.000000 4.039750 7 2 0.9745223 0.9807692 0.9935065
## 2 1453.5096 0.000000 4.187344 9 2 0.9622642 0.9807692 0.9807692
## 3 1361.2682 0.000000 4.259568 8 2 0.9426752 0.9487179 0.9932886
## 4 1858.8985 0.000000 3.970513 10 2 0.9050633 0.9166667 0.9862069
## 5 1023.7081 2.011661 4.063644 4 3 0.5240642 0.6282051 0.7596899
## 6 473.9206 2.340799 0.000000 1 1 0.4834437 0.9358974 0.5000000
## 7 1135.9335 2.434873 3.985370 5 3 0.3960396 0.5128205 0.6349206
## 8 788.8633 1.542205 0.000000 3 1 0.3900000 1.0000000 0.3900000
## 9 781.2367 1.392247 0.000000 2 1 0.3705036 0.6602564 0.4577778
## 10 1135.9335 2.434873 3.985370 5 2 0.3636364 0.4871795 0.5891473
```

Classification of pixels into different known classes (e.g., cancer vs normal) based on the mass spectra is a common application for MS imaging.

```
set.seed(2020)
mse2 <- simulateImage(preset=7, npeaks=10, dim=c(10,10), sdnoise=0.5,
nruns=3, peakdiff=2, representation="centroid")
class <- makeFactor(A=mse2$circleA, B=mse2$circleB)
image(mse2, class ~ x * y, key=TRUE, layout=c(1,3))
```

`image(mse2, feature=1, layout=c(1,3))`

When performing classification, it is important to use cross-validation so that reported accuracies are not overly optimistic.

We strongly recomend making sure that all spectra from the same experiment run belong to the same fold, to reduce predictive bias due to run effects.

Projection to latent structures (PLS), also called partial least squares, is a supervised dimension reduction technique. It can be thought of as being similar to PCA, but for classification or regression.

```
cv_pls <- crossValidate(mse2, .y=class, .fun=PLS, ncomp=1:5, .fold=run(mse2))
summary(cv_pls)
```

```
## Cross validation:
##
## Classification on 2 classes: A B
## Summarized 3 folds: run0 run1 run2
##
## ncomp Accuracy Sensitivity Specificity
## 1 0.6608485 0.0000000 1.0000000
## 2 0.8100534 0.4811594 0.9690476
## 3 0.9200094 0.8405797 0.9776557
## 4 0.9132067 0.8550725 0.9553114
## 5 0.9088528 0.7884058 0.9648352
```

We can see that using 3 PLS components produces the best cross-validated accuracy.

```
pls <- PLS(mse2, y=class, ncomp=3)
summary(pls)
```

```
## Projection to latent components:
##
## Classification on 2 classes: A B
## Method = pls
##
## Number of Components Accuracy Sensitivity Specificity
## 3 0.9 0.7647059 0.9775281
```

We can plot the fitted values to visualize the prediction.

`image(pls, values="fitted", layout=c(1,3))`

The PLS regression coefficients can be used to select influential features.

`plot(pls, values="coefficients", lwd=2)`

Like PCA, it can be useful to plot the PLS scores against each other to visualize the separation between classes.

```
pls_scores <- DataFrame(resultData(pls, 1, "scores"))
plot(pls_scores, C1 ~ C2, groups=class, pch=20)
```

Note that orthgonal PLS (O-PLS) is also available via `method="opls"`

or by using the separate `OPLS()`

method. Typically, both methods perform similarly, although O-PLS can sometimes produce more easily interpretable regression coefficients.

Spatially-aware nearest shrunken centroids classification is an extension of nearest shrunken centroids that incorporates spatial information into the model.

Like in the clustering case of spatial shrunken centroids, a smoothing radius `r`

must be provided along with sparsity parameters `s`

.

```
cv_ssc <- crossValidate(mse2, .y=class,
.fun=spatialShrunkenCentroids,
r=1, s=c(0,3,6,9), .fold=run(mse2))
summary(cv_ssc)
```

```
## Cross validation:
##
## Classification on 2 classes: A B
## Summarized 3 folds: run0 run1 run2
##
## r s Accuracy Sensitivity Specificity
## 1 0 0.9661925 1.0000000 0.9505495
## 1 3 0.9429979 0.8840580 0.9871795
## 1 6 0.8593713 0.6666667 1.0000000
## 1 9 0.7955665 0.5217391 1.0000000
```

We can see that in this case, the fully dense model (`s=0`

) that uses all mass features has the best cross-validated accuracy for the data.

```
ssc2 <- spatialShrunkenCentroids(mse2, y=class, r=1, s=0)
summary(ssc2)
```

```
## Spatially-aware nearest shrunken centroids:
##
## Classification on 2 classes: A B
## Method = gaussian
## Distance = chebyshev
##
## Radius (r) Sparsity (s) Features / Class Accuracy Sensitivity Specificity
## 1 0 10 0.9714286 1 0.9550562
```

Plotting the predicted class probabilities produces a more easily interpretable visualization than PLS in this case.

`image(ssc2, layout=c(1,3))`

Plotting t-statistics shows the first three features have much higher abundance in condition “B”.

`plot(ssc2, values="statistic", lwd=2)`