Contents

1 Introduction

Cardinal 2 provides new classes and methods for the manipulation, transformation, visualization, and analysis of imaging experiments–specifically mass spectrometry (MS) imaging experiments.

MS imaging is a rapidly advancing field with consistent improvements in instrumentation for both MALDI and DESI imaging experiments. Both mass resolution and spatial resolution are steadily increasing, and MS imaging experiments grow increasingly complex.

The first version of Cardinal was written with certain assumptions about MS imaging data that are no longer true. For example, the basic assumption that the raw spectra can be fully loaded into memory is unreasonable for many MS imaging experiments today.

Cardinal 2 was re-written from the ground up to handle the evolving needs of high-resolution MS imaging experiments. Some advancements include:

Classes from older versions of Cardinal should be coerced to their Cardinal 2 equivalents. For example, to return an updated MSImageSet object called x, use as(x, "MSImagingExperiment").

2 Installation

Cardinal can be installed via the BiocManager package.

install.packages("BiocManager")
BiocManager::install("Cardinal")

The same function can be used to update Cardinal and other Bioconductor packages.

Once installed, Cardinal can be loaded with library():

library(Cardinal)

3 Components of an MSImagingExperiment

In Cardinal, imaging experiment datasets are composed of multiple sets of metadata, in addition to the actual experimental data. These are (1) pixel metadata, (2) feature (\(m/z\)) metadata, (3) the actual imaging data, and (4) a class that holds all of these and represents the experiment as a whole.

MSImagingExperiment is a matrix-like container for complete MS imaging experiments, where the “rows” represent the mass features, and the columns represent the pixels. An MSImagingExperiment object contains the following components:

Unlike many software packages designed for analysis of MS imaging experiments, Cardinal is designed to work with multiple datasets simultaneously and can integrate all aspects of experimental design and metadata.

3.1 Example data

We will use simulateImage() to prepare an example dataset.

set.seed(2020)
mse <- simulateImage(preset=1, npeaks=10, nruns=2, baseline=1)
mse
## An object of class 'MSContinuousImagingExperiment'
##   <3919 feature, 800 pixel> imaging dataset
##     imageData(1): intensity
##     featureData(0):
##     pixelData(1): circle
##     metadata(1): design
##     run(2): run0 run1
##     raster dimensions: 20 x 20
##     coord(2): x = 1..20, y = 1..20
##     mass range:  426.5285 to 2044.4400 
##     centroided: FALSE

3.2 Pixel metadata with PositionDataFrame

The pixelData() accessor extracts the pixel information for an MSImagingExperiment. The pixel data are stored in a PositionDataFrame object, which is a type of data frame that separately tracks pixel coordinates and experimental run information.

pixelData(mse)
## PositionDataFrame with 800 rows and 1 column
##        :run:   coord:x   coord:y    circle
##     <factor> <integer> <integer> <logical>
## 1       run0         1         1     FALSE
## 2       run0         2         1     FALSE
## 3       run0         3         1     FALSE
## 4       run0         4         1     FALSE
## 5       run0         5         1     FALSE
## ...      ...       ...       ...       ...
## 796     run1        16        20     FALSE
## 797     run1        17        20     FALSE
## 798     run1        18        20     FALSE
## 799     run1        19        20     FALSE
## 800     run1        20        20     FALSE

The coord() accessor specifically extracts the data frame of pixel coordinates.

coord(mse)
## DataFrame with 800 rows and 2 columns
##             x         y
##     <integer> <integer>
## 1           1         1
## 2           2         1
## 3           3         1
## 4           4         1
## 5           5         1
## ...       ...       ...
## 796        16        20
## 797        17        20
## 798        18        20
## 799        19        20
## 800        20        20

The run() accessor specifically extracts the vector of experimental runs.

run(mse)[1:10]
##  [1] run0 run0 run0 run0 run0 run0 run0 run0 run0 run0
## Levels: run0 run1

3.3 Feature metadata with MassDataFrame

The featureData() accessor extracts the feature information for an MSImagingExperiment. The feature data are stored in a MassDataFrame object, which is a type of data frame that separately tracks the m/z-values associated with the mass spectral features.

featureData(mse)
## MassDataFrame with 3919 rows and 0 columns
##           :mz:
##      <numeric>
## 1      426.529
## 2      426.699
## 3      426.870
## 4      427.041
## 5      427.211
## ...        ...
## 3915   2041.17
## 3916   2041.99
## 3917   2042.81
## 3918   2043.62
## 3919   2044.44

The mz() accessor specifically extracts the vector of m/z-values.

mz(mse)[1:10]
##  [1] 426.5285 426.6991 426.8699 427.0406 427.2115 427.3824 427.5534 427.7245
##  [9] 427.8956 428.0668

3.4 Accessing spectra and image data

The imageData() accessor extracts the image data for an MSImagingExperiment. The data are stored in an ImageArrayList, which is a list of matrix-like objects.

It is possible to store multiple matrices of intensities in this list. Typically, only the first entry will be used by pre-processing and analysis methods.

imageData(mse)
## MSContinuousImagingSpectraList of length 1 
##  names(1):    intensity
##  class(1):       matrix
##    dim(1): <3919 x 800>
##    mem(1):      25.1 MB

Entries in this list can be extracted by name with iData().

iData(mse, "intensity")

The spectra() accessor is a shortcut for accessing the first data matrix.

spectra(mse)[1:5, 1:5]
##           [,1]      [,2]      [,3]      [,4]      [,5]
## [1,] 0.9295940 0.9779923 0.9415157 0.9115036 0.8960595
## [2,] 1.0087009 1.3108664 1.0928983 1.0243944 1.0706272
## [3,] 1.0578001 1.0625834 1.2407371 0.9319758 0.8822412
## [4,] 0.8949165 1.1568158 0.9250994 0.9499621 1.0127282
## [5,] 1.0660395 1.0123048 1.0291570 0.8999156 1.1126816

Rows of these matrices correspond to mass features. Columns correspond to pixels. In other words, each column is a mass spectrum, and each row is a flattened vector of images.

3.5 Specialized classes

In order to specialize to the needs of different datasets, Cardinal provides specialized versions of MSImagingExperiment that reflect different spectra storage strategies.

3.5.1 “Continuous” MS imaging experiments

MSContinuousImagingExperiment is a specialization that enforces the data matrices be stored as dense, column-major matrices. These include R’s native matrix and matter_matc objects from the matter package.

mse
## An object of class 'MSContinuousImagingExperiment'
##   <3919 feature, 800 pixel> imaging dataset
##     imageData(1): intensity
##     featureData(0):
##     pixelData(1): circle
##     metadata(1): design
##     run(2): run0 run1
##     raster dimensions: 20 x 20
##     coord(2): x = 1..20, y = 1..20
##     mass range:  426.5285 to 2044.4400 
##     centroided: FALSE
imageData(mse)
## MSContinuousImagingSpectraList of length 1 
##  names(1):    intensity
##  class(1):       matrix
##    dim(1): <3919 x 800>
##    mem(1):      25.1 MB

3.5.2 “Processed” MS imaging experiments

MSProcessedImagingExperiment is a specialization that enforces the data matrices be stored as sparse, column-major matrices. These include sparse_matc objects from the matter package. This specialization is useful for processed data.

set.seed(2020)
mse2 <- simulateImage(preset=3, npeaks=10, nruns=2)
mse3 <- as(mse2, "MSProcessedImagingExperiment")
mse3
## An object of class 'MSProcessedImagingExperiment'
##   <3919 feature, 800 pixel> imaging dataset
##     imageData(1): intensity
##     featureData(0):
##     pixelData(3): square1 square2 circle
##     metadata(1): design
##     run(2): run0 run1
##     raster dimensions: 20 x 20
##     coord(2): x = 1..20, y = 1..20
##     mass range:  426.5285 to 2044.4400 
##     centroided: FALSE
imageData(mse3)
## MSProcessedImagingSpectraList of length 1 
##  names(1):    intensity
##  class(1):  sparse_matc
##    dim(1): <3919 x 800>
##    mem(1):      25.2 MB

Because the data is stored sparsely, spectra from MSProcessedImagingExperiment objects are binned on-the-fly to the m/z-values specified by mz().

The resolution of the binning can be accessed by resolution().

resolution(mse3)
## ppm 
## 400

The resolution can be set to change how the binning is performed.

resolution(mse3) <- c(ppm=400)
## nrows changed from 3919 to 3918

Changing the binned mass resolution will typically change the effective dimensions of the experiment.

dim(mse2)
## Features   Pixels 
##     3919      800
dim(mse3)
## Features   Pixels 
##     3918      800

The effective m/z-values are also updated to reflect the new bins.

mz(mse2)[1:10]
##  [1] 426.5285 426.6991 426.8699 427.0406 427.2115 427.3824 427.5534 427.7245
##  [9] 427.8956 428.0668
mz(mse3)[1:10]
##  [1] 426.5285 426.6991 426.8699 427.0406 427.2115 427.3824 427.5534 427.7245
##  [9] 427.8956 428.0668

Note that the underlying data will remain unchanged, but the binned values for the intensities will be different.

4 Data import and export

Cardinal 2 natively supports reading and writing imzML (both “continuous” and “processed” versions) and Analyze 7.5 formats via the readMSIData() and writeMSIData() functions.

We will focus on imzML.

The imzML format is an open standard designed specifically for interchange of mass spectrometry imaging datasets. Many other formats can be converted to imzML with the help of free applications available online at .

Let’s create a small image to demonstrate data import/export.

set.seed(2020)
tiny <- simulateImage(preset=1, from=500, to=600, dim=c(3,3))
tiny
## An object of class 'MSContinuousImagingExperiment'
##   <456 feature, 9 pixel> imaging dataset
##     imageData(1): intensity
##     featureData(0):
##     pixelData(1): circle
##     metadata(1): design
##     run(1): run0
##     raster dimensions: 3 x 3
##     coord(2): x = 1..3, y = 1..3
##     mass range: 500.0000 to 599.8071 
##     centroided: FALSE

We’ll also create a “processed” version for writing “processed” imzML.

tiny2 <- as(tiny, "MSProcessedImagingExperiment")
tiny2
## An object of class 'MSProcessedImagingExperiment'
##   <456 feature, 9 pixel> imaging dataset
##     imageData(1): intensity
##     featureData(0):
##     pixelData(1): circle
##     metadata(1): design
##     run(1): run0
##     raster dimensions: 3 x 3
##     coord(2): x = 1..3, y = 1..3
##     mass range: 500.0000 to 599.8071 
##     centroided: FALSE

Note that despite the name, the only difference between “continuous” and “processed” imzML is how the data are stored, rather than what processing has been applied to the data. “Continuous” imzML stores spectra densely, with each spectrum sharing the same m/z-values. “Processed” imzML stores spectra sparsely, and each spectrum can have its own distinct m/z-values.

4.1 Writing imzML

Use writeMSIData() to write datasets to imzML and Analyze formats.

4.1.1 Writing “continuous” imzML

Internally, writeMSIData() will call either writeImzML() or writeAnalyze() depending on the value of outformat. The default is outformat="imzML".

path <- tempfile()
writeMSIData(tiny, file=path, outformat="imzML")
## Warning in .Call("C_setVector", x, value, PACKAGE = "matter"): converting NULL
## pointer to R NULL

## Warning in .Call("C_setVector", x, value, PACKAGE = "matter"): converting NULL
## pointer to R NULL
## Warning in .Call("C_setMatrixCols", x, j - 1, value, PACKAGE = "matter"):
## converting NULL pointer to R NULL

## Warning in .Call("C_setMatrixCols", x, j - 1, value, PACKAGE = "matter"):
## converting NULL pointer to R NULL

## Warning in .Call("C_setMatrixCols", x, j - 1, value, PACKAGE = "matter"):
## converting NULL pointer to R NULL

## Warning in .Call("C_setMatrixCols", x, j - 1, value, PACKAGE = "matter"):
## converting NULL pointer to R NULL

## Warning in .Call("C_setMatrixCols", x, j - 1, value, PACKAGE = "matter"):
## converting NULL pointer to R NULL

## Warning in .Call("C_setMatrixCols", x, j - 1, value, PACKAGE = "matter"):
## converting NULL pointer to R NULL

## Warning in .Call("C_setMatrixCols", x, j - 1, value, PACKAGE = "matter"):
## converting NULL pointer to R NULL

## Warning in .Call("C_setMatrixCols", x, j - 1, value, PACKAGE = "matter"):
## converting NULL pointer to R NULL

## Warning in .Call("C_setMatrixCols", x, j - 1, value, PACKAGE = "matter"):
## converting NULL pointer to R NULL

Two files are produced with extensions “.imzML” and “.ibd”. The former contains an XML description of the dataset, and the latter contains the actual intensity data.

## [1] "file22f070d9c748.ibd"   "file22f070d9c748.imzML"

Because tiny is a MSContinuousImagingExperiment, it is written as “continuous” imzML.

mzml <- readLines(paste0(path, ".imzML"))
grep("continuous", mzml, value=TRUE)
## [1] "\t\t\t<cvParam cvRef=\"IMS\" accession=\"IMS:1000030\" name=\"continuous\" value=\"\" />"

4.1.2 Writing “processed” imzML

We can also write “processed” imzML if we export a MSProcessedImagingExperiment file.

path2 <- tempfile()
writeMSIData(tiny2, file=path2, outformat="imzML")
## Warning in .Call("C_setVector", x, value, PACKAGE = "matter"): converting NULL
## pointer to R NULL
## Warning in .Call("C_setListElements", x, i - 1, NULL, value, PACKAGE =
## "matter"): converting NULL pointer to R NULL

## Warning in .Call("C_setListElements", x, i - 1, NULL, value, PACKAGE =
## "matter"): converting NULL pointer to R NULL

## Warning in .Call("C_setListElements", x, i - 1, NULL, value, PACKAGE =
## "matter"): converting NULL pointer to R NULL

## Warning in .Call("C_setListElements", x, i - 1, NULL, value, PACKAGE =
## "matter"): converting NULL pointer to R NULL

## Warning in .Call("C_setListElements", x, i - 1, NULL, value, PACKAGE =
## "matter"): converting NULL pointer to R NULL

## Warning in .Call("C_setListElements", x, i - 1, NULL, value, PACKAGE =
## "matter"): converting NULL pointer to R NULL

## Warning in .Call("C_setListElements", x, i - 1, NULL, value, PACKAGE =
## "matter"): converting NULL pointer to R NULL

## Warning in .Call("C_setListElements", x, i - 1, NULL, value, PACKAGE =
## "matter"): converting NULL pointer to R NULL

## Warning in .Call("C_setListElements", x, i - 1, NULL, value, PACKAGE =
## "matter"): converting NULL pointer to R NULL

## Warning in .Call("C_setListElements", x, i - 1, NULL, value, PACKAGE =
## "matter"): converting NULL pointer to R NULL

## Warning in .Call("C_setListElements", x, i - 1, NULL, value, PACKAGE =
## "matter"): converting NULL pointer to R NULL

## Warning in .Call("C_setListElements", x, i - 1, NULL, value, PACKAGE =
## "matter"): converting NULL pointer to R NULL

## Warning in .Call("C_setListElements", x, i - 1, NULL, value, PACKAGE =
## "matter"): converting NULL pointer to R NULL

## Warning in .Call("C_setListElements", x, i - 1, NULL, value, PACKAGE =
## "matter"): converting NULL pointer to R NULL

## Warning in .Call("C_setListElements", x, i - 1, NULL, value, PACKAGE =
## "matter"): converting NULL pointer to R NULL

## Warning in .Call("C_setListElements", x, i - 1, NULL, value, PACKAGE =
## "matter"): converting NULL pointer to R NULL

## Warning in .Call("C_setListElements", x, i - 1, NULL, value, PACKAGE =
## "matter"): converting NULL pointer to R NULL

## Warning in .Call("C_setListElements", x, i - 1, NULL, value, PACKAGE =
## "matter"): converting NULL pointer to R NULL
mzml2 <- readLines(paste0(path2, ".imzML"))
grep("processed", mzml2, value=TRUE)
## [1] "\t\t\t<cvParam cvRef=\"IMS\" accession=\"IMS:1000031\" name=\"processed\" value=\"\" />"

4.1.3 Writing datasets with multiple runs

If an experiment contains multiple runs, then each run will be written to a separate imzML file.

set.seed(2020)
tiny3 <- simulateImage(preset=1, from=500, to=600, dim=c(3,3), nruns=2)
runNames(tiny3)
## [1] "run0" "run1"
path3 <- tempfile()
writeMSIData(tiny3, file=path3, outformat="imzML")
## Warning in .Call("C_setVector", x, value, PACKAGE = "matter"): converting NULL
## pointer to R NULL

## Warning in .Call("C_setVector", x, value, PACKAGE = "matter"): converting NULL
## pointer to R NULL
## Warning in .Call("C_setMatrixCols", x, j - 1, value, PACKAGE = "matter"):
## converting NULL pointer to R NULL

## Warning in .Call("C_setMatrixCols", x, j - 1, value, PACKAGE = "matter"):
## converting NULL pointer to R NULL

## Warning in .Call("C_setMatrixCols", x, j - 1, value, PACKAGE = "matter"):
## converting NULL pointer to R NULL

## Warning in .Call("C_setMatrixCols", x, j - 1, value, PACKAGE = "matter"):
## converting NULL pointer to R NULL

## Warning in .Call("C_setMatrixCols", x, j - 1, value, PACKAGE = "matter"):
## converting NULL pointer to R NULL

## Warning in .Call("C_setMatrixCols", x, j - 1, value, PACKAGE = "matter"):
## converting NULL pointer to R NULL

## Warning in .Call("C_setMatrixCols", x, j - 1, value, PACKAGE = "matter"):
## converting NULL pointer to R NULL

## Warning in .Call("C_setMatrixCols", x, j - 1, value, PACKAGE = "matter"):
## converting NULL pointer to R NULL

## Warning in .Call("C_setMatrixCols", x, j - 1, value, PACKAGE = "matter"):
## converting NULL pointer to R NULL
## Warning in .Call("C_setVector", x, value, PACKAGE = "matter"): converting NULL
## pointer to R NULL

## Warning in .Call("C_setVector", x, value, PACKAGE = "matter"): converting NULL
## pointer to R NULL
## Warning in .Call("C_setMatrixCols", x, j - 1, value, PACKAGE = "matter"):
## converting NULL pointer to R NULL

## Warning in .Call("C_setMatrixCols", x, j - 1, value, PACKAGE = "matter"):
## converting NULL pointer to R NULL

## Warning in .Call("C_setMatrixCols", x, j - 1, value, PACKAGE = "matter"):
## converting NULL pointer to R NULL

## Warning in .Call("C_setMatrixCols", x, j - 1, value, PACKAGE = "matter"):
## converting NULL pointer to R NULL

## Warning in .Call("C_setMatrixCols", x, j - 1, value, PACKAGE = "matter"):
## converting NULL pointer to R NULL

## Warning in .Call("C_setMatrixCols", x, j - 1, value, PACKAGE = "matter"):
## converting NULL pointer to R NULL

## Warning in .Call("C_setMatrixCols", x, j - 1, value, PACKAGE = "matter"):
## converting NULL pointer to R NULL

## Warning in .Call("C_setMatrixCols", x, j - 1, value, PACKAGE = "matter"):
## converting NULL pointer to R NULL

## Warning in .Call("C_setMatrixCols", x, j - 1, value, PACKAGE = "matter"):
## converting NULL pointer to R NULL
## [1] "file22f04ca9a6cb-run0.ibd"   "file22f04ca9a6cb-run0.imzML"
## [3] "file22f04ca9a6cb-run1.ibd"   "file22f04ca9a6cb-run1.imzML"

4.2 Reading imzML

Use readMSIData() to read datasets in imzML or Analyze formats.

4.2.1 Reading “continuous” imzML

Internally, readMSIData() will guess the format based on the file extension (which must be included) and call either readImzML() or readAnalyze().

path_in <- paste0(path, ".imzML")
tiny_in <- readMSIData(path_in, attach.only=TRUE)
tiny_in
## An object of class 'MSContinuousImagingExperiment'
##   <456 feature, 9 pixel> imaging dataset
##     imageData(1): intensity
##     featureData(0):
##     pixelData(2): 3DPositionX 3DPositionY
##     metadata(8): spectrum representation ibd binary type ... files name
##     run(1): file22f070d9c748
##     raster dimensions: 3 x 3
##     coord(2): x = 1..3, y = 1..3
##     mass range: 500.0000 to 599.8071 
##     centroided: FALSE

The attach.only argument is used to specify that the intensity data should not be loaded into memory, but instead attached as a file-based matrix using the matter package.

Starting in Cardinal 2, the default is attach.only=TRUE. This is more memory-efficient, but some methods may be more time-consuming due to the file I/O.

imageData(tiny_in)
## MSContinuousImagingSpectraList of length 1 
##  names(1):   intensity
##  class(1): matter_matc
##    dim(1):   <456 x 9>
##    mem(1):     10.6 KB
spectra(tiny_in)
## <456 row, 9 column> matter_matc :: out-of-memory numeric matrix
##                   [,1]               [,2]               [,3]               [,4]
## [1,]                 0 0.0408783257007599                  0                  0
## [2,]                 0  0.162409648299217                  0                  0
## [3,]                 0                  0 0.0359107814729214 0.0125106507912278
## [4,] 0.187441676855087 0.0405706577003002 0.0885359272360802 0.0557640492916107
## [5,]                 0  0.193072378635406                  0  0.124683283269405
## [6,]  0.30000975728035                  0 0.0878914147615433                  0
## ...                ...                ...                ...                ...
##                    [,5]               [,6] ...
## [1,] 0.0166400671005249 0.0327425710856915 ...
## [2,]                  0                  0 ...
## [3,] 0.0794587582349777 0.0879494920372963 ...
## [4,]                  0 0.0971635207533836 ...
## [5,]                  0  0.117957629263401 ...
## [6,]  0.115523137152195 0.0507860481739044 ...
## ...                 ...                ... ...
## (10.6 KB real | 16.4 KB virtual)

An out-of-memory matter matrix can be subsetted like an ordinary R matrix. The data values are only read from file and loaded into memory when they are requested (via subsetting).

spectra(tiny_in)[1:5, 1:5]
##           [,1]       [,2]       [,3]       [,4]       [,5]
## [1,] 0.0000000 0.04087833 0.00000000 0.00000000 0.01664007
## [2,] 0.0000000 0.16240965 0.00000000 0.00000000 0.00000000
## [3,] 0.0000000 0.00000000 0.03591078 0.01251065 0.07945876
## [4,] 0.1874417 0.04057066 0.08853593 0.05576405 0.00000000
## [5,] 0.0000000 0.19307238 0.00000000 0.12468328 0.00000000

If loading the data fully into memory is desired, either set attach.only=FALSE when reading the data, or use as.matrix() on the intensity matrix.

spectra(tiny_in) <- as.matrix(spectra(tiny_in))
imageData(tiny_in)
## MSContinuousImagingSpectraList of length 1 
##  names(1): intensity
##  class(1):    matrix
##    dim(1): <456 x 9>
##    mem(1):     33 KB

Using collect() on the MSImagingExperiment will also load all data into memory.

tiny_in <- collect(tiny_in)

4.2.2 Reading “processed” imzML

For “processed” imzML files, the spectra must be binned to common m/z-values. By default, readImzML() will detect the mass range from the data. This requires reading a large proportion of data from the file, even if attach.only=TRUE.

path2_in <- paste0(path2, ".imzML")
tiny2_in <- readMSIData(path2_in)
tiny2_in
## An object of class 'MSProcessedImagingExperiment'
##   <456 feature, 9 pixel> imaging dataset
##     imageData(1): intensity
##     featureData(0):
##     pixelData(2): 3DPositionX 3DPositionY
##     metadata(8): spectrum representation ibd binary type ... files name
##     run(1): file22f03ae8c16e
##     raster dimensions: 3 x 3
##     coord(2): x = 1..3, y = 1..3
##     mass range: 500.0000 to 599.8071 
##     centroided: FALSE

If known, it can be much more efficient to specify mass.range when importing “processed” imzML data. This can also be used to pre-filter the data to a smaller mass range.

The size of the m/z bins can be changed with the resolution argument.

tiny2_in <- readMSIData(path2_in, mass.range=c(510,590),
                        resolution=100, units="ppm")
tiny2_in
## An object of class 'MSProcessedImagingExperiment'
##   <1458 feature, 9 pixel> imaging dataset
##     imageData(1): intensity
##     featureData(0):
##     pixelData(2): 3DPositionX 3DPositionY
##     metadata(8): spectrum representation ibd binary type ... files name
##     run(1): file22f03ae8c16e
##     raster dimensions: 3 x 3
##     coord(2): x = 1..3, y = 1..3
##     mass range: 510.000 to 589.993 
##     centroided: FALSE

The resolution for the m/z bins can be changed after importing the data by setting the resolution() of the dataset.

resolution(tiny2_in) <- c(ppm=400)
## nrows changed from 1458 to 365

4.3 Building from scratch

While importing formats besides imzML and Analyze are not explicitly supported by Cardinal, if the data can be read into R, it is easy to construct a MSImagingExperiment object from its components.

set.seed(2020)
s <- simulateSpectrum(n=9, peaks=10, from=500, to=600)

coord <- expand.grid(x=1:3, y=1:3)
run <- factor(rep("run0", nrow(coord)))

fdata <- MassDataFrame(mz=s$mz)
pdata <- PositionDataFrame(run=run, coord=coord)

out <- MSImagingExperiment(imageData=s$intensity,
                            featureData=fdata,
                            pixelData=pdata)
out
## An object of class 'MSContinuousImagingExperiment'
##   <456 feature, 9 pixel> imaging dataset
##     imageData(1): intensity
##     featureData(0):
##     pixelData(0):
##     run(1): run0
##     raster dimensions: 3 x 3
##     coord(2): x = 1..3, y = 1..3
##     mass range: 500.0000 to 599.8071 
##     centroided: FALSE

For loading data into R, read.csv() and read.table() can be used to read CSV and tab-delimited text files, respectively.

Likewise, write.csv() and write.table() can be used to write pixel metadata and feature metadata after coercing them to an ordinary R data.frame with as.data.frame().

Use saveRDS() and readRDS() to save and read and entire R object such as a MSImagingExperiment. Note that if intensity data is to be saved as well, it should be pulled into memory and coerced to an R matrix with as.matrix() first.

5 Visualization

Visualization of mass spectra and molecular ion images is vital for exploratory analysis of MS imaging experiments. Cardinal provides a plot() method for plotting mass spectra and an image() method for plotting ion images.

Cardinal 2 provides some useful visualization updates compared to previous versions:

5.1 Visualizing mass spectra with plot()

Use plot() to plot mass spectra. Either pixel or coord can be specified to determine which mass spectra should be plotted.

plot(mse, pixel=c(211, 611))

The plusminus argument can be used to specify that multiple spectra should be averaged and plotted together.

plot(mse, coord=list(x=10, y=10), plusminus=1, fun=mean)

A formula can be specified to further customize mass spectra plotting. The LHS of the formula should specify one or more elements of imageData() and the RHS of the formula should be a variable in featureData().

plot(mse, intensity + I(-log1p(intensity)) ~ mz, pixel=211, superpose=TRUE)

5.2 Visualizing images with image()

Use image() to plot ion images. Either feature or mz can be specified to determine which mass features should be plotted.

image(mse, mz=1200)

The plusminus argument can be used to specify that multiple mass features should be averaged and plotted together.

image(mse, mz=1227, plusminus=0.5, fun=mean)

By default, images from all experimental runs are plotted. If this is not desired, a subset can be specified instead.

image(mse, mz=1227, subset=run(mse) == "run0")

image(mse, mz=c(1200, 1227), subset=circle)

Multiplicative variance in spectral intensities can cause images to be noisy and dark due to hot spots.

Often, images may require some type of processing and enhancement to improve interpretation.

image(mse, mz=1200, smooth.image="gaussian")

image(mse, mz=1200, contrast.enhance="histogram")

The default viridis colorscale is chosen to enable ease of interpretation. However, other colorscales can be chosen. All colorscales from the viridisLite package are available in Cardinal, including cividis, magma, inferno, and plasma.

The magma colorscale is used below with a different dataset.

image(mse2, mz=1136, colorscale=magma)

Cardinal will try to choose an appropriate layout for the images. However, a user-defined one can be specified by layout. Use layout=NULL to use the current plotting device as-is.

image(mse2, mz=c(781, 1361), layout=c(1,4), colorscale=magma)

Multiple images can be superposed with superpose=TRUE. Use normalize.image="linear" to normalize all images to the same intensity range.

image(mse2, mz=c(781, 1361), superpose=TRUE, normalize.image="linear")

Like plot(), a formula can be specified. The LHS should specify one or more elements of imageData() and the RHS should specify two to three variables from pixelData().

image(mse2, log1p(intensity) ~ x * y, mz=1136, colorscale=magma)

5.3 Visualizing 3D imaging experiments

3D imaging datasets can be plotted with image3D().

set.seed(1)
mse3d <- simulateImage(preset=9, nruns=7, dim=c(7,7), npeaks=10,
                        peakheight=c(2,4), representation="centroid")

image3D(mse3d, mz=c(1001, 1159), colorscale=plasma, cex=3, theta=30, phi=30)

Any dataset with 3 or more spatial coordinates (columns in coord()), can be plotted in 3D.

5.4 Region-of-interest selection

Use selectROI() to select regions-of-interest on an ion image. It is important to specify a subset so that selection is only made on a single experimental run, otherwise results may be unexpected. The form of selectROI() is the same as image().

sampleA <- selectROI(mse, mz=1200, subset=run(mse) == "run0")
sampleB <- selectROI(mse, mz=1200, subset=run(mse) == "run1")

selectROI() returns a logical vector specifying which pixels from the imaging experiment are contained in the selected region.

makeFactor() can then be used to combine multiple logical vectors (e.g., from selectROI()) into a single factor.

regions <- makeFactor(A=sampleA, B=sampleB)

5.5 Saving plots and images

Plots and images can be saved to a file by using R’s built-in graphics devices.

Use pdf() to initialize a PDF graphics device, create the plot, and then use dev.off() to turn off the graphics device.

Any plots printed while the graphics device is active will be saved to the specified file(s).

pdf_file <- paste0(tempfile(), ".pdf")

pdf(file=pdf_file, width=9, height=4)
image(mse, mz=1200)
dev.off()

Graphics devices for png(), jpeg(), bmp(), and tiff() are also available. See their documentation for usage.

5.6 Dark themes

While many software for MS imaging data use a light-on-dark theme, Cardinal uses a transparent background by default. However, a dark theme is also provided through darkmode().

darkmode()
image(mse, mz=1200)

darkmode()
image(mse2, mz=1135, colorscale=magma)

Any future plots will use the new mode. The default mode can be restored with lightmode().

lightmode()

5.7 A note on plotting speed

While plotting spectra should typically be fast for most datasets regardless of data format, plotting images will typically be slower for out-of-memory (file-based) datasets and for MSProcessedImagingExperiment objects in general.

This is due to the way the spectra are stored in imzML and Analyze files, and the way sparse spectra are stored (regardless of location). Calculating and decompressing the images simply takes longer than reading the spectra.

For the fastest visualization of images, experiments should be coerced to an in-memory MSContinuousImagingExperiment.

Also note that all Cardinal 2 visualizations produce a print()-able object that can be assigned to a variable and print()-ed later without the need to read the data again.

g <- image(mse, mz=1200)
print(g)

This is useful for re-creating or updating plots without accessing the data again.

6 Common operations on MSImagingExperiment

6.1 Subsetting

MSImagingExperiment are matrix-like objects that can be subsetted using the [ operator.

When subsetting, the “rows” are the mass features, and the “columns” are the pixels.

# subset first 10 mass features and first 5 pixels
mse[1:10, 1:5]
## An object of class 'MSContinuousImagingExperiment'
##   <10 feature, 5 pixel> imaging dataset
##     imageData(1): intensity
##     featureData(0):
##     pixelData(1): circle
##     metadata(1): design
##     run(1): run0
##     raster dimensions: 5 x 1
##     coord(2): x = 1..5, y = 1..1
##     mass range: 426.5285 to 428.0668 
##     centroided: FALSE

Subsetting the dataset this way requires knowing the desired row and column indices.

features() returns row indices based on specified feature metadata.

# get row index corresponding to m/z 1200
features(mse, mz=1200)
## [1] 2587
# get row indices corresponding to m/z 1199 - 1201
features(mse, 1199 < mz & mz < 1201)
## [1] 2585 2586 2587 2588 2589

pixels() returns column indices based on specified pixel metadata.

# get column indices corresponding to x = 10, y = 10 in all runs
pixels(mse, coord=list(x=10, y=10))
## [1] 190 590
# get column indices corresponding to x <= 3, y <= 3 in "run0"
pixels(mse, x <= 3, y <= 3, run == "run0")
## [1]  1  2  3 21 22 23 41 42 43

These methods can be used to determine row/column indices of particular m/z-values or pixel coordinates to use for subsetting.

fid <- features(mse, 1199 < mz & mz < 1201)
pid <- pixels(mse, x <= 3, y <= 3, run == "run0")
mse[fid, pid]
## An object of class 'MSContinuousImagingExperiment'
##   <5 feature, 9 pixel> imaging dataset
##     imageData(1): intensity
##     featureData(0):
##     pixelData(1): circle
##     metadata(1): design
##     run(1): run0
##     raster dimensions: 3 x 3
##     coord(2): x = 1..3, y = 1..3
##     mass range: 1199.043 to 1200.963 
##     centroided: FALSE

6.2 Slicing

MSImagingExperiment represents the data as a matrix, where each column is a mass spectrum, rather than as a true “data cube”. This is typically simpler when operating on the mass spectra, and more space efficient when the data is pixel-sparse (i.e., non-rectangular).

Sometimes, however, it is useful to index into the data as an actual “data cube”, with explicit array dimensions for each spatial dimension.

Use slice() to slice an MSImagingExperiment as a data cube and extract images.

# slice image for first mass feature
a <- slice(mse, 1)
dim(a)
##   x   y run 
##  20  20   2

Any arguments to slice() are passed to features(), making it easy to select the desired image slices.

By default, array dimensions with only one level are dropped; use .preserve=TRUE to keep all dimensions.

# slice image for m/z 1200
a2 <- slice(mse, mz=1200, drop=FALSE)
dim(a2)
##       x       y     run feature 
##      20      20       2       1

Note that when plotting images from raw arrays, the images are upside-down due to differing coordinate conventions used by graphics::image().

image(a2[,,1,1], col=bw.colors(100))

6.3 Combining

Because MSImagingExperiment is matrix-like, rbind() and cbind() can be used to combine multiple MSImagingExperiment objects by “row” or “column”, assumping certain conditions are met.

Use cbind() to combine datasets from different experimental runs. The m/z-values must match between all datasets to succesfully combine them.

# divide dataset into separate objects for each run
mse_run0 <- mse[,run(mse) == "run0"]
mse_run1 <- mse[,run(mse) == "run1"]
mse_run0
## An object of class 'MSContinuousImagingExperiment'
##   <3919 feature, 400 pixel> imaging dataset
##     imageData(1): intensity
##     featureData(0):
##     pixelData(1): circle
##     metadata(1): design
##     run(1): run0
##     raster dimensions: 20 x 20
##     coord(2): x = 1..20, y = 1..20
##     mass range:  426.5285 to 2044.4400 
##     centroided: FALSE
mse_run1
## An object of class 'MSContinuousImagingExperiment'
##   <3919 feature, 400 pixel> imaging dataset
##     imageData(1): intensity
##     featureData(0):
##     pixelData(1): circle
##     metadata(1): design
##     run(1): run1
##     raster dimensions: 20 x 20
##     coord(2): x = 1..20, y = 1..20
##     mass range:  426.5285 to 2044.4400 
##     centroided: FALSE
# recombine the separate datasets back together
mse_cbind <- cbind(mse_run0, mse_run1)
mse_cbind
## An object of class 'MSContinuousImagingExperiment'
##   <3919 feature, 800 pixel> imaging dataset
##     imageData(1): intensity
##     featureData(0):
##     pixelData(1): circle
##     metadata(1): design
##     run(2): run0 run1
##     raster dimensions: 20 x 20
##     coord(2): x = 1..20, y = 1..20
##     mass range:  426.5285 to 2044.4400 
##     centroided: FALSE

Some processing may be necessary to ensure datasets are compatible before combining them.

6.4 Getters and setters

Most components of an MSImagingExperiment that can be accessed through getter functions like pixelData(), featureData(), and imageData(), can also be re-assigned with analogous setter functions. These can likewise be used to get and set columns of the pixel and feature metadata.

pData() and fData() are aliases for pixelData() and featureData(), respectively.

The $ operator will access the corresponding columns of pixelData().

mse$region <- makeFactor(circle=mse$circle,
                            bg=!mse$circle)
pData(mse)
## PositionDataFrame with 800 rows and 2 columns
##        :run:   coord:x   coord:y    circle   region
##     <factor> <integer> <integer> <logical> <factor>
## 1       run0         1         1     FALSE       bg
## 2       run0         2         1     FALSE       bg
## 3       run0         3         1     FALSE       bg
## 4       run0         4         1     FALSE       bg
## 5       run0         5         1     FALSE       bg
## ...      ...       ...       ...       ...      ...
## 796     run1        16        20     FALSE       bg
## 797     run1        17        20     FALSE       bg
## 798     run1        18        20     FALSE       bg
## 799     run1        19        20     FALSE       bg
## 800     run1        20        20     FALSE       bg

iData() can be used to access elements of the imageData() list by name or index.

Using iData() with no arguments besides the dataset will get or set the first element of imageData(). Providing a name or index will get or set that element.

iData(mse, "log2intensity") <- log2(iData(mse) + 1)
imageData(mse)
## MSContinuousImagingSpectraList of length 2 
##  names(2):    intensity log2intensity
##  class(2):       matrix        matrix
##    dim(2): <3919 x 800>  <3919 x 800>
##    mem(2):      25.1 MB       25.1 MB

For MSImagingExperiment, spectra() is an alias for iData().

spectra(mse, "log2intensity")[1:5, 1:5]
##           [,1]      [,2]      [,3]      [,4]      [,5]
## [1,] 0.9482973 0.9840368 0.9571834 0.9347079 0.9230042
## [2,] 1.0062627 1.2084339 1.0655022 1.0174904 1.0500678
## [3,] 1.0411028 1.0444525 1.1639734 0.9500770 0.9124515
## [4,] 0.9221343 1.1089029 0.9449329 0.9634461 1.0091524
## [5,] 1.0468679 1.0088489 1.0208805 0.9259353 1.0790753

Whether or not the spectra have been centroided or not can be accessed using centroided()

centroided(mse)
## [1] FALSE

This can also be used to set whether the spectra should be treated as centroided or not.

centroided(mse) <- FALSE

6.5 Summarization (e.g., mean spectra)

Cardinal 2 implements several convenient data manipulation verbs for subsetting and summarizing MSImagingExperiment objects.

  • subset() subsets an MSImagingExperiment

  • subsetFeatures() subsets an MSImagingExperiment by row, i.e., mass features

  • subsetPixels() subsets an MSImagingExperiment by column, i.e., pixels

  • aggregate() summarizes an MSImagingExperiment by either feature or pixel

  • summarizeFeatures() summarizes an MSImagingExperiment by feature (e.g., mean spectrum)

  • summarizePixels() summarizes an MSImagingExperiment by pixel (e.g., TIC)

The %>% operator can be used to chain these operations together. For file-based data, the subsetting should be quick, as only metadata is modified.

# subset by mass range
subsetFeatures(mse, mz > 700, mz < 900)
## An object of class 'MSContinuousImagingExperiment'
##   <628 feature, 800 pixel> imaging dataset
##     imageData(2): intensity log2intensity
##     featureData(0):
##     pixelData(2): circle region
##     metadata(1): design
##     run(2): run0 run1
##     raster dimensions: 20 x 20
##     coord(2): x = 1..20, y = 1..20
##     mass range: 700.1392 to 899.7160 
##     centroided: FALSE
# subset by pixel coordinates
subsetPixels(mse, x <= 3, y <= 3, run == "run0")
## An object of class 'MSContinuousImagingExperiment'
##   <3919 feature, 9 pixel> imaging dataset
##     imageData(2): intensity log2intensity
##     featureData(0):
##     pixelData(2): circle region
##     metadata(1): design
##     run(1): run0
##     raster dimensions: 3 x 3
##     coord(2): x = 1..3, y = 1..3
##     mass range:  426.5285 to 2044.4400 
##     centroided: FALSE
# subset by mass range + pixel coordinates
subset(mse, mz > 700 & mz < 900, x <=3 & y <= 3 & run == "run0")
## An object of class 'MSContinuousImagingExperiment'
##   <628 feature, 9 pixel> imaging dataset
##     imageData(2): intensity log2intensity
##     featureData(0):
##     pixelData(2): circle region
##     metadata(1): design
##     run(1): run0
##     raster dimensions: 3 x 3
##     coord(2): x = 1..3, y = 1..3
##     mass range: 700.1392 to 899.7160 
##     centroided: FALSE
# chain verbs together
mse %>%
    subsetFeatures(mz > 700, mz < 900) %>%
    subsetPixels(x <= 3, y <= 3, run == "run0")
## An object of class 'MSContinuousImagingExperiment'
##   <628 feature, 9 pixel> imaging dataset
##     imageData(2): intensity log2intensity
##     featureData(0):
##     pixelData(2): circle region
##     metadata(1): design
##     run(1): run0
##     raster dimensions: 3 x 3
##     coord(2): x = 1..3, y = 1..3
##     mass range: 700.1392 to 899.7160 
##     centroided: FALSE
# calculate mean spectrum
summarizeFeatures(mse, "mean", as="DataFrame")
## MassDataFrame with 3919 rows and 1 column
##           :mz:      mean
##      <numeric> <numeric>
## 1      426.529  1.002647
## 2      426.699  0.997441
## 3      426.870  0.998042
## 4      427.041  1.001188
## 5      427.211  0.995394
## ...        ...       ...
## 3915   2041.17 0.0377938
## 3916   2041.99 0.0428236
## 3917   2042.81 0.0411181
## 3918   2043.62 0.0377568
## 3919   2044.44 0.0410845
# calculate tic
summarizePixels(mse, c(tic="sum"), as="DataFrame")
## PositionDataFrame with 800 rows and 1 column
##        :run:   coord:x   coord:y       tic
##     <factor> <integer> <integer> <numeric>
## 1       run0         1         1   932.111
## 2       run0         2         1   939.237
## 3       run0         3         1   937.598
## 4       run0         4         1   933.216
## 5       run0         5         1   937.512
## ...      ...       ...       ...       ...
## 796     run1        16        20   935.639
## 797     run1        17        20   936.751
## 798     run1        18        20   941.756
## 799     run1        19        20   948.589
## 800     run1        20        20   928.441
# calculate mean spectrum of circle region
mse %>%
    subsetPixels(region == "circle") %>%
    summarizeFeatures("mean", as="DataFrame")
## MassDataFrame with 3919 rows and 1 column
##           :mz:      mean
##      <numeric> <numeric>
## 1      426.529  1.001174
## 2      426.699  0.999913
## 3      426.870  1.004862
## 4      427.041  1.002847
## 5      427.211  0.996085
## ...        ...       ...
## 3915   2041.17 0.0367079
## 3916   2041.99 0.0441397
## 3917   2042.81 0.0396913
## 3918   2043.62 0.0404076
## 3919   2044.44 0.0429847

6.6 Pull data into memory

By default, Cardinal 2 does not load the spectra from imzML and Analyze files into memory, but retrieves them from files when necessary.

For very large datasets, this is necessary and memory-efficient.

However, for datasets that are known to fit in computer memory, this may be unnecessarily slow, especially when plotting images (which are perpendicular to how data are stored in the files).

# coerce spectra to file-based matter matrix
spectra(mse) <- matter::as.matter(spectra(mse))
## Warning in .Call("C_setMatrix", x, value, PACKAGE = "matter"): converting NULL
## pointer to R NULL
spectra(mse)
## <3919 row, 800 column> matter_matc :: out-of-memory numeric matrix
##                   [,1]              [,2]              [,3]              [,4]
## [1,] 0.929593985749523 0.977992334255707 0.941515698898047 0.911503628041379
## [2,]  1.00870086651911  1.31086642293593  1.09289833045394  1.02439444262814
## [3,]  1.05780006935272  1.06258344855023  1.24073708584273  0.93197575318083
## [4,] 0.894916515700226  1.15681575755041  0.92509939130322 0.949962119177309
## [5,]  1.06603952985143  1.01230482004299  1.02915697989109 0.899915591256897
## [6,] 0.921759798412127  1.06647982270475 0.983971732163821 0.844834986167779
## ...                ...               ...               ...               ...
##                   [,5]              [,6] ...
## [1,] 0.896059510791875 0.995104040677534 ...
## [2,]  1.07062719616811  1.06550164549397 ...
## [3,] 0.882241208579355 0.977646123538851 ...
## [4,]  1.01272818923958  1.15616832323534 ...
## [5,]  1.11268157414036 0.955812180737613 ...
## [6,]  1.13482107107674  1.08524715504496 ...
## ...                ...               ... ...
## (13.3 KB real | 25.1 MB virtual)
imageData(mse)
## MSContinuousImagingSpectraList of length 2 
##  names(2):    intensity log2intensity
##  class(2):  matter_matc        matrix
##    dim(2): <3919 x 800>  <3919 x 800>
##    mem(2):      13.3 KB       25.1 MB

Use pull() to pull all imageData() elements in an MSImagingExperiment into memory.

mse <- pull(mse)
imageData(mse)
## MSContinuousImagingSpectraList of length 2 
##  names(2):    intensity log2intensity
##  class(2):       matrix        matrix
##    dim(2): <3919 x 800>  <3919 x 800>
##    mem(2):      25.1 MB       25.1 MB

By default, the spectra from MSProcessedImagingExperiment objects will still be represented as sparse matrices after using pull().

To coerce sparse spectra to an R matrix as well, use as.matrix=TRUE.

mse3 <- pull(mse3, as.matrix=TRUE)

6.7 Coercion to/from other classes

Use as() to coerce between different MSImagingExperiment sub-classes.

mse3
## An object of class 'MSProcessedImagingExperiment'
##   <3918 feature, 800 pixel> imaging dataset
##     imageData(1): intensity
##     featureData(0):
##     pixelData(3): square1 square2 circle
##     metadata(1): design
##     run(2): run0 run1
##     raster dimensions: 20 x 20
##     coord(2): x = 1..20, y = 1..20
##     mass range:  426.5285 to 2043.6224 
##     centroided: FALSE
as(mse3, "MSContinuousImagingExperiment")
## An object of class 'MSContinuousImagingExperiment'
##   <3918 feature, 800 pixel> imaging dataset
##     imageData(1): intensity
##     featureData(0):
##     pixelData(3): square1 square2 circle
##     metadata(1): design
##     run(2): run0 run1
##     raster dimensions: 20 x 20
##     coord(2): x = 1..20, y = 1..20
##     mass range:  426.5285 to 2043.6224 
##     centroided: FALSE

This will often change the underlying data representation, so some information may be lost depending on the coercion.

Objects of the older MSImageSet class can also be coerced to MSImagingExperiment this way.

# requires CardinalWorkflows installed
data(cardinal, package="CardinalWorkflows")
cardinal2 <- as(cardinal, "MSImagingExperiment")

7 Pre-processing

A major change in Cardinal 2 from earlier versions is how pre-processing is handled.

Instead of applying pre-processing immediately, each pre-processing step is queued to the dataset, and only applied once process() is called.

This approach is more computationally and memory efficient in most cases, as ideally each spectrum is only processed once, and no extraneous copies of the data are made.

7.1 Delayed processing with process()

On its own, the process() method queues a new pre-processing function, and then applies all currently queued processing functions.

For example, the following code applies very basic total-ion-current (TIC) normalization to all spectra.

mse_tic <- process(mse, function(x) length(x) * x / sum(x), label="norm")
mse_tic
## An object of class 'MSContinuousImagingExperiment'
##   <3919 feature, 800 pixel> imaging dataset
##     imageData(2): intensity log2intensity
##     featureData(0):
##     pixelData(2): circle region
##     metadata(1): design
##     processing complete(1): norm
##     processing pending(0):
##     run(2): run0 run1
##     raster dimensions: 20 x 20
##     coord(2): x = 1..20, y = 1..20
##     mass range:  426.5285 to 2044.4400 
##     centroided: FALSE

By default, this is applied immediately. However, delay=TRUE delays this, and allows us to queue multiple pre-processing functions at once.

mse_pre <- mse %>%
    process(function(x) length(x) * x / sum(x), label="norm", delay=TRUE) %>%
    process(function(x) signal::sgolayfilt(x), label="smooth", delay=TRUE)

processingData(mse_pre)
## List of length 2
## names(2): norm smooth

We can view all pending and completed pre-processing steps in more detail.

mcols(processingData(mse_pre))[,-1]
## DataFrame with 2 rows and 6 columns
##               kind   pending  complete   has_pre  has_post  has_plot
##        <character> <logical> <logical> <logical> <logical> <logical>
## norm         pixel      TRUE     FALSE     FALSE     FALSE     FALSE
## smooth       pixel      TRUE     FALSE     FALSE     FALSE     FALSE

Calling process() on the dataset again without any other arguments will apply all queued pre-processing steps.

mse_proc <- process(mse_pre)
mse_proc
## An object of class 'MSContinuousImagingExperiment'
##   <3919 feature, 800 pixel> imaging dataset
##     imageData(2): intensity log2intensity
##     featureData(0):
##     pixelData(2): circle region
##     metadata(1): design
##     processing complete(2): norm smooth
##     processing pending(0):
##     run(2): run0 run1
##     raster dimensions: 20 x 20
##     coord(2): x = 1..20, y = 1..20
##     mass range:  426.5285 to 2044.4400 
##     centroided: FALSE

Note that subsetting retains any queued pre-processing steps.

mse_pre %>%
    subsetPixels(x <= 3, y <= 3) %>%
    subsetFeatures(mz <= 1000) %>%
    process()
## An object of class 'MSContinuousImagingExperiment'
##   <2131 feature, 18 pixel> imaging dataset
##     imageData(2): intensity log2intensity
##     featureData(0):
##     pixelData(2): circle region
##     metadata(1): design
##     processing complete(2): norm smooth
##     processing pending(0):
##     run(2): run0 run1
##     raster dimensions: 3 x 3
##     coord(2): x = 1..3, y = 1..3
##     mass range: 426.5285 to 999.9239 
##     centroided: FALSE

All pre-processing methods for MSImagingExperiment queue delayed processing by default.

If this is not desired, you can set options(Cardinal.delay=FALSE) to apply all pre-processing steps immediately.

7.2 Normalization

Use normalize() to queue normalization to an MSImagingExperiment.

mse_pre <- normalize(mse, method="tic")
  • method="tic" performs total-ion-current (TIC) normalization

  • method="rms" performs root-mean-square (RMS) normalization

  • method="reference" normalizes all spectra to a reference feature

TIC normalization is one of the most common normalization methods for mass spectrometry imaging. For comparison between datasets, TIC normalization requires that all spectra are the same length. RMS normalization is more appropriate when spectra are of different lengths.

Normalization to a reference is the most reliable form of normalization, but is only possible when the experiment contains a known reference peak with a constant abundance throughout the dataset. This is often not possible in unsupervised and exploratory experiments.

7.3 Spectral smoothing

Use smoothSignal() to queue spectral smoothing to an MSImagingExperiment.

mse %>% smoothSignal(method="gaussian") %>%
    subsetPixels(x==10, y==10, run=="run0") %>%
    process(plot=TRUE,
            par=list(main="Gaussian smoothing", layout=c(3,1)))

mse %>% smoothSignal(method="ma") %>%
    subsetPixels(x==10, y==10, run=="run0") %>%
    process(plot=TRUE, par=list(main="Moving average smoothing"))

mse %>% smoothSignal(method="sgolay") %>%
    subsetPixels(x==10, y==10, run=="run0") %>%
    process(plot=TRUE, par=list(main="Savitzky-Golay smoothing"))

mse_pre <- smoothSignal(mse_pre, method="gaussian")
  • method="gaussian" performs smoothing with a Gaussian kernel

  • method="ma" performs moving average smoothing

  • method="sgolay" applies a Savitzky-Golay smoothing filter

7.4 Baseline correction

Use reduceBaseline() to queue baseline correction to an MSImagingExperiment.

mse %>% reduceBaseline(method="locmin") %>%
    subsetPixels(x==10, y==10, run=="run0") %>%
    process(plot=TRUE, par=list(main="Local minima", layout=c(2,1)))

mse %>% reduceBaseline(method="median") %>%
    subsetPixels(x==10, y==10, run=="run0") %>%
    process(plot=TRUE, par=list(main="Median interpolation"))

mse_pre <- reduceBaseline(mse_pre, method="locmin")
  • method="locmin" interpolates a baseline from local minima

  • method="median" splits a spectrum into blocks and interpolates from binned medians

7.5 Peak processing

Peak processing encompasses multiple steps, including (1) picking peaks, (2) aligning peaks, (3) filtering peaks, and (4) binning profile spectra to the detected peaks.

Prior to peak detection is a good time to apply the previous processing steps.

mse_pre <- process(mse_pre)

(This is optional, and not necessary if only the peaks are desired, and if it is acceptable to have peaks with intensities of zero in pixels where that peak was not detected.)

7.5.1 Peak picking

Use peakPick() to queue peak picking to an MSImagingExperiment.

mse_pre %>% subsetPixels(x==10, y==10, run=="run0") %>%
    process() %>%
    peakPick(method="mad") %>%
    process(plot=TRUE, par=list(main="MAD noise", layout=c(3,1)))

mse_pre %>% subsetPixels(x==10, y==10, run=="run0") %>%
    process() %>%
    peakPick(method="simple") %>%
    process(plot=TRUE,
            par=list(main="Simple SD noise"))

mse_pre %>% subsetPixels(x==10, y==10, run=="run0") %>%
    process() %>%
    peakPick(method="adaptive") %>%
    process(plot=TRUE, par=list(main="Adaptive SD noise"))

mse_peaks <- peakPick(mse_pre, method="mad")
  • method="mad" calculates adaptive noise from interpolating local mean absolute deviations

  • method="simple" calculates a constant noise from standard deviations of low-kurtosis bins

  • method="adaptive" calculates adaptive noise from standard deviations of low-kurtosis bins

7.5.2 Peak alignment

Use peakAlign() to queue peak alignment to an MSImagingExperiment.

mse_peaks <- peakAlign(mse_pre, tolerance=200, units="ppm")

Peaks are matched based on proximity of their m/z-values, according to tolerance, in either parts-per-millin (“ppm”) or absolute (“mz”) units.

The m/z-values of known reference peaks can be provided. If no reference is provided, the mean spectrum is calculated, and the local maxima of the mean spectrum are used as the reference.

7.5.3 Peak filtering

Use peakFilter() to queue peak filtering to an MSImagingExperiment.

mse_peaks <- peakFilter(mse_pre, freq.min=0.05)

The proportions of pixels where a peak was detected at each m/z-value are calculated. Only peaks with frequencies greater than freq.min are retained.

7.5.4 Peak binning

Use peakBin() to queue binning of spectra to reference peaks.

Typically, this is applied to processed profile spectra after peak detection, to extract a more accurate representation of the peak intensities.

mse_peaks <- process(mse_peaks)
mse_peaks <- peakBin(mse_pre, ref=mz(mse_peaks), type="height")
mse_peaks <- mse_peaks %>% process()

A tolerance in either parts-per-million (“ppm”) or absolute (“mz”) units is used to match the reference peaks to local maxima in each spectrum.

The peak is then expanded to the nearest local minima in both directions. The intensity of the peak is then summarized either by the maximum intensity (type="height") or sum of intensities (type="area").

Rather than use the m/z-values of the detected peaks, we can also use known reference peaks (in this case, from the design of the simulated data).

mzref <- mz(metadata(mse)$design$featureData)
mse_peaks <- peakBin(mse_pre, ref=mzref, type="height")
mse_peaks <- mse_peaks %>% subsetPixels(x %in% 9:11, y %in% 9:11) %>% process()
plot(mse_peaks, coord=list(x=10, y=10))

We typically recommend binning the processed profile spectra to detected or reference peaks to produce centroided spectra suitable for downstream analysis.

It is possible to use the detected and aligned peaks directly (after applying peakAlign() and peakFilter()), but pixels where a peak was not detected will have zero intensities for those peaks. Moreover, all peak intensities will be the peak heights, even when peak areas may be desired instead. However, using the detected peaks directly does mean that there is no need to calculate and store the processed profile spectra, so this saves on storage space.

Generally, it is preferable and more accurate to bin the processed profile spectra to the detected peaks whenever possible and reasonable.

7.6 Mass alignment

Although peak alignment and peak binning will generally account for small differences in m/z between spectra, alignment of the profile spectra is sometimes desireable as well.

Use mzAlign() to queue alignment of spectra so that peaks will have a consistent m/z-value.

First, we need to simulate spectra that are in need of calibration.

set.seed(2020)
mse4 <- simulateImage(preset=1, npeaks=10, from=500, to=600,
                            sdmz=750, units="ppm")

plot(mse4, pixel=185:195, xlim=c(535, 570), key=FALSE, superpose=TRUE)

If no reference spectrum is provided, the mean spectrum is calculated and used as the reference.

mse4_align <- mzAlign(mse4, tolerance=2000, units="ppm")
mse4_align <- process(mse4_align)
plot(mse4_align, pixel=185:195, xlim=c(535, 570), key=FALSE, superpose=TRUE)

The algorithm will try to match the most intense peaks in the reference to local maxima in each spectrum, within tolerance. If tolerance is too small, the matching local maxima may not be found. If tolerance is too large, then peaks may be matched to the wrong local maxima.

This method is still experimental and under development.

7.7 Mass binning

While the m/z binning scheme of MSProcessedImagingExperiment objects can be adjusted on-the-fly, this does not apply to other types of MSImagingExperiment.

Use mzBin() to queue binning of spectra to new m/z-values.

mse_bin <- mzBin(mse_pre, from=1000, to=1600, resolution=1000, units="ppm")
mse_bin <- subsetPixels(mse_bin, x %in% 9:11, y %in% 9:11) %>% process()
plot(mse_bin, coord=list(x=10, y=10))

This is useful if you need to combine datasets with different m/z-values.

7.8 Mass filtering

Sometimes it is necessary to filter the mass features of a dataset that has not been peak picked and aligned. For example, to remove noisy or low-intensity m/z-values.

Use mzFilter() to queue filtering of mass features.

mse_filt <- mzFilter(mse_pre, freq.min=0.05)
mse_filt <- subsetPixels(mse_filt, x %in% 9:11, y %in% 9:11) %>% process()
plot(mse_filt, coord=list(x=10, y=10), type='h')

mzFilter() and peakFilter() are actually the same function internally, but with different defaults for profile and centroided spectra, respectively.

Note that mzFilter() this does not set centroided() to TRUE; it is up to the user to decide whether the result represent centroided spectra or not, and set centroided() appropriately.

7.9 Example processing workflow

Delayed pre-processing makes it easy to chain together multiple pre-processing steps with the %>% operator, and then apply them all at once.

mse_proc <- mse %>%
    normalize() %>%
    smoothSignal() %>%
    reduceBaseline() %>%
    peakPick()

# preview processing
mse_proc %>%
    subsetPixels(x == 10, y == 10, run == "run0") %>%
    process(plot=TRUE, par=list(layout=c(2,2)))

# process detected peaks
mse_peakref <- mse_proc %>%
    peakAlign() %>%
    peakFilter() %>%
    process()

# bin profile spectra to peaks
mse_peaks <- mse %>%
    normalize() %>%
    smoothSignal() %>%
    reduceBaseline() %>%
    peakBin(mz(mse_peakref))

8 Advanced operations on MSImagingExperiment

Internally, most methods that iterate over pixels or mass features use some combination of pixelApply(), featureApply(), and spatialApply().

While summarize() is useful for summarizing a MSImagingExperiment, it is limited in that it can only apply summary functions that return a single value, which can be simplified into the columns of a MassDataFrame or PositionDataFrame.

By contrast, these functions (like apply() and lapply()) can return any value.

8.1 Using pixelApply() and featureApply()

pixelApply() is used to apply functions over pixels.

# calculate the TIC for every pixel
tic <- pixelApply(mse, sum)

image(mse, tic ~ x * y)

featureApply() is used to apply functions over features.

# calculate the mean spectrum
smean <- featureApply(mse, mean)

plot(mse, smean ~ mz)

Note that featureApply() will typically be slower than pixelApply() for out-of-memory data for MSProcessedImagingExperiment objects, due to the way the data is stored.

8.2 Using spatialApply()

spatialApply() is used to iterate over spatial neighborhoods of an imaging experiment.

Rather than vectors, it operates on matrices, where each column is a mass spectrum from a different pixel in the neighborhood.

# calculate a spatially-smoothed TIC
sptic <- spatialApply(mse, .r=1, .fun=mean)

image(mse, sptic ~ x * y)

See ?spatialApply for more details on attributes that are made available to the calling function (e.g., information about the neighborhood offsets, center pixel, etc.).

9 Parallel computing using BiocParallel

All pre-processing methods and most statistical analysis methods in Cardinal 2 can be executed in parallel using BiocParallel.

By default, a serial backend is used (no parallelization). This is for maximum stability and compatibility.

9.1 Using BPPARAM

Any method that supports parallelization includes BPPARAM as an argument (see method documentation). The BPPARAM argument can be used to specify a parallel backend for the operation, such as SerialParam(), MulticoreParam(), SnowParam(), etc.

# run in parallel, rather than serially
tic <- pixelApply(mse, sum, BPPARAM=MulticoreParam())

9.2 Backend types

Several parallelization backends are available, depending on OS:

  • SerialParam() creates a serial (non-parallel) backend. Use this to avoid potential issues caused by parallelization.

  • MulticoreParam() creates a multicore backend by forking the current R session. This is typically the fastest parallelization option, but is only available on macOS and Linux.

  • SnowParam() creates a SNOW backend by creating new R sessions via socket connections. This is typically slower than multicore, but is available on all platforms including Windows.

Use of MulticoreParam() will frequently improve speed on macOS and Linux dramatically. However, due to the extra overhead of SnowParam(), Windows users may prefer the default SerialParam(), depending on the size of the dataset.

9.3 Getting available backends

Available backends can be viewed with BiocParallel::registered().

BiocParallel::registered()
## $MulticoreParam
## class: MulticoreParam
##   bpisup: FALSE; bpnworkers: 4; bptasks: 0; bpjobname: BPJOB
##   bplog: FALSE; bpthreshold: INFO; bpstopOnError: TRUE
##   bpRNGseed: ; bptimeout: 2592000; bpprogressbar: FALSE
##   bpexportglobals: TRUE
##   bplogdir: NA
##   bpresultdir: NA
##   cluster type: FORK
## 
## $SnowParam
## class: SnowParam
##   bpisup: FALSE; bpnworkers: 4; bptasks: 0; bpjobname: BPJOB
##   bplog: FALSE; bpthreshold: INFO; bpstopOnError: TRUE
##   bpRNGseed: ; bptimeout: 2592000; bpprogressbar: FALSE
##   bpexportglobals: TRUE
##   bplogdir: NA
##   bpresultdir: NA
##   cluster type: SOCK
## 
## $SerialParam
## class: SerialParam
##   bpisup: FALSE; bpnworkers: 1; bptasks: 0; bpjobname: BPJOB
##   bplog: FALSE; bpthreshold: INFO; bpstopOnError: TRUE
##   bpRNGseed: ; bptimeout: 2592000; bpprogressbar: FALSE
##   bpexportglobals: TRUE
##   bplogdir: NA
##   bpresultdir: NA

The current backend used by Cardinal can be viewed with getCardinalBPPARAM():

getCardinalBPPARAM()
## class: SerialParam
##   bpisup: FALSE; bpnworkers: 1; bptasks: 0; bpjobname: BPJOB
##   bplog: FALSE; bpthreshold: INFO; bpstopOnError: TRUE
##   bpRNGseed: ; bptimeout: 2592000; bpprogressbar: FALSE
##   bpexportglobals: TRUE
##   bplogdir: NA
##   bpresultdir: NA

9.4 Setting a parallel backend

A new default backend can be set for use with Cardinal by calling setCardinalBPPARAM().

# register a SNOW backend
setCardinalBPPARAM(SnowParam())

See the BiocParallel package documentation for more details on available parallel backends.

9.5 RNG and reproducibility

For methods that rely on random number generation to be reproducible when run in parallel, the RNG must be set to “L’Ecuyer-CMRG” before setting a seed.

RNGkind("L'Ecuyer-CMRG")
set.seed(1)

10 Session information

sessionInfo()
## R version 4.0.0 (2020-04-24)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.4 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.11-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.11-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats4    parallel  stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
## [1] Cardinal_2.6.0      ProtGenerics_1.20.0 S4Vectors_0.26.0   
## [4] EBImage_4.30.0      BiocParallel_1.22.0 BiocGenerics_0.34.0
## [7] BiocStyle_2.16.0   
## 
## loaded via a namespace (and not attached):
##  [1] biglm_0.9-1         locfit_1.5-9.4      tidyselect_1.0.0   
##  [4] xfun_0.13           purrr_0.3.4         lattice_0.20-41    
##  [7] vctrs_0.2.4         htmltools_0.4.0     viridisLite_0.3.0  
## [10] yaml_2.2.1          rlang_0.4.5         pillar_1.4.3       
## [13] glue_1.4.0          DBI_1.1.0           sp_1.4-1           
## [16] jpeg_0.1-8.1        lifecycle_0.2.0     stringr_1.4.0      
## [19] htmlwidgets_1.5.1   evaluate_0.14       Biobase_2.48.0     
## [22] knitr_1.28          irlba_2.3.3         Rcpp_1.0.4.6       
## [25] BiocManager_1.30.10 magick_2.3          abind_1.4-5        
## [28] png_0.1-7           digest_0.6.25       stringi_1.4.6      
## [31] tiff_0.1-5          bookdown_0.18       dplyr_0.8.5        
## [34] grid_4.0.0          tools_4.0.0         bitops_1.0-6       
## [37] magrittr_1.5        RCurl_1.98-1.2      tibble_3.0.1       
## [40] crayon_1.3.4        pkgconfig_2.0.3     MASS_7.3-51.6      
## [43] ellipsis_0.3.0      Matrix_1.2-18       matter_1.14.0      
## [46] assertthat_0.2.1    rmarkdown_2.1       R6_2.4.1           
## [49] fftwtools_0.9-8     mclust_5.4.6        signal_0.7-6       
## [52] nlme_3.1-147        compiler_4.0.0