Cardinal 3 lays the groundwork for future improvements to the existing toolbox of pre-processing, visualization, and statistical methods for mass spectrometry (MS) imaging experiments. Cardinal has been updated to support matter 2, and legacy support has been dropped.
Despite minimal user-visible changes in Cardinal (at first), the entire matter package that provides the backend for Cardinal’s computing on larger-than-memory MS imaging datasets has been rewritten. This should provide more robust support for larger-than-memory computations, as well as greater flexibility in handling many data files in the future.
Further changes will be coming soon to Cardinal 3 in future point updates that are aimed to greatly improve the user experience and simplify the code that users need to write to process and analyze MS imaging data.
Major improvements from earlier versions are further described below.
Cardinal 2 provides new classes and methods for the manipulation, transformation, visualization, and analysis of imaging experiments–specifically MS imaging experiments.
MS imaging is a rapidly advancing field with consistent improvements in instrumentation for both MALDI and DESI imaging experiments. Both mass resolution and spatial resolution are steadily increasing, and MS imaging experiments grow increasingly complex.
The first version of Cardinal was written with certain assumptions about MS imaging data that are no longer true. For example, the basic assumption that the raw spectra can be fully loaded into memory is unreasonable for many MS imaging experiments today.
Cardinal 2 was re-written from the ground up to handle the evolving needs of high-resolution MS imaging experiments. Some advancements include:
New imaging experiment classes such as ImagingExperiment
, SparseImagingExperiment
, and MSImagingExperiment
provide better support for out-of-memory datasets and a more flexible representation of complex experiments
New imaging metadata classes such as PositionDataFrame
and MassDataFrame
make it easier to manipulate experimental runs, pixel coordinates, and m/z-values by storing them as separate slots rather than ordinary columns
New plot()
and image()
visualization methods that can handle non-gridded pixel coordinates and allow assigning the resulting plot (and data) to a variable for later re-plotting
Support for writing imzML in addition to reading it; more options and support for importing out-of-memory imzML for both “continuous” and “processed” formats
Data manipulation and summarization verbs such as subset()
, aggregate()
, and summarizeFeatures()
, etc. for easier subsetting and summarization of imaging datasets
Delayed pre-processing via a new process()
method that allows queueing of delayed pre-processing methods such as normalize()
and peakPick()
for later execution
Parallel processing support via the BiocParallel package for all pre-processing methods and any statistical analysis methods with a BPPARAM
option
Classes from older versions of Cardinal should be coerced to their Cardinal 2 equivalents. For example, to return an updated MSImageSet
object called x
, use as(x, "MSImagingExperiment")
.
Cardinal can be installed via the BiocManager package.
install.packages("BiocManager")
BiocManager::install("Cardinal")
The same function can be used to update Cardinal and other Bioconductor packages.
Once installed, Cardinal can be loaded with library()
:
library(Cardinal)
MSImagingExperiment
In Cardinal, imaging experiment datasets are composed of multiple sets of metadata, in addition to the actual experimental data. These are (1) pixel metadata, (2) feature (\(m/z\)) metadata, (3) the actual imaging data, and (4) a class that holds all of these and represents the experiment as a whole.
MSImagingExperiment
is a matrix-like container for complete MS imaging experiments, where the “rows” represent the mass features, and the columns represent the pixels. An MSImagingExperiment
object contains the following components:
pixelData()
accesses the pixel information, stored in a PositionDataFrame
featureData()
accesses the feature information, stored in MassDataFrame
imageData()
accesses the spectral information, stored in a ImageArrayList
Unlike many software packages designed for analysis of MS imaging experiments, Cardinal is designed to work with multiple datasets simultaneously and can integrate all aspects of experimental design and metadata.
We will use simulateImage()
to prepare an example dataset.
set.seed(2020)
mse <- simulateImage(preset=1, npeaks=10, nruns=2, baseline=1)
mse
## An object of class 'MSContinuousImagingExperiment'
## <3919 feature, 800 pixel> imaging dataset
## imageData(1): intensity
## featureData(0):
## pixelData(1): circle
## metadata(1): design
## run(2): run0 run1
## raster dimensions: 20 x 20
## coord(2): x = 1..20, y = 1..20
## mass range: 426.5285 to 2044.4400
## centroided: FALSE
PositionDataFrame
The pixelData()
accessor extracts the pixel information for an MSImagingExperiment
. The pixel data are stored in a PositionDataFrame
object, which is a type of data frame that separately tracks pixel coordinates and experimental run information.
pixelData(mse)
## PositionDataFrame with 800 rows and 1 column
## :run: coord:x coord:y circle
## <factor> <integer> <integer> <logical>
## 1 run0 1 1 FALSE
## 2 run0 2 1 FALSE
## 3 run0 3 1 FALSE
## 4 run0 4 1 FALSE
## 5 run0 5 1 FALSE
## ... ... ... ... ...
## 796 run1 16 20 FALSE
## 797 run1 17 20 FALSE
## 798 run1 18 20 FALSE
## 799 run1 19 20 FALSE
## 800 run1 20 20 FALSE
The coord()
accessor specifically extracts the data frame of pixel coordinates.
coord(mse)
## DataFrame with 800 rows and 2 columns
## x y
## <integer> <integer>
## 1 1 1
## 2 2 1
## 3 3 1
## 4 4 1
## 5 5 1
## ... ... ...
## 796 16 20
## 797 17 20
## 798 18 20
## 799 19 20
## 800 20 20
The run()
accessor specifically extracts the vector of experimental runs.
run(mse)[1:10]
## [1] run0 run0 run0 run0 run0 run0 run0 run0 run0 run0
## Levels: run0 run1
MassDataFrame
The featureData()
accessor extracts the feature information for an MSImagingExperiment
. The feature data are stored in a MassDataFrame
object, which is a type of data frame that separately tracks the m/z-values associated with the mass spectral features.
featureData(mse)
## MassDataFrame with 3919 rows and 0 columns
## :mz:
## <numeric>
## 1 426.529
## 2 426.699
## 3 426.870
## 4 427.041
## 5 427.211
## ... ...
## 3915 2041.17
## 3916 2041.99
## 3917 2042.81
## 3918 2043.62
## 3919 2044.44
The mz()
accessor specifically extracts the vector of m/z-values.
mz(mse)[1:10]
## [1] 426.5285 426.6991 426.8699 427.0406 427.2115 427.3824 427.5534 427.7245
## [9] 427.8956 428.0668
The imageData()
accessor extracts the image data for an MSImagingExperiment
. The data are stored in an ImageArrayList
, which is a list of matrix-like objects.
It is possible to store multiple matrices of intensities in this list. Typically, only the first entry will be used by pre-processing and analysis methods.
imageData(mse)
## MSContinuousImagingSpectraList of length 1
## names(1): intensity
## class(1): matrix
## dim(1): <3919 x 800>
## mem(1): 25.08 MB
Entries in this list can be extracted by name with iData()
.
iData(mse, "intensity")
The spectra()
accessor is a shortcut for accessing the first data matrix.
spectra(mse)[1:5, 1:5]
## [,1] [,2] [,3] [,4] [,5]
## [1,] 0.9295940 0.9779923 0.9415157 0.9115036 0.8960595
## [2,] 1.0087009 1.3108664 1.0928983 1.0243944 1.0706272
## [3,] 1.0578001 1.0625834 1.2407371 0.9319758 0.8822412
## [4,] 0.8949165 1.1568158 0.9250994 0.9499621 1.0127282
## [5,] 1.0660395 1.0123048 1.0291570 0.8999156 1.1126816
Rows of these matrices correspond to mass features. Columns correspond to pixels. In other words, each column is a mass spectrum, and each row is a flattened vector of images.
In order to specialize to the needs of different datasets, Cardinal provides specialized versions of MSImagingExperiment
that reflect different spectra storage strategies.
MSContinuousImagingExperiment
is a specialization that enforces the data matrices be stored as dense, column-major matrices. These include R’s native matrix
and matter_matc
objects from the matter package.
mse
## An object of class 'MSContinuousImagingExperiment'
## <3919 feature, 800 pixel> imaging dataset
## imageData(1): intensity
## featureData(0):
## pixelData(1): circle
## metadata(1): design
## run(2): run0 run1
## raster dimensions: 20 x 20
## coord(2): x = 1..20, y = 1..20
## mass range: 426.5285 to 2044.4400
## centroided: FALSE
imageData(mse)
## MSContinuousImagingSpectraList of length 1
## names(1): intensity
## class(1): matrix
## dim(1): <3919 x 800>
## mem(1): 25.08 MB
MSProcessedImagingExperiment
is a specialization that enforces the data matrices be stored as sparse, column-major matrices. These include sparse_matc
objects from the matter package. This specialization is useful for processed data.
set.seed(2020)
mse2 <- simulateImage(preset=3, npeaks=10, nruns=2)
mse3 <- as(mse2, "MSProcessedImagingExperiment")
mse3
## An object of class 'MSProcessedImagingExperiment'
## <3919 feature, 800 pixel> imaging dataset
## imageData(1): intensity
## featureData(0):
## pixelData(3): square1 square2 circle
## metadata(1): design
## run(2): run0 run1
## raster dimensions: 20 x 20
## coord(2): x = 1..20, y = 1..20
## mass range: 426.5285 to 2044.4400
## centroided: FALSE
imageData(mse3)
## MSProcessedImagingSpectraList of length 1
## names(1): intensity
## class(1): sparse_mat
## dim(1): <3919 x 800>
## mem(1): 25.15 MB
Because the data is stored sparsely, spectra from MSProcessedImagingExperiment
objects are binned on-the-fly to the m/z-values specified by mz()
.
The resolution of the binning can be accessed by resolution()
.
resolution(mse3)
## ppm
## 400
The resolution can be set to change how the binning is performed.
resolution(mse3) <- c(ppm=200)
Changing the binned mass resolution will typically change the effective dimensions of the experiment.
dim(mse2)
## Features Pixels
## 3919 800
dim(mse3)
## Features Pixels
## 7837 800
The effective m/z-values are also updated to reflect the new bins.
mz(mse2)[1:10]
## [1] 426.5285 426.6991 426.8699 427.0406 427.2115 427.3824 427.5534 427.7245
## [9] 427.8956 428.0668
mz(mse3)[1:10]
## [1] 426.5285 426.6138 426.6991 426.7845 426.8699 426.9552 427.0406 427.1261
## [9] 427.2115 427.2969
Note that the underlying data will remain unchanged, but the binned values for the intensities will be different.
Cardinal 2 natively supports reading and writing imzML (both “continuous” and “processed” versions) and Analyze 7.5 formats via the readMSIData()
and writeMSIData()
functions.
We will focus on imzML.
The imzML format is an open standard designed specifically for interchange of mass spectrometry imaging datasets. Many other formats can be converted to imzML with the help of free applications available online at .
Let’s create a small image to demonstrate data import/export.
set.seed(2020)
tiny <- simulateImage(preset=1, from=500, to=600, dim=c(3,3))
tiny
## An object of class 'MSContinuousImagingExperiment'
## <456 feature, 9 pixel> imaging dataset
## imageData(1): intensity
## featureData(0):
## pixelData(1): circle
## metadata(1): design
## run(1): run0
## raster dimensions: 3 x 3
## coord(2): x = 1..3, y = 1..3
## mass range: 500.0000 to 599.8071
## centroided: FALSE
We’ll also create a “processed” version for writing “processed” imzML.
tiny2 <- as(tiny, "MSProcessedImagingExperiment")
tiny2
## An object of class 'MSProcessedImagingExperiment'
## <456 feature, 9 pixel> imaging dataset
## imageData(1): intensity
## featureData(0):
## pixelData(1): circle
## metadata(1): design
## run(1): run0
## raster dimensions: 3 x 3
## coord(2): x = 1..3, y = 1..3
## mass range: 500.0000 to 599.8071
## centroided: FALSE
Note that despite the name, the only difference between “continuous” and “processed” imzML is how the data are stored, rather than what processing has been applied to the data. “Continuous” imzML stores spectra densely, with each spectrum sharing the same m/z-values. “Processed” imzML stores spectra sparsely, and each spectrum can have its own distinct m/z-values.
Use writeMSIData()
to write datasets to imzML and Analyze formats.
Internally, writeMSIData()
will call either writeImzML()
or writeAnalyze()
depending on the value of outformat
. The default is outformat="imzML"
.
path <- tempfile()
writeMSIData(tiny, file=path, outformat="imzML")
Two files are produced with extensions “.imzML” and “.ibd”. The former contains an XML description of the dataset, and the latter contains the actual intensity data.
## [1] "file1b2a7623b2e42e.ibd" "file1b2a7623b2e42e.imzML"
Because tiny
is a MSContinuousImagingExperiment
, it is written as “continuous” imzML.
mzml <- readLines(paste0(path, ".imzML"))
grep("continuous", mzml, value=TRUE)
## [1] "\t\t\t<cvParam cvRef=\"IMS\" accession=\"IMS:1000030\" name=\"continuous\" value=\"\" />"
We can also write “processed” imzML if we export a MSProcessedImagingExperiment
file.
path2 <- tempfile()
writeMSIData(tiny2, file=path2, outformat="imzML")
mzml2 <- readLines(paste0(path2, ".imzML"))
grep("processed", mzml2, value=TRUE)
## [1] "\t\t\t<cvParam cvRef=\"IMS\" accession=\"IMS:1000031\" name=\"processed\" value=\"\" />"
If an experiment contains multiple runs, then each run will be written to a separate imzML file.
set.seed(2020)
tiny3 <- simulateImage(preset=1, from=500, to=600, dim=c(3,3), nruns=2)
runNames(tiny3)
## [1] "run0" "run1"
path3 <- tempfile()
writeMSIData(tiny3, file=path3, outformat="imzML")
## [1] "file1b2a7610fd5bcf-run0.ibd" "file1b2a7610fd5bcf-run0.imzML"
## [3] "file1b2a7610fd5bcf-run1.ibd" "file1b2a7610fd5bcf-run1.imzML"
Use readMSIData()
to read datasets in imzML or Analyze formats.
Internally, readMSIData()
will guess the format based on the file extension (which must be included) and call either readImzML()
or readAnalyze()
.
path_in <- paste0(path, ".imzML")
tiny_in <- readMSIData(path_in, attach.only=TRUE)
tiny_in
## An object of class 'MSContinuousImagingExperiment'
## <456 feature, 9 pixel> imaging dataset
## imageData(1): intensity
## featureData(0):
## pixelData(2): 3DPositionX 3DPositionY
## metadata(8): spectrum representation ibd binary type ... files name
## run(1): file1b2a7623b2e42e
## raster dimensions: 3 x 3
## coord(2): x = 1..3, y = 1..3
## mass range: 500.0000 to 599.8071
## centroided: FALSE
The attach.only
argument is used to specify that the intensity data should not be loaded into memory, but instead attached as a file-based matrix using the matter package.
Starting in Cardinal 2, the default is attach.only=TRUE
. This is more memory-efficient, but some methods may be more time-consuming due to the file I/O.
imageData(tiny_in)
## MSContinuousImagingSpectraList of length 1
## names(1): intensity
## class(1): matter_mat
## dim(1): <456 x 9>
## mem(1): 6.22 KB
spectra(tiny_in)
## <456 row x 9 col> matter_mat :: out-of-memory double matrix
## [,1] [,2] [,3] [,4] [,5] [,6] ...
## [1,] 0.00000000 0.04087833 0.00000000 0.00000000 0.13038468 0.03781159 ...
## [2,] 0.00000000 0.16240965 0.00000000 0.16388728 0.00000000 0.00000000 ...
## [3,] 0.00000000 0.00000000 0.03591078 0.00000000 0.00000000 0.09754688 ...
## [4,] 0.18744168 0.04057066 0.08853593 0.00000000 0.01664007 0.03274257 ...
## [5,] 0.00000000 0.19307238 0.00000000 0.00000000 0.00000000 0.00000000 ...
## [6,] 0.30000976 0.00000000 0.08789141 0.01251065 0.07945876 0.08794949 ...
## ... ... ... ... ... ... ... ...
## (6.22 KB real | 16.42 KB virtual)
An out-of-memory matter matrix can be subsetted like an ordinary R matrix. The data values are only read from file and loaded into memory when they are requested (via subsetting).
spectra(tiny_in)[1:5, 1:5]
## [,1] [,2] [,3] [,4] [,5]
## [1,] 0.0000000 0.04087833 0.00000000 0.0000000 0.13038468
## [2,] 0.0000000 0.16240965 0.00000000 0.1638873 0.00000000
## [3,] 0.0000000 0.00000000 0.03591078 0.0000000 0.00000000
## [4,] 0.1874417 0.04057066 0.08853593 0.0000000 0.01664007
## [5,] 0.0000000 0.19307238 0.00000000 0.0000000 0.00000000
If loading the data fully into memory is desired, either set attach.only=FALSE
when reading the data, or use as.matrix()
on the intensity matrix.
spectra(tiny_in) <- as.matrix(spectra(tiny_in))
imageData(tiny_in)
## MSContinuousImagingSpectraList of length 1
## names(1): intensity
## class(1): matrix
## dim(1): <456 x 9>
## mem(1): 33.05 KB
Using collect()
on the MSImagingExperiment
will also load all data into memory.
tiny_in <- collect(tiny_in)
For “processed” imzML files, the spectra must be binned to common m/z-values. By default, readImzML()
will detect the mass range from the data. This requires reading a large proportion of data from the file, even if attach.only=TRUE
.
path2_in <- paste0(path2, ".imzML")
tiny2_in <- readMSIData(path2_in)
tiny2_in
## An object of class 'MSProcessedImagingExperiment'
## <456 feature, 9 pixel> imaging dataset
## imageData(1): intensity
## featureData(0):
## pixelData(2): 3DPositionX 3DPositionY
## metadata(8): spectrum representation ibd binary type ... files name
## run(1): file1b2a76187710d5
## raster dimensions: 3 x 3
## coord(2): x = 1..3, y = 1..3
## mass range: 500.0000 to 599.8071
## centroided: FALSE
If known, it can be much more efficient to specify mass.range
when importing “processed” imzML data. This can also be used to pre-filter the data to a smaller mass range.
The size of the m/z bins can be changed with the resolution
argument.
tiny2_in <- readMSIData(path2_in, mass.range=c(510,590),
resolution=100, units="ppm")
tiny2_in
## An object of class 'MSProcessedImagingExperiment'
## <1458 feature, 9 pixel> imaging dataset
## imageData(1): intensity
## featureData(0):
## pixelData(2): 3DPositionX 3DPositionY
## metadata(8): spectrum representation ibd binary type ... files name
## run(1): file1b2a76187710d5
## raster dimensions: 3 x 3
## coord(2): x = 1..3, y = 1..3
## mass range: 510.000 to 589.993
## centroided: FALSE
The resolution for the m/z bins can be changed after importing the data by setting the resolution()
of the dataset.
resolution(tiny2_in) <- c(ppm=400)
While importing formats besides imzML and Analyze are not explicitly supported by Cardinal, if the data can be read into R, it is easy to construct a MSImagingExperiment
object from its components.
set.seed(2020)
s <- simulateSpectrum(n=9, peaks=10, from=500, to=600)
coord <- expand.grid(x=1:3, y=1:3)
run <- factor(rep("run0", nrow(coord)))
fdata <- MassDataFrame(mz=s$mz)
pdata <- PositionDataFrame(run=run, coord=coord)
out <- MSImagingExperiment(imageData=s$intensity,
featureData=fdata,
pixelData=pdata)
out
## An object of class 'MSContinuousImagingExperiment'
## <456 feature, 9 pixel> imaging dataset
## imageData(1): intensity
## featureData(0):
## pixelData(0):
## run(1): run0
## raster dimensions: 3 x 3
## coord(2): x = 1..3, y = 1..3
## mass range: 500.0000 to 599.8071
## centroided: FALSE
For loading data into R, read.csv()
and read.table()
can be used to read CSV and tab-delimited text files, respectively.
Likewise, write.csv()
and write.table()
can be used to write pixel metadata and feature metadata after coercing them to an ordinary R data.frame
with as.data.frame()
.
Use saveRDS()
and readRDS()
to save and read and entire R object such as a MSImagingExperiment
. Note that if intensity data is to be saved as well, it should be pulled into memory and coerced to an R matrix with as.matrix()
first.
Visualization of mass spectra and molecular ion images is vital for exploratory analysis of MS imaging experiments. Cardinal provides a plot()
method for plotting mass spectra and an image()
method for plotting ion images.
Cardinal 2 provides some useful visualization updates compared to previous versions:
A new default color scale (viridis) for images that doesn’t use the flawed rainbow color scheme
Non-gridded pixel coordinates are allowed, and plotting of non-rastered image data is better supported
The new plot()
and image()
methods return values that can be assigned to a variable for later re-plotting
plot()
Use plot()
to plot mass spectra. Either pixel
or coord
can be specified to determine which mass spectra should be plotted.
plot(mse, pixel=c(211, 611))
The plusminus
argument can be used to specify that multiple spectra should be averaged and plotted together.
plot(mse, coord=list(x=10, y=10), plusminus=1, fun=mean)
A formula can be specified to further customize mass spectra plotting. The LHS of the formula should specify one or more elements of imageData()
and the RHS of the formula should be a variable in featureData()
.
plot(mse, intensity + I(-log1p(intensity)) ~ mz, pixel=211, superpose=TRUE)
image()
Use image()
to plot ion images. Either feature
or mz
can be specified to determine which mass features should be plotted.
image(mse, mz=1200)
The plusminus
argument can be used to specify that multiple mass features should be averaged and plotted together.
image(mse, mz=1227, plusminus=0.5, fun=mean)
By default, images from all experimental runs are plotted. If this is not desired, a subset
can be specified instead.
image(mse, mz=1227, subset=run(mse) == "run0")
image(mse, mz=c(1200, 1227), subset=circle)
Multiplicative variance in spectral intensities can cause images to be noisy and dark due to hot spots.
Often, images may require some type of processing and enhancement to improve interpretation.
image(mse, mz=1200, smooth.image="gaussian")
image(mse, mz=1200, contrast.enhance="histogram")
The default viridis
colorscale is chosen to enable ease of interpretation. However, other colorscales can be chosen. All colorscales from the viridisLite
package are available in Cardinal, including cividis
, magma
, inferno
, and plasma
.
The magma
colorscale is used below with a different dataset.
image(mse2, mz=1136, colorscale=magma)
Cardinal will try to choose an appropriate layout for the images. However, a user-defined one can be specified by layout
. Use layout=NULL
to use the current plotting device as-is.
image(mse2, mz=c(781, 1361), layout=c(1,4), colorscale=magma)
Multiple images can be superposed with superpose=TRUE
. Use normalize.image="linear"
to normalize all images to the same intensity range.
image(mse2, mz=c(781, 1361), superpose=TRUE, normalize.image="linear")
Like plot()
, a formula can be specified. The LHS should specify one or more elements of imageData()
and the RHS should specify two to three variables from pixelData()
.
image(mse2, log1p(intensity) ~ x * y, mz=1136, colorscale=magma)
3D imaging datasets can be plotted with image3D()
.
set.seed(1)
mse3d <- simulateImage(preset=9, nruns=7, dim=c(7,7), npeaks=10,
peakheight=c(2,4), representation="centroid")
image3D(mse3d, mz=c(1001, 1159), colorscale=plasma, cex=3, theta=30, phi=30)
Any dataset with 3 or more spatial coordinates (columns in coord()
), can be plotted in 3D.
Use selectROI()
to select regions-of-interest on an ion image. It is important to specify a subset so that selection is only made on a single experimental run, otherwise results may be unexpected. The form of selectROI()
is the same as image()
.
sampleA <- selectROI(mse, mz=1200, subset=run(mse) == "run0")
sampleB <- selectROI(mse, mz=1200, subset=run(mse) == "run1")
selectROI()
returns a logical vector specifying which pixels from the imaging experiment are contained in the selected region.
makeFactor()
can then be used to combine multiple logical vectors (e.g., from selectROI()
) into a single factor.
regions <- makeFactor(A=sampleA, B=sampleB)
Plots and images can be saved to a file by using R’s built-in graphics devices.
Use pdf()
to initialize a PDF graphics device, create the plot, and then use dev.off()
to turn off the graphics device.
Any plots printed while the graphics device is active will be saved to the specified file(s).
pdf_file <- paste0(tempfile(), ".pdf")
pdf(file=pdf_file, width=9, height=4)
image(mse, mz=1200)
dev.off()
Graphics devices for png()
, jpeg()
, bmp()
, and tiff()
are also available. See their documentation for usage.
While many software for MS imaging data use a light-on-dark theme, Cardinal uses a transparent background by default. However, a dark theme is also provided through darkmode()
.
darkmode()
image(mse, mz=1200)
darkmode()
image(mse2, mz=1135, colorscale=magma)
Any future plots will use the new mode. The default mode can be restored with lightmode()
.
lightmode()
While plotting spectra should typically be fast for most datasets regardless of data format, plotting images will typically be slower for out-of-memory (file-based) datasets and for MSProcessedImagingExperiment
objects in general.
This is due to the way the spectra are stored in imzML and Analyze files, and the way sparse spectra are stored (regardless of location). Calculating and decompressing the images simply takes longer than reading the spectra.
For the fastest visualization of images, experiments should be coerced to an in-memory MSContinuousImagingExperiment
.
Also note that all Cardinal 2 visualizations produce a print()
-able object that can be assigned to a variable and print()
-ed later without the need to read the data again.
g <- image(mse, mz=1200)
print(g)
This is useful for re-creating or updating plots without accessing the data again.
MSImagingExperiment
MSImagingExperiment
are matrix-like objects that can be subsetted using the [
operator.
When subsetting, the “rows” are the mass features, and the “columns” are the pixels.
# subset first 10 mass features and first 5 pixels
mse[1:10, 1:5]
## An object of class 'MSContinuousImagingExperiment'
## <10 feature, 5 pixel> imaging dataset
## imageData(1): intensity
## featureData(0):
## pixelData(1): circle
## metadata(1): design
## run(1): run0
## raster dimensions: 5 x 1
## coord(2): x = 1..5, y = 1..1
## mass range: 426.5285 to 428.0668
## centroided: FALSE
Subsetting the dataset this way requires knowing the desired row and column indices.
features()
returns row indices based on specified feature metadata.
# get row index corresponding to m/z 1200
features(mse, mz=1200)
## [1] 2587
# get row indices corresponding to m/z 1199 - 1201
features(mse, 1199 < mz & mz < 1201)
## [1] 2585 2586 2587 2588 2589
pixels()
returns column indices based on specified pixel metadata.
# get column indices corresponding to x = 10, y = 10 in all runs
pixels(mse, coord=list(x=10, y=10))
## [1] 190 590
# get column indices corresponding to x <= 3, y <= 3 in "run0"
pixels(mse, x <= 3, y <= 3, run == "run0")
## [1] 1 2 3 21 22 23 41 42 43
These methods can be used to determine row/column indices of particular m/z-values or pixel coordinates to use for subsetting.
fid <- features(mse, 1199 < mz & mz < 1201)
pid <- pixels(mse, x <= 3, y <= 3, run == "run0")
mse[fid, pid]
## An object of class 'MSContinuousImagingExperiment'
## <5 feature, 9 pixel> imaging dataset
## imageData(1): intensity
## featureData(0):
## pixelData(1): circle
## metadata(1): design
## run(1): run0
## raster dimensions: 3 x 3
## coord(2): x = 1..3, y = 1..3
## mass range: 1199.043 to 1200.963
## centroided: FALSE
MSImagingExperiment
represents the data as a matrix, where each column is a mass spectrum, rather than as a true “data cube”. This is typically simpler when operating on the mass spectra, and more space efficient when the data is pixel-sparse (i.e., non-rectangular).
Sometimes, however, it is useful to index into the data as an actual “data cube”, with explicit array dimensions for each spatial dimension.
Use slice()
to slice an MSImagingExperiment
as a data cube and extract images.
# slice image for first mass feature
a <- slice(mse, 1)
dim(a)
## x y run
## 20 20 2
Any arguments to slice()
are passed to features()
, making it easy to select the desired image slices.
By default, array dimensions with only one level are dropped; use .preserve=TRUE
to keep all dimensions.
# slice image for m/z 1200
a2 <- slice(mse, mz=1200, drop=FALSE)
dim(a2)
## x y run feature
## 20 20 2 1
Note that when plotting images from raw arrays, the images are upside-down due to differing coordinate conventions used by graphics::image()
.
image(a2[,,1,1], col=bw.colors(100))
Because MSImagingExperiment
is matrix-like, rbind()
and cbind()
can be used to combine multiple MSImagingExperiment
objects by “row” or “column”, assumping certain conditions are met.
Use cbind()
to combine datasets from different experimental runs. The m/z-values must match between all datasets to succesfully combine them.
# divide dataset into separate objects for each run
mse_run0 <- mse[,run(mse) == "run0"]
mse_run1 <- mse[,run(mse) == "run1"]
mse_run0
## An object of class 'MSContinuousImagingExperiment'
## <3919 feature, 400 pixel> imaging dataset
## imageData(1): intensity
## featureData(0):
## pixelData(1): circle
## metadata(1): design
## run(1): run0
## raster dimensions: 20 x 20
## coord(2): x = 1..20, y = 1..20
## mass range: 426.5285 to 2044.4400
## centroided: FALSE
mse_run1
## An object of class 'MSContinuousImagingExperiment'
## <3919 feature, 400 pixel> imaging dataset
## imageData(1): intensity
## featureData(0):
## pixelData(1): circle
## metadata(1): design
## run(1): run1
## raster dimensions: 20 x 20
## coord(2): x = 1..20, y = 1..20
## mass range: 426.5285 to 2044.4400
## centroided: FALSE
# recombine the separate datasets back together
mse_cbind <- cbind(mse_run0, mse_run1)
mse_cbind
## An object of class 'MSContinuousImagingExperiment'
## <3919 feature, 800 pixel> imaging dataset
## imageData(1): intensity
## featureData(0):
## pixelData(1): circle
## metadata(1): design
## run(2): run0 run1
## raster dimensions: 20 x 20
## coord(2): x = 1..20, y = 1..20
## mass range: 426.5285 to 2044.4400
## centroided: FALSE
Some processing may be necessary to ensure datasets are compatible before combining them.
Most components of an MSImagingExperiment
that can be accessed through getter functions like pixelData()
, featureData()
, and imageData()
, can also be re-assigned with analogous setter functions. These can likewise be used to get and set columns of the pixel and feature metadata.
pData()
and fData()
are aliases for pixelData()
and featureData()
, respectively.
The $
operator will access the corresponding columns of pixelData()
.
mse$region <- makeFactor(circle=mse$circle,
bg=!mse$circle)
pData(mse)
## PositionDataFrame with 800 rows and 2 columns
## :run: coord:x coord:y circle region
## <factor> <integer> <integer> <logical> <factor>
## 1 run0 1 1 FALSE bg
## 2 run0 2 1 FALSE bg
## 3 run0 3 1 FALSE bg
## 4 run0 4 1 FALSE bg
## 5 run0 5 1 FALSE bg
## ... ... ... ... ... ...
## 796 run1 16 20 FALSE bg
## 797 run1 17 20 FALSE bg
## 798 run1 18 20 FALSE bg
## 799 run1 19 20 FALSE bg
## 800 run1 20 20 FALSE bg
iData()
can be used to access elements of the imageData()
list by name or index.
Using iData()
with no arguments besides the dataset will get or set the first element of imageData()
. Providing a name or index will get or set that element.
iData(mse, "log2intensity") <- log2(iData(mse) + 1)
imageData(mse)
## MSContinuousImagingSpectraList of length 2
## names(2): intensity log2intensity
## class(2): matrix matrix
## dim(2): <3919 x 800> <3919 x 800>
## mem(2): 25.08 MB 25.08 MB
For MSImagingExperiment
, spectra()
is an alias for iData()
.
spectra(mse, "log2intensity")[1:5, 1:5]
## [,1] [,2] [,3] [,4] [,5]
## [1,] 0.9482973 0.9840368 0.9571834 0.9347079 0.9230042
## [2,] 1.0062627 1.2084339 1.0655022 1.0174904 1.0500678
## [3,] 1.0411028 1.0444525 1.1639734 0.9500770 0.9124515
## [4,] 0.9221343 1.1089029 0.9449329 0.9634461 1.0091524
## [5,] 1.0468679 1.0088489 1.0208805 0.9259353 1.0790753
Whether or not the spectra have been centroided or not can be accessed using centroided()
centroided(mse)
## [1] FALSE
This can also be used to set whether the spectra should be treated as centroided or not.
centroided(mse) <- FALSE
Cardinal 2 implements several convenient data manipulation verbs for subsetting and summarizing MSImagingExperiment
objects.
subset()
subsets an MSImagingExperiment
subsetFeatures()
subsets an MSImagingExperiment
by row, i.e., mass features
subsetPixels()
subsets an MSImagingExperiment
by column, i.e., pixels
aggregate()
summarizes an MSImagingExperiment
by either feature or pixel
summarizeFeatures()
summarizes an MSImagingExperiment
by feature (e.g., mean spectrum)
summarizePixels()
summarizes an MSImagingExperiment
by pixel (e.g., TIC)
The %>%
operator can be used to chain these operations together. For file-based data, the subsetting should be quick, as only metadata is modified.
# subset by mass range
subsetFeatures(mse, mz > 700, mz < 900)
## An object of class 'MSContinuousImagingExperiment'
## <628 feature, 800 pixel> imaging dataset
## imageData(2): intensity log2intensity
## featureData(0):
## pixelData(2): circle region
## metadata(1): design
## run(2): run0 run1
## raster dimensions: 20 x 20
## coord(2): x = 1..20, y = 1..20
## mass range: 700.1392 to 899.7160
## centroided: FALSE
# subset by pixel coordinates
subsetPixels(mse, x <= 3, y <= 3, run == "run0")
## An object of class 'MSContinuousImagingExperiment'
## <3919 feature, 9 pixel> imaging dataset
## imageData(2): intensity log2intensity
## featureData(0):
## pixelData(2): circle region
## metadata(1): design
## run(1): run0
## raster dimensions: 3 x 3
## coord(2): x = 1..3, y = 1..3
## mass range: 426.5285 to 2044.4400
## centroided: FALSE
# subset by mass range + pixel coordinates
subset(mse, mz > 700 & mz < 900, x <=3 & y <= 3 & run == "run0")
## An object of class 'MSContinuousImagingExperiment'
## <628 feature, 9 pixel> imaging dataset
## imageData(2): intensity log2intensity
## featureData(0):
## pixelData(2): circle region
## metadata(1): design
## run(1): run0
## raster dimensions: 3 x 3
## coord(2): x = 1..3, y = 1..3
## mass range: 700.1392 to 899.7160
## centroided: FALSE
# chain verbs together
mse %>%
subsetFeatures(mz > 700, mz < 900) %>%
subsetPixels(x <= 3, y <= 3, run == "run0")
## An object of class 'MSContinuousImagingExperiment'
## <628 feature, 9 pixel> imaging dataset
## imageData(2): intensity log2intensity
## featureData(0):
## pixelData(2): circle region
## metadata(1): design
## run(1): run0
## raster dimensions: 3 x 3
## coord(2): x = 1..3, y = 1..3
## mass range: 700.1392 to 899.7160
## centroided: FALSE
# calculate mean spectrum
summarizeFeatures(mse, "mean", as="DataFrame")
## MassDataFrame with 3919 rows and 1 column
## :mz: mean
## <numeric> <numeric>
## 1 426.529 0.997692
## 2 426.699 0.999218
## 3 426.870 0.996708
## 4 427.041 0.994184
## 5 427.211 0.994789
## ... ... ...
## 3915 2041.17 0.0362793
## 3916 2041.99 0.0418589
## 3917 2042.81 0.0406153
## 3918 2043.62 0.0362828
## 3919 2044.44 0.0410249
# calculate tic
summarizePixels(mse, c(tic="sum"), as="DataFrame")
## PositionDataFrame with 800 rows and 1 column
## :run: coord:x coord:y tic
## <factor> <integer> <integer> <numeric>
## 1 run0 1 1 932.111
## 2 run0 2 1 939.237
## 3 run0 3 1 937.598
## 4 run0 4 1 933.216
## 5 run0 5 1 937.512
## ... ... ... ... ...
## 796 run1 16 20 934.901
## 797 run1 17 20 934.547
## 798 run1 18 20 938.088
## 799 run1 19 20 947.894
## 800 run1 20 20 931.592
# calculate mean spectrum of circle region
mse %>%
subsetPixels(region == "circle") %>%
summarizeFeatures("mean", as="DataFrame")
## MassDataFrame with 3919 rows and 1 column
## :mz: mean
## <numeric> <numeric>
## 1 426.529 1.002838
## 2 426.699 1.003761
## 3 426.870 0.995426
## 4 427.041 0.990888
## 5 427.211 0.999006
## ... ... ...
## 3915 2041.17 0.0351678
## 3916 2041.99 0.0430222
## 3917 2042.81 0.0389528
## 3918 2043.62 0.0376079
## 3919 2044.44 0.0414156
By default, Cardinal 2 does not load the spectra from imzML and Analyze files into memory, but retrieves them from files when necessary.
For very large datasets, this is necessary and memory-efficient.
However, for datasets that are known to fit in computer memory, this may be unnecessarily slow, especially when plotting images (which are perpendicular to how data are stored in the files).
# coerce spectra to file-based matter matrix
spectra(mse) <- matter::as.matter(spectra(mse))
spectra(mse)
## <3919 row x 800 col> matter_mat :: out-of-memory double matrix
## [,1] [,2] [,3] [,4] [,5] [,6] ...
## [1,] 0.9295940 0.9779923 0.9415157 0.9115036 0.8960595 0.9951040 ...
## [2,] 1.0087009 1.3108664 1.0928983 1.0243944 1.0706272 1.0655016 ...
## [3,] 1.0578001 1.0625834 1.2407371 0.9319758 0.8822412 0.9776461 ...
## [4,] 0.8949165 1.1568158 0.9250994 0.9499621 1.0127282 1.1561683 ...
## [5,] 1.0660395 1.0123048 1.0291570 0.8999156 1.1126816 0.9558122 ...
## [6,] 0.9217598 1.0664798 0.9839717 0.8448350 1.1348211 1.0852472 ...
## ... ... ... ... ... ... ... ...
## (5.82 KB real | 25.08 MB virtual)
imageData(mse)
## MSContinuousImagingSpectraList of length 2
## names(2): intensity log2intensity
## class(2): matter_mat matrix
## dim(2): <3919 x 800> <3919 x 800>
## mem(2): 5.82 KB 25.08 MB
Use as.matrix()
on the spectra()
to pull the spectra into memory as a dense matrix.
spectra(mse) <- as.matrix(spectra(mse))
imageData(mse)
## MSContinuousImagingSpectraList of length 2
## names(2): intensity log2intensity
## class(2): matrix matrix
## dim(2): <3919 x 800> <3919 x 800>
## mem(2): 25.08 MB 25.08 MB
Use as()
to coerce between different MSImagingExperiment
sub-classes.
mse3
## An object of class 'MSProcessedImagingExperiment'
## <7837 feature, 800 pixel> imaging dataset
## imageData(1): intensity
## featureData(0):
## pixelData(3): square1 square2 circle
## metadata(1): design
## run(2): run0 run1
## raster dimensions: 20 x 20
## coord(2): x = 1..20, y = 1..20
## mass range: 426.5285 to 2044.4400
## centroided: FALSE
as(mse3, "MSContinuousImagingExperiment")
## An object of class 'MSContinuousImagingExperiment'
## <7837 feature, 800 pixel> imaging dataset
## imageData(1): intensity
## featureData(0):
## pixelData(3): square1 square2 circle
## metadata(1): design
## run(2): run0 run1
## raster dimensions: 20 x 20
## coord(2): x = 1..20, y = 1..20
## mass range: 426.5285 to 2044.4400
## centroided: FALSE
This will often change the underlying data representation, so some information may be lost depending on the coercion.
Objects of the older MSImageSet
class can also be coerced to MSImagingExperiment
this way.
# requires CardinalWorkflows installed
data(cardinal, package="CardinalWorkflows")
cardinal2 <- as(cardinal, "MSImagingExperiment")
A major change in Cardinal 2 from earlier versions is how pre-processing is handled.
Instead of applying pre-processing immediately, each pre-processing step is queued to the dataset, and only applied once process()
is called.
This approach is more computationally and memory efficient in most cases, as ideally each spectrum is only processed once, and no extraneous copies of the data are made.
process()
On its own, the process()
method queues a new pre-processing function, and then applies all currently queued processing functions.
For example, the following code applies very basic total-ion-current (TIC) normalization to all spectra.
mse_tic <- process(mse, function(x) length(x) * x / sum(x), label="norm")
mse_tic
## An object of class 'MSContinuousImagingExperiment'
## <3919 feature, 800 pixel> imaging dataset
## imageData(2): intensity log2intensity
## featureData(0):
## pixelData(2): circle region
## metadata(1): design
## processing complete(1): norm
## processing pending(0):
## run(2): run0 run1
## raster dimensions: 20 x 20
## coord(2): x = 1..20, y = 1..20
## mass range: 426.5285 to 2044.4400
## centroided: FALSE
By default, this is applied immediately. However, delay=TRUE
delays this, and allows us to queue multiple pre-processing functions at once.
mse_pre <- mse %>%
process(function(x) length(x) * x / sum(x), label="norm", delay=TRUE) %>%
process(function(x) signal::sgolayfilt(x), label="smooth", delay=TRUE)
processingData(mse_pre)
## List of length 2
## names(2): norm smooth
We can view all pending and completed pre-processing steps in more detail.
mcols(processingData(mse_pre))[,-1]
## DataFrame with 2 rows and 6 columns
## kind pending complete has_pre has_post has_plot
## <character> <logical> <logical> <logical> <logical> <logical>
## norm pixel TRUE FALSE FALSE FALSE FALSE
## smooth pixel TRUE FALSE FALSE FALSE FALSE
Calling process()
on the dataset again without any other arguments will apply all queued pre-processing steps.
mse_proc <- process(mse_pre)
mse_proc
## An object of class 'MSContinuousImagingExperiment'
## <3919 feature, 800 pixel> imaging dataset
## imageData(2): intensity log2intensity
## featureData(0):
## pixelData(2): circle region
## metadata(1): design
## processing complete(2): norm smooth
## processing pending(0):
## run(2): run0 run1
## raster dimensions: 20 x 20
## coord(2): x = 1..20, y = 1..20
## mass range: 426.5285 to 2044.4400
## centroided: FALSE
Note that subsetting retains any queued pre-processing steps.
mse_pre %>%
subsetPixels(x <= 3, y <= 3) %>%
subsetFeatures(mz <= 1000) %>%
process()
## An object of class 'MSContinuousImagingExperiment'
## <2131 feature, 18 pixel> imaging dataset
## imageData(2): intensity log2intensity
## featureData(0):
## pixelData(2): circle region
## metadata(1): design
## processing complete(2): norm smooth
## processing pending(0):
## run(2): run0 run1
## raster dimensions: 3 x 3
## coord(2): x = 1..3, y = 1..3
## mass range: 426.5285 to 999.9239
## centroided: FALSE
All pre-processing methods for MSImagingExperiment
queue delayed processing by default.
If this is not desired, you can set options(Cardinal.delay=FALSE)
to apply all pre-processing steps immediately.
Use normalize()
to queue normalization to an MSImagingExperiment
.
mse_pre <- normalize(mse, method="tic")
method="tic"
performs total-ion-current (TIC) normalization
method="rms"
performs root-mean-square (RMS) normalization
method="reference"
normalizes all spectra to a reference feature
TIC normalization is one of the most common normalization methods for mass spectrometry imaging. For comparison between datasets, TIC normalization requires that all spectra are the same length. RMS normalization is more appropriate when spectra are of different lengths.
Normalization to a reference is the most reliable form of normalization, but is only possible when the experiment contains a known reference peak with a constant abundance throughout the dataset. This is often not possible in unsupervised and exploratory experiments.
Use smoothSignal()
to queue spectral smoothing to an MSImagingExperiment
.
mse %>% smoothSignal(method="gaussian") %>%
subsetPixels(x==10, y==10, run=="run0") %>%
process(plot=TRUE,
par=list(main="Gaussian smoothing", layout=c(3,1)))
mse %>% smoothSignal(method="ma") %>%
subsetPixels(x==10, y==10, run=="run0") %>%
process(plot=TRUE, par=list(main="Moving average smoothing"))
mse %>% smoothSignal(method="sgolay") %>%
subsetPixels(x==10, y==10, run=="run0") %>%
process(plot=TRUE, par=list(main="Savitzky-Golay smoothing"))
mse_pre <- smoothSignal(mse_pre, method="gaussian")
method="gaussian"
performs smoothing with a Gaussian kernel
method="ma"
performs moving average smoothing
method="sgolay"
applies a Savitzky-Golay smoothing filter
Use reduceBaseline()
to queue baseline correction to an MSImagingExperiment
.
mse %>% reduceBaseline(method="locmin") %>%
subsetPixels(x==10, y==10, run=="run0") %>%
process(plot=TRUE, par=list(main="Local minima", layout=c(2,1)))
mse %>% reduceBaseline(method="median") %>%
subsetPixels(x==10, y==10, run=="run0") %>%
process(plot=TRUE, par=list(main="Median interpolation"))
mse_pre <- reduceBaseline(mse_pre, method="locmin")
method="locmin"
interpolates a baseline from local minima
method="median"
splits a spectrum into blocks and interpolates from binned medians
Peak processing encompasses multiple steps, including (1) picking peaks, (2) aligning peaks, (3) filtering peaks, and (4) binning profile spectra to the detected peaks.
Prior to peak detection is a good time to apply the previous processing steps.
mse_pre <- process(mse_pre)
(This is optional, and not necessary if only the peaks are desired, and if it is acceptable to have peaks with intensities of zero in pixels where that peak was not detected.)
Use peakPick()
to queue peak picking to an MSImagingExperiment
.
mse_pre %>% subsetPixels(x==10, y==10, run=="run0") %>%
process() %>%
peakPick(method="mad") %>%
process(plot=TRUE, par=list(main="MAD noise", layout=c(3,1)))
mse_pre %>% subsetPixels(x==10, y==10, run=="run0") %>%
process() %>%
peakPick(method="simple") %>%
process(plot=TRUE,
par=list(main="Simple SD noise"))
mse_pre %>% subsetPixels(x==10, y==10, run=="run0") %>%
process() %>%
peakPick(method="adaptive") %>%
process(plot=TRUE, par=list(main="Adaptive SD noise"))
mse_peaks <- peakPick(mse_pre, method="mad")
method="mad"
calculates adaptive noise from interpolating local mean absolute deviations
method="simple"
calculates a constant noise from standard deviations of low-kurtosis bins
method="adaptive"
calculates adaptive noise from standard deviations of low-kurtosis bins
Use peakAlign()
to queue peak alignment to an MSImagingExperiment
.
mse_peaks <- peakAlign(mse_pre, tolerance=200, units="ppm")
Peaks are matched based on proximity of their m/z-values, according to tolerance
, in either parts-per-millin (“ppm”) or absolute (“mz”) units
.
The m/z-values of known reference peaks can be provided. If no reference is provided, the mean spectrum is calculated, and the local maxima of the mean spectrum are used as the reference.
Use peakFilter()
to queue peak filtering to an MSImagingExperiment
.
mse_peaks <- peakFilter(mse_pre, freq.min=0.05)
The proportions of pixels where a peak was detected at each m/z-value are calculated. Only peaks with frequencies greater than freq.min
are retained.
Use peakBin()
to queue binning of spectra to reference peaks.
Typically, this is applied to processed profile spectra after peak detection, to extract a more accurate representation of the peak intensities.
mse_peaks <- process(mse_peaks)
mse_peaks <- peakBin(mse_pre, ref=mz(mse_peaks), type="height")
mse_peaks <- mse_peaks %>% process()
A tolerance
in either parts-per-million (“ppm”) or absolute (“mz”) units
is used to match the reference peaks to local maxima in each spectrum.
The peak is then expanded to the nearest local minima in both directions. The intensity of the peak is then summarized either by the maximum intensity (type="height")
or sum of intensities (type="area")
.
Rather than use the m/z-values of the detected peaks, we can also use known reference peaks (in this case, from the design of the simulated data).
mzref <- mz(metadata(mse)$design$featureData)
mse_peaks <- peakBin(mse_pre, ref=mzref, type="height")
mse_peaks <- mse_peaks %>% subsetPixels(x %in% 9:11, y %in% 9:11) %>% process()
plot(mse_peaks, coord=list(x=10, y=10))
We typically recommend binning the processed profile spectra to detected or reference peaks to produce centroided spectra suitable for downstream analysis.
It is possible to use the detected and aligned peaks directly (after applying peakAlign()
and peakFilter()
), but pixels where a peak was not detected will have zero intensities for those peaks. Moreover, all peak intensities will be the peak heights, even when peak areas may be desired instead. However, using the detected peaks directly does mean that there is no need to calculate and store the processed profile spectra, so this saves on storage space.
Generally, it is preferable and more accurate to bin the processed profile spectra to the detected peaks whenever possible and reasonable.
Although peak alignment and peak binning will generally account for small differences in m/z between spectra, alignment of the profile spectra is sometimes desireable as well.
Use mzAlign()
to queue alignment of spectra so that peaks will have a consistent m/z-value.
First, we need to simulate spectra that are in need of calibration.
set.seed(2020)
mse4 <- simulateImage(preset=1, npeaks=10, from=500, to=600,
sdmz=750, units="ppm")
plot(mse4, pixel=185:195, xlim=c(535, 570), key=FALSE, superpose=TRUE)
To align the spectra, we need to provide a vector of reference m/z values of expected peaks. Here, we will simply use the peaks of the mean spectrum.
mse4_mean <- summarizeFeatures(mse4)
mse4_peaks <- peakPick(mse4_mean, SNR=2)
mse4_peaks <- peakAlign(mse4_peaks, tolerance=1000, units="ppm")
mse4_peaks <- process(mse4_peaks)
fData(mse4_peaks)
## MassDataFrame with 7 rows and 0 columns
## :mz:
## <numeric>
## 1 509.897
## 2 527.953
## 3 541.644
## 4 548.182
## 5 553.470
## 6 561.049
## 7 590.050
mse4_align1 <- mzAlign(mse4, ref=mz(mse4_peaks), tolerance=2000, units="ppm")
mse4_align1 <- process(mse4_align1)
plot(mse4_align1, pixel=185:195, xlim=c(535, 570), key=FALSE, superpose=TRUE)
If no reference spectrum is provided, the mean spectrum is calculated automatically and used as the reference.
mse4_align2 <- mzAlign(mse4, tolerance=2000, units="ppm")
mse4_align2 <- process(mse4_align2)
plot(mse4_align2, pixel=185:195, xlim=c(535, 570), key=FALSE, superpose=TRUE)
The algorithm will try to match the most intense peaks in the reference to local maxima in each spectrum, within tolerance
. If tolerance
is too small, the matching local maxima may not be found. If tolerance
is too large, then peaks may be matched to the wrong local maxima.
While the m/z binning scheme of MSProcessedImagingExperiment
objects can be adjusted on-the-fly, this does not apply to other types of MSImagingExperiment
.
Use mzBin()
to queue binning of spectra to new m/z-values.
mse_bin <- mzBin(mse_pre, from=1000, to=1600, resolution=1000, units="ppm")
mse_bin <- subsetPixels(mse_bin, x %in% 9:11, y %in% 9:11) %>% process()
plot(mse_bin, coord=list(x=10, y=10))
This is useful if you need to combine datasets with different m/z-values.
Sometimes it is necessary to filter the mass features of a dataset that has not been peak picked and aligned. For example, to remove noisy or low-intensity m/z-values.
Use mzFilter()
to queue filtering of mass features.
mse_filt <- mzFilter(mse_pre, freq.min=0.05)
mse_filt <- subsetPixels(mse_filt, x %in% 9:11, y %in% 9:11) %>% process()
plot(mse_filt, coord=list(x=10, y=10), type='h')
mzFilter()
and peakFilter()
are actually the same function internally, but with different defaults for profile and centroided spectra, respectively.
Note that mzFilter()
this does not set centroided()
to TRUE
; it is up to the user to decide whether the result represent centroided spectra or not, and set centroided()
appropriately.
Delayed pre-processing makes it easy to chain together multiple pre-processing steps with the %>%
operator, and then apply them all at once.
mse_proc <- mse %>%
normalize() %>%
smoothSignal() %>%
reduceBaseline() %>%
peakPick()
# preview processing
mse_proc %>%
subsetPixels(x == 10, y == 10, run == "run0") %>%
process(plot=TRUE, par=list(layout=c(2,2)))
# process detected peaks
mse_peakref <- mse_proc %>%
peakAlign() %>%
peakFilter() %>%
process()
# bin profile spectra to peaks
mse_peaks <- mse %>%
normalize() %>%
smoothSignal() %>%
reduceBaseline() %>%
peakBin(mz(mse_peakref))
MSImagingExperiment
Internally, most methods that iterate over pixels or mass features use some combination of pixelApply()
, featureApply()
, and spatialApply()
.
While summarize()
is useful for summarizing a MSImagingExperiment
, it is limited in that it can only apply summary functions that return a single value, which can be simplified into the columns of a MassDataFrame
or PositionDataFrame
.
By contrast, these functions (like apply()
and lapply()
) can return any value.
pixelApply()
and featureApply()
pixelApply()
is used to apply functions over pixels.
# calculate the TIC for every pixel
tic <- pixelApply(mse, sum)
image(mse, tic ~ x * y)
featureApply()
is used to apply functions over features.
# calculate the mean spectrum
smean <- featureApply(mse, mean)
plot(mse, smean ~ mz)
Note that featureApply()
will typically be slower than pixelApply()
for out-of-memory data for MSProcessedImagingExperiment
objects, due to the way the data is stored.
All pre-processing methods and most statistical analysis methods in Cardinal 2 can be executed in parallel using BiocParallel.
By default, a serial backend is used (no parallelization). This is for maximum stability and compatibility.
BPPARAM
Any method that supports parallelization includes BPPARAM
as an argument (see method documentation). The BPPARAM
argument can be used to specify a parallel backend for the operation, such as SerialParam()
, MulticoreParam()
, SnowParam()
, etc.
# run in parallel, rather than serially
tic <- pixelApply(mse, sum, BPPARAM=MulticoreParam())
Several parallelization backends are available, depending on OS:
SerialParam()
creates a serial (non-parallel) backend. Use this to avoid potential issues caused by parallelization.
MulticoreParam()
creates a multicore backend by forking the current R session. This is typically the fastest parallelization option, but is only available on macOS and Linux.
SnowParam()
creates a SNOW backend by creating new R sessions via socket connections. This is typically slower than multicore, but is available on all platforms including Windows.
Use of MulticoreParam()
will frequently improve speed on macOS and Linux dramatically. However, due to the extra overhead of SnowParam()
, Windows users may prefer the default SerialParam()
, depending on the size of the dataset.
Available backends can be viewed with BiocParallel::registered()
.
BiocParallel::registered()
## $MulticoreParam
## class: MulticoreParam
## bpisup: FALSE; bpnworkers: 4; bptasks: 0; bpjobname: BPJOB
## bplog: FALSE; bpthreshold: INFO; bpstopOnError: TRUE
## bpRNGseed: ; bptimeout: NA; bpprogressbar: FALSE
## bpexportglobals: TRUE; bpexportvariables: FALSE; bpforceGC: TRUE; bpfallback: TRUE
## bplogdir: NA
## bpresultdir: NA
## cluster type: FORK
##
## $SnowParam
## class: SnowParam
## bpisup: FALSE; bpnworkers: 4; bptasks: 0; bpjobname: BPJOB
## bplog: FALSE; bpthreshold: INFO; bpstopOnError: TRUE
## bpRNGseed: ; bptimeout: NA; bpprogressbar: FALSE
## bpexportglobals: TRUE; bpexportvariables: TRUE; bpforceGC: FALSE; bpfallback: TRUE
## bplogdir: NA
## bpresultdir: NA
## cluster type: SOCK
##
## $SerialParam
## class: SerialParam
## bpisup: FALSE; bpnworkers: 1; bptasks: 0; bpjobname: BPJOB
## bplog: FALSE; bpthreshold: INFO; bpstopOnError: TRUE
## bpRNGseed: ; bptimeout: NA; bpprogressbar: FALSE
## bpexportglobals: FALSE; bpexportvariables: FALSE; bpforceGC: FALSE; bpfallback: FALSE
## bplogdir: NA
## bpresultdir: NA
The current backend used by Cardinal can be viewed with getCardinalBPPARAM()
:
getCardinalBPPARAM()
## class: SerialParam
## bpisup: FALSE; bpnworkers: 1; bptasks: 0; bpjobname: BPJOB
## bplog: FALSE; bpthreshold: INFO; bpstopOnError: TRUE
## bpRNGseed: ; bptimeout: NA; bpprogressbar: FALSE
## bpexportglobals: FALSE; bpexportvariables: FALSE; bpforceGC: FALSE; bpfallback: FALSE
## bplogdir: NA
## bpresultdir: NA
A new default backend can be set for use with Cardinal by calling setCardinalBPPARAM()
.
# register a SNOW backend
setCardinalBPPARAM(SnowParam())
See the BiocParallel package documentation for more details on available parallel backends.
For methods that rely on random number generation to be reproducible when run in parallel, the RNG must be set to “L’Ecuyer-CMRG” before setting a seed.
RNGkind("L'Ecuyer-CMRG")
set.seed(1)
sessionInfo()
## R version 4.2.2 (2022-10-31)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.5 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.16-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.16-bioc/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] Cardinal_3.0.1 S4Vectors_0.36.0 EBImage_4.40.0
## [4] BiocParallel_1.32.1 BiocGenerics_0.44.0 ProtGenerics_1.30.0
## [7] BiocStyle_2.26.0
##
## loaded via a namespace (and not attached):
## [1] biglm_0.9-2.1 locfit_1.5-9.6 xfun_0.34
## [4] bslib_0.4.1 lattice_0.20-45 htmltools_0.5.3
## [7] viridisLite_0.4.1 yaml_2.3.6 rlang_1.0.6
## [10] jquerylib_0.1.4 DBI_1.1.3 sp_1.5-1
## [13] jpeg_0.1-9 stringr_1.4.1 htmlwidgets_1.5.4
## [16] codetools_0.2-18 evaluate_0.18 Biobase_2.58.0
## [19] knitr_1.40 fastmap_1.1.0 irlba_2.3.5.1
## [22] parallel_4.2.2 highr_0.9 Rcpp_1.0.9
## [25] BiocManager_1.30.19 cachem_1.0.6 magick_2.7.3
## [28] jsonlite_1.8.3 abind_1.4-5 png_0.1-7
## [31] digest_0.6.30 stringi_1.7.8 tiff_0.1-11
## [34] bookdown_0.30 grid_4.2.2 cli_3.4.1
## [37] tools_4.2.2 bitops_1.0-7 magrittr_2.0.3
## [40] sass_0.4.2 RCurl_1.98-1.9 MASS_7.3-58.1
## [43] Matrix_1.5-3 matter_2.0.0 rmarkdown_2.18
## [46] R6_2.5.1 fftwtools_0.9-11 mclust_6.0.0
## [49] signal_0.7-7 nlme_3.1-160 compiler_4.2.2