xcms 3.14.1
Package: xcms
Authors: Johannes Rainer, Michael Witting
Modified: 2021-05-19 14:51:41
Compiled: Tue Jul 27 23:43:04 2021
Metabolite identification is an important step in non-targeted metabolomics and requires different steps. One involves the use of tandem mass spectrometry to generate fragmentation spectra of detected metabolites (LC-MS/MS), which are then compared to fragmentation spectra of known metabolites. Different approaches exist for the generation of these fragmentation spectra, whereas the most used is data dependent acquisition (DDA) also known as the top-n method. In this method the top N most intense m/z values from a MS1 scan are selected for fragmentation in the next N scans before the cycle starts again. This method allows to generate clean MS2 fragmentation spectra on the fly during acquisition without the need for further experiments, but suffers from poor coverage of the detected metabolites (since only a limited number of ions are fragmented).
Data independent approaches (DIA) like Bruker bbCID, Agilent AllIons or Waters MSe don’t use such a preselection, but rather fragment all detected molecules at once. They are using alternating schemes with scan of low and high collision energy to collect MS1 and MS2 data. Using this approach, there is no problem in coverage, but the relation between the precursor and fragment masses is lost leading to chimeric spectra. Sequential Window Acquisition of all Theoretical Mass Spectra (or SWATH [1]) combines both approaches through a middle-way approach. There is no precursor selection and acquisition is independent of acquired data, but rather than isolating all precusors at once, defined windows (i.e. ranges of m/z values) are used and scanned. This reduces the overlap of fragment spectra while still keeping a high coverage.
This document showcases the analysis of two small LC-MS/MS data sets using xcms. The data files used are reversed-phase LC-MS/MS runs from the Agilent Pesticide mix obtained from a Sciex 6600 Triple ToF operated in SWATH acquisition mode. For comparison a DDA file from the same sample is included.
Below we load the example DDA data set using the readMSData
function from the
MSnbase package.
library(xcms)
dda_file <- system.file("TripleTOF-SWATH", "PestMix1_DDA.mzML",
package = "msdata")
dda_data <- readMSData(dda_file, mode = "onDisk")
The variable dda_data
contains now all MS1 and MS2 spectra from the specified
mzML file. The number of spectra for each MS level is listed below.
table(msLevel(dda_data))
##
## 1 2
## 1504 2238
For the MS2 spectra we can get the m/z of the precursor ion with the
precursorMz
function. Below we first filter the data set by MS level, extract
the precursor m/z and call head
to just show the first 6 elements. For easier
readability we use the forward pipe operator %>%
from the magrittr
package.
library(magrittr)
dda_data %>%
filterMsLevel(2L) %>%
precursorMz() %>%
head()
## F1.S1570 F1.S1588 F1.S1592 F1.S1594 F1.S1595 F1.S1596
## 130.96578 388.25426 89.93779 83.99569 371.22409 388.25226
With the precursorIntensity
function it is also possible to extract the
intensity of the precursor ion.
dda_data %>%
filterMsLevel(2L) %>%
precursorIntensity() %>%
head()
## F1.S1570 F1.S1588 F1.S1592 F1.S1594 F1.S1595 F1.S1596
## 0 0 0 0 0 0
Some manufacturers (like Sciex for the present test data) don’t define/export
the precursor intensity and thus either NA
or 0
is reported. We can however
use the estimatePrecursorIntensity
function to determine the precursor
intensity for a MS 2 spectrum based on the intensity of the respective ion in
the previous MS1 scan (note that with method = "interpolation"
the precursor
intensity would be defined based on interpolation between the intensity in the
previous and subsequent MS1 scan). Below we estimate the precursor intensities,
on the full data (for MS1 spectra a NA
value is reported)
prec_int <- estimatePrecursorIntensity(dda_data)
We next set the precursor intensity in the spectrum metadata of dda_data
. So
that it can be extracted later with the precursorIntensity
function.
fData(dda_data)$precursorIntensity <- prec_int
dda_data %>%
filterMsLevel(2L) %>%
precursorIntensity() %>%
head()
## F1.S1570 F1.S1588 F1.S1592 F1.S1594 F1.S1595 F1.S1596
## 0.9691072 3.0772917 0.3885723 0.3215049 1.6329483 4.4057989
Next we perform the chromatographic peak detection on the MS level 1 data with
the findChromPeaks
method. Below we define the settings for a centWave-based
peak detection and perform the analysis.
cwp <- CentWaveParam(snthresh = 5, noise = 100, ppm = 10,
peakwidth = c(3, 30))
dda_data <- findChromPeaks(dda_data, param = cwp)
In total 112 peaks were identified in the present data set.
The advantage of LC-MS/MS data is that (MS1) ions are fragmented and the
corresponding MS2 spectra measured. Thus, for some of the ions (identified as
MS1 chromatographic peaks) MS2 spectra are available. These can facilitate the
annotation of the respective MS1 chromatographic peaks (or MS1 features after a
correspondence analysis). Spectra for identified chromatographic peaks can be
extracted with the chromPeakSpectra
method. MS2 spectra with their precursor
m/z and retention time within the rt and m/z range of the chromatographic peak
are returned. Parameter return.type
allows to define in which format these are
returned. With return.type = "List"
or return.type = "Spectra"
the data is
represented by a Spectra
object from the Spectra.
library(Spectra)
dda_spectra <- chromPeakSpectra(
dda_data, msLevel = 2L, return.type = "Spectra")
dda_spectra
## MSn data (Spectra) with 150 spectra in a MsBackendMzR backend:
## msLevel rtime scanIndex
## <integer> <numeric> <integer>
## F1.S1812 2 237.869 1812
## F1.S1846 2 241.299 1846
## F1.S2446 2 326.583 2446
## F1.S2450 2 327.113 2450
## F1.S2502 2 330.273 2502
## ... ... ... ...
## F1.S5110 2 574.725 5110
## F1.S5115 2 575.255 5115
## F1.S5272 2 596.584 5272
## F1.S5236 2 592.424 5236
## F1.S5266 2 596.054 5266
## ... 38 more variables/columns.
##
## file(s):
## PestMix1_DDA.mzML
By default chromPeakSpectra
returns all spectra associated with a MS1
chromatographic peak, but parameter method
allows to choose and return only
one spectrum per peak (have a look at the ?chromPeakSpectra
help page for more
details). Also, it would be possible to extract MS1 spectra for each peak by
specifying msLevel = 1L
in the call above (e.g. to evaluate the full MS1
signal at the peak’s apex position).
In the example above we selected to return the data as a Spectra
object. Spectra variables "peak_id"
and "peak_index"
contain the identifiers
and the index (in the chromPeaks
matrix) of the chromatographic peaks the MS2
spectrum is associated with.
dda_spectra$peak_id
## [1] "CP004" "CP004" "CP005" "CP006" "CP006" "CP008" "CP008" "CP011" "CP011"
## [10] "CP012" "CP012" "CP013" "CP013" "CP013" "CP013" "CP014" "CP014" "CP014"
## [19] "CP014" "CP018" "CP022" "CP022" "CP022" "CP022" "CP025" "CP025" "CP025"
## [28] "CP025" "CP026" "CP026" "CP026" "CP026" "CP033" "CP033" "CP034" "CP034"
## [37] "CP034" "CP034" "CP034" "CP035" "CP035" "CP035" "CP041" "CP041" "CP041"
## [46] "CP042" "CP042" "CP042" "CP042" "CP042" "CP043" "CP048" "CP048" "CP050"
## [55] "CP050" "CP050" "CP050" "CP051" "CP051" "CP051" "CP052" "CP052" "CP052"
## [64] "CP054" "CP055" "CP055" "CP056" "CP056" "CP056" "CP057" "CP057" "CP057"
## [73] "CP057" "CP057" "CP061" "CP061" "CP061" "CP061" "CP065" "CP065" "CP066"
## [82] "CP066" "CP067" "CP067" "CP069" "CP069" "CP069" "CP069" "CP070" "CP070"
## [91] "CP070" "CP070" "CP070" "CP072" "CP072" "CP072" "CP073" "CP074" "CP074"
## [100] "CP074" "CP074" "CP075" "CP075" "CP075" "CP077" "CP077" "CP077" "CP079"
## [109] "CP079" "CP079" "CP079" "CP080" "CP080" "CP080" "CP081" "CP087" "CP087"
## [118] "CP087" "CP087" "CP087" "CP089" "CP089" "CP089" "CP090" "CP090" "CP090"
## [127] "CP092" "CP092" "CP094" "CP094" "CP095" "CP095" "CP095" "CP096" "CP096"
## [136] "CP096" "CP097" "CP097" "CP097" "CP099" "CP099" "CP099" "CP099" "CP099"
## [145] "CP100" "CP100" "CP100" "CP101" "CP102" "CP102"
Note also that with return.type = "List"
a list parallel to the chromPeaks
matrix would be returned, i.e. each element in that list would contain the
spectra for the chromatographic peak with the same index. This data
representation might eventually simplify further processing.
We next use the MS2 information to aid in the annotation of a chromatographic peak. As an example we use a chromatographic peak of an ion with an m/z of 304.1131 which we extract in the code block below.
ex_mz <- 304.1131
chromPeaks(dda_data, mz = ex_mz, ppm = 20)
## mz mzmin mzmax rt rtmin rtmax into intb
## CP057 304.1133 304.1126 304.1143 425.024 417.985 441.773 13040.8 12884.14
## maxo sn sample
## CP057 3978.987 79 1
A search of potential ions with a similar m/z in a reference database (e.g. Metlin) returned a large list of potential hits, most with a very small ppm. For two of the hits, Flumazenil (Metlin ID 2724) and Fenamiphos (Metlin ID 72445) experimental MS2 spectra are available. Thus, we could match the MS2 spectrum for the identified chromatographic peak against these to annotate our ion. Below we extract all MS2 spectra that were associated with the candidate chromatographic peak using the ID of the peak in the present data set.
ex_id <- rownames(chromPeaks(dda_data, mz = ex_mz, ppm = 20))
ex_spectra <- dda_spectra[dda_spectra$peak_id == ex_id]
ex_spectra
## MSn data (Spectra) with 5 spectra in a MsBackendMzR backend:
## msLevel rtime scanIndex
## <integer> <numeric> <integer>
## F1.S3505 2 418.926 3505
## F1.S3510 2 419.306 3510
## F1.S3582 2 423.036 3582
## F1.S3603 2 423.966 3603
## F1.S3609 2 424.296 3609
## ... 38 more variables/columns.
##
## file(s):
## PestMix1_DDA.mzML
There are 5 MS2 spectra representing fragmentation of the ion(s) measured
in our candidate chromatographic peak. We next reduce this to a single MS2
spectrum using the combineSpectra
method employing the combinePeaks
function to determine which peaks to keep in the resulting spectrum (have a look
at the ?combinePeaks
help page for details). Parameter f
allows to specify
which spectra in the input object should be combined into one.
ex_spectrum <- combineSpectra(ex_spectra, FUN = combinePeaks, ppm = 20,
peaks = "intersect", minProp = 0.8,
intensityFun = median, mzFun = median,
f = ex_spectra$peak_id)
ex_spectrum
## MSn data (Spectra) with 1 spectra in a MsBackendDataFrame backend:
## msLevel rtime scanIndex
## <integer> <numeric> <integer>
## F1.S3505 2 418.926 3505
## ... 38 more variables/columns.
## Processing:
## Switch backend from MsBackendMzR to MsBackendDataFrame [Tue Jul 27 23:43:10 2021]
Mass peaks from all input spectra with a difference in m/z smaller 20 ppm
(parameter ppm
) were combined into one peak and the median m/z and intensity
is reported for these. Due to parameter minProp = 0.8
, the resulting MS2
spectrum contains only peaks that were present in 80% of the input spectra.
A plot of this consensus spectrum is shown below.
plotSpectra(ex_spectrum)
We could now match the consensus spectrum against a database of MS2 spectra. In
our example we simply load MS2 spectra for the two compounds with matching m/z
exported from Metlin. For each of the compounds MS2 spectra created with
collision energies of 0V, 10V, 20V and 40V are available. Below we import the
respective data and plot our candidate spectrum against the MS2 spectra of
Flumanezil and Fenamiphos (from a collision energy of 20V). To import files in
MGF format we have to load the MsBackendMgf
R package which adds MGF file
support to the Spectra
package. This package can be installed with
BiocManager::install("RforMassSpectrometry/MsBackendMgf")
.
Prior plotting we normalize our experimental spectra.
norm_fun <- function(z, ...) {
z[, "intensity"] <- z[, "intensity"] /
max(z[, "intensity"], na.rm = TRUE) * 100
z
}
ex_spectrum <- addProcessing(ex_spectrum, FUN = norm_fun)
library(MsBackendMgf)
flumanezil <- Spectra(
system.file("mgf", "metlin-2724.mgf", package = "xcms"),
source = MsBackendMgf())
## Start data import from 1 files ... done
fenamiphos <- Spectra(
system.file("mgf", "metlin-72445.mgf", package = "xcms"),
source = MsBackendMgf())
## Start data import from 1 files ... done
par(mfrow = c(1, 2))
plotSpectraMirror(ex_spectrum, flumanezil[3], main = "against Flumanezil",
ppm = 40)
plotSpectraMirror(ex_spectrum, fenamiphos[3], main = "against Fenamiphos",
ppm = 40)
Our candidate spectrum matches Fenamiphos, thus, our example chromatographic
peak represents signal measured for this compound. In addition to plotting the
spectra, we can also calculate similarities between them with the
compareSpectra
method (which uses by default the normalized dot-product to
calculate the similarity).
compareSpectra(ex_spectrum, flumanezil, ppm = 40)
## [1] 4.520957e-02 3.283806e-02 2.049379e-03 3.374354e-05
compareSpectra(ex_spectrum, fenamiphos, ppm = 40)
## [1] 0.1326234432 0.4879399946 0.7198406271 0.3997922658 0.0004876129
## [6] 0.0028408885 0.0071030051 0.0053809736
Clearly, the candidate spectrum does not match Flumanezil, while it has a high
similarity to Fenamiphos. While we performed here the MS2-based annotation on a
single chromatographic peak, this could be easily extended to the full list of
MS2 spectra (returned by chromPeakSpectra
) for all chromatographic peaks in an
experiment. See also here.
In the present example we used only a single data file and we did thus not need
to perform a sample alignment and correspondence analysis. These tasks could
however be performed similarly to plain LC-MS data, retention times of
recorded MS2 spectra would however also be adjusted during alignment based on
the MS1 data. After correspondence analysis (peak grouping) MS2 spectra for
features can be extracted with the featureSpectra
function which returns all
MS2 spectra associated with any chromatographic peak of a feature.
Note also that this workflow can be included into the Feature-Based Molecular Networking FBMN to match MS2 spectra against GNPS. See here for more details and examples.
In this section we analyze a small SWATH data set consisting of a single mzML
file with data from the same sample analyzed in the previous section but
recorded in SWATH mode. We again read the data with the readMSData
function. The resulting object will contain all recorded MS1 and MS2
spectra in the specified file.
swath_file <- system.file("TripleTOF-SWATH",
"PestMix1_SWATH.mzML",
package = "msdata")
swath_data <- readMSData(swath_file, mode = "onDisk")
Below we determine the number of MS level 1 and 2 spectra in the present data set.
table(msLevel(swath_data))
##
## 1 2
## 444 3556
As described in the introduction, in SWATH mode all ions within pre-defined
isolation windows are fragmented and MS2 spectra measured. The definition of
these isolation windows (SWATH pockets) is imported from the mzML files and
stored in the object’s fData
(which provides additional annotations for each
individual spectrum). Below we inspect the respective information for the first
few spectra. The upper and lower isolation window m/z can be extracted with the
isolationWindowLowerMz
and isolationWindowUpperMz
.
head(fData(swath_data)[, c("isolationWindowTargetMZ",
"isolationWindowLowerOffset",
"isolationWindowUpperOffset",
"msLevel", "retentionTime")])
## isolationWindowTargetMZ isolationWindowLowerOffset
## F1.S2000 208.95 21.95
## F1.S2001 244.05 14.15
## F1.S2002 270.85 13.65
## F1.S2003 299.10 15.60
## F1.S2004 329.80 16.10
## F1.S2005 367.35 22.45
## isolationWindowUpperOffset msLevel retentionTime
## F1.S2000 21.95 2 200.084
## F1.S2001 14.15 2 200.181
## F1.S2002 13.65 2 200.278
## F1.S2003 15.60 2 200.375
## F1.S2004 16.10 2 200.472
## F1.S2005 22.45 2 200.569
head(isolationWindowLowerMz(swath_data))
## [1] 187.0 229.9 257.2 283.5 313.7 344.9
head(isolationWindowUpperMz(swath_data))
## [1] 230.9 258.2 284.5 314.7 345.9 389.8
In the present data set we use the value of the isolation window target m/z to define the individual SWATH pockets. Below we list the number of spectra that are recorded in each pocket/isolation window.
table(isolationWindowTargetMz(swath_data))
##
## 163.75 208.95 244.05 270.85 299.1 329.8 367.35 601.85
## 444 445 445 445 445 444 444 444
We have thus 1,000 MS2 spectra measured in each isolation window.
Similar to a conventional LC-MS analysis, we perform first a chromatographic
peak detection (on the MS level 1 data) with the findChromPeaks
method. Below
we define the settings for a centWave-based peak detection and perform the
analysis.
cwp <- CentWaveParam(snthresh = 5, noise = 100, ppm = 10,
peakwidth = c(3, 30))
swath_data <- findChromPeaks(swath_data, param = cwp)
Next we perform a chromatographic peak detection in the MS level 2 data of each
isolation window. We use the findChromPeaksIsolationWindow
function employing
the same peak detection algorithm reducing however the required signal-to-noise
ratio. The isolationWindow
parameter allows to specify which MS2 spectra
belong to which isolation window and hence defines in which set of MS2 spectra
chromatographic peak detection should be performed. While the default value for
this parameter uses isolation windows provided by calling
isolationWindowTargetMz
on the object, it would also be possible to manually
define the isolation windows, e.g. if the corresponding information is not
available in the input mzML files.
cwp <- CentWaveParam(snthresh = 3, noise = 10, ppm = 10,
peakwidth = c(3, 30))
swath_data <- findChromPeaksIsolationWindow(swath_data, param = cwp)
The findChromPeaksIsolationWindow
function added all peaks identified in the
individual isolation windows to the chromPeaks
matrix containing already the
MS1 chromatographic peaks. These newly added peaks can be identified by the
value of the "isolationWindow"
column in the corresponding row in
chromPeakData
, which lists also the MS level in which the peak was identified.
chromPeakData(swath_data)
## DataFrame with 368 rows and 6 columns
## ms_level is_filled isolationWindow isolationWindowTargetMZ
## <integer> <logical> <factor> <numeric>
## CP01 1 FALSE NA NA
## CP02 1 FALSE NA NA
## CP03 1 FALSE NA NA
## CP04 1 FALSE NA NA
## CP05 1 FALSE NA NA
## ... ... ... ... ...
## CP364 2 FALSE 601.85 601.85
## CP365 2 FALSE 601.85 601.85
## CP366 2 FALSE 601.85 601.85
## CP367 2 FALSE 601.85 601.85
## CP368 2 FALSE 601.85 601.85
## isolationWindowLowerMz isolationWindowUpperMz
## <numeric> <numeric>
## CP01 NA NA
## CP02 NA NA
## CP03 NA NA
## CP04 NA NA
## CP05 NA NA
## ... ... ...
## CP364 388.8 814.9
## CP365 388.8 814.9
## CP366 388.8 814.9
## CP367 388.8 814.9
## CP368 388.8 814.9
Below we count the number of chromatographic peaks identified within each isolation window (the number of chromatographic peaks identified in MS1 is 62).
table(chromPeakData(swath_data)$isolationWindow)
##
## 163.75 208.95 244.05 270.85 299.1 329.8 367.35 601.85
## 2 38 32 14 105 23 61 31
We thus successfully identified chromatographic peaks in the different MS levels and isolation windows, but don’t have any actual MS2 spectra yet. These have to be reconstructed from the available chromatographic peak data which we will done in the next section.
Identifying the signal of the fragment ions for the precursor measured by each MS1 chromatographic peak is a non-trivial task. The MS2 spectrum of the fragment ion for each MS1 chromatographic peak has to be reconstructed from the available MS2 signal (i.e. the chromatographic peaks identified in MS level 2). For SWATH data, fragment ion signal should be present in the isolation window that contains the m/z of the precursor ion and the chromatographic peak shape of the MS2 chromatographic peaks of fragment ions of a specific precursor should have a similar retention time and peak shape than the precursor’s MS1 chromatographic peak.
After detection of MS1 and MS2 chromatographic peaks has been performed, we can
reconstruct the MS2 spectra using the reconstructChromPeakSpectra
function. This function defines an MS2 spectrum for each MS1 chromatographic
peak based on the following approach:
diffRt
parameter).minCor
are retained.To illustrate this process we perform the individual steps on the example of Fenamiphos (exact mass 303.105800777 and m/z of [M+H]+ adduct 304.113077). As a first step we extract the chromatographic peak for this ion.
fenamiphos_mz <- 304.113077
fenamiphos_ms1_peak <- chromPeaks(swath_data, mz = fenamiphos_mz, ppm = 2)
fenamiphos_ms1_peak
## mz mzmin mzmax rt rtmin rtmax into intb
## CP34 304.1124 304.1121 304.1126 423.945 419.445 428.444 10697.34 10688.34
## maxo sn sample
## CP34 2401.849 618 1
Next we identify all MS2 chromatographic peaks that were identified in the
isolation window containing the m/z of Fenamiphos. The information on the
isolation window in which a chromatographic peak was identified is available in
the chromPeakData
(which contains arbitrary additional annotations to each
individual chromatographic peak).
keep <- chromPeakData(swath_data)$isolationWindowLowerMz < fenamiphos_mz &
chromPeakData(swath_data)$isolationWindowUpperMz > fenamiphos_mz
We also require the retention time of the MS2 chromatographic peaks to be similar to the retention time of the MS1 peak and extract the corresponding peak information.
keep <- keep &
chromPeaks(swath_data)[, "rtmin"] < fenamiphos_ms1_peak[, "rt"] &
chromPeaks(swath_data)[, "rtmax"] > fenamiphos_ms1_peak[, "rt"]
fenamiphos_ms2_peak <- chromPeaks(swath_data)[which(keep), ]
In total 24 MS2 chromatographic peaks match all the
above condition. Next we extract their corresponding ion chromatograms, as well
as the ion chromatogram of the MS1 peak. In addition we have to filter the
object first by isolation window, keeping only spectra that were measured in
that specific window and to specify to extract the chromatographic data from MS2
spectra (with msLevel = 2L
).
rtr <- fenamiphos_ms1_peak[, c("rtmin", "rtmax")]
mzr <- fenamiphos_ms1_peak[, c("mzmin", "mzmax")]
fenamiphos_ms1_chr <- chromatogram(swath_data, rt = rtr, mz = mzr)
rtr <- fenamiphos_ms2_peak[, c("rtmin", "rtmax")]
mzr <- fenamiphos_ms2_peak[, c("mzmin", "mzmax")]
fenamiphos_ms2_chr <- chromatogram(
filterIsolationWindow(swath_data, mz = fenamiphos_mz),
rt = rtr, mz = mzr, msLevel = 2L)
We can now plot the extracted ion chromatogram of the MS1 and the extracted MS2 data.
plot(rtime(fenamiphos_ms1_chr[1, 1]),
intensity(fenamiphos_ms1_chr[1, 1]),
xlab = "retention time [s]", ylab = "intensity", pch = 16,
ylim = c(0, 5000), col = "blue", type = "b", lwd = 2)
#' Add data from all MS2 peaks
tmp <- lapply(fenamiphos_ms2_chr@.Data,
function(z) points(rtime(z), intensity(z),
col = "#00000080",
type = "b", pch = 16))