library(Risa)
library(xcms)
library(CAMERA)
library(pcaMethods)

Introduction

Indole-3-acetaldoxime (IAOx) represents an early intermediate of the biosynthesis of a variety of indolic secondary metabolites including the phytoanticipin indol-3-ylmethyl glucosinolate and the phytoalexin camalexin (3-thiazol-2’-yl-indole). Arabidopsis thaliana cyp79B2 cyp79B3 double knockout plants are completely impaired in the conversion of tryptophan to indole-3-acetaldoxime and do not accumulate IAOx-derived metabolites any longer. Consequently, comparative analysis of wild-type and cyp79B2 cyp79B3 plant lines has the potential to explore the complete range of IAOx-derived indolic secondary metabolites.

Since 2006, the Bioconductor package xcms (Smith et al, 2006) provides a rich set of algorithms for mass spectrometry data processing. Typically, xcms will create an xcmsSet object from several raw data files in an assay, which are obtained from the samples in the study.
Allowed raw data formats are netCDF, mzData, mzXML and mzML.

In this vignette, we demonstrate the processing of the MTBLS2 dataset, which was described in Neumann 2012.

A few global settings

A few things might be worth to define at the beginning of an analysis

## How many CPU cores has your machine (or cluster) ?
nSlaves=1

# prefilter <- c(3,200)  ## standard
prefilter=c(6,750)      ## quick-run for debugging

Raw data conversion

This can be done with the vendor tools, or the open source proteowizard converter. The preferred format should be mzML or mzData/mzXML. An overview of formats (and problems) is available at the xcms online help pages.

R and ISAtab

An ISAtab archive will contain the metadata description in several tab-separated files. (One of) the assay files contains the column Raw Spectral Data File with the paths to the mass spectral raw data files in one of the above formats.

ISAmtbls2 <- readISAtab(find.package("mtbls2"))
a.filename <- ISAmtbls2["assay.filenames"][[1]]

ISAtab, Risa and xcms

With the combination of Risa and xcms, we can convert the MS raw data in an ISAtab archive into an xcmsSet:

mtbls2Set <- processAssayXcmsSet(ISAmtbls2, a.filename,
                                 method="centWave", prefilter=prefilter, 
                                 snthr=25, ppm=25, 
                                 peakwidth=c(5,12),
                                 nSlaves=nSlaves)
## Use of argument 'nSlaves' is deprecated, please use 'BPPARAM' instead.
## Detecting mass traces at 25 ppm ... OK
## Detecting chromatographic peaks in 11335 regions of interest ... OK: 3707 found.
## Detecting mass traces at 25 ppm ... OK
## Detecting chromatographic peaks in 11193 regions of interest ... OK: 3690 found.
## Detecting mass traces at 25 ppm ... OK
## Detecting chromatographic peaks in 11156 regions of interest ... OK: 3722 found.
## Detecting mass traces at 25 ppm ... OK
## Detecting chromatographic peaks in 11146 regions of interest ... OK: 3771 found.
## Detecting mass traces at 25 ppm ... OK
## Detecting chromatographic peaks in 12661 regions of interest ... OK: 4045 found.
## Detecting mass traces at 25 ppm ... OK
## Detecting chromatographic peaks in 12041 regions of interest ... OK: 3862 found.
## Detecting mass traces at 25 ppm ... OK
## Detecting chromatographic peaks in 12489 regions of interest ... OK: 3969 found.
## Detecting mass traces at 25 ppm ... OK
## Detecting chromatographic peaks in 11940 regions of interest ... OK: 3897 found.
## Detecting mass traces at 25 ppm ... OK
## Detecting chromatographic peaks in 12627 regions of interest ... OK: 4124 found.
## Detecting mass traces at 25 ppm ... OK
## Detecting chromatographic peaks in 13344 regions of interest ... OK: 4226 found.
## Detecting mass traces at 25 ppm ... OK
## Detecting chromatographic peaks in 12897 regions of interest ... OK: 4185 found.
## Detecting mass traces at 25 ppm ... OK
## Detecting chromatographic peaks in 12689 regions of interest ... OK: 4190 found.
## Detecting mass traces at 25 ppm ... OK
## Detecting chromatographic peaks in 13007 regions of interest ... OK: 4429 found.
## Detecting mass traces at 25 ppm ... OK
## Detecting chromatographic peaks in 13666 regions of interest ... OK: 4407 found.
## Detecting mass traces at 25 ppm ... OK
## Detecting chromatographic peaks in 13033 regions of interest ... OK: 4337 found.
## Detecting mass traces at 25 ppm ... OK
## Detecting chromatographic peaks in 13646 regions of interest ... OK: 4377 found.

The result is the same type of xcmsSet object:

show(mtbls2Set)
## An "xcmsSet" object with 16 samples
## 
## Time range: 18.4-1147.6 seconds (0.3-19.1 minutes)
## Mass range: 99.5288-1003.5005 m/z
## Peaks: 64938 (about 4059 per sample)
## Peak Groups: 0 
## Sample classes: Col-0.Exp1, cyp79.Exp1, Col-0.Exp2, cyp79.Exp2 
## 
## Feature detection:
##  o Peak picking performed on MS1.
## Profile settings: method = bin
##                   step = 0.1
## 
## Memory usage: 6.37 MB

Several options exist to quantify the individual intensities. For each feature, additional attributes are available, such as the minimum/maximum and average retention time and m/z values.

Grouping and Retention time correction

In the following steps, we perform a grouping: because the UPLC system used here has very stable retention times, we just use the retention time correction step as quality control of the raw data. After that, ‘fillPeaks()’ will integrate the raw data for those features, which were not detected in some of the samples.

mtbls2Set <- group(mtbls2Set, minfrac=1, bw=3)
## Processing 7233 mz slices ... OK
retcor(mtbls2Set, plottype="mdevden")
## Performing retention time correction using 1293 peak groups.