xcms 3.0.2
Package: xcms
Authors: Johannes Rainer
Modified: 2017-10-30 17:18:20
Compiled: Sat Mar 3 18:19:25 2018
This documents describes data import, exploration, preprocessing and analysis of
LCMS experiments with xcms
version >= 3. The examples and basic workflow was
adapted from the original LC/MS Preprocessing and Analysis with xcms vignette
from Colin A. Smith.
xcms
supports analysis of LC/MS data from files in (AIA/ANDI) NetCDF, mzML/mzXML
and mzData format. For the actual data import Bioconductor’s SRC_R[:exports
both]{Biocpkg(“mzR”)} is used. For demonstration purpose we will analyze a
subset of the data from [1] in which the metabolic consequences
of knocking out the fatty acid amide hydrolase (FAAH) gene in mice was
investigated. The raw data files (in NetCDF format) are provided with the faahKO
data package. The data set consists of samples from the spinal cords of 6
knock-out and 6 wild-type mice. Each file contains data in centroid mode
acquired in positive ion mode form 200-600 m/z and 2500-4500 seconds.
Below we load all required packages, locate the raw CDF files within the faahKO
package and build a phenodata data frame describing the experimental setup.
library(xcms)
library(faahKO)
library(RColorBrewer)
library(pander)
## Get the full path to the CDF files
cdfs <- dir(system.file("cdf", package = "faahKO"), full.names = TRUE,
recursive = TRUE)
## Create a phenodata data.frame
pd <- data.frame(sample_name = sub(basename(cdfs), pattern = ".CDF",
replacement = "", fixed = TRUE),
sample_group = c(rep("KO", 6), rep("WT", 6)),
stringsAsFactors = FALSE)
Subsequently we load the raw data as an OnDiskMSnExp
object using the
readMSData
method from the MSnbase
package. While the MSnbase
package was
originally developed for proteomics data processing, many of its functionality,
including raw data import and data representation, can be shared and reused in
metabolomics data analysis.
raw_data <- readMSData(files = cdfs, pdata = new("NAnnotatedDataFrame", pd),
mode = "onDisk")
The OnDiskMSnExp
object contains general information about the number of
spectra, retention times, the measured total ion current etc, but does not
contain the full raw data (i.e. the m/z and intensity values from each measured
spectrum). Its memory footprint is thus rather small making it an ideal object
to represent large metabolomics experiments while still allowing to perform
simple quality controls, data inspection and exploration as well as data
sub-setting operations. The m/z and intensity values are imported from the raw
data files on demand, hence the location of the raw data files should not be
changed after initial data import.
The OnDiskMSnExp
organizes the MS data by spectrum and provides the methods
intensity
, mz
and rtime
to access the raw data from the files (the measured
intensity values, the corresponding m/z and retention time values). In addition,
the spectra
method could be used to return all data encapsulated in Spectrum
classes. Below we extract the retention time values from the object.
head(rtime(raw_data))
## F01.S0001 F01.S0002 F01.S0003 F01.S0004 F01.S0005 F01.S0006
## 2501.378 2502.943 2504.508 2506.073 2507.638 2509.203
All data is returned as one-dimensional vectors (a numeric vector for rtime
and
a list
of numeric vectors for mz
and intensity
, each containing the values from
one spectrum), even if the experiment consists of multiple files/samples. The
fromFile
function returns a numeric vector that provides the mapping of the
values to the originating file. Below we use the fromFile
indices to organize
the mz
values by file.
mzs <- mz(raw_data)
## Split the list by file
mzs_by_file <- split(mzs, f = fromFile(raw_data))
length(mzs_by_file)
## [1] 12
As a first evaluation of the data we plot below the base peak chromatogram (BPC)
for each file in our experiment. We use the chromatogram
method and set the
aggregationFun
to "max"
to return for each spectrum the maximal intensity and
hence create the BPC from the raw data. To create a total ion chromatogram we
could set aggregationFun
to sum
.
## Get the base peak chromatograms. This reads data from the files.
bpis <- chromatogram(raw_data, aggregationFun = "max")
## Define colors for the two groups
group_colors <- brewer.pal(3, "Set1")[1:2]
names(group_colors) <- c("KO", "WT")
## Plot all chromatograms.
plot(bpis, col = group_colors[raw_data$sample_group])