Package: peakPantheR
Authors: Arnaud Wolfer

Package for Peak Picking and ANnoTation of High resolution Experiments in R, implemented in R and Shiny

1 Overview

peakPantheR implements functions to detect, integrate and report pre-defined features in MS files (e.g. compounds, fragments, adducts, …).

It is designed for:

  • Real time feature detection and integration (see Real Time Annotation)
    • process multiple compounds in one file at a time
  • Post-acquisition feature detection, integration and reporting (see Parallel Annotation)
    • process multiple compounds in multiple files in parallel, store results in a single object

peakPantheR can process LC/MS data files in NetCDF, mzML/mzXML and mzData format as data import is achieved using Bioconductor’s mzR package.

2 Installation

To install peakPantheR from Bioconductor:

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("peakPantheR")

Install the development version of peakPantheR directly from GitHub with:

# Install devtools
if(!require("devtools")) install.packages("devtools")
devtools::install_github("phenomecentre/peakPantheR")

3 Input Data

Both real time and parallel compound integration require a common set of information:

  • Path(s) to netCDF / mzML MS file(s)
  • An expected region of interest (RT / m/z window) for each compound.

3.1 MS files

For demonstration purpose we can annotate a set a set of raw MS spectra (in NetCDF format) provided by the faahKO package. Briefly, this subset of the data from (Saghatelian et al. 2004) invesigate the metabolic consequences of knocking out the fatty acid amide hydrolase (FAAH) gene in mice. The dataset consists of samples from the spinal cords of 6 knock-out and 6 wild-type mice. Each file contains data in centroid mode acquired in positive ion mode form 200-600 m/z and 2500-4500 seconds.

Below we install the faahKO package and locate raw CDF files of interest:

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("faahKO")
library(faahKO)
## file paths
input_spectraPaths  <- c(system.file('cdf/KO/ko15.CDF', package = "faahKO"),
                        system.file('cdf/KO/ko16.CDF', package = "faahKO"),
                        system.file('cdf/KO/ko18.CDF', package = "faahKO"))
input_spectraPaths
#> [1] "/home/biocbuild/bbs-3.11-bioc/R/library/faahKO/cdf/KO/ko15.CDF"
#> [2] "/home/biocbuild/bbs-3.11-bioc/R/library/faahKO/cdf/KO/ko16.CDF"
#> [3] "/home/biocbuild/bbs-3.11-bioc/R/library/faahKO/cdf/KO/ko18.CDF"

3.2 Expected regions of interest

Expected regions of interest (targeted features) are specified using the following information:

  • cpdID (numeric)
  • cpdName (character)
  • rtMin (sec)
  • rtMax (sec)
  • rt (sec, optional / NA)
  • mzMin (m/z)
  • mzMax (m/z)
  • mz (m/z, optional / NA)

Below we define 2 features of interest that are present in the faahKO dataset and can be employed in subsequent vignettes:

# targetFeatTable
input_targetFeatTable <- data.frame(matrix(vector(), 2, 8, dimnames=list(c(), 
                        c("cpdID", "cpdName", "rtMin", "rt", "rtMax", "mzMin", 
                        "mz", "mzMax"))), stringsAsFactors=FALSE)
input_targetFeatTable[1,] <- c(1, "Cpd 1", 3310., 3344.888, 3390., 522.194778, 
                                522.2, 522.205222)
input_targetFeatTable[2,] <- c(2, "Cpd 2", 3280., 3385.577, 3440., 496.195038,
                                496.2, 496.204962)
input_targetFeatTable[,c(1,3:8)] <- sapply(input_targetFeatTable[,c(1,3:8)], 
                                            as.numeric)
cpdID cpdName rtMin rt rtMax mzMin mz mzMax
1 Cpd 1 3310 3344.888 3390 522.194778 522.2 522.205222
2 Cpd 2 3280 3385.577 3440 496.195038 496.2 496.204962

References

Saghatelian, A., S. A. Trauger, E. J. Want, E. G. Hawkins, G. Siuzdak, and B. F. Cravatt. 2004. “Assignment of Endogenous Substrates to Enzymes by Global Metabolite Profiling.” Biochemistry 43:14332–9. http://dx.doi.org/10.1021/bi0480335.