Package: peakPantheR
Authors: Arnaud Wolfer, Goncalo Correia

1 Introduction

The peakPantheR package is designed for the detection, integration and reporting of pre-defined features in MS files (e.g. compounds, fragments, adducts, …).

The Parallel Annotation is set to detect and integrate multiple compounds in multiple files in parallel and store results in a single object. It can be employed to integrate a large number of expected features across a dataset.

Using the faahKO raw MS dataset as an example, this vignette will:

  • Detail the Parallel Annotation concept
  • Apply the Parallel Annotation to a subset of pre-defined features in the faahKO dataset

1.1 Abbreviations

  • ROI: Regions Of Interest
    • reference RT / m/z windows in which to search for a feature
  • uROI: updated Regions Of Interest
    • modifed ROI adapted to the current dataset which override the reference ROI
  • FIR: Fallback Integration Regions
    • RT / m/z window to integrate if no peak is found
  • TIC: Total Ion Chromatogram
    • the intensities summed across all masses for each scan
  • EIC: Extracted Ion Chromatogram
    • the intensities summed over a mass range, for each scan

2 Parallel Annotation Concept

Parallel compound integration is set to process multiple compounds in multiple files in parallel, and store results in a single object.

To achieve this, peakPantheR will:

  1. load a list of expected RT / m/z ROI and a list of files to process
  2. initialise an output object with expected ROI and file paths
  3. first pass (without peak filling) on a subset of representative samples (e.g QC samples):
    • for each file, detect features in each ROI and keep highest intensity
    • determine peak statistics for each feature
    • store results + EIC for each ROI
  4. visual inspection of first pass results, update ROI:
    • diagnostic plots: all EICs, peak apex RT / m/z & peak width evolution
    • correct ROI (remove interfering feature, correct RT shift)
    • define fallback integration regions (FIR) if no feature is detected (median RT / m/z start and end of found features)
  5. initialise a new output object, with updated regions of interest (uROI) and fallback integration regions (FIR), with all samples
  6. second pass (with peak filling) on all samples:
    • for each file, detect features in each uROI and keep highest intensity
    • determine peak statistics for each feature
    • integrate FIR when no peaks are found
    • store results + EIC for each uROI
  7. summary statistics:
    • plot EICs, apex and peakwidth evolution
    • compare first and second pass
  8. return the resulting object and/or table (row: file, col: compound)

Diagram of the workflow and functions used for parallel annotation.

3 Parallel Annotation Example

We can target 2 pre-defined features in 6 raw MS spectra file from the faahKO package using peakPantheR_parallelAnnotation(). For more details on the installation and input data employed, please consult the Getting Started with peakPantheR vignette.

3.1 Input Data

First the paths to 3 MS file from the faahKO are located and used as input spectras. In this example these 3 samples are considered as representative of the whole run (e.g. Quality Control samples):

library(faahKO)
## file paths
input_spectraPaths  <- c(system.file('cdf/KO/ko15.CDF', package = "faahKO"),
                        system.file('cdf/KO/ko16.CDF', package = "faahKO"),
                        system.file('cdf/KO/ko18.CDF', package = "faahKO"))
input_spectraPaths
#> [1] "/home/biocbuild/bbs-3.15-bioc/R/library/faahKO/cdf/KO/ko15.CDF"
#> [2] "/home/biocbuild/bbs-3.15-bioc/R/library/faahKO/cdf/KO/ko16.CDF"
#> [3] "/home/biocbuild/bbs-3.15-bioc/R/library/faahKO/cdf/KO/ko18.CDF"

Two targeted features (e.g. compounds, fragments, adducts, …) are defined and stored in a table with as columns:

  • cpdID (numeric)
  • cpdName (character)
  • rtMin (sec)
  • rtMax (sec)
  • rt (sec, optional / NA)
  • mzMin (m/z)
  • mzMax (m/z)
  • mz (m/z, optional / NA)
# targetFeatTable
input_targetFeatTable <- data.frame(matrix(vector(), 2, 8, dimnames=list(c(), 
                        c("cpdID", "cpdName", "rtMin", "rt", "rtMax", "mzMin", 
                            "mz", "mzMax"))), stringsAsFactors=FALSE)
input_targetFeatTable[1,] <- c("ID-1", "Cpd 1", 3310., 3344.888, 3390., 
                                522.194778, 522.2, 522.205222)
input_targetFeatTable[2,] <- c("ID-2", "Cpd 2", 3280., 3385.577, 3440., 
                                496.195038, 496.2, 496.204962)
input_targetFeatTable[,c(3:8)] <- sapply(input_targetFeatTable[,c(3:8)], 
                                        as.numeric)
cpdID cpdName rtMin rt rtMax mzMin mz mzMax
ID-1 Cpd 1 3310 3344.888 3390 522.194778 522.2 522.205222
ID-2 Cpd 2 3280 3385.577 3440 496.195038 496.2 496.204962

Additional compound and spectra metadata can be provided but isn’t employed during the fitting procedure:

# spectra Metadata
input_spectraMetadata  <- data.frame(matrix(c("sample type 1", "sample type 2", 
                            "sample type 1"), 3, 1, 
                            dimnames=list(c(),c("sampleType"))),
                            stringsAsFactors=FALSE)
sampleType
sample type 1
sample type 2
sample type 1

3.2 Initialise and Run Parallel Annotation

A peakPantheRAnnotation object is first initialised with the path to the files to process (spectraPaths), features to integrate (targetFeatTable) and additional information and parameters such as spectraMetadata, uROI, FIR and if they should be used (useUROI=TRUE, useFIR=TRUE):

library(peakPantheR)
init_annotation <- peakPantheRAnnotation(spectraPaths = input_spectraPaths,
                        targetFeatTable = input_targetFeatTable,
                        spectraMetadata = input_spectraMetadata)

The resulting peakPantheRAnnotation object is not annotated, does not contain and use uROI and FIR

init_annotation
#> An object of class peakPantheRAnnotation
#>  2 compounds in 3 samples. 
#>   updated ROI do not exist (uROI)
#>   does not use updated ROI (uROI)
#>   does not use fallback integration regions (FIR)
#>   is not annotated

peakPantheR_parallelAnnotation() will run the annotation across files in parallel (if ncores >0) and return the successful annotations (result$annotation) and failures (result$failures):

# annotate files serially
annotation_result <- peakPantheR_parallelAnnotation(init_annotation, ncores=0,
                                                    curveModel='skewedGaussian',