4 Hands-on

4.1 Peak detection with `proFIAset`

We first load the two packages containing the software and the dataset:

# loading the packages
library(proFIA)
library(plasFIA)

# finding the directory of the raw files
path <- system.file(package="plasFIA", "mzML")
list.files(path)

## [1] "plasFIA_1000_A.mzML" "plasFIA_1000_B.mzML" "plasFIA_100_A.mzML" 
## [4] "plasFIA_100_B.mzML"  "plasFIA_10_A.mzML"   "plasFIA_10_B.mzML"

The first step of the workflow is the proFIAset function which takes as input the path to the raw files. This function performs noise model building, followed by m/z strips detection and filtering. The important parameters to keep in mind are:

noiseEstimation (logical): shall noise model be constructed to filter signal? (recommended).
ppm (numeric): maximum deviation between scans during strips detection.
parallel (logical): shall parallel computation be used.

Note: 1. As all files need to be processed 2 times, one for noise estimation and one for model estimation, this step is the most time consuming of the workflow. 2. The ppm parameter is the most important of the workflow; the package was designed to work optimally for high-resolution data with an accuracy inferior or equal to 5 ppm.

# defining the ppm parameter adapted to the Orbitrap Fusion
ppm <- 3

# performing the first step of the workflow
plasSet <- proFIAset(path, ppm=ppm, parallel=FALSE)

The quality of peak detection can be assessed by using the plotRaw method to visualize the corresponding areas in the raw data.

# loading the spiked molecules data frame
data("plasMols")

# plotting the raw region aroung the Diphenhydramine mass signal
plasMols[7,]

##    formula           names classes     mass mass_M+H
## 7 C17H21NO Diphenhydramine Benzene 255.1623 256.1696

mzrange <- c(plasMols[7,"mass_M+H"]-0.1,plasMols[7,"mass_M+H"]+0.1)
plotRaw(plasSet, type="r", sample=3, ylim=mzrange, size=0.6)

In the example above, we see that a signal at 256.195 m/z corresponding to the solvent has been correctly discarded by proFIA.

# plotting the filter Dipehnhydramine region.
plotRaw(plasSet, type="p", sample=3, ylim=mzrange, size=0.6)

Peak detection in proFIA is based on matched filtering. It therefore relies on a peak model which is tuned on the signals from the most intense ions. The plotInjectionPeaks method allows to check visually the consistency of these reconstructed filters.

# plotting the injection peak
plotInjectionPeaks(plasSet)

4.2 Peak grouping with `group.FIA`

The second step of the workflow consists in matching the signals between the samples. The group.FIA methods uses an estimation of the density in the mass dimension. The two important parameters are:

ppm (numeric): accuracy of the mass spectrometer; must be inferior or equal to the corresponding value in proFIAset,
fracGroup (numeric): minimum fraction of samples with detected peaks in at least one class for a group to be created.

# selecting the parameters
ppmgroup <- 1

# due to the experimental design, sample fraction was set to 0.2
fracGroup <- 0.2

# grouping
plasSet <- group.FIA(plasSet, ppm=ppmgroup, fracGroup=fracGroup)

## 
##  647  groups have been done .

The groups may be visualized using the plotEICs function, which take as input a mass and a ppm tolerance, or an index.

# plotting the EICs of the parameters.
plotEICs(plasSet,mz=plasMols[4,"mass_M+H"])

At this stage, it is possible to check whether a molecule (i.e., a group) has been detected in the dataset by using the findMzGroup method.

# searching for match group with 2 ppm tolerance
lMatch <- findMzGroup(plasSet,plasMols[,"mass_M+H"],tol=3)

# index of the 40 molecules which may be used with plotEICs
molFound <- data.frame(names=plasMols[,"names"],found=lMatch)
head(molFound)

##                          names found
## 1            Acetyl-L-carnitin   145
## 2            3-Methylhistamine    20
## 3                    Panthenol   151
## 4                    Metformin    27
## 5                 Azelaic acid    NA
## 6 D-erythro-Dihydrosphingosine   288

# getting the molecules which are not detected
plasMols[which(is.na(lMatch)),]

##    formula                           names            classes     mass
## 5  C9H16O4                    Azelaic acid Dicarboxylic Acids 188.1049
## 16 C6H10O5 3-Hydroxy-3-methylglutaric acid Dicarboxylic Acids 162.0528
##    mass_M+H
## 5  189.1121
## 16 163.0601

We see that molecules 5 and 16 were not found, which is coherent with their chemical classes as they are both Dicarboxylic Acids, which ionizes in negative modes.

4.3 Peak table with `makeDataMatrix`

The data matrix (peak table) can be built with the makeDataMatrix method: ion intensities can be computed either as the areas of the peaks (maxo=F) which is considered to be more robust, or as the maximum intensities (maxo=T).

# building the data matrix
plasSet <- makeDataMatrix(plasSet, maxo=FALSE)

4.4 Imputation with `fillPeaks.WKNN`

Imputation of the data matrix is performed by using a weighted k-nearest neighbours approach. A good number for k is half the size of the smaller class. * k (numeric): number of neighbours

# since there is only 1 replicate, k is set to 1, which should not be the case in a standard experiment.
k <- 1

# imputation
plasSet <- fillPeaks.WKNN(plasSet, k=k)

Here you see a warning because doing 1-Nearest-Neighbour is a bad partice, but for purpose of demonstration and to avoid long computations, we only furnished two sample by classes, which means that 1-Nearest-Neighbour is the best option in this case.

4.5 Running the whole workflow with `analyzeAcquisitionFIA`

The whole workflow described previously can be run by a single call to the analyzeAcquisitionFIA function:

# selecting the parameters
ppm <- 2
ppmgroup <- 1
fracGroup <- 0.2
k <- 1

# running the whole workflow in a single step
plasSet <- analyzeAcquisitionFIA(path, ppm=ppm, ppmgroup=ppmgroup, k=k,fracGroup = fracGroup,parallel=FALSE)

4.6 Export

The processed data can be exported either as:

A peak table in a format similar to the XCMS output.
An ExpressionSet object (see the Biobase bioconductor package).
A peak table which may be created using the exportPeakTable function.
3 .tsv tabular files corresponding to the dataMatrix, the sampleMetadata, and the variableMetadata, and which are compatible with the Workflow4metabolomics format.

# Expression Set.
eset <- exportExpressionSet(plasSet)
eset

## ExpressionSet (storageMode: lockedEnvironment)
## assayData: 647 features, 6 samples 
##   element names: exprs 
## protocolData: none
## phenoData: none
## featureData
##   featureNames: M102.0914 M104.1070 ... M599.2833 (647 total)
##   fvarLabels: mzMed scanMin ... SigSolMean (6 total)
##   fvarMetadata: labelDescription
## experimentData: use 'experimentData(object)'
## Annotation:

# Peak Table.
peakTable <- exportPeakTable(plasSet)


# 3 W4M Tables:
dataMatrix <- exportDataMatrix(plasSet)
sampleMetadata <- exportSampleMetadata(plasSet)
variableMetadata <- exportVariableMetadata(plasSet)

4.7 Examples of downstream statistical analyzes

Univariate and multivariate analyzes can be applied to the processed peak table. As an example, we perform a modeling of the spiking dilution with Orthogonal Partial Least Squares, by using the ropls bioconductor package. This allows us to illustrate the efficiency of the matrix effect indicator.

# getting the concentrations of the spiked compounds (log10 scale)
data("plasSamples")
vconc <- log10(plasSamples[,"concentration_ng_ml"])

# building the OPLS model
library(ropls)
plasSet.opls <- opls(dataMatrix, vconc, predI = 1, orthoI = NA, log10L = TRUE, crossvalI=5)

## OPLS
## 6 samples x 647 variables and 1 response
## standard scaling of predictors and response(s)
##       R2X(cum) R2Y(cum) Q2(cum)  RMSEE pre ort pR2Y  pQ2
## Total     0.82    0.999   0.946 0.0283   1   1 0.05 0.05

We observe that the model significantly explains the majority of the response variance (Q2Y > 0.94, pQ2Y < 5%). The score plot and the observation diagnostic do not evidence any outlier sample (not shown). As expected, the concentration of the samples increases with the predictive scores. Furthermore, the matrix effect contributes to the orthogonal component: this can be observed by plotting the predictive and orthogonal VIP values, and coloring according to the matrix effect metric (inversely proportional to the ‘corMean’ quality metric of the peak):

# matrix effec metric
MEmetricVn <- 1 - variableMetadata[, "corMean"]
nnaVl <- !is.na(MEmetricVn)
MEmetricVn <- MEmetricVn[nnaVl]
ordVi <- order(MEmetricVn)
MEmetricVn <- MEmetricVn[ordVi]

# predictive and orthogonal VIP
vipVn <- getVipVn(plasSet.opls)[nnaVl][ordVi]
orthoVipVn <- getVipVn(plasSet.opls, orthoL = TRUE)[nnaVl][ordVi]

# plot
colVc <- rev(rainbow(sum(nnaVl), end = 4/6))
plot(vipVn, orthoVipVn, pch = 16, col = colVc,
     xlab = "VIP_pred", ylab = "VIP_ortho", main = "VIP_ortho vs VIP_pred",lwd=3)

# adding black circles around spiked compounds
matchVl <- (1:nrow(variableMetadata) %in% lMatch[!is.na(lMatch)])[nnaVl][ordVi]
points(vipVn[matchVl], orthoVipVn[matchVl], cex=1.2,pch=1,col="black",lwd=2)
legend("topright", legend = c(paste0(c("ME max: ", "ME min: "), round(rev(range(MEmetricVn)), 2)),"Spiked molecules"), pch=c(16,16,1),col = c(rev(colVc[c(1, length(colVc))]),1))

We see that high predictive VIP values correspond to low matrix effect (which is in particular the case for the spiked molecules; the two clusters probably result from the fact that some of the spiked molecules are naturally present in plasma). In contrast, the orthogonal VIP axis correlates with the matrix effect. This illustrates the capacity of proFIA to extract relevant signal from raw data and to provide useful visualization to assess the quality of the data.

proFIA: Preprocessing data from flow injection analysis coupled to high-resolution mass spectrometry (FIA-HRMS)

Alexis Delabriere and Etienne Thevenot

2016-10-28

Contents

1 Context

2 Workflow

3 The plasFIA data package

4 Hands-on

4.1 Peak detection with `proFIAset`

4.2 Peak grouping with `group.FIA`

4.3 Peak table with `makeDataMatrix`

4.4 Imputation with `fillPeaks.WKNN`

4.5 Running the whole workflow with `analyzeAcquisitionFIA`

4.6 Export

4.7 Examples of downstream statistical analyzes

5 Session info

proFIA: Preprocessing data from flow injection analysis coupled to high-resolution mass spectrometry (FIA-HRMS)

Alexis Delabriere and Etienne Thevenot

2016-10-28

Contents

1 Context

2 Workflow

3 The plasFIA data package

4 Hands-on

4.1 Peak detection with proFIAset

4.2 Peak grouping with group.FIA

4.3 Peak table with makeDataMatrix

4.4 Imputation with fillPeaks.WKNN

4.5 Running the whole workflow with analyzeAcquisitionFIA

4.6 Export

4.7 Examples of downstream statistical analyzes

5 Session info

4.1 Peak detection with `proFIAset`

4.2 Peak grouping with `group.FIA`

4.3 Peak table with `makeDataMatrix`

4.4 Imputation with `fillPeaks.WKNN`

4.5 Running the whole workflow with `analyzeAcquisitionFIA`