1 Introduction

This document provides a brief summary on how to use the PepsNMR package. In this package, pre-processing functions transform raw FID signals from 1H NMR spectroscopy into a set of interpretable spectra.

2 Installation

The PepsNMR package is available on Bioconductor and can be installed via BiocManager::install:

if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")
BiocManager::install("PepsNMR")

The package needs to be loaded once installed to be used:

library(PepsNMR)

The package development version is available on Github: https://github.com/ManonMartin/PepsNMR, although it is highly recommended to rely on the Bioconductor release version of the package to avoid any package version mismatch.

3 Data importation

The first step is meant to access the raw data files. To import Free Induction Decays (FIDs) in Bruker format, use the ReadFids function. This function will return a list with the FID data matrix (saved in Fid_data) and metadata about these FIDs (saved in Fid_info).

fidList <- ReadFids(file.path(path, dataset_name))

Fid_data <- fidList[["Fid_data"]]
Fid_info <- fidList[["Fid_info"]]

The possible directory structures are illustrated here:

Accepted directory structures for the raw Bruker files

Figure 1: Accepted directory structures for the raw Bruker files

And is to be linked with the possible options of the ReadFids function:

  1. If use of title file and presence of sub-directories: set the FID Title line, subdirs = TRUE, dirs.names = FALSE
  2. If use of title file and no sub-directories: set the FID Title line, subdirs = FALSE, dirs.names = FALSE
  3. If no use of title file and presence of sub-directories: subdirs = TRUE, dirs.names = TRUE
  4. If no use of title file and no sub-directories: subdirs = FALSE, dirs.names = TRUE

4 Pre-processing steps

Here is the recommended pre-processing workflow on the FIDs and the spectra once the raw data have been loaded in R:

Table 1: Pre-processing steps
They are presented in the suggested order.
Steps Description
Group Delay Correction Correct for the Bruker Group Delay.
Solvent Suppression Remove the solvent signal from the FIDs.
Apodization Increase the Signal-to-Noise ratio of the FIDs.
Fourier Transform Transform the FIDs into a spectrum and convert the frequency scale (Hz -> ppm).
Zero Order Phase Correction Correct for the zero order phase dephasing.
Internal Referencing Calibrate the spectra with an internal reference compound. Referencing with an internal (e.g. TMSP at 0 ppm)
Baseline Correction Remove the spectral baseline.
Negative Values Zeroing Set negatives values to 0.
Warping Warp the spectra according to a reference spectrum.
Window Selection Select the informative part of the spectrum.
Bucketing Data reduction.
Region Removal Set a desired spectral region to 0.
Zone Aggregation Aggregate a spectral region into a single peak.
Normalization Normalize the spectra.

Information on each function is provided in R, (e.g. type ?ReadFids in the R console) and methodological details are found in Martin et al. (2018).

5 Available datasets

Human serum (HS) and urine (HU) datasets are available as raw data (FIDs in Bruker format) and as (partially) pre-processed signals/spectra in the ad hoc PepsNMRData package automatically installed with PepsNMR.

Here are examples of available datasets:

library(PepsNMRData)
str(FidData_HU)
#>  cplx [1:24, 1:29411] 0+0i 0+0i 0+0i ...
#>  - attr(*, "dimnames")=List of 2
#>   ..$ : chr [1:24] "S1-D0-E1" "S1-D0-E2" "S1-D1-E2" "S2-D0-E2" ...
#>   ..$ : chr [1:29411] "0" "5.1e-05" "0.000102" "0.000153" ...

str(FinalSpectra_HS)
#>  cplx [1:32, 1:500] 0-371030i 0-362686i 0-216899i ...
#>  - attr(*, "dimnames")=List of 2
#>   ..$ : chr [1:32] "J1-D1-1D-T1" "J3-D2-1D-T8" "J3-D3-1D-T9" "J3-D4-1D-T14" ...
#>   ..$ : chr [1:500] "9.98986984692192" "9.97027064485524" "9.95067144278856" "9.93107224072187" ...

Information for each dataset is available, (e.g. enter ?FidData_HS in the R Console).

6 Demo of the pre-processing functions on the Human Serum dataset

6.1 Load the data

Raw Bruker FIDs can be loaded from where PepsNMRData has been intalled:

data_path <- system.file("extdata", package = "PepsNMRData")
dir(data_path)
#> [1] "Group_HS.csv" "HumanSerum"

6.2 Read the FID data file

To import FIDs in Bruker format, the ReadFids function is used. This function will return a list with the FID data matrix (here saved as Fid_data) and information about these FIDs (here saved as Fid_info).

# ==== read the FIDs and their metadata =================
fidList <- ReadFids(file.path(data_path, "HumanSerum"))
#> Begin ReadFids 
#> dim Fid_data:  32 32768 
#> IDs:  J1-D1-1D-T1 J3-D2-1D-T8 J3-D3-1D-T9 J3-D4-1D-T14 J4-D1-1D-T4 J4-D2-1D-T7 J4-D3-1D-T10 J4-D4-1D-T13 J5-D1-1D-T1 J5-D2-1D-T6 J5-D3-1D-T11 J5-D4-1D-T16 J1-D2-1D-T6 J6-D1-1D-T2 J6-D2-1D-T5 J6-D3-1D-T12 J6-D4-1D-T15 J7-D1-1D-T3 J7-D2-1D-T8 J7-D3-1D-T9 J7-D4-1D-T14 J8-D1-1D-T4 J8-D2-1D-T7 J8-D3-1D-T10 J1-D3-1D-T11 J8-D4-1D-T13 J1-D4-1D-T16 J2-D1-1D-T2 J2-D2-1D-T5 J2-D3-1D-T12 J2-D4-1D-T15 J3-D1-1D-T3 
#> non-unique IDs? 0 
#> End ReadFids 
#> It lasted 0.62 s user time, 0.028 s system time and 1.34 s elapsed time.
Fid_data0 <- fidList[["Fid_data"]]
Fid_info <- fidList[["Fid_info"]]
kable(head(Fid_info))
TD BYTORDA DIGMOD DECIM DSPFVS SW_h SW O1 DT
J1-D1-1D-T1 65536 1 1 16 12 10245.9 20.48638 2352.222 4.88e-05
J3-D2-1D-T8 65536 1 1 16 12 10245.9 20.48638 2352.222 4.88e-05
J3-D3-1D-T9 65536 1 1 16 12 10245.9 20.48638 2352.222 4.88e-05
J3-D4-1D-T14 65536 1 1 16 12 10245.9 20.48638 2352.222 4.88e-05
J4-D1-1D-T4 65536 1 1 16 12 10245.9 20.48638 2352.222 4.88e-05
J4-D2-1D-T7 65536 1 1 16 12 10245.9 20.48638 2352.222 4.88e-05
Raw FID.

Figure 2: Raw FID

6.3 Group Delay Correction

The Bruker’s digital filter originally produces a Group Delay that is removed during this step.

# ==== GroupDelayCorrection =================
Fid_data.GDC <- GroupDelayCorrection(Fid_data0, Fid_info)
#> Begin GroupDelayCorrection 
#> End GroupDelayCorrection 
#> It lasted 0.968 s user time, 0.088 s system time and 1.055 s elapsed time.
Signal before and after the Group Delay removal.

Figure 3: Signal before and after the Group Delay removal

6.4 Solvent Suppression

Pre-acquisition techniques for solvent suppression can be unsufficient to remove the solvent signal from the FIDs. SolventSuppression estimates and removes this residual solvent signal based on a Whittaker smoother.

# ==== SolventSuppression =================
SS.res <- SolventSuppression(Fid_data.GDC, returnSolvent = TRUE)
#> Begin SolventSuppression 
#> End SolventSuppression 
#> It lasted 0.608 s user time, 0.088 s system time and 0.709 s elapsed time.

Fid_data.SS <- SS.res[["Fid_data"]]
SolventRe <- SS.res[["SolventRe"]]
Signal before and after the Residual Solvent Removal.

Figure 4: Signal before and after the Residual Solvent Removal

6.5 Apodization

The apodization step increases the spectral Signal-to-Noise ratio by multiplying the FIDs by an appropriate factor (by default a negative exponential).

# ==== Apodization =================
Fid_data.A <- Apodization(Fid_data.SS, Fid_info)
#> Begin Apodization 
#> End Apodization 
#> It lasted 0.044 s user time, 0.008 s system time and 0.054 s elapsed time.
Signal before and after Apodization.

Figure 5: Signal before and after Apodization

6.6 FourierTransform

The Fourier Transform is applied to the FIDs to obtain spectra expressed in the frequency domain. The FourierTransform function further converts this frequency scale originally in hertz into parts per million (ppm).

# ==== FourierTransform =================
RawSpect_data.FT <- FourierTransform(Fid_data.A, Fid_info)
#> Begin FourierTransform 
#> End FourierTransform 
#> It lasted 0.36 s user time, 0.072 s system time and 0.432 s elapsed time.
Spectrum after FT.

Figure 6: Spectrum after FT

6.7 Zero Order Phase Correction

After Fourier Transform, the spectra need to be phased for the real part to be in pure absoptive mode (i.e. with positive intensities).

# ==== ZeroOrderPhaseCorrection =================
Spectrum_data.ZOPC <- ZeroOrderPhaseCorrection(RawSpect_data.FT)
#> Begin ZeroOrderPhaseCorrection 
#> End ZeroOrderPhaseCorrection 
#> It lasted 1.02 s user time, 0.08 s system time and 1.102 s elapsed time.
Spectrum after Zero Order Phase Correction.

Figure 7: Spectrum after Zero Order Phase Correction

6.8 Internal Referencing

During this step, the spectra are aligned with an internal reference compound (e.g. with the TMSP). The user determines the ppm value of the reference compound which is by default 0.

# ==== InternalReferencing =================
target.value <- 0
Spectrum_data.IR <- InternalReferencing(Spectrum_data.ZOPC, Fid_info, ppm.value = target.value)
#> Begin InternalReferencing 
#> End InternalReferencing 
#> It lasted 0.348 s user time, 0.072 s system time and 0.421 s elapsed time.
Spectrum after Internal Referencing.

Figure 8: Spectrum after Internal Referencing

6.9 Baseline Correction

The spectral baseline is estimated and removed from the spectral profiles with the Asymetric Least Squares smoothing algorithm.

# ==== BaselineCorrection =================
BC.res <- BaselineCorrection(Spectrum_data.IR, returnBaseline = TRUE, lambda.bc = 1e+08, 
    p.bc = 0.01)
#> Begin BaselineCorrection 
#> End BaselineCorrection 
#> It lasted 2.228 s user time, 0.044 s system time and 2.275 s elapsed time.
Spectrum before and after the Baseline Correction.

Figure 9: Spectrum before and after the Baseline Correction

6.10 Negative Values Zeroing

The remaining negative values are set to 0 since they cannot be interpreted.

# ==== NegativeValuesZeroing =================
Spectrum_data.NVZ <- NegativeValuesZeroing(Spectrum_data.BC)
#> Begin NegativeValuesZeroing 
#> End NegativeValuesZeroing 
#> It lasted 0.008 s user time, 0 s system time and 0.009 s elapsed time.
Spectrum after Negative Values Zeroing.

Figure 10: Spectrum after Negative Values Zeroing

6.11 Warping

The spectra are realigned based on a reference spectrum with a Semi-Parametric Time Warping technique.

# ==== Warping =================
W.res <- Warping(Spectrum_data.NVZ, returnWarpFunc = TRUE, reference.choice = "fixed")
#> Begin Warping 
#> Begin Normalization 
#> End Normalization 
#> It lasted 0.052 s user time, 0 s system time and 0.051 s elapsed time.
#> End Warping 
#> It lasted 48.872 s user time, 0.424 s system time and 49.344 s elapsed time.

Spectrum_data.W <- W.res[["Spectrum_data"]]
warp_func <- W.res[["Warp.func"]]
Spectrum before and after the Baseline Correction.

Figure 11: Spectrum before and after the Baseline Correction

6.12 Window Selection

During this step the user selects the part of the spectrum that is of interest and the other parts are removed from the spectral matrix.

# ==== WindowSelection =================
Spectrum_data.WS <- WindowSelection(Spectrum_data.W, from.ws = 10, to.ws = 0.2)
#> Begin WindowSelection 
#> End WindowSelection 
#> It lasted 0.024 s user time, 0 s system time and 0.024 s elapsed time.
Spectrum after Window Selection.

Figure 12: Spectrum after Window Selection

6.13 Bucketing

The Bucketing function reduces the number of spectral descriptors by aggregating intensities into a series of buckets.

# ==== Bucketing =================
Spectrum_data.B <- Bucketing(Spectrum_data.WS, intmeth = "t")
#> Begin Bucketing 
#> PPM range of the bucketted spectral matrix: 9.99967 0.21967 
#> PPM width between 2 buckets: 0.01960 
#> End Bucketing 
#> It lasted 0.136 s user time, 0.008 s system time and 0.143 s elapsed time.
Spectrum before and after Bucketing.

Figure 13: Spectrum before and after Bucketing

6.14 Region Removal

By default, this step sets to zero spectral areas that are of no interest or have a sigificant and unwanted amount of variation (e.g. the water area).

# ==== RegionRemoval =================
Spectrum_data.RR <- RegionRemoval(Spectrum_data.B, typeofspectra = "serum")
#> Begin RegionRemoval 
#> End RegionRemoval 
#> It lasted 0 s user time, 0 s system time and 0.001 s elapsed time.
Spectrum after Region Removal

Figure 14: Spectrum after Region Removal

6.15 Normalization

The normalization copes with the dilution factor and other issues that render the spectral profiles non-comparable to each other.

# ==== Normalization =================
Spectrum_data.N <- Normalization(Spectrum_data.RR, type.norm = "mean")
#> Begin Normalization 
#> End Normalization 
#> It lasted 0 s user time, 0 s system time and 0.001 s elapsed time.
Spectrum before and after Normalization

Figure 15: Spectrum before and after Normalization

Note: the function ZoneAggregation is not used here, but is can be applied e.g. to urine spectra to aggregate the citrate peak.

7 Final spectra visualisation

The Draw function enables to produce line plots with type.draw = "signal" or PCA results with type.draw = "pca" of FIDs or spectra. These plots can be saved as pdf files with the option output = "pdf", see ?Draw, ?DrawSignal and ?DrawPCA for more details on the other available options.

Draw(Spectrum_data.N[1:4, ], type.draw = c("signal"), subtype = "stacked", output = c("default"))
4 first pre-processed spectra

Figure 16: 4 first pre-processed spectra

Draw(Spectrum_data.N, type.draw = c("pca"), output = c("default"), Class = Group_HS, 
    type.pca = "scores")
PCA scores of the pre-processed spectra

Figure 17: PCA scores of the pre-processed spectra

Draw(Spectrum_data.N, type.draw = c("pca"), output = c("default"), Class = Group_HS, 
    type.pca = "loadings")
PCA loadings of the pre-processed spectra

Figure 18: PCA loadings of the pre-processed spectra

8 Session info

sessionInfo()
#> R version 3.5.2 (2018-12-20)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 16.04.5 LTS
#> 
#> Matrix products: default
#> BLAS: /home/biocbuild/bbs-3.8-bioc/R/lib/libRblas.so
#> LAPACK: /home/biocbuild/bbs-3.8-bioc/R/lib/libRlapack.so
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] PepsNMRData_1.0.0 PepsNMR_1.0.2     knitr_1.21        BiocStyle_2.10.0 
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.0.0         highr_0.7          nloptr_1.2.1      
#>  [4] compiler_3.5.2     pillar_1.3.1       formatR_1.6       
#>  [7] BiocManager_1.30.4 plyr_1.8.4         tools_3.5.2       
#> [10] digest_0.6.18      evaluate_0.13      tibble_2.0.1      
#> [13] gtable_0.2.0       lattice_0.20-38    png_0.1-7         
#> [16] pkgconfig_2.0.2    rlang_0.3.1        Matrix_1.2-16     
#> [19] yaml_2.2.0         xfun_0.5           gridExtra_2.3     
#> [22] stringr_1.4.0      dplyr_0.8.0.1      grid_3.5.2        
#> [25] tidyselect_0.2.5   glue_1.3.0         R6_2.4.0          
#> [28] ptw_1.9-13         rmarkdown_1.11     bookdown_0.9      
#> [31] reshape2_1.4.3     ggplot2_3.1.0      purrr_0.3.1       
#> [34] magrittr_1.5       matrixStats_0.54.0 scales_1.0.0      
#> [37] htmltools_0.3.6    assertthat_0.2.0   colorspace_1.4-0  
#> [40] labeling_0.3       stringi_1.3.1      lazyeval_0.2.1    
#> [43] munsell_0.5.0      crayon_1.3.4

References

Martin, Manon, Benoît Legat, Justine Leenders, Julien Vanwinsberghe, Réjane Rousseau, Bruno Boulanger, Paul H.C. Eilers, Pascal De Tullio, and Bernadette Govaerts. 2018. “PepsNMR for 1H NMR Metabolomic Data Pre-Processing.” Analytica Chimica Acta 1019:1–13. https://doi.org/10.1016/j.aca.2018.02.067.