ncGTW 1.0.0
Neighbor-wise Compound-specific Graphical Time Warping (ncGTW) [1] is an alignment algorithm that can align LC-MS profiles by leveraging expected retention time (RT) drift structures and compound-specific warping functions. This algorithm is improved from graphical time warping (GTW) [2], a popular dynamic time warping (DTW) based alignment method [3]. Specifically, ncGTW uses individualized warping functions for different compounds and assigns constraint edges on warping functions of neighboring samples. That is, ncGTW avoids the popular but not accurate assumption which assumes all the m/z bins in the same sample share the same warping function. This assumption often fails when the dataset contains hundreds of samples or the data acquisition time longer than a week. Moreover, by considering the RT drifts structure, ncGTW can align RT more accurately.
ncGTW
is an R package developed as a plug-in of xcms
, a popular LC-MS data
analysis R package [4–6]. Due to the same warping function
assumption or bad parameter settings, xcms
may have some misaligned features,
and there is a function in ncGTW
to identify such misalignments. After
identifying the misaligned features, the user can realign these features with
the alignment function in ncGTW
to obtain a better alignment result for more
accurate analysis, such as peak-regrouping or peak-filling with xcms
.
You can install the latest version of ncGTW from GitHub by
devtools::install_github("ChiungTingWu/ncGTW")
or from Bioconductor by
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("ncGTW")
To check there are misaligned features from xcms
or not, one can input two
xcms
grouping results with different values of RT window parameter (xcms
grouping parameter, bw
) to the function misalignDetect()
. One value of bw
should be the expected maximal RT drift, and another should be near to the RT
sampling resolution (the inverse of scan frequency). If there are some detected
misaligned features, the user can decide to adjust the paramters in xcms
or
use ncGTW
to realign them. Besides the xcms
aligment results, the only
paramter with no default in misalignDetect()
is ppm
, which should be set as
same as ppm
of the peak detection in xcms
.
excluGroups <- misalignDetect(xcmsLargeWin, xcmsSmallWin, ppm)
To demonstrate the workflow of ncGTW, an example dataset is included in the package. The aquisition time of the dataset is more than two weeks, in which the 20 samples are selected from a large dataset for a quick demonstration.
library(xcms)
library(ncGTW)
filepath <- system.file("extdata", package = "ncGTW")
file <- list.files(filepath, pattern = "mzxml", full.names = TRUE)
# The paths of the 20 samples
To incorporate the RT structure, the order of the paths in file
should be as
same as the sample acquisition order (run order). In the example dataset, the
index in each file name is the acquisition order, so we sort the paths according
to tempInd
. When dealing with other dataset, the user should make sure the
order of the paths is as same the order of data acquisition.
tempInd <- matrix(0, length(file), 1)
for (n in seq_along(file)){
tempCha <- file[n]
tempLen <- nchar(tempCha)
tempInd[n] <- as.numeric(substr(tempCha, regexpr("example", tempCha) + 7,
tempLen - 6))
}
file <- file[sort.int(tempInd, index.return = TRUE)$ix]
# Sort the paths by data acquisition order to incorporate the RT structure
As a plug-in, the inputs of ncGTW
are the alignment results from xcms
, so
first we need to apply xcms
on the dataset. The parameters should be decided
by the user when dealing with other datasets.
CPWmin <- 2
CPWmax <- 25
ppm <- 15
xsnthresh <- 3
LM <- FALSE
integrate <- 2
RTerror <- 6
MZerror <- 0.05
prefilter <- c(8, 1000)
# XCMS parameters
ds <- xcmsSet(file, method="centWave", peakwidth=c(CPWmin, CPWmax), ppm=ppm,
noise=xsnthresh, integrate=integrate, prefilter=prefilter)
gds <- group(ds, mzwid=MZerror, bw=RTerror)
agds <- retcor(gds, missing=5)
# XCMS peak detection and RT alignment
To detect the misaligned features, ncGTW
needs two XCMS grouping results with
different values of bw
. The larger one should be expected maximal RT drift,
and the smaller one should be the RT sampling resolution (the inverse of scan
frequency).
bwLarge <- RTerror
bwSmall <- 0.3
# Two different values of bw parameter
xcmsLargeWin <- group(agds, mzwid=MZerror, bw=bwLarge)
xcmsSmallWin <- group(agds, mzwid=MZerror, bw=bwSmall, minfrac=0)
# Two resolution of XCMS grouping results
After XCMS preprocessing, ncGTW
can be applied on the results. There are two
major steps in ncGTW
, misaligned feature detection and misaligned feature
realignment.
To detect the misaligned features, misalignDetect()
needs two different XCMS
grouping results as inputs. This function tells which features in xcmsLargeWin
could be broken into several small features in xcmsSmallWin
, and the detected
features should be misaligned features. ppm
is one criteria to decide the
small features in xcmsLargeWin
are from the same compounds or not, and should
be set as same as the one in XCMS peak detection.
excluGroups <- misalignDetect(xcmsLargeWin, xcmsSmallWin, ppm)
# Detect misaligned features
show(excluGroups)
#> index mzmed mzmin mzmax rtmed rtmin rtmax npeaks
#> [1,] 1 630.5534 630.5527 630.5546 349.4897 347.2043 355.5610 14
#> [2,] 9 931.5268 931.5251 931.5275 336.6458 335.6331 339.6014 17
#> extdata
#> [1,] 14
#> [2,] 17
There are two peak groups (features) are detected as shown in excluGroups
.
Before realigning them, the raw profile of each detected feature of each sample
needs to load from the files. loadProfile()
loads the needed information with
file paths (file
) and the detected features (excluGroups
) as inputs.
ncGTWinputs <- loadProfile(file, excluGroups)
# Load raw profiles from the files
The user can also check the detected features are really misaligned or not by
viewing the extracted ion chromatogram. plotGroup()
draws the extracted ion
chromatogram. ncGTWinputs
is the loaded information from loadProfile()
,
xcmsLargeWin@rt$corrected
is the alignment by XCMS, and ind
is just a
parameter for indexing the chromatograms. The user are free to set ind
.
for (n in seq_along(ncGTWinputs))
plotGroup(ncGTWinputs[[n]], slot(xcmsLargeWin, 'rt')$corrected, ind=n)