1 Introduction

Data quality assessment is an integral part of preparatory data analysis to ensure sound biological information retrieval.

We present here the MsQuality package, which provides functionality to calculate quality metrics for mass spectrometry-derived, spectral data at the per-sample level. MsQuality relies on the mzQC framework of quality metrics defined by the Human Proteome Organization-Proteomics Standards Intitiative (HUPO-PSI). These metrics quantify the quality of spectral raw files using a controlled vocabulary. The package is especially addressed towards users that acquire mass spectrometry data on a large scale (e.g. data sets from clinical settings consisting of several thousands of samples): while it is easier to control for high-quality data acquisition in small-scale experiments, typically run in one or few batches, clinical data sets are often acquired over longer time frames and are prone to higher technical variation that is often unnoticed. MsQuality tries to address this problem by calculating metrics that can be stored along the spectral data sets (raw files or feature-extracted data sets). MsQuality, thus, facilitates the tracking of shifts in data quality and quantifies the quality using multiple metrics. It should be thus easier to identify samples that are of low quality (high-number of missing values, termination of chromatographic runs, low instrument sensitivity, etc.).

We would like to note here that these metrics only give an indication of data quality, and, before removing indicated low-quality samples from the analysis more advanced analytics, e.g. using the implemented functionality and visualizations in the MatrixQCvis package, should be scrutinized. Also, data quality should always be regarded in the context of the sample type and experimental settings, i.e. quality metrics should always be compared with regard to the sample type, experimental setup, instrumentation, etc..

The MsQuality package allows to calculate low-level quality metrics that require minimum information on mass spectrometry data: retention time, m/z values, and associated intensities. The list included in the mzQC framework is excessive, also including metrics that rely on more high-level information, that might not be readily accessible from .raw or .mzML files, e.g. pump pressure mean, or rely on alignment results, e.g. retention time mean shift, signal-to-noise ratio, precursor errors (ppm).

The MsQuality package is built upon the Spectra and the MsExperiment package. Metrics will be calculated based on the information stored in a Spectra object and the respective dataOrigin entries are used to distinguish between the mass spectral data of multiple samples. The MsExperiment serves as a container to store the mass spectral data of multiple samples. MsQuality enables the user to calculate quality metrics both on Spectra and MsExperiment objects.

In this vignette, we will (i) create some exemplary Spectra and MsExperiment objects, (ii) calculate the quality metrics on these data sets, and (iii) visualize some of the metrics.

1.1 Alternative software for data quality assessment

Other R packages are available in Bioconductor that are able to assess the quality of mass spectrometry data:

artMS uses MaxQuant output and enables to calculate several QC metrics, e.g. correlation matrix for technical replicates, calculation of total sum of intensities in biological replicates, total peptide counts in biological replicates, charge state distribution of PSMs identified in each biological replicates, or MS1 scan counts in each biological replicate.

MSstatsQC and the visualization tool MSstatsQCgui require csv files in long format from spectral processing tools such as Skyline and Panorama autoQC or MSnbase objects. MSstatsQC enables to generate individual, moving range, cumulative sum for mean, and/or cumulative sum for variability control charts for each metric. Metrics can be any kind of user-defined metric stored in the data columns for a given peptide, e.g. retention time and peak area.

MQmetrics provides a pipeline to analyze the quality of proteomics data sets from MaxQuant files and focuses on proteomics-/MaxQuant-specific metrics, e.g. proteins identified, peptides identified, or proteins versus peptide/protein ratio.

MatrixQCvis provides an interactive shiny-based interface to assess data quality at various processing steps (normalization, transformation, batch correction, and imputation) of rectangular matrices. The package includes several diagnostic plots and metrics such as barplots of intensity distributions, plots to visualize drifts, MA plots and Hoeffding’s D value calculation, and dimension reduction plots and provides specific tools to analyze data sets containing missing values as commonly observed in mass spectrometry.

proBatch enables to assess batch effects in (prote)omics data sets and corrects these batch effects in subsequent steps. Several tools to visualize data quality are included in the proBatch packages, such as barplots of intensity distributions, cluster and heatmap analysis tools, and PCA dimension reduction plots. Additionally, proBatch enables to assess diagnostics at the feature level, e.g. peptides or spike-ins.

2 Installation

To install this package, start R and enter:

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

if (!requireNamespace("remotes", quietly = TRUE))
    install.packages("remotes")

## to install from Bioconductor
BiocManager::install("MsQuality")

## to install from GitHub
BiocManager::install("tnaake/MsQuality")

This will install this package and all eventually missing dependencies.

Questions and bugs

MsQuality is currently under active development. If you discover any bugs, typos or develop ideas of improving MsQuality feel free to raise an issue via GitHub or send a mail to the developer.

3 Create Spectra and MsExperiment objects

Load the Spectra package.

library("Spectra")
library("MsExperiment")
library("MsQuality")

3.1 Create Spectra and MsExperiment objects from mzML files

There are several options available to create a Spectra object. One way, as outlined in the vignette of the Spectra package is by specifying the location of mass spectrometry raw files in mzML, mzXML or CDF format and using the MsBackendMzR backend. Here we load the example files from the sciex data set of the msdata package and create a Spectra object from the two provided mzML files. The example is taken from the Spectra vignette.

## this example is taken from the Spectra vignette
fls <- dir(system.file("sciex", package = "msdata"), full.names = TRUE)
sps_sciex <- Spectra(fls, backend = MsBackendMzR())

The data set consists of a single sample measured in two different injections to the same LC-MS setup. An empty instance of an MsExperiment object is created and populated with information on the samples by assigning data on the samples (sampleData), information on the mzML files (MsExperimentFiles) and spectral information (spectra). In a last step, using linkSampleData, the relationships between the samples and the spectral information are defined.

## this example is taken from the Spectra vignette
lmse <- MsExperiment()
sd <- DataFrame(sample_id = c("QC1", "QC2"),
    sample_name = c("QC Pool", "QC Pool"),
    injection_idx = c(1, 3))
sampleData(lmse) <- sd
## add mzML files to the experiment
experimentFiles(lmse) <- MsExperimentFiles(mzML_files = fls)
## add the Spectra object to the experiment
spectra(lmse) <- sps_sciex
## use linkSampleData to establish and define relationships between sample 
## annotations and MS data
lmse <- linkSampleData(lmse, with = "experimentFiles.mzML_file",
    sampleIndex = c(1, 2), withIndex = c(1, 2))

3.2 Create Spectra and MsExperiment objects from (feature-extracted) intensity tables

Another common approach is the creation of Spectra objects from a DataFrames using the MsBackendDataFrame backend.

We will use here the data set of Lee et al. (2019), that contains metabolite level information measured by reverse phase liquid chromatography (RPLC) coupled to mass spectrometry and hydrophilic interaction liquid chromatography (HILIC) coupled to mass spectrometry (derived from the file STables - rev1.xlsx in the Supplementary Information).

In a separate step (see documentation for Lee2019_meta_vals and Lee2019), we have created a list containing Spectra objects for each samples (objects sps_l_rplc and sps_l_hilic) and MsExperiment objects containing the data of all samples (objects msexp_rplc and msexp_hilic). We will load here these objects:

data("Lee2019", package = "MsQuality")

The final data set contains 541 paired samples (i.e. 541 samples derived from RPLC and 541 samples derived from HILIC).

We will combine the sps_rplc and sps_hilic objects in the following and calculate on this combined document the metrics.

sps_comb <- c(sps_rplc, sps_hilic)

The most important function to assess the data quality and to calculate the metrics is the calculateMetrics function. The function takes a Spectra or MsExperiment object as input, a character vector of metrics to be calculated, and, optionally a list of parameters passed to the quality metrics functions.

4 Calculating the quality metrics on Spectra and MsExperiment objects

Currently, the following metrics are included:

qualityMetrics(msexp_rplc)
##  [1] "rtDuration"                         "rtOverTicQuantiles"                
##  [3] "rtOverMsQuarters"                   "ticQuartileToQuartileLogRatio"     
##  [5] "numberSpectra"                      "medianPrecursorMz"                 
##  [7] "rtIqr"                              "rtIqrRate"                         
##  [9] "areaUnderTic"                       "areaUnderTicRtQuantiles"           
## [11] "extentIdentifiedPrecursorIntensity" "medianTicRtIqr"                    
## [13] "medianTicOfRtRange"                 "mzAcquisitionRange"                
## [15] "rtAcquisitionRange"                 "precursorIntensityRange"           
## [17] "precursorIntensityQuartiles"        "precursorIntensityMean"            
## [19] "precursorIntensitySd"               "msSignal10xChange"                 
## [21] "ratioCharge1over2"                  "ratioCharge3over2"                 
## [23] "ratioCharge4over2"                  "meanCharge"                        
## [25] "medianCharge"

The following list gives a brief explanation for these metrics. Further information may be found at the HUPO-PSI mzQC project page or in the respective help file for the quality metric (accessible by e.g. entering ?rtDuration to the R console). Currently, all quality metrics can be calculated for both Spectra and MsExperiment objects.

  • rtDuration, RT duration (QC:4000053), “The retention time duration of the MS run in seconds, similar to the highest scan time minus the lowest scan time.” [PSI:QC];
  • rtOverTicQuantiles, RT over TIC quantile (QC:4000054), “The interval when the respective quantile of the TIC accumulates divided by retention time duration. The number of quantiles observed is given by the size of the tuple.” [PSI:QC];
  • rtOverMsQuarters, MS1 quantiles RT fraction (QC:4000055), “The interval used for acquisition of the first, second, third, and fourth quarter of all MS1 events divided by RT-Duration.” [PSI:QC];
  • rtOverMsQuarters, MS1/MS2 quantiles RT fraction (QC:4000055), “The interval used for acquisition of the first, second, third, and fourth quarter of all MS2 events divided by RT-Duration.” [PSI:QC];
  • ticQuartileToQuartileLogRatio, MS1 TIC-change quartile ratios (MS:4000057), “The log ratios of successive TIC-change quartiles. The TIC changes are the list of MS1 total ion current (TIC) value changes from one to the next scan, produced when each MS1 TIC is subtracted from the preceding MS1 TIC. The metric’s value triplet represents the log ratio of the TIC-change Q2 to Q1, Q3 to Q2, TIC-change-max to Q3” [PSI:MS], mode = "TIC_change";
  • ticQuartileToQuartileLogRatio, MS1 TIC quartile ratios (MS:4000058), The log ratios of successive TIC quartiles. The metric’s value triplet represents the log ratios of TIC-Q2 to TIC-Q1, TIC-Q3 to TIC-Q2, TIC-max to TIC-Q3." [PSI:MS], `mode = “TIC”;
  • numberSpectra, Number of MS1 spectra (QC:4000059), “The number of MS1 events in the run.” [PSI:QC];
  • numberSpectra, Number of MS2 spectra (QC:4000060), “The number of MS2 events in the run.” [PSI:QC];
  • medianPrecursorMz, Precursor median m/z for IDs (QC:4000065), “Median m/z value for all identified peptides (unique ions) after FDR.” [PSI:QC];
  • rtIqr, Interquartile RT period for peptide identifications (QC:4000072), “The interquartile retention time period, in seconds, for all peptide identifications over the complete run.” [PSI:QC];
  • rtIqrRate, Peptide identification rate of the interquartile RT period (QC:4000073), “The identification rate of peptides for the interquartile retention time period, in peptides per second.” [PSI:QC];
  • areaUnderTic, Area under TIC (QC:4000077), “The area under the total ion chromatogram.” [PSI:QC];
  • areaUnderTicRtQuantiles, Area under TIC RT quantiles (QC:4000078), “The area under the total ion chromatogram of the retention time quantiles. Number of quantiles are given by the n-tuple.” [PSI:QC];
  • extentIdentifiedPrecursorIntensity, Extent of identified precursor intensity (QC:4000125), “Ratio of 95th over 5th percentile of precursor intensity for identified peptides” [PSI:QC];
  • medianTicRtIqr, Median of TIC values in the RT range in which the middle half of peptides are identified (QC:4000130), “Median of TIC values in the RT range in which half of peptides are identified (RT values of Q1 to Q3 of identifications)” [PSI:QC];
  • medianTicOfRtRange, Median of TIC values in the shortest RT range in which half of the peptides are identified (QC:4000132), “Median of TIC values in the shortest RT range in which half of the peptides are identified” [PSI:QC]
  • mzAcquisitionRange, m/z acquisition range (QC:4000138), “Upper and lower limit of m/z values at which spectra are recorded.” [PSI:QC];
  • rtAcquisitionRange, Retention time acquisition range (QC:4000139), “Upper and lower limit of time at which spectra are recorded.” [PSI:QC];
  • precursorIntensityRange, Precursor intensity range (QC:4000144), “Minimum and maximum precursor intensity recorded.” [PSI:QC];
  • precursorIntensityQuartiles, Precursor intensity distribution Q1, Q2, Q3 (QC:4000167), “From the distribution of precursor intensities, the quartiles Q1, Q2, Q3” [PSI:QC];
  • precursorIntensityQuartiles, Identified precursor intensity distribution Q1, Q2, Q3 (QC:4000228), “From the distribution of identified precursor intensities, the quartiles Q1, Q2, Q3” [PSI:QC];
  • precursorIntensityQuartiles, Unidentified precursor intensity distribution Q1, Q2, Q3 (QC:4000233), “From the distribution of unidentified precursor intensities, the quartiles Q1, Q2, Q3” [PSI:QC];
  • precursorIntensityMean, Precursor intensity distribution mean (QC:4000168), “From the distribution of precursor intensities, the mean.” [PSI:QC];
  • precursorIntensityMean, Identified precursor intensity distribution mean (QC:4000229), “From the distribution of identified precursor intensities, the mean” [PSI:QC];
  • precursorIntensityMean, Unidentified precursor intensity distribution mean (QC:4000234), “From the distribution of unidentified precursor intensities, the mean” [PSI:QC];
  • precursorIntensitySd, Precursor intensity distribution sigma (QC:4000169), “From the distribution of precursor intensities, the sigma value.” [PSI:QC];
  • precursorIntensitySd, Identified precursor intensity distribution sigma (QC:4000230), “From the distribution of identified precursor intensities, the sigma value” [PSI:QC];
  • precursorIntensitySD, Unidentified precursor intensity distribution sigma (QC:400235), “From the distribution of unidentified precursor intensities, the sigma value” [PSI:QC];
  • msSignal10xChange, MS1 signal jump (10x) count (QC:4000172), “The count of MS1 signal jump (spectra sum) by a factor of ten or more (10x) between two subsequent scans” [PSI:QC];
  • msSignal10xChange, MS1 signal fall (10x) count (QC:4000173), “The count of MS1 signal decline (spectra sum) by a factor of ten or more (10x) between two subsequent scans” [PSI:QC];
  • RatioCharge1over2, Charged peptides ratio 1+ over 2+ (QC:4000174), “Ratio of 1+ peptide count over 2+ peptide count in identified spectra” [PSI:QC];
  • RatioCharge1over2, Charged spectra ratio 1+ over 2+ (QC:4000179), “Ratio of 1+ spectra count over 2+ spectra count in all MS2” [PSI:QC];
  • RatioCharge3over2, Charged peptides ratio 3+ over 2+ (QC:4000175), “Ratio of 3+ peptide count over 2+ peptide count in identified spectra” [PSI:QC];
  • RatioCharge3over2, Charged spectra ratio 3+ over 2+ (QC:4000180), “Ratio of 3+ peptide count over 2+ peptide count in all MS2” [PSI:QC];
  • RatioCharge4over2, Charged peptides ratio 4+ over 2+ (QC:4000176), “Ratio of 4+ peptide count over 2+ peptide count in identified spectra” [PSI:QC];
  • RatioCharge4over2, Charged spectra ratio 4+ over 2+ (QC:4000181), “Ratio of 4+ peptide count over 2+ peptide count in all MS2” [PSI:QC];
  • meanCharge, Mean charge in identified spectra (QC:4000177), “Mean charge in identified spectra” [PSI:QC];
  • meanCharge, Mean precursor charge in all MS2 (QC:4000182), “Mean precursor charge in all MS2” [PSI:QC];
  • medianCharge, Median charge in identified spectra (QC:4000178), “Median charge in identified spectra” [PSI:QC];
  • medianCharge, Median precursor charge in all MS2 (QC:4000183), “Median precursor charge in all MS2” [PSI:QC].

The most important function to assess the data quality and to calculate the metrics is the calculateMetrics function. The function takes a Spectra or MsExperiment object as input, a character vector of metrics to be calculated, and, optionally a list of parameters passed to the quality metrics functions.

When passing a Spectra/MsExperiment object to the function, a data.frame returned by calculateMetrics with the metrics specified by the argument metrics. By default, qualityMetrics(object) is taken to specify the calculation of quality metrics. calculateMetrics also accepts a list of parameters passed to the individual quality metrics functions. For each quality metrics functions, the relevant parameters are selected based on the accepted arguments.

Additional arguments can be given to the quality metrics functions. For example, the function ticQuartileToQuartileLogRatio function has the arguments relativeTo, mode, and msLevel. relativeTo specifies to which quantile the log TIC quantile is relatively related to (either to the 1st quantile or the respective previous one). mode (either "TIC_change" or "TIC") specifies if the quantiles are taken from the changes between TICs of scan events or the TICs directly. One Spectra/MsExperiment object may also contain more than one msLevel, e.g. if it also contains information on MS\(^2\) or MS\(^3\) features. If the user adds the arguments relativeTo = "Q1", mode = "TIC", msLevel = c(1L, 2L)), ticQuartileToQuartileLogRatio is run with the parameter combinations relativeTo = "Q1", mode = "TIC", msLevel = c(1L, 2L).

The results based on these parameter combinations are returned and the used parameters are returned as attributes to the returned vector.

Here, we would like to calculate the metrics of all included quality metrics functions (qualityMetrics(object)) and additionally pass the parameter relativeTo = "Q1" and relativeTo = "previous". For computational reasons, we will restrict the calculation of the metrics to the first
sample and to RPLC samples.

## subset the Spectra objects
sps_comb_subset <- sps_comb[grep("Sample.1_", sps_comb$dataOrigin), ]

## for RPLC and HILIC
metrics_sps_Q1 <- calculateMetrics(object = sps_comb_subset,
    metrics = qualityMetrics(sps_comb_subset),
    relativeTo = "Q1", msLevel = 1L)
metrics_sps_Q1
##                rtDuration rtOverTicQuantiles.0% rtOverTicQuantiles.25%
## Sample.1_RPLC      18.214                     0             0.08098166
## Sample.1_HILIC     16.000                     0             0.34375000
##                rtOverTicQuantiles.50% rtOverTicQuantiles.75%
## Sample.1_RPLC              0.08098166              0.1495004
## Sample.1_HILIC             0.34375000              0.5375000
##                rtOverTicQuantiles.100% rtOverMsQuarters.Quarter1
## Sample.1_RPLC                        1                0.02893379
## Sample.1_HILIC                       1                0.15625000
##                rtOverMsQuarters.Quarter2 rtOverMsQuarters.Quarter3
## Sample.1_RPLC                 0.08806413                 0.3102009
## Sample.1_HILIC                0.47500000                 0.6216875
##                rtOverMsQuarters.Quarter4 ticQuartileToQuartileLogRatio.Q2/Q1
## Sample.1_RPLC                          1                                 NaN
## Sample.1_HILIC                         1                                -Inf
##                ticQuartileToQuartileLogRatio.Q3/Q1
## Sample.1_RPLC                                  NaN
## Sample.1_HILIC                                 NaN
##                ticQuartileToQuartileLogRatio.Q4/Q1 numberSpectra
## Sample.1_RPLC                                  NaN           190
## Sample.1_HILIC                                 NaN           165
##                medianPrecursorMz   rtIqr rtIqrRate areaUnderTic
## Sample.1_RPLC             198.05 5.10125  18.42686    655624525
## Sample.1_HILIC            179.10 7.44700  11.14543    149945016
##                areaUnderTicRtQuantiles.25% areaUnderTicRtQuantiles.50%
## Sample.1_RPLC                   25612593.2                   389734297
## Sample.1_HILIC                    927493.9                   100961764
##                areaUnderTicRtQuantiles.75% areaUnderTicRtQuantiles.100%
## Sample.1_RPLC                    233673968                      5692460
## Sample.1_HILIC                    40996727                      5460065
##                extentIdentifiedPrecursorIntensity medianTicRtIqr
## Sample.1_RPLC                           36467.884       7496.006
## Sample.1_HILIC                           8713.906       2195.400
##                medianTicOfRtRange mzAcquisitionRange.min mzAcquisitionRange.max
## Sample.1_RPLC           13858.935                   60.1                 1377.6
## Sample.1_HILIC           2086.417                   73.0                  784.1
##                rtAcquisitionRange.min rtAcquisitionRange.max
## Sample.1_RPLC                   0.986                   19.2
## Sample.1_HILIC                  1.100                   17.1
##                precursorIntensityRange.min precursorIntensityRange.max
## Sample.1_RPLC                     100.6492                   349282909
## Sample.1_HILIC                    100.0895                    92021751
##                precursorIntensityQuartiles.Q1 precursorIntensityQuartiles.Q2
## Sample.1_RPLC                       1667.3911                       6746.195
## Sample.1_HILIC                       300.5038                       2304.383
##                precursorIntensityQuartiles.Q3 precursorIntensityMean
## Sample.1_RPLC                        75003.90              3450655.4
## Sample.1_HILIC                       33382.25               908757.7
##                precursorIntensitySd msSignal10xChange ratioCharge1over2
## Sample.1_RPLC              26563228                56               NaN
## Sample.1_HILIC              7729451                47               NaN
##                ratioCharge3over2 ratioCharge4over2 meanCharge medianCharge
## Sample.1_RPLC                NaN               NaN        NaN           NA
## Sample.1_HILIC               NaN               NaN        NaN           NA
## attr(,"relativeTo")
## [1] "Q1"
## attr(,"msLevel")
## [1] 1
metrics_sps_previous <- calculateMetrics(object = sps_comb_subset,
    metrics = qualityMetrics(sps_comb_subset),
    relativeTo = "previous", msLevel = 1L)
metrics_sps_previous
##                rtDuration rtOverTicQuantiles.0% rtOverTicQuantiles.25%
## Sample.1_RPLC      18.214                     0             0.08098166
## Sample.1_HILIC     16.000                     0             0.34375000
##                rtOverTicQuantiles.50% rtOverTicQuantiles.75%
## Sample.1_RPLC              0.08098166              0.1495004
## Sample.1_HILIC             0.34375000              0.5375000
##                rtOverTicQuantiles.100% rtOverMsQuarters.Quarter1
## Sample.1_RPLC                        1                0.02893379
## Sample.1_HILIC                       1                0.15625000
##                rtOverMsQuarters.Quarter2 rtOverMsQuarters.Quarter3
## Sample.1_RPLC                 0.08806413                 0.3102009
## Sample.1_HILIC                0.47500000                 0.6216875
##                rtOverMsQuarters.Quarter4 ticQuartileToQuartileLogRatio.Q2/Q1
## Sample.1_RPLC                          1                                 NaN
## Sample.1_HILIC                         1                                -Inf
##                ticQuartileToQuartileLogRatio.Q3/Q2
## Sample.1_RPLC                             5.831233
## Sample.1_HILIC                                 Inf
##                ticQuartileToQuartileLogRatio.Q4/Q3 numberSpectra
## Sample.1_RPLC                             8.915513           190
## Sample.1_HILIC                            8.982638           165
##                medianPrecursorMz   rtIqr rtIqrRate areaUnderTic
## Sample.1_RPLC             198.05 5.10125  18.42686    655624525
## Sample.1_HILIC            179.10 7.44700  11.14543    149945016
##                areaUnderTicRtQuantiles.25% areaUnderTicRtQuantiles.50%
## Sample.1_RPLC                   25612593.2                   389734297
## Sample.1_HILIC                    927493.9                   100961764
##                areaUnderTicRtQuantiles.75% areaUnderTicRtQuantiles.100%
## Sample.1_RPLC                    233673968                      5692460
## Sample.1_HILIC                    40996727                      5460065
##                extentIdentifiedPrecursorIntensity medianTicRtIqr
## Sample.1_RPLC                           36467.884       7496.006
## Sample.1_HILIC                           8713.906       2195.400
##                medianTicOfRtRange mzAcquisitionRange.min mzAcquisitionRange.max
## Sample.1_RPLC           13858.935                   60.1                 1377.6
## Sample.1_HILIC           2086.417                   73.0                  784.1
##                rtAcquisitionRange.min rtAcquisitionRange.max
## Sample.1_RPLC                   0.986                   19.2
## Sample.1_HILIC                  1.100                   17.1
##                precursorIntensityRange.min precursorIntensityRange.max
## Sample.1_RPLC                     100.6492                   349282909
## Sample.1_HILIC                    100.0895                    92021751
##                precursorIntensityQuartiles.Q1 precursorIntensityQuartiles.Q2
## Sample.1_RPLC                       1667.3911                       6746.195
## Sample.1_HILIC                       300.5038                       2304.383
##                precursorIntensityQuartiles.Q3 precursorIntensityMean
## Sample.1_RPLC                        75003.90              3450655.4
## Sample.1_HILIC                       33382.25               908757.7
##                precursorIntensitySd msSignal10xChange ratioCharge1over2
## Sample.1_RPLC              26563228                56               NaN
## Sample.1_HILIC              7729451                47               NaN
##                ratioCharge3over2 ratioCharge4over2 meanCharge medianCharge
## Sample.1_RPLC                NaN               NaN        NaN           NA
## Sample.1_HILIC               NaN               NaN        NaN           NA
## attr(,"relativeTo")
## [1] "previous"
## attr(,"msLevel")
## [1] 1

Alternatively, an MsExperiment object might be passed to calculateMetrics. The function will iterate over the samples (referring to rows in sampleData(msexp))) and calculate the quality metrics on the corresponding Spectras.

There are in total 541 samples respectively in the objects msexp_rplc and msexp_hilic. To improve the visualization and interpretability, we will only calculate the metrics from the first 20 of these samples.

## subset the MsExperiment objects
msexp_rplc_subset <- msexp_rplc[1:20]
msexp_hilic_subset <- msexp_hilic[1:20]

## define metrics
metrics_sps <- c("rtDuration", "rtOverTicQuantiles", "rtOverMsQuarters",
    "ticQuartileToQuartileLogRatio", "numberSpectra", "medianPrecursorMz",
    "rtIqr", "rtIqrRate", "areaUnderTic")

## for RPLC-derived MsExperiment
metrics_rplc_msexp <- calculateMetrics(object = msexp_rplc_subset,
    metrics = qualityMetrics(msexp_rplc_subset),
    relativeTo = "Q1", msLevel = 1L)

## for HILIC-derived MsExperiment
metrics_hilic_msexp <- calculateMetrics(object = msexp_hilic_subset,
    metrics = qualityMetrics(msexp_hilic_subset),
    relativeTo = "Q1", msLevel = 1L)

When passing an MsExperiment object to calculateMetrics a data.frame object is returned with the samples (derived from the rownames of sampleData(msexp)) in the rows and the metrics in columns.

We will show here the objects metrics_rplc_msexp and metrics_hilic_msexp

## [1] "metrics_rplc_msexp"
## [1] "metrics_hilic_msexp"

5 Visualizing the results

The quality metrics can be most easily compared when graphically visualized.

The MsQuality package offers the possibility to graphically display the metrics using the plotMetric and shinyMsQuality functions. The plotMetric function will create one plot based on a single metric. shinyMsQuality, on the other hand, opens a shiny application that allows to browse through all the metrics stored in the object.

As a way of example, we will plot here the number of features. A high number of missing features might indicate low data quality, however, also different sample types might exhibit contrasting number of detected features. As a general rule, only samples of the same type should be compared to adjust for sample type-specific effects.

metrics_msexp <- rbind(metrics_rplc_msexp, metrics_hilic_msexp)
plotMetric(qc = metrics_msexp, metric = "numberSpectra")

Similarly, we are able to display the area under the TIC for the retention time quantiles. This plot gives information on the perceived signal (TIC) for the differnt retention time quantiles and could indicate drifts or interruptions of sensitivity during the run.

plotMetric(qc = metrics_msexp, metric = "ticQuartileToQuartileLogRatio")

Alternatively, to browse through all metrics that were calculated in an interactive way, we can use the shinyMsQuality function.

shinyMsQuality(qc = metrics_msexp)

Appendix

Session information

All software and respective versions to build this vignette are listed here:

## R version 4.3.0 RC (2023-04-13 r84269)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.2 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.17-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
## [1] MsExperiment_1.2.0  Spectra_1.10.0      ProtGenerics_1.32.0
## [4] BiocParallel_1.34.0 S4Vectors_0.38.0    BiocGenerics_0.46.0
## [7] MsQuality_1.0.0     knitr_1.42          BiocStyle_2.28.0   
## 
## loaded via a namespace (and not attached):
##  [1] tidyselect_1.2.0            viridisLite_0.4.1          
##  [3] dplyr_1.1.2                 bitops_1.0-7               
##  [5] fastmap_1.1.1               lazyeval_0.2.2             
##  [7] RCurl_1.98-1.12             promises_1.2.0.1           
##  [9] digest_0.6.31               mime_0.12                  
## [11] lifecycle_1.0.3             cluster_2.1.4              
## [13] ellipsis_0.3.2              magrittr_2.0.3             
## [15] compiler_4.3.0              rlang_1.1.0                
## [17] sass_0.4.5                  tools_4.3.0                
## [19] igraph_1.4.2                utf8_1.2.3                 
## [21] yaml_2.3.7                  data.table_1.14.8          
## [23] labeling_0.4.2              htmlwidgets_1.6.2          
## [25] curl_5.0.0                  DelayedArray_0.26.0        
## [27] RColorBrewer_1.1-3          withr_2.5.0                
## [29] purrr_1.0.1                 grid_4.3.0                 
## [31] fansi_1.0.4                 xtable_1.8-4               
## [33] colorspace_2.1-0            ggplot2_3.4.2              
## [35] scales_1.2.1                MASS_7.3-59                
## [37] MultiAssayExperiment_1.26.0 SummarizedExperiment_1.30.0
## [39] cli_3.6.1                   rmarkdown_2.21             
## [41] generics_0.1.3              httr_1.4.5                 
## [43] cachem_1.0.7                stringr_1.5.0              
## [45] zlibbioc_1.46.0             parallel_4.3.0             
## [47] AnnotationFilter_1.24.0     BiocManager_1.30.20        
## [49] XVector_0.40.0              matrixStats_0.63.0         
## [51] vctrs_0.6.2                 Matrix_1.5-4               
## [53] jsonlite_1.8.4              bookdown_0.33              
## [55] IRanges_2.34.0              clue_0.3-64                
## [57] crosstalk_1.2.0             plotly_4.10.1              
## [59] jquerylib_0.1.4             tidyr_1.3.0                
## [61] glue_1.6.2                  codetools_0.2-19           
## [63] QFeatures_1.10.0            stringi_1.7.12             
## [65] gtable_0.3.3                later_1.3.0                
## [67] GenomeInfoDb_1.36.0         GenomicRanges_1.52.0       
## [69] shinydashboard_0.7.2        munsell_0.5.0              
## [71] tibble_3.2.1                pillar_1.9.0               
## [73] htmltools_0.5.5             GenomeInfoDbData_1.2.10    
## [75] R6_2.5.1                    evaluate_0.20              
## [77] shiny_1.7.4                 lattice_0.21-8             
## [79] Biobase_2.60.0              httpuv_1.6.9               
## [81] bslib_0.4.2                 Rcpp_1.0.10                
## [83] xfun_0.39                   MsCoreUtils_1.12.0         
## [85] fs_1.6.2                    MatrixGenerics_1.12.0      
## [87] pkgconfig_2.0.3

References

Lee, H.-J., D. M. Kremer, P. Sajjakulnukit, L. Zhang, and C. A. Lyssiotis. 2019. “A Large-Scale Analysis of Targeted Metabolomics Data from Heterogeneous Biological Samples Provides Insights into Metabolite Dynamics.” Metabolomics, 103. https://doi.org/10.1007/s11306-019-1564-8.