Working with MSstatsConvert

Purpose of MSstatsConvert

The MSstatsConvert package is a member of the MSstatst family of packages, MSstats and MSstatsTMT. It creates an abstraction for the steps in mass spectrometry (MS) data analysis that are required before a dataset can be used for statistical modeling. In short, the package is responsible for converting output from signal processing tools such as OpenMS or MaxQuant into a format suitable for statistical analysis. This includes:

MSstatsConvert allows for transforming any MS quantification result into a format required by MSstats and MSstatsTMT packages. Additionally, it provides built-in cleaning functions for outputs of DIAUmpire, MaxQuant, OpenMS, OpenSWATH, Progenesis, ProteomeDiscoverer, Skyline, Spectromine, and Spectronaut. These functions serve as a base for converter functions (called *toMSstatsFormat or *toMSstatsTMTFormat) provided by the MSstats and MSstatsTMT packages.

MSstats data format

MSstats family packages works with label-free, SRM and TMT datasets. The following column are required.

Additionally, if the experiment involves fractionation, Fraction column can be added to store fraction labels.

Logging

MSstatsConvert allows for flexible logging based on the log4r package. Information about preprocessing steps can be written to a file, to a console, to both or to neither. The MSstatsLogsSettings function helps manage log settings. The user can pass a path to an existing file to the log_file_path parameter. Combined with setting append = TRUE, this allows writing all information related to a specific data analysis to a single file. If a user does not specify a file, a new file will be created automatically with a name starting with “MSstats_log”, followed by a timestamp.

library(MSstatsConvert)
# default - creates a new file
MSstatsLogsSettings(use_log_file = TRUE, append = FALSE) 

# default - creates a new file
MSstatsLogsSettings(use_log_file = TRUE, append = TRUE, 
                    log_file_path = "log_file.log") 

# switches logging off
MSstatsLogsSettings(use_log_file = FALSE, append = FALSE) 

# switches off logs and messages
MSstatsLogsSettings(use_log_file = FALSE, verbose = FALSE) 

Additionally, session info generated by the utils::sessionInfo() function can be saved to file with the MSstatsSaveSessionInfo function.

MSstatsSaveSessionInfo()

By default, the output file name will start with “MSstats_session_info” and end with a current timestamp.

Importing and cleaning data

MS data processing by MSstatsConvert starts with importing and cleaning data. The MSstatsImport function produces a wrapper for possibly multiple files that may describe a single dataset. For example, MaxQuant output consists of two files, while OpenMS outputs just a single file.

maxquant_evidence = read.csv(system.file("tinytest/raw_data/MaxQuant/mq_ev.csv",
                                         package = "MSstatsConvert"))
maxquant_proteins = read.csv(system.file("tinytest/raw_data/MaxQuant/mq_pg.csv",
                                         package = "MSstatsConvert"))
maxquant_imported = MSstatsImport(list(evidence = maxquant_evidence,
                                       protein_groups = maxquant_proteins),
                                  type = "MSstats", tool = "MaxQuant")
is(maxquant_imported)
#> [1] "MSstatsMaxQuantFiles" "MSstatsInputFiles"

openms_input = read.csv(system.file(
  "tinytest/raw_data/OpenMSTMT/openmstmt_input.csv",
  package = "MSstatsConvert"
))
openms_imported = MSstatsImport(list(input = openms_input),
                                "MSstatsTMT", "OpenMS")
is(openms_imported)
#> [1] "MSstatsOpenMSFiles" "MSstatsInputFiles"

The getInputFile method allows user to retrieve the files:

getInputFile(maxquant_imported, "evidence")[1:5, 1:5]
#>       Sequence Length Modifications Modifiedsequence OxidationMProbabilities
#>         <char>  <int>        <char>           <char>                  <char>
#> 1: AEAPAAAPAAK     11    Unmodified    _AEAPAAAPAAK_                        
#> 2: AEAPAAAPAAK     11    Unmodified    _AEAPAAAPAAK_                        
#> 3: AEAPAAAPAAK     11    Unmodified    _AEAPAAAPAAK_                        
#> 4: AEAPAAAPAAK     11    Unmodified    _AEAPAAAPAAK_                        
#> 5: AEAPAAAPAAK     11    Unmodified    _AEAPAAAPAAK_

As a next step of the analysis, input files are combined into a single data.table with standardized column names by the MSstatsClean function. It is a generic function with built-in support for outputs of tools listed in the “Purpose of the MSstatsConvert package” section. The type parameter is equal to either MSstats or MSstatsTMT and indicates if the data comes from a labelled TMT experiment.

For some datasets, MSstatsClean may require additional parameters described in the respective help files. For our example datasets, the following calls merge input files into a single table.

maxquant_cleaned = MSstatsClean(maxquant_imported, protein_id_col = "Proteins")
head(maxquant_cleaned)
#>    ProteinName PeptideSequence Modifications PrecursorCharge
#>         <char>          <char>        <char>           <int>
#> 1:      P06959     AEAPAAAPAAK    Unmodified               2
#> 2:      P06959     AEAPAAAPAAK    Unmodified               2
#> 3:      P06959     AEAPAAAPAAK    Unmodified               2
#> 4:      P06959     AEAPAAAPAAK    Unmodified               2
#> 5:      P06959     AEAPAAAPAAK    Unmodified               2
#> 6:      P06959     AEAPAAAPAAK    Unmodified               2
#>                                            Run Intensity   Score
#>                                         <char>     <num>   <num>
#> 1: 121219_S_CCES_01_01_LysC_Try_1to10_Mixt_1_1   4023100  76.332
#> 2: 121219_S_CCES_01_02_LysC_Try_1to10_Mixt_1_2   5132500  83.081
#> 3: 121219_S_CCES_01_03_LysC_Try_1to10_Mixt_1_3   2761600 104.430
#> 4: 121219_S_CCES_01_05_LysC_Try_1to10_Mixt_2_2   4091800  94.465
#> 5: 121219_S_CCES_01_06_LysC_Try_1to10_Mixt_2_3   4727000  88.596
#> 6: 121219_S_CCES_01_08_LysC_Try_1to10_Mixt_3_2   2258400  90.050

openms_cleaned = MSstatsClean(openms_imported)
head(openms_cleaned)
#>             ProteinName                               PeptideSequence
#>                  <char>                                        <char>
#> 1: sp|Q60854|SPB6_MOUSE .(TMT6plex)AFVEVNEEGTEAAAATAGMM(Oxidation)TVR
#> 2: sp|Q60854|SPB6_MOUSE .(TMT6plex)AFVEVNEEGTEAAAATAGMM(Oxidation)TVR
#> 3: sp|Q60854|SPB6_MOUSE .(TMT6plex)AFVEVNEEGTEAAAATAGMM(Oxidation)TVR
#> 4: sp|Q60854|SPB6_MOUSE .(TMT6plex)AFVEVNEEGTEAAAATAGMM(Oxidation)TVR
#> 5: sp|Q60854|SPB6_MOUSE .(TMT6plex)AFVEVNEEGTEAAAATAGMM(Oxidation)TVR
#> 6: sp|Q60854|SPB6_MOUSE .(TMT6plex)AFVEVNEEGTEAAAATAGMM(Oxidation)TVR
#>    PrecursorCharge
#>              <int>
#> 1:               3
#> 2:               3
#> 3:               3
#> 4:               3
#> 5:               3
#> 6:               3
#>                                                                 PSM Condition
#>                                                              <char>    <char>
#> 1: .(TMT6plex)AFVEVNEEGTEAAAATAGMM(Oxidation)TVR_3_4359.56536443198   Long_HF
#> 2: .(TMT6plex)AFVEVNEEGTEAAAATAGMM(Oxidation)TVR_3_6190.04195694402   Long_HF
#> 3: .(TMT6plex)AFVEVNEEGTEAAAATAGMM(Oxidation)TVR_3_4359.56536443198   Long_HF
#> 4: .(TMT6plex)AFVEVNEEGTEAAAATAGMM(Oxidation)TVR_3_6190.04195694402   Long_HF
#> 5: .(TMT6plex)AFVEVNEEGTEAAAATAGMM(Oxidation)TVR_3_6190.04195694402   Long_HF
#> 6: .(TMT6plex)AFVEVNEEGTEAAAATAGMM(Oxidation)TVR_3_4359.56536443198   Long_HF
#>    BioReplicate    Run Channel Intensity Mixture TechRepMixture Fraction
#>           <int> <char>   <int>     <num>   <int>         <char>    <int>
#> 1:           21  3_3_3       1        NA       3            3_3        3
#> 2:           21  3_3_3       1        NA       3            3_3        3
#> 3:           24  3_3_3       4        NA       3            3_3        3
#> 4:           24  3_3_3       4        NA       3            3_3        3
#> 5:           26  3_3_3       6        NA       3            3_3        3
#> 6:           26  3_3_3       6  1820.072       3            3_3        3

If a user wants to use MSstatsConvert package with data in a format that is not currently supported, it is enough to first re-format the data into a data.table with column ProteinName, PeptideSequence, PrecursorCharge, FragmentIon, ProductCharge (with the latter two possibly equal to NA), Run and IsotopeLabelType (in case of non-TMT data) or Channel (in case of TMT data). Moreover, the dataset may include any column that will be used for filtering the dataset (for example a column that stores q-values). In our example, such additional columns are “Modifications” and “Score” from MaxQuant files.

Annotation columns should be called Condition and BioReplicate. For TMT data, Mixture, TechRepMixture columns may be added. Fractionation should be indicated by a Fraction column.

Preprocessing

The goal of MSstatsPreprocess function is to transform cleaned MS data into a format ready for statistical analysis with MSstats or MSstatsTMT packages. This function accepts several parameters, and each corresponds to a preprocessing step.

maxquant_annotation = read.csv(system.file(
  "tinytest/raw_data/MaxQuant/annotation.csv",
  package = "MSstatsConvert"
))
maxquant_annotation = MSstatsMakeAnnotation(maxquant_cleaned,
                                            maxquant_annotation,
                                            Run = "Rawfile")
m_filter = list(col_name = "PeptideSequence", 
                pattern = "M", 
                filter = TRUE, 
                drop_column = FALSE)

oxidation_filter = list(col_name = "Modifications", 
                        pattern = "Oxidation", 
                        filter = TRUE, 
                        drop_column = TRUE)

feature_columns = c("PeptideSequence", "PrecursorCharge")
maxquant_processed = MSstatsPreprocess(
  maxquant_cleaned, 
  maxquant_annotation,
  feature_columns,
  remove_shared_peptides = TRUE, 
  remove_single_feature_proteins = FALSE,
  pattern_filtering = list(oxidation = oxidation_filter,
                           m = m_filter),
  feature_cleaning = list(remove_features_with_few_measurements = TRUE,
                          summarize_multiple_psms = max),
  columns_to_fill = list("FragmentIon" = NA,
                         "ProductCharge" = NA,
                         "IsotopeLabelType" = "L"))
head(maxquant_processed)
#>                                            Run PeptideSequence PrecursorCharge
#>                                         <char>          <char>           <int>
#> 1: 121219_S_CCES_01_01_LysC_Try_1to10_Mixt_1_1     AEAPAAAPAAK               2
#> 2: 121219_S_CCES_01_02_LysC_Try_1to10_Mixt_1_2     AEAPAAAPAAK               2
#> 3: 121219_S_CCES_01_03_LysC_Try_1to10_Mixt_1_3     AEAPAAAPAAK               2
#> 4: 121219_S_CCES_01_05_LysC_Try_1to10_Mixt_2_2     AEAPAAAPAAK               2
#> 5: 121219_S_CCES_01_06_LysC_Try_1to10_Mixt_2_3     AEAPAAAPAAK               2
#> 6: 121219_S_CCES_01_08_LysC_Try_1to10_Mixt_3_2     AEAPAAAPAAK               2
#>    Intensity ProteinName Condition BioReplicate Experiment IsotopeLabelType
#>        <num>      <char>     <int>        <int>     <char>           <char>
#> 1:   4023100      P06959         1            1        1_1                L
#> 2:   5132500      P06959         1            1        1_2                L
#> 3:   2761600      P06959         1            1        1_3                L
#> 4:   4091800      P06959         2            2        2_2                L
#> 5:   4727000      P06959         2            2        2_3                L
#> 6:   2258400      P06959         3            3        3_2                L
#>    FragmentIon ProductCharge
#>         <lgcl>        <lgcl>
#> 1:          NA            NA
#> 2:          NA            NA
#> 3:          NA            NA
#> 4:          NA            NA
#> 5:          NA            NA
#> 6:          NA            NA

# OpenMS - TMT data
feature_columns_tmt = c("PeptideSequence", "PrecursorCharge")
openms_processed = MSstatsPreprocess(
  openms_cleaned, 
  NULL, 
  feature_columns_tmt,
  remove_shared_peptides = TRUE,
  remove_single_feature_proteins = TRUE,
  feature_cleaning = list(remove_features_with_few_measurements = TRUE,
                          summarize_multiple_psms = max)
)
head(openms_processed)
#>             ProteinName                               PeptideSequence
#>                  <char>                                        <char>
#> 1: sp|Q60854|SPB6_MOUSE .(TMT6plex)AFVEVNEEGTEAAAATAGMM(Oxidation)TVR
#> 2: sp|Q60854|SPB6_MOUSE .(TMT6plex)AFVEVNEEGTEAAAATAGMM(Oxidation)TVR
#> 3: sp|Q60854|SPB6_MOUSE .(TMT6plex)AFVEVNEEGTEAAAATAGMM(Oxidation)TVR
#> 4: sp|Q60854|SPB6_MOUSE .(TMT6plex)AFVEVNEEGTEAAAATAGMM(Oxidation)TVR
#> 5: sp|Q60854|SPB6_MOUSE .(TMT6plex)AFVEVNEEGTEAAAATAGMM(Oxidation)TVR
#> 6: sp|Q60854|SPB6_MOUSE .(TMT6plex)AFVEVNEEGTEAAAATAGMM(Oxidation)TVR
#>    PrecursorCharge                                             PSM Condition
#>              <int>                                          <char>    <char>
#> 1:               3 .(TMT6plex)AFVEVNEEGTEAAAATAGMM(Oxidation)TVR_3   Long_HF
#> 2:               3 .(TMT6plex)AFVEVNEEGTEAAAATAGMM(Oxidation)TVR_3   Long_HF
#> 3:               3 .(TMT6plex)AFVEVNEEGTEAAAATAGMM(Oxidation)TVR_3   Long_HF
#> 4:               3 .(TMT6plex)AFVEVNEEGTEAAAATAGMM(Oxidation)TVR_3   Long_HF
#> 5:               3 .(TMT6plex)AFVEVNEEGTEAAAATAGMM(Oxidation)TVR_3   Long_LF
#> 6:               3 .(TMT6plex)AFVEVNEEGTEAAAATAGMM(Oxidation)TVR_3    Long_M
#>    BioReplicate    Run Channel Intensity Mixture TechRepMixture Fraction
#>           <int> <char>   <int>     <num>   <int>         <char>    <int>
#> 1:           21  3_3_3       1        NA       3            3_3        3
#> 2:           24  3_3_3       4        NA       3            3_3        3
#> 3:           26  3_3_3       6 1820.0721       3            3_3        3
#> 4:           28  3_3_3       8  445.7412       3            3_3        3
#> 5:           25  3_3_3       5 1580.9510       3            3_3        3
#> 6:           23  3_3_3       3 1508.3302       3            3_3        3
#>    FragmentIon
#>         <char>
#> 1:        <NA>
#> 2:        <NA>
#> 3:        <NA>
#> 4:        <NA>
#> 5:        <NA>
#> 6:        <NA>

Annotation is created via the MSstatsMakeAnnotation function. It takes the cleaned dataset and annotation file as input. Additionally, key-value pairs can be passed to this function to change column names (not including dots and other symbols) in the annotation from names given by values to names given by keys.

For programmatic applications and consistency of the interface, filtering is done with the help of lists.

For filtering based on numerical scores (for example q-value filtering), the list should consist of elements named

For example, to remove intensities smaller than 1, we could pass the following list to the score_filtering parameters:

list(
  list(score_column = "Intensity", score_threshold = 1,
       direction = "greater", behavior = "remove", 
       handle_na = "remove", fill_value = NA, filter = TRUE, drop = FALSE
       )
)

For filtering based on patterns (for example, removing oxidation peptides), the list should consist of elements named

For filtering based on exact values (for example, removing iRT proteins), the list should consists of elements named

Fractions and balanced design

Finally, after preprocessing, the MSstatsBalancedDesign function can be applied to handle fractions and create a balanced design.

For label-free data, fractionation or technical replicates are detected if this information is not provided. Features that overlap across multiple fractions of the same sample are assigned to a single fraction by the following rule: for each feature, the fraction with the largest number of MS runs containing a non-missing measurement is kept. If multiple fractions tie on that count, the tie is broken by choosing the fraction with the highest mean intensity. The remaining fractions’ rows for that feature are dropped. The data are then adjusted so that within each fraction, every feature has a row for each run. If the intensity value is missing, it is denoted by NA.

For TMT data, a unique fraction is selected for each overlapped feature as well. After fraction selection, the data are adjusted so that within each run, every feature has a row for each channel. If the intensity is missing for a channel, it is denoted by NA.

Note also that this fraction-collapsing logic is distinct from the summaryforMultipleRows argument on each converter, which only combines duplicate PSMs identifying the same feature within a single MS run.

maxquant_balanced = MSstatsBalancedDesign(maxquant_processed, feature_columns)
head(maxquant_balanced)
#>   ProteinName PeptideSequence PrecursorCharge FragmentIon ProductCharge
#> 1      P06959     AEAPAAAPAAK               2          NA            NA
#> 2      P06959     AEAPAAAPAAK               2          NA            NA
#> 3      P06959     AEAPAAAPAAK               2          NA            NA
#> 4      P06959     AEAPAAAPAAK               2          NA            NA
#> 5      P06959     AEAPAAAPAAK               2          NA            NA
#> 6      P06959     AEAPAAAPAAK               2          NA            NA
#>   IsotopeLabelType Condition BioReplicate
#> 1                L         1            1
#> 2                L         1            1
#> 3                L         1            1
#> 4                L         2            2
#> 5                L         2            2
#> 6                L         2            2
#>                                           Run Fraction Intensity
#> 1 121219_S_CCES_01_01_LysC_Try_1to10_Mixt_1_1        1   4023100
#> 2 121219_S_CCES_01_02_LysC_Try_1to10_Mixt_1_2        1   5132500
#> 3 121219_S_CCES_01_03_LysC_Try_1to10_Mixt_1_3        1   2761600
#> 4 121219_S_CCES_01_04_LysC_Try_1to10_Mixt_2_1        1   2932900
#> 5 121219_S_CCES_01_05_LysC_Try_1to10_Mixt_2_2        1   4091800
#> 6 121219_S_CCES_01_06_LysC_Try_1to10_Mixt_2_3        1   4727000
dim(maxquant_balanced)
#> [1] 690  11
dim(maxquant_processed)
#> [1] 625  14

openms_balanced = MSstatsBalancedDesign(openms_processed, feature_columns_tmt)
head(openms_balanced)
#>            ProteinName                               PeptideSequence
#> 1 sp|Q60854|SPB6_MOUSE .(TMT6plex)AFVEVNEEGTEAAAATAGMM(Oxidation)TVR
#> 2 sp|Q60854|SPB6_MOUSE .(TMT6plex)AFVEVNEEGTEAAAATAGMM(Oxidation)TVR
#> 3 sp|Q60854|SPB6_MOUSE .(TMT6plex)AFVEVNEEGTEAAAATAGMM(Oxidation)TVR
#> 4 sp|Q60854|SPB6_MOUSE .(TMT6plex)AFVEVNEEGTEAAAATAGMM(Oxidation)TVR
#> 5 sp|Q60854|SPB6_MOUSE .(TMT6plex)AFVEVNEEGTEAAAATAGMM(Oxidation)TVR
#> 6 sp|Q60854|SPB6_MOUSE .(TMT6plex)AFVEVNEEGTEAAAATAGMM(Oxidation)TVR
#>   PrecursorCharge                                             PSM Mixture
#> 1               3 .(TMT6plex)AFVEVNEEGTEAAAATAGMM(Oxidation)TVR_3       3
#> 2               3 .(TMT6plex)AFVEVNEEGTEAAAATAGMM(Oxidation)TVR_3       3
#> 3               3 .(TMT6plex)AFVEVNEEGTEAAAATAGMM(Oxidation)TVR_3       3
#> 4               3 .(TMT6plex)AFVEVNEEGTEAAAATAGMM(Oxidation)TVR_3       3
#> 5               3 .(TMT6plex)AFVEVNEEGTEAAAATAGMM(Oxidation)TVR_3       3
#> 6               3 .(TMT6plex)AFVEVNEEGTEAAAATAGMM(Oxidation)TVR_3       3
#>   TechRepMixture   Run Channel BioReplicate Condition Intensity
#> 1            3_3 3_3_3       1           21   Long_HF        NA
#> 2            3_3 3_3_3       2           22      Norm  1068.580
#> 3            3_3 3_3_3       3           23    Long_M  1508.330
#> 4            3_3 3_3_3       4           24   Long_HF        NA
#> 5            3_3 3_3_3       5           25   Long_LF  1580.951
#> 6            3_3 3_3_3       6           26   Long_HF  1820.072
dim(openms_balanced)
#> [1] 330  11
dim(openms_processed)
#> [1] 370  16

MSstatsBalancedDesign output is a data.frame of class MSstatsValidated. Such a data.frame will be recognized by statistical processing functions from MSstats and MSstatsTMT packages as a valid input, which will allow them to skip checks and transformation necessary to fit data into this format.

Metabolomics with MZMine

MZMinetoMSstatsFormat brings untargeted metabolomics into the MSstats family. It takes the wide-format feature-quantification table exported by MZMine (one row per feature, one <sample> Peak area column per sample) together with a standard MSstats annotation and produces an MSstats-ready long-format data.table.

An MZMine spectral-library annotation table with id, compound_name, and score columns is required. The highest-scoring compound_name per feature is used as ProteinName. Features in the quant table with no matching annotation row are dropped from the output — there is no synthesized mz_rt fallback, because placeholder identifiers inflate the hypothesis count for downstream groupComparison without biological signal.

These are MSI Level 2 annotations (putative identification via MS/MS spectral matching against a reference library). Lower-confidence annotation sources — SIRIUS / MS2Query (Level 3) and CANOPUS (Level 4) — are out of scope for this iteration; features without a Level 2 annotation row are filtered out.

mzmine_input = data.table::fread(system.file(
  "tinytest/raw_data/MZMine/mzmine_input.csv",
  package = "MSstatsConvert"
))
mzmine_annotation = data.table::fread(system.file(
  "tinytest/raw_data/MZMine/annotation.csv",
  package = "MSstatsConvert"
))
mzmine_library = data.table::fread(system.file(
  "tinytest/raw_data/MZMine/mzmine_annotations.csv",
  package = "MSstatsConvert"
))

# ProteinName comes from the matched compound_name; unannotated features are dropped
mzmine_converted = MZMinetoMSstatsFormat(
  mzmine_input,
  annotation = mzmine_annotation,
  mzmine_annotations = mzmine_library,
  use_log_file = FALSE
)
#> INFO  [2026-06-03 19:06:45] ** Raw data from MZMine imported successfully.
#> INFO  [2026-06-03 19:06:45] ** MZMine: retained 4 feature(s) after annotation join: 1, 2, 3, 6
#> INFO  [2026-06-03 19:06:45] ** Raw data from MZMine cleaned successfully.
#> INFO  [2026-06-03 19:06:45] ** Using provided annotation.
#> INFO  [2026-06-03 19:06:45] ** Run labels were standardized to remove symbols such as '.' or '%'.
#> INFO  [2026-06-03 19:06:45] ** The following options are used:
#>   - Features will be defined by the columns: PeptideSequence, PrecursorCharge, FragmentIon, ProductCharge
#>   - Shared peptides will not be removed.
#>   - Proteins with single feature will not be removed.
#>   - Features with less than 3 measurements across runs will be kept.
#> INFO  [2026-06-03 19:06:45] ** Features with all missing measurements across runs are removed.
#> INFO  [2026-06-03 19:06:45] ** Multiple measurements in a feature and a run are summarized by summaryforMultipleRows: max
#> INFO  [2026-06-03 19:06:45] ** Features with all missing measurements across runs are removed.
#> INFO  [2026-06-03 19:06:45] ** Run annotation merged with quantification data.
#> INFO  [2026-06-03 19:06:45] ** Updated quantification data to make balanced design. Missing values are marked by NA
#> INFO  [2026-06-03 19:06:45] ** Finished preprocessing. The dataset is ready to be processed by the dataProcess function.
head(mzmine_converted)
#>   ProteinName PeptideSequence PrecursorCharge FragmentIon ProductCharge
#> 1    Caffeine               1              NA        <NA>            NA
#> 2    Caffeine               1              NA        <NA>            NA
#> 3    Caffeine               1              NA        <NA>            NA
#> 4    Caffeine               1              NA        <NA>            NA
#> 5 GlucoseHigh               2              NA        <NA>            NA
#> 6 GlucoseHigh               2              NA        <NA>            NA
#>   IsotopeLabelType Condition BioReplicate         Run Fraction Intensity
#> 1            Light   Control            1 sampleAmzML        1      1000
#> 2            Light   Control            2 sampleBmzML        1      1100
#> 3            Light Treatment            3 sampleCmzML        1      1200
#> 4            Light Treatment            4 sampleDmzML        1      1300
#> 5            Light   Control            1 sampleAmzML        1      5000
#> 6            Light   Control            2 sampleBmzML        1      4800

Since metabolomics features do not carry peptide-level identifiers, PeptideSequence holds the MZMine row ID (as a string), and PrecursorCharge, FragmentIon, and ProductCharge are all NA. IsotopeLabelType is set to "Light" for every row.