Package: MsBackendMsp
Authors: Neumann Steffen [aut] (https://orcid.org/0000-0002-7899-7192),
Johannes Rainer [aut, cre] (https://orcid.org/0000-0002-6977-7147),
Michael Witting [ctb] (https://orcid.org/0000-0002-1462-4426)
Compiled: Tue Nov 1 18:07:26 2022
The Spectra package provides a central infrastructure for the
handling of Mass Spectrometry (MS) data. The package supports interchangeable
use of different backends to import MS data from a variety of sources (such as
mzML files). The MsBackendMsp
package adds support for files in NIST MSP
format which are frequently used to share spectra libraries and hence enhances
small compound annotation workflows using the Spectra
and
MetaboAnnotation packages (Rainer et al. 2022). This vignette
illustrates the usage of the MsBackendMsp
package and how it can be used to
import and export data in MSP file format.
To install this package, start R
and enter:
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("MsBackendMsp")
This will install this package and all eventually missing dependencies.
The MSP file format allows to store MS/MS spectra (m/z and intensity of mass peaks) along with additional annotations for each spectrum. A single MSP file can thus contain a single or multiple spectra. Below we load the package and define the file name of an MSP file which is distributed with this package.
library(MsBackendMsp)
nist <- system.file("extdata", "spectrum2.msp", package = "MsBackendMsp")
We next import the data into a Spectra
object by specifying in the constructor
function the backend class which can be used to read the data (in our case a
MsBackendMsp
).
sp <- Spectra(nist, source = MsBackendMsp())
## Start data import from 1 files ... done
With that we have now full access to all imported spectra variables that can be
listed with the spectraVariables
function.
spectraVariables(sp)
## [1] "msLevel" "rtime"
## [3] "acquisitionNum" "scanIndex"
## [5] "dataStorage" "dataOrigin"
## [7] "centroided" "smoothed"
## [9] "polarity" "precScanNum"
## [11] "precursorMz" "precursorIntensity"
## [13] "precursorCharge" "collisionEnergy"
## [15] "isolationWindowLowerMz" "isolationWindowTargetMz"
## [17] "isolationWindowUpperMz" "name"
## [19] "adduct" "INSTRUMENTTYPE"
## [21] "instrument" "smiles"
## [23] "inchikey" "inchi"
## [25] "formula" "PUBCHEMID"
## [27] "SOURCE" "COMMENT"
## [29] "Num.Peaks"
Besides default spectra variables, such as msLevel
, rtime
, precursorMz
, we
also have additional spectra variables such as the name
or adduct
that are
additional data fields from the MSP file.
sp$msLevel
## [1] 2 2
sp$name
## [1] "3-Hydroxy-3-(2-(2-hydroxyphenyl)-2-oxoethyl)-1,3-dihydro-2H-indol-2-one"
## [2] "5-(4-Ethoxybenzylidene)-2-(4-morpholinyl)-1,3-thiazol-4(5H)-one"
sp$adduct
## [1] "[M+H]+" "[M+H]+"
The NIST file format is however only loosely defined and variety of flavors
(or dialects) exist which define their own data fields or use different names
for the fields. The MsBackendMsp
supports data import/export from all MSP
format variations by defining and providing different mappings between MSP data
fields and spectra variables. Also user-defined mappings can be used, which
makes import from any MSP flavor possible. Pre-defined mappings between MSP data
fields and spectra variables (i.e. variables within the Spectra
object) are
returned by the spectraVariableMapping
function.
spectraVariableMapping(MsBackendMsp())
## name accession formula inchikey adduct
## "NAME" "DB#" "FORMULA" "INCHIKEY" "PRECURSORTYPE"
## exactmass rtime precursorMz adduct smiles
## "EXACTMASS" "RETENTIONTIME" "PRECURSORMZ" "PRECURSORTYPE" "SMILES"
## inchi polarity instrument
## "INCHI" "IONMODE" "INSTRUMENT"
The names of this character
vector represent the spectra variable names and
the values of the vector the MSP data fields. Note that by default, also all
data fields for which no mapping is provided are imported (with the field name
being used as spectra variable name).
This default mapping works well for MSP files from NIST or from other tools such as MS-DIAL. MassBank of North America MoNA however, uses a slightly different format. Below we read the first 6 lines of a MSP file from MoNA.
mona <- system.file("extdata", "minimona.msp", package = "MsBackendMsp")
head(readLines(mona))
## [1] "Name: Ritonavir"
## [2] "Synon: $:00in-source"
## [3] "DB#: MoNA000010"
## [4] "InChIKey: NCDNCNXCDXHOMX-XGKFQTDJSA-N"
## [5] "Instrument_type: Waters Synapt G2"
## [6] "Formula: C37H48N6O5S2"
The first 6 lines from a NIST MSP file:
head(readLines(nist))
## [1] "NAME: 3-Hydroxy-3-(2-(2-hydroxyphenyl)-2-oxoethyl)-1,3-dihydro-2H-indol-2-one"
## [2] "PRECURSORMZ: 284.0917"
## [3] "PRECURSORTYPE: [M+H]+"
## [4] "INSTRUMENTTYPE: IT/ion trap"
## [5] "INSTRUMENT: Thermo Finnigan LCQ Deca"
## [6] "SMILES: NA"
MSP files with MoNA flavor use slightly different field names, that are also not
all upper case, and also additional fields are defined. While it is possible to
import MoNA flavored MSP files using the default variable mapping that was used
above, most of the spectra variables would however not mapped correctly to the
respective spectra variable in the resulting Spectra
object (e.g. the
precursor m/z would not be available with the expected spectra variable
$precursorMz
).
The spectraVariableMapping
provides however also the mapping for MSP files
with MoNA flavor.
spectraVariableMapping(MsBackendMsp(), "mona")
## name synonym accession
## "Name" "Synon" "DB#"
## inchikey adduct precursorMz
## "InChIKey" "Precursor_type" "PrecursorMZ"
## polarity formula exactmass
## "Ion_mode" "Formula" "ExactMass"
## collision_energy_text msLevel
## "Collision_energy" "Spectrum_type"
Using this mapping in the data import will ensure that the fields get correctly mapped.
sp_mona <- Spectra(mona, source = MsBackendMsp(),
mapping = spectraVariableMapping(MsBackendMsp(), "mona"))
## Start data import from 1 files ... done
sp_mona$precursorMz
## [1] NA 189.1603 265.1188 265.1188 263.1031 263.1031 229.1552 312.1302
## [9] 525.4990 525.4990 525.4990 525.4990 525.4990 525.4990 525.4990 525.4990
## [17] 525.4990 525.4990 525.4990 525.4990 539.5146 539.5146 539.5146 539.5146
## [25] 539.5146 539.5146 539.5146 539.5146 539.5146 539.5146
Note that in addition to the predefined variable mappings, it is also possible
to provide any user-defined variable mapping with the mapping
parameter thus
enabling to import from MSP files with a highly customized format.
Multiple values for a certain spectrum are represented as duplicated fields in
an MSP file. The MsBackendMsp
supports also import of such data. MoNA MSP
files use for example multiple "Synon"
fields to list all synonyms of a
compound. Below we extract such values for two spectra within our Spectra
object from MoNA.
sp_mona[29:30]$synonym
## [[1]]
## [1] "$:00 ms2" "$:05 30V CID"
## [3] "$:07 In-Silico-Spectrum" "$:00in-source"
##
## [[2]]
## [1] "$:00 ms2" "$:05 30V CID"
## [3] "$:07 In-Silico-Spectrum" "$:00in-source"
In addition to importing data from MSP files, MsBackendMsp
allows also to
export Spectra
to files in MSP format. Below we export for example the
Spectra
with data from MoNA to a temporary file, using the default NIST MSP
format.
tmpf <- tempfile()
export(sp_mona, backend = MsBackendMsp(), file = tmpf,
mapping = spectraVariableMapping(MsBackendMsp()))
head(readLines(tmpf))
## [1] "NAME: Ritonavir"
## [2] "msLevel: MS2"
## [3] "synonym: $:00in-source"
## [4] "DB#: MoNA000010"
## [5] "INCHIKEY: NCDNCNXCDXHOMX-XGKFQTDJSA-N"
## [6] "Instrument_type: Waters Synapt G2"
Or export the Spectra
with data in NIST MSP format to a MSP file with MoNA
flavor.
tmpf <- tempfile()
export(sp, backend = MsBackendMsp(), file = tmpf,
mapping = spectraVariableMapping(MsBackendMsp(), "mona"))
head(readLines(tmpf))
## [1] "Name: 3-Hydroxy-3-(2-(2-hydroxyphenyl)-2-oxoethyl)-1,3-dihydro-2H-indol-2-one"
## [2] "Spectrum_type: MS2"
## [3] "Ion_mode: Positive"
## [4] "PrecursorMZ: 284.0917"
## [5] "Precursor_type: [M+H]+"
## [6] "INSTRUMENTTYPE: IT/ion trap"
Thus, this could also be used to convert between MSP files with different flavors.
sessionInfo()
## R version 4.2.1 (2022-06-23)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.5 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.16-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.16-bioc/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] MsBackendMsp_1.2.0 Spectra_1.8.0 ProtGenerics_1.30.0
## [4] BiocParallel_1.32.0 S4Vectors_0.36.0 BiocGenerics_0.44.0
## [7] BiocStyle_2.26.0
##
## loaded via a namespace (and not attached):
## [1] cluster_2.1.4 knitr_1.40 magrittr_2.0.3
## [4] MASS_7.3-58.1 MsCoreUtils_1.10.0 IRanges_2.32.0
## [7] clue_0.3-62 R6_2.5.1 rlang_1.0.6
## [10] fastmap_1.1.0 stringr_1.4.1 tools_4.2.1
## [13] parallel_4.2.1 xfun_0.34 cli_3.4.1
## [16] jquerylib_0.1.4 htmltools_0.5.3 yaml_2.3.6
## [19] digest_0.6.30 bookdown_0.29 BiocManager_1.30.19
## [22] fs_1.5.2 sass_0.4.2 codetools_0.2-18
## [25] cachem_1.0.6 evaluate_0.17 rmarkdown_2.17
## [28] stringi_1.7.8 compiler_4.2.1 bslib_0.4.0
## [31] jsonlite_1.8.3
Rainer, Johannes, Andrea Vicini, Liesa Salzer, Jan Stanstrup, Josep M. Badia, Steffen Neumann, Michael A. Stravs, et al. 2022. “A Modular and Expandable Ecosystem for Metabolomics Data Annotation in R.” Metabolites 12 (2): 173. https://doi.org/10.3390/metabo12020173.