As described in the GC-EI-MS tutorial, CluMSID can also be used to analyse low resolution data – although using low resolution data comes at a cost.
In this example, we will use a similar sample (1uL Pseudomonas aeruginosa PA14 cell extract) as in the General Tutorial, measured with similar chromatography but on a different mass spectrometer, a Bruker amaZon ion trap instrument operated in ESI-(+) mode with auto-MS/MS. In addition to introducing a workflow for low resolution LC-MS/MS data, this example also demonstrates that CluMSID can work with data from different types of mass spectrometers.
We load the file from the CluMSIDdata
package:
The extraction of spectra works the same way as with high resolution LC-MS/MS data:
Like in the GC-EI-MS example, we have to adjust mz_tolerance
to a much higher value compared to high resolution data, while the retention time tolerance can remain unchanged.
We see that we have similar numbers of spectra as in the General Tutorial, because we tried to keep all parameters except for the mass spectrometer type comparable.
As we do not have low resolution spectral libraries at hand, we skip the integration of previous knowledge on feature identities in this example and generate a distance matrix right away:
Starting from this distance matrix, we can use all the data exploration functions that CluMSID
offers.
When we now make an MDS plot, we learn that the similarity data is very different from the comparable high resolution data:
To get a better overview of the data and the general similarity behaviour, we create a heat map of the distance matrix:
We clearly see that the heat map is generally a lot “warmer” than in the General Tutorial (an intuition that is supported by the histogram in the top-left corner), i.e. we have a higher general degree of similarity between spectra. That is not surprising as the m/z information has much fewer levels than in high resolution data and fragments of different sum formula are more likely to have indistinguishable mass-to-charge ratios.
We also see that some more or less compact clusters can be identified. This is easier to inspect in the dendrogram visualisation:
In conclusion, CluMSID is capable of processing low resolution LC-MS/MS data and if high resolution data is not available, it can be very useful to provide an overview of spectral similarities in low resolution data, thereby helping metabolite annotation if some individual metabolites can be identified by comparison to authentic standards. However, concerning feature annotation, high resolution methods should always be favoured for the many benefits they provide.
sessionInfo()
#> R version 4.1.0 (2021-05-18)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 20.04.2 LTS
#>
#> Matrix products: default
#> BLAS: /home/biocbuild/bbs-3.13-bioc/R/lib/libRblas.so
#> LAPACK: /home/biocbuild/bbs-3.13-bioc/R/lib/libRlapack.so
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_GB LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> attached base packages:
#> [1] stats4 parallel stats graphics grDevices utils datasets
#> [8] methods base
#>
#> other attached packages:
#> [1] magrittr_2.0.1 metaMSdata_1.27.0 metaMS_1.28.0
#> [4] CAMERA_1.48.0 xcms_3.14.0 MSnbase_2.18.0
#> [7] ProtGenerics_1.24.0 S4Vectors_0.30.0 mzR_2.26.0
#> [10] Rcpp_1.0.6 BiocParallel_1.26.0 Biobase_2.52.0
#> [13] BiocGenerics_0.38.0 CluMSIDdata_1.7.0 CluMSID_1.8.0
#>
#> loaded via a namespace (and not attached):
#> [1] backports_1.2.1 Hmisc_4.5-0
#> [3] plyr_1.8.6 igraph_1.2.6
#> [5] lazyeval_0.2.2 splines_4.1.0
#> [7] GenomeInfoDb_1.28.0 ggplot2_3.3.3
#> [9] digest_0.6.27 foreach_1.5.1
#> [11] htmltools_0.5.1.1 fansi_0.4.2
#> [13] checkmate_2.0.0 rle_0.9.2
#> [15] cluster_2.1.2 doParallel_1.0.16
#> [17] limma_3.48.0 readr_1.4.0
#> [19] sna_2.6 matrixStats_0.58.0
#> [21] jpeg_0.1-8.1 colorspace_2.0-1
#> [23] xfun_0.23 dplyr_1.0.6
#> [25] crayon_1.4.1 RCurl_1.98-1.3
#> [27] jsonlite_1.7.2 graph_1.70.0
#> [29] impute_1.66.0 survival_3.2-11
#> [31] iterators_1.0.13 ape_5.5
#> [33] glue_1.4.2 gtable_0.3.0
#> [35] zlibbioc_1.38.0 XVector_0.32.0
#> [37] DelayedArray_0.18.0 DEoptimR_1.0-8
#> [39] scales_1.1.1 vsn_3.60.0
#> [41] DBI_1.1.1 GGally_2.1.1
#> [43] viridisLite_0.4.0 htmlTable_2.2.1
#> [45] clue_0.3-59 foreign_0.8-81
#> [47] preprocessCore_1.54.0 Formula_1.2-4
#> [49] MsCoreUtils_1.4.0 htmlwidgets_1.5.3
#> [51] httr_1.4.2 gplots_3.1.1
#> [53] RColorBrewer_1.1-2 ellipsis_0.3.2
#> [55] farver_2.1.0 pkgconfig_2.0.3
#> [57] reshape_0.8.8 XML_3.99-0.6
#> [59] nnet_7.3-16 sass_0.4.0
#> [61] utf8_1.2.1 labeling_0.4.2
#> [63] tidyselect_1.1.1 rlang_0.4.11
#> [65] munsell_0.5.0 tools_4.1.0
#> [67] cli_2.5.0 dbscan_1.1-8
#> [69] generics_0.1.0 statnet.common_4.4.1
#> [71] evaluate_0.14 stringr_1.4.0
#> [73] mzID_1.30.0 yaml_2.2.1
#> [75] knitr_1.33 robustbase_0.93-7
#> [77] caTools_1.18.2 purrr_0.3.4
#> [79] RANN_2.6.1 ncdf4_1.17
#> [81] RBGL_1.68.0 nlme_3.1-152
#> [83] compiler_4.1.0 rstudioapi_0.13
#> [85] plotly_4.9.3 png_0.1-7
#> [87] affyio_1.62.0 MassSpecWavelet_1.58.0
#> [89] tibble_3.1.2 bslib_0.2.5.1
#> [91] stringi_1.6.2 ps_1.6.0
#> [93] highr_0.9 lattice_0.20-44
#> [95] Matrix_1.3-3 vctrs_0.3.8
#> [97] pillar_1.6.1 lifecycle_1.0.0
#> [99] BiocManager_1.30.15 jquerylib_0.1.4
#> [101] MALDIquant_1.19.3 data.table_1.14.0
#> [103] bitops_1.0-7 GenomicRanges_1.44.0
#> [105] R6_2.5.0 latticeExtra_0.6-29
#> [107] pcaMethods_1.84.0 affy_1.70.0
#> [109] network_1.16.1 KernSmooth_2.23-20
#> [111] gridExtra_2.3 IRanges_2.26.0
#> [113] codetools_0.2-18 MASS_7.3-54
#> [115] gtools_3.8.2 assertthat_0.2.1
#> [117] SummarizedExperiment_1.22.0 GenomeInfoDbData_1.2.6
#> [119] hms_1.1.0 grid_4.1.0
#> [121] rpart_4.1-15 tidyr_1.1.3
#> [123] coda_0.19-4 rmarkdown_2.8
#> [125] MatrixGenerics_1.4.0 base64enc_0.1-3