biodbLipidmaps 1.4.1
biodbLipidmaps is a biodb extension package that implements a connector to Lipidmaps Structure (Sud et al. 2007).
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install('biodbLipidmaps')
The first step in using biodbLipidmaps, is to create an instance of the biodb
class BiodbMain
from the main biodb package. This is done by calling the
constructor of the class:
mybiodb <- biodb::newInst()
During this step the configuration is set up, the cache system is initialized and extension packages are loaded.
We will see at the end of this vignette that the biodb instance needs to be
terminated with a call to the terminate()
method.
In biodb connections to databases are handled by connector instances that you can get from the factory. Here is the code to instantiate a connector to Lipidmaps Structure database:
conn <- mybiodb$getFactory()$createConn('lipidmaps.structure')
## Loading required package: biodbLipidmaps
To get the number of entries stored inside the database, run:
conn$getNbEntries()
## [1] NA
To get some of the first entry IDs (accession numbers) from the database, run:
ids <- conn$getEntryIds(2)
ids
## [1] "LMFA00000001" "LMFA00000002"
To retrieve entries, use:
entries <- conn$getEntry(ids)
entries
## [[1]]
## Biodb LIPID MAPS Structure entry instance LMFA00000001.
##
## [[2]]
## Biodb LIPID MAPS Structure entry instance LMFA00000002.
To convert a list of entries into a data frame, run:
x <- mybiodb$entriesToDataframe(entries)
## Loading required package: biodbChebi
x
## accession chebi.id ncbi.pubchem.comp.id
## 1 LMFA00000001 178363 10930192
## 2 LMFA00000002 137783 42607281
## comp.iupac.name.syst monoisotopic.mass
## 1 2-methoxy-12-methyloctadec-17-en-5-ynoyl anhydride 626.4910
## 2 N-(3S-hydroxydecanoyl)-L-serine 275.1733
## formula name
## 1 C40H66O5 2-methoxy-12-methyloctadec-17-en-5-ynoyl anhydride;Acetylenic acids
## 2 C13H25NO5 Serratamic acid
## lipidmaps.structure.id
## 1 LMFA00000001
## 2 LMFA00000002
## inchi
## 1 InChI=1S/C40H66O5/c1-7-9-11-23-29-35(3)31-25-19-15-13-17-21-27-33-37(43-5)39(41)45-40(42)38(44-6)34-28-22-18-14-16-20-26-32-36(4)30-24-12-10-8-2/h7-8,35-38H,1-2,9-16,19-20,23-34H2,3-6H3
## 2 InChI=1S/C13H25NO5/c1-2-3-4-5-6-7-10(16)8-12(17)14-11(9-15)13(18)19/h10-11,15-16H,2-9H2,1H3,(H,14,17)(H,18,19)/t10-,11-/m0/s1
## inchikey molecular.mass
## 1 VOGBKCAANIAXCI-UHFFFAOYSA-N 626.963
## 2 NDDJIMSGSZNACM-QWRGUYRKSA-N 275.342
You can access the web service “LMSDSearch” directly with the wsLmsdSearch method:
ids <- conn$wsLmsdSearch(mode='ProcessStrSearch', name="fatty", retfmt="ids")
ids
## [1] "LMFA01010000" "LMFA01140081" "LMFA01140082" "LMFA01140083" "LMFA01140084"
## [6] "LMFA01140085" "LMFA05000000" "LMFA06000000"
From this list of identifiers, we can obtain the full entry objects:
entries <- conn$getEntry(ids)
And then a data frame:
entriesDf <- mybiodb$entriesToDataframe(entries)
That you can see in table 1.
accession | chebi.id | kegg.compound.id | comp.iupac.name.syst | name | lipidmaps.structure.id | inchi | inchikey | molecular.mass | ncbi.pubchem.comp.id | monoisotopic.mass | formula |
---|---|---|---|---|---|---|---|---|---|---|---|
LMFA01010000 | 35366 | C00162 | fatty acid | fatty acid | LMFA01010000 | NA | NA | 45.0174 | NA | NA | NA |
LMFA01140081 | NA | NA | 2-[5]-ladderane ethanoic acid | 2-[5]-ladderane ethanoic acid;C14-[5]-ladderane fatty acid | LMFA01140081 | NA | NA | NA | 137323820 | 218.1307 | C14H18O2 |
LMFA01140082 | 187485 | NA | 2-[3]-ladderane ethanoic acid | 2-[3]-ladderane ethanoic acid;C14-[3]-ladderane fatty acid | LMFA01140082 | InChI=1S/C14H20O2/c15-12(16)6-7-1-2-10-11(5-7)14-9-4-3-8(9)13(10)14/h7-11,13-14H,1-6H2,(H,15,16) | MZLSFWGEQLSKRL-UHFFFAOYSA-N | 220.3120 | 137323821 | 220.1463 | C14H20O2 |
LMFA01140083 | NA | NA | 8-[1]-ladderane octanoic acid | 8-[1]-ladderane octanoic acid;C20-[1]-ladderane fatty acid | LMFA01140083 | NA | NA | NA | 137323822 | 302.2246 | C20H30O2 |
LMFA01140084 | NA | NA | 8-[1]-ladderane octanoic acid | 8-[1]-ladderane octanoic acid;C20-[1]-ladderane fatty acid | LMFA01140084 | NA | NA | NA | 137323823 | 304.2402 | C20H32O2 |
LMFA01140085 | NA | NA | 6-[1]-ladderane hexanoic acid | 6-[1]-ladderane hexanoic acid;C18-[1]-ladderane fatty acid | LMFA01140085 | NA | NA | NA | 137323824 | 274.1933 | C18H26O2 |
LMFA05000000 | 142622 | NA | NA | Fatty Alcohol | LMFA05000000 | NA | NA | 31.0340 | NA | NA | NA |
LMFA06000000 | 35746 | NA | NA | Fatty Aldehyde | LMFA06000000 | NA | NA | 29.0180 | NA | NA | NA |
When done with your biodb instance you have to terminate it, in order to ensure release of resources (file handles, database connection, etc):
mybiodb$terminate()
## INFO [15:56:20.339] Closing BiodbMain instance...
## INFO [15:56:20.341] Connector "lipidmaps.structure" deleted.
## INFO [15:56:20.349] Connector "chebi" deleted.
sessionInfo()
## R version 4.2.2 (2022-10-31)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.5 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.16-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.16-bioc/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] biodbChebi_1.4.0 biodbLipidmaps_1.4.1 BiocStyle_2.26.0
##
## loaded via a namespace (and not attached):
## [1] progress_1.2.2 tidyselect_1.2.0 xfun_0.35
## [4] bslib_0.4.1 vctrs_0.5.1 generics_0.1.3
## [7] htmltools_0.5.3 BiocFileCache_2.6.0 yaml_2.3.6
## [10] utf8_1.2.2 blob_1.2.3 XML_3.99-0.12
## [13] rlang_1.0.6 jquerylib_0.1.4 pillar_1.8.1
## [16] withr_2.5.0 glue_1.6.2 DBI_1.1.3
## [19] rappdirs_0.3.3 bit64_4.0.5 dbplyr_2.2.1
## [22] lifecycle_1.0.3 plyr_1.8.8 stringr_1.4.1
## [25] memoise_2.0.1 evaluate_0.18 knitr_1.41
## [28] fastmap_1.1.0 curl_4.3.3 fansi_1.0.3
## [31] highr_0.9 biodb_1.6.1 Rcpp_1.0.9
## [34] openssl_2.0.4 filelock_1.0.2 BiocManager_1.30.19
## [37] cachem_1.0.6 jsonlite_1.8.3 bit_4.0.5
## [40] chk_0.8.1 askpass_1.1 hms_1.1.2
## [43] digest_0.6.30 stringi_1.7.8 bookdown_0.30
## [46] dplyr_1.0.10 bitops_1.0-7 cli_3.4.1
## [49] tools_4.2.2 magrittr_2.0.3 sass_0.4.4
## [52] RCurl_1.98-1.9 RSQLite_2.2.19 tibble_3.1.8
## [55] crayon_1.5.2 pkgconfig_2.0.3 ellipsis_0.3.2
## [58] prettyunits_1.1.1 assertthat_0.2.1 rmarkdown_2.18
## [61] httr_1.4.4 lgr_0.4.4 R6_2.5.1
## [64] compiler_4.2.2
Sud, Manish, Eoin Fahy, Dawn Cotter, Alex Brown, Edward A. Dennis, Christopher K. Glass, Alfred H. Merrill Jr., et al. 2007. “LMSD: LIPID Maps Structure Database.” Nucleic Acids Research 35 (Database issue): D527–D532. https://doi.org/10.1093/nar/gkl838.