biodbKegg 1.6.1
biodbKegg is a biodb extension package that implements a connector to KEGG Compound database (Kanehisa and Goto 2000).
Install using Bioconductor:
if (!requireNamespace("BiocManager", quietly=TRUE))
install.packages("BiocManager")
BiocManager::install('biodbKegg')
The first step in using biodbKegg, is to create an instance of the biodb
class BiodbMain
from the main biodb package. This is done by calling the
constructor of the class:
mybiodb <- biodb::newInst()
During this step the configuration is set up, the cache system is initialized and extension packages are loaded.
We will see at the end of this vignette that the biodb instance needs to be
terminated with a call to the terminate()
method.
In biodb the connection to a database is handled by a connector instance that you can get from the factory. biodbKegg implements a connector to a remote database. Here is the code to instantiate a connector:
kegg.comp.conn <- mybiodb$getFactory()$createConn('kegg.compound')
## Loading required package: biodbKegg
To retrieve entries, use:
entries <- kegg.comp.conn$getEntry(c('C00133', 'C00751'))
entries
## [[1]]
## Biodb KEGG Compound entry instance C00133.
##
## [[2]]
## Biodb KEGG Compound entry instance C00751.
To convert a list of entries into a dataframe, run:
x <- mybiodb$entriesToDataframe(entries, compute=FALSE)
x
## accession monoisotopic.mass formula molecular.mass
## 1 C00133 89.0477 C3H7NO2 89.0932
## 2 C00751 410.3913 C30H50 410.7180
## name cas.id ncbi.pubchem.comp.id
## 1 D-Alanine;D-2-Aminopropionic acid;D-Ala 338-69-2 3433
## 2 Squalene;Spinacene;Supraene 111-02-4 4013
## chebi.id
## 1 15570
## 2 15440
## kegg.reaction.id
## 1 R00399;R00401;R01147;R01148;R01149;R01150;R01225;R01344;R02718;R04369;R04611;R05861;R07651;R08850;R09588;R09595;R11965;R12557;R12812;R12863;R12867;R12871;R12873;R12875;R12904
## 2 R02872;R02874;R02875;R02876;R06223;R07322;R07323;R08535;R09712;R10167;R10169;R11401;R12355
## kegg.enzyme.id
## 1 1.4.3.3;1.4.3.19;2.1.2.7;2.3.1.263;2.3.2.14;2.6.1.21;3.1.1.103;3.4.13.22;3.4.17.8;5.1.1.1;6.1.1.13;6.1.2.1;6.3.2.4;6.3.2.16;6.3.2.35
## 2 1.3.1.96;1.14.14.17;1.14.19.-;1.17.8.1;2.1.1.262;2.5.1.21;4.2.1.123;4.2.1.129;5.4.99.17;5.4.99.37
## kegg.pathway.id
## 1 map00470;map00550;map00552;map01100;map01502;map01503;map04742
## 2 map00100;map00909;map00996;map00999;map01060;map01062;map01066;map01070;map01100;map01110
## kegg.compound.id lipidmaps.structure.id
## 1 C00133 <NA>
## 2 C00751 LMPR0106010002
ids <- kegg.comp.conn$searchForEntries(list(monoisotopic.mass=list(value=64, delta=2.0)), max.results=10)
entries <- mybiodb$getFactory()$getEntry('kegg.compound', ids)
If you have a data frame containing a column with KEGG Compound IDs, you can add information such as associated KEGG Enzymes, associated KEGG Pathways and KEGG Modules to your data frame, for a specific organism.
For the example we use the list of compound IDs we already have, to construct a data frame:
kegg.comp.ids <- c('C06144', 'C06178', 'C02659')
mydf <- data.frame(kegg.ids=kegg.comp.ids)
Using the addInfo()
method of KeggCompoundConn
class, we add information
about pathways, enzymes and modules for these compounds:
kegg.comp.conn$addInfo(mydf, id.col='kegg.ids', org='mmu')
## kegg.ids kegg.enzyme.id
## 1 C06144 4.2.1.27
## 2 C06178 1.4.3.21|2.3.1.74
## 3 C02659 1.14.14.41|2.4.1.63|3.2.1.21
## kegg.reaction.id
## 1 R01611;R01367
## 2 R01853;R02382;R02529|R01613;R07987;R07988
## 3 R10030;R10034;R11597|R03625;R04948;R10037|R00026;R00306;R02558
## kegg.pathway.id
## 1 mmu00650|mmu01100
## 2 mmu00760|mmu01100
## 3 mmu01100
## kegg.pathway.name
## 1 Butanoate metabolism - Mus musculus (house mouse)|Metabolic pathways - Mus musculus (house mouse)
## 2 Nicotinate and nicotinamide metabolism - Mus musculus (house mouse)|Metabolic pathways - Mus musculus (house mouse)
## 3 Metabolic pathways - Mus musculus (house mouse)
## kegg.pathway.pathway.class kegg.module.id
## 1 Metabolism;Carbohydrate metabolism|NA M00027|M00001|M00002
## 2 Metabolism;Metabolism of cofactors and vitamins|NA M00912|M00001|M00002
## 3 <NA> M00001|M00002|M00003
## kegg.module.name
## 1 GABA (gamma-Aminobutyrate) shunt|Glycolysis (Embden-Meyerhof pathway), glucose => pyruvate|Glycolysis, core module involving three-carbon compounds
## 2 NAD biosynthesis, tryptophan => quinolinate => NAD|Glycolysis (Embden-Meyerhof pathway), glucose => pyruvate|Glycolysis, core module involving three-carbon compounds
## 3 Glycolysis (Embden-Meyerhof pathway), glucose => pyruvate|Glycolysis, core module involving three-carbon compounds|Gluconeogenesis, oxaloacetate => fructose-6P
Note that, by default, the number of values for each field is limited to 3.
Please see the help page of KeggCompoundConn
for more information about
addInfo()
, and a description of all parameters.
The list of organisms is available at https://www.genome.jp/kegg/catalog/org_list.html.
When done with your biodb instance you have to terminate it, in order to ensure release of resources (file handles, database connection, etc):
mybiodb$terminate()
## INFO [16:00:42.122] Closing BiodbMain instance...
## INFO [16:00:42.124] Connector "kegg.compound" deleted.
## INFO [16:00:42.132] Connector "kegg.enzyme" deleted.
## INFO [16:00:42.133] Connector "kegg.pathway" deleted.
## INFO [16:00:42.135] Connector "kegg.module" deleted.
## INFO [16:00:42.136] Connector "kegg.reaction" deleted.
sessionInfo()
## R version 4.3.1 (2023-06-16)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.2 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.17-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] biodbKegg_1.6.1 BiocStyle_2.28.0
##
## loaded via a namespace (and not attached):
## [1] rappdirs_0.3.3 sass_0.4.7 utf8_1.2.3
## [4] generics_0.1.3 stringi_1.7.12 RSQLite_2.3.1
## [7] hms_1.1.3 digest_0.6.33 magrittr_2.0.3
## [10] evaluate_0.21 bookdown_0.34 fastmap_1.1.1
## [13] blob_1.2.4 plyr_1.8.8 jsonlite_1.8.7
## [16] progress_1.2.2 DBI_1.1.3 BiocManager_1.30.21.1
## [19] httr_1.4.6 fansi_1.0.4 XML_3.99-0.14
## [22] jquerylib_0.1.4 cli_3.6.1 rlang_1.1.1
## [25] chk_0.9.0 crayon_1.5.2 dbplyr_2.3.3
## [28] bit64_4.0.5 withr_2.5.0 cachem_1.0.8
## [31] yaml_2.3.7 tools_4.3.1 memoise_2.0.1
## [34] biodb_1.8.0 dplyr_1.1.2 filelock_1.0.2
## [37] curl_5.0.1 vctrs_0.6.3 R6_2.5.1
## [40] magick_2.7.4 BiocFileCache_2.8.0 lifecycle_1.0.3
## [43] stringr_1.5.0 bit_4.0.5 pkgconfig_2.0.3
## [46] pillar_1.9.0 bslib_0.5.0 glue_1.6.2
## [49] Rcpp_1.0.11 lgr_0.4.4 highr_0.10
## [52] xfun_0.39 tibble_3.2.1 tidyselect_1.2.0
## [55] knitr_1.43 igraph_1.5.0.1 htmltools_0.5.5
## [58] rmarkdown_2.23 compiler_4.3.1 prettyunits_1.1.1
## [61] askpass_1.1 openssl_2.1.0
Kanehisa, Minoru, and Susumu Goto. 2000. “KEGG: Kyoto Encyclopedia of Genes and Genomes.” Nucleic Acids Research 28 (1): 27–30. https://doi.org/10.1093/nar/28.1.27.