1 Introduction

BridgeDb is a combination of an application programming interface (API), library, and set of data files for mapping identifiers for identical objects [1]. Because BridgeDb is use by projects in bioinformatics, like WikiPathways [2] and PathVisio [3], identifier mapping databases are available for gene products and metabolites.

Questions can be directed to the BridgeDb Google Group.

The Bioconductor BridgeDbR package page describes how to install the package. After installation, the library can be loaded with the following command:

library(BridgeDbR)
## Loading required package: rJava

2 Concepts

BridgeDb has a few core concepts which are explained in this section. Much of the API requires one to be familiar with these concepts, though some are not always applicable. The first concept is an example of that: organisms, which do not apply to metabolites.

2.1 Organisms

However, for genes the organism is important: the same gene has different identifiers in different organisms. BridgeDb identifies organisms by their latin name and with a two character code. Because identifier mapping files provided by PathVisio have names with these short codes, it can be useful to have a conversion method:

code = getOrganismCode("Rattus norvegicus")
code
## [1] "Rn"

2.2 Data Sources

Identifiers have a context and this context is often a database. For example, metabolite identfiers can be provided by the Human Metabolome Database (HMDB), ChemSpider, PubChem, and ChEBI. Similarly, gene product identifiers can be provided by databases like Ensemble. Such a database providing identifiers is in BridgeDb called a data source.

Importantly, each such data source is identified by a human readable long name and by a short system code. This package has methods to interconvert one into the other:

fullName <- getFullName("Ce")
fullName
## [1] "ChEBI"
code <- getSystemCode("ChEBI")
code
## [1] "Ce"

2.3 Identifier Patterns

Another useful aspect of BridgeDb is that it knows about the patterns of identifiers. If this pattern is unique enough, it can be used used to automatically find the data sources that match a particular identifier. For example:

getMatchingSources("HMDB00555")
##  [1] "SWISS-MODEL"                      "HGNC"                            
##  [3] "Ensembl Plants"                   "NCI Pathway Interaction Database"
##  [5] "EMBL"                             "Wikipedia"                       
##  [7] "LipidBank"                        "KEGG Pathway"                    
##  [9] "HMDB"                             "SUPFAM"                          
## [11] "NCBI Protein"
getMatchingSources("ENSG00000100030")
##  [1] "SWISS-MODEL"                      "HGNC"                            
##  [3] "Ensembl Plants"                   "Ensembl Human"                   
##  [5] "NCI Pathway Interaction Database" "EMBL"                            
##  [7] "Wikipedia"                        "LipidBank"                       
##  [9] "Ensembl"                          "SUPFAM"                          
## [11] "NCBI Protein"

2.4 Identifier Mapping Databases

The BridgeDb package primarily provides the software framework, and not identifier mapping data. Identifier Mapping databases can be downloaded from various websites. The package knows about the download location provided by PathVisio, and we can query for all gene product identifier mapping databases:

getBridgeNames()
##  [1] "Ag_Derby_Ensembl_Metazoa_39.bridge"
##  [2] "An_Derby_Ensembl_Fungi_39.bridge"  
##  [3] "At_Derby_Ensembl_Plant_39.bridge"  
##  [4] "Bs_Derby_Ensembl_91.bridge"        
##  [5] "Bt_Derby_Ensembl_91.bridge"        
##  [6] "Ce_Derby_Ensembl_91.bridge"        
##  [7] "Cf_Derby_Ensembl_91.bridge"        
##  [8] "Ci_Derby_Ensembl_91.bridge"        
##  [9] "Dm_Derby_Ensembl_91.bridge"        
## [10] "Dr_Derby_Ensembl_91.bridge"        
## [11] "Ec_Derby_Ensembl_91.bridge"        
## [12] "Gg_Derby_Ensembl_91.bridge"        
## [13] "Gm_Derby_Ensembl_Plant_39.bridge"  
## [14] "Gz_Derby_Ensembl_Fungi_39.bridge"  
## [15] "Hs_Derby_Ensembl_91.bridge"        
## [16] "Hv_Derby_Ensembl_Plant_39.bridge"  
## [17] "Ml_Derby_Ensembl_91.bridge"        
## [18] "Mm_Derby_Ensembl_91.bridge"        
## [19] "Mx_Derby_Ensembl_91.bridge"        
## [20] "Oa_Derby_Ensembl_91.bridge"        
## [21] "Oi_Derby_Ensembl_Plant_39.bridge"  
## [22] "Oj_Derby_Ensembl_Plant_39.bridge"  
## [23] "Pi_Derby_Ensembl_Plant_39.bridge"  
## [24] "Pt_Derby_Ensembl_91.bridge"        
## [25] "Qc_Derby_Ensembl_91.bridge"        
## [26] "Rn_Derby_Ensembl_91.bridge"        
## [27] "Sc_Derby_Ensembl_91.bridge"        
## [28] "Sl_Derby_Ensembl_Plant_39.bridge"  
## [29] "Ss_Derby_Ensembl_91.bridge"        
## [30] "Vv_Derby_Ensembl_Plant_39.bridge"  
## [31] "Xt_Derby_Ensembl_91.bridge"        
## [32] "Zm_Derby_Ensembl_Plant_39.bridge"

2.5 Downloading

The package provides a convenience method to download such identifier mapping databases. For example, we can save the identifier mapping database for rat to the current folder with:

dbLocation <- getDatabase("Rattus norvegicus",location=getwd())

The dbLocation variable then contains the location of the identifier mapping file that was downloaded.

Mapping databases can also be manually downloaded for genes, metabolites, and gene variants from the following locations:

2.6 Loading Databases

Once you have downloaded an identifier mapping database, either manually or via the getDatabase() method, you need to load the database for the identifier mappings to become available.

mapper <- loadDatabase(dbLocation)

3 Mapping Identifiers

With a loaded database, identifiers can be mapped. The mapping method uses system codes. So, to map the human Entrez Gene identifier (system code: L) 196410 to Affy identifiers (system code: X) we use:

location <- getDatabase("Homo sapiens")
mapper <- loadDatabase(location)
map(mapper, "L", "196410", "X")

Mind you, this returns more than one identifier, as BridgeDb is generally a one to many mapping database.

3.1 Mapping multiple identifiers

For mapping multiple identifiers, for example in a matrix, there is currently no convenience mapping yet (planned), but with an apply method possible with the following code. However, because there can be multiple mapped identifers for a single identifier, we need a bit of extra logic and select a (random) mapped identifier. We can put this in a helper function. Mind you, this helper function has loaded database (“mapper”) and the source and target data sources (HMDB and Wikidata, respectively) hardcoded:

mapToSingle = function(x) {
  mappings = map(mapper, "Ch", x, "Wd")
  if (length(mappings) == 1) {
    result = mappings
  } else {
    result = mappings[1]
  }
  return(result)
}

Let’s assume we have a data frame, data, with a HMDB identifier in the second column, we can append a column with Wikidata identifiers with this code:

wikidata = unlist(sapply(as.character(data[,2]), mapToSingle))
data2 = cbind(c(wikidata,""),data)

4 Metabolomics

While you can download the gene and protein identifier mapping databases with the getDatabase() method, this mapping database cannot be used for metabolites. The mapping database for metabolites will have to be downloaded manually from Figshare, e.g. the February 2018 release version. A full overview of mappings files can be found in this Figshare collection.

Each mapping file record will allow you to download the .bridge file with the mappings.

If reproducibility is important to you, you can download the file with (mind you, these files are quite large):

file <- "metabolites_20180201.bridge"
download.file(
  "https://ndownloader.figshare.com/files/10358973",
  location
)
location = normalizePath(file)
mapper <- loadDatabase(location)

With this mapper you can then map metabolite identifiers:

map(mapper, "456", source="Cs", target="Ck")

References

1. Iersel M van, Pico A, Kelder T, Gao J, Ho I, Hanspers K, Conklin B, Evelo C: The BridgeDb framework: Standardized access to gene, protein and metabolite identifier mapping services. BMC Bioinformatics 2010, 11:5+.

2. Pico AR, Kelder T, Iersel MP van, Hanspers K, Conklin BR, Evelo C: WikiPathways: Pathway editing for the people. PLoS Biol 2008, 6:e184+.

3. Iersel MP van, Kelder T, Pico AR, Hanspers K, Coort S, Conklin BR, Evelo C: Presenting and exploring biological pathways with PathVisio. BMC Bioinformatics {2008}, 9.