How to query the Ontology Lookup Service directly from R and how to create and parse controlled vocabulary.
rols 3.0.0
rols is a Bioconductor package and should hence be installed using the dedicated functionality
## try http:// if https:// URLs are not supported
if (!requireNamespace("BiocManager", quietly=TRUE))
install.packages("BiocManager")
BiocManager::install("rols")
To get help, either post your question on the Bioconductor support site or open an issue on the rols github page.
The Ontology Lookup Service (OLS) [1, 2] is originally spin-off of the PRoteomics IDEntifications database (PRIDE) service, located at the EBI, and is now developed and maintained by the Samples, Phenotypes and Ontologies team at EMBL-EBI.
The OLS provides a REST interface to hundreds of ontologies from a single location with a unified output format. The rols package make this possible from within R. Do do so, it relies on the httr package to query the REST interface, and access and retrieve data.
There are 256 ontologies available in the OLS, listed in the table below. Their name is to be use to defined which ontology to query.
The rols package is build around a few classes that enable to query the OLS and retrieve, store and manipulate data. Each of these classes are described in more details in their respective manual pages. We start by loading the package.
library("rols")
The Ontology
and Ontologies
classes can store information about
single of multiple ontologies. The latter can be easily subset using
[
and [[
, as one would for lists.
ol <- Ontologies()
## ⠙ Iterating 13 done (5.9/s) | 2.2s
ol
## Object of class 'Ontologies' with 256 entries
## ADO, AGRO ... CCF, CPONT
head(olsNamespace(ol))
## [1] "ado" "agro" "aism" "amphx" "apo" "apollo_sv"
ol[["bspo"]]
## Ontology: Biological Spatial Ontology (bspo)
## An ontology for respresenting spatial concepts, anatomical axes,
## gradients, regions, planes, sides and surfaces. These concepts can be
## used at multiple biological scales and in a diversity of taxa,
## including plants, animals and fungi. The BSPO is used to provide a
## source of anatomical location descriptors for logically defining
## anatomical entity classes in anatomy ontologies.
## Loaded: 2024-04-29 Updated: 2024-04-29 Version: 2023-05-27
## 169 terms 236 properties 18 individuals
It is also possible to initialise a single ontology
bspo <- Ontology("bspo")
bspo
## Ontology: Biological Spatial Ontology (bspo)
## An ontology for respresenting spatial concepts, anatomical axes,
## gradients, regions, planes, sides and surfaces. These concepts can be
## used at multiple biological scales and in a diversity of taxa,
## including plants, animals and fungi. The BSPO is used to provide a
## source of anatomical location descriptors for logically defining
## anatomical entity classes in anatomy ontologies.
## Loaded: 2024-04-29 Updated: 2024-04-29 Version: 2023-05-27
## 169 terms 236 properties 18 individuals
Single ontology terms are stored in Term
objects. When more terms
need to be manipulated, they are stored as Terms
objects. It is easy
to obtain all terms of an ontology of interest, and the resulting
Terms
object can be subset using [
and [[
, as one would for
lists.
bspotrms <- Terms(bspo) ## or Terms("bspo")
bspotrms
## Object of class 'Terms' with 169 entries
## From the BSPO ontology
## BFO:0000002, BFO:0000003 ... IAO:0000409, PATO:0000001
bspotrms[1:10]
## Object of class 'Terms' with 10 entries
## From the BSPO ontology
## BFO:0000002, BFO:0000003 ... BFO:0000023, BFO:0000031
bspotrms[["BSPO:0000092"]]
## A Term from the BSPO ontology: BSPO:0000092
## Label: anatomical compartment boundary
## to be merged into CARO
It is also possible to initialise a single term
trm <- Term(bspo, "BSPO:0000092")
termId(trm)
## [1] "BSPO:0000092"
termLabel(trm)
## [1] "anatomical compartment boundary"
It is then possible to extract the ancestors
, descendants
,
parents
and children
terms. Each of these functions return a
Terms
object
parents(trm)
## Object of class 'Terms' with 1 entries
## From the BSPO ontology
## CARO:0000010
children(trm)
## Object of class 'Terms' with 6 entries
## From the BSPO ontology
## BSPO:0000094, BSPO:0000093 ... BSPO:0000041, BSPO:0000040
Finally, a single term or terms object can be coerced to a
data.frame
using as(x, "data.frame")
.
Properties (relationships) of single or multiple terms or complete
ontologies can be queries with the properties
method, as briefly
illustrated below.
trm <- Term("uberon", "UBERON:0002107")
trm
## A Term from the UBERON ontology: UBERON:0002107
## Label: liver
## An exocrine gland which secretes bile and functions in metabolism of
## protein and carbohydrate and fat, synthesizes substances involved in
## the clotting of the blood, synthesizes vitamin A, detoxifies poisonous
## substances, stores glycogen, and breaks down worn-out erythrocytes[GO].
p <- Properties(trm)
p
## Object of class 'Properties' with 160 entries
## From the UBERON ontology
## hepatobiliary system, exocrine system ... liver serosa, liver subserosa
p[[1]]
## A Property from the UBERON ontology: UBERON:0002423
## Label: hepatobiliary system
termLabel(p[[1]])
## [1] "hepatobiliary system"
A researcher might be interested in the trans-Golgi network. Searching
the OLS is assured by the OlsSearch
and olsSearch
classes/functions. The first step is to defined the search query with
OlsSearch
, as shown below. This creates an search object of class
OlsSearch
that stores the query and its parameters. In records the
number of requested results (default is 20) and the total number of
possible results (there are 16946 results across all
ontologies, in this case). At this stage, the results have not yet
been downloaded, as shown by the 0 responses.
OlsSearch(q = "trans-golgi network")
## Object of class 'OlsSearch':
## query: trans-golgi network
## requested: 20 (out of 16946)
## response(s): 0
16946 results are probably too many to be
relevant. Below we show how to perform an exact search by setting
exact = TRUE
, and limiting the search the the GO ontology by
specifying ontology = "GO"
, or doing both.
OlsSearch(q = "trans-golgi network", exact = TRUE)
## Object of class 'OlsSearch':
## query: trans-golgi network
## requested: 20 (out of 221)
## response(s): 0
OlsSearch(q = "trans-golgi network", ontology = "GO")
## Object of class 'OlsSearch':
## ontolgy: GO
## query: trans-golgi network
## requested: 20 (out of 1097)
## response(s): 0
OlsSearch(q = "trans-golgi network", ontology = "GO", exact = TRUE)
## Object of class 'OlsSearch':
## ontolgy: GO
## query: trans-golgi network
## requested: 20 (out of 25)
## response(s): 0
One case set the rows
argument to set the number of desired results.
OlsSearch(q = "trans-golgi network", ontology = "GO", rows = 200)
## Object of class 'OlsSearch':
## ontolgy: GO
## query: trans-golgi network
## requested: 200 (out of 1097)
## response(s): 0
See ?OlsSearch
for details about retrieving many results.
Let’s proceed with the exact search and retrieve the results. Even if
we request the default 20 results, only the 221 relevant
result will be retrieved. The olsSearch
function updates the
previously created object (called qry
below) by adding the results
to it.
qry <- OlsSearch(q = "trans-golgi network", exact = TRUE)
(qry <- olsSearch(qry))
## Object of class 'OlsSearch':
## query: trans-golgi network
## requested: 20 (out of 221)
## response(s): 20
We can now transform this search result object into a fully fledged
Terms
object or a data.frame
.
(qtrms <- as(qry, "Terms"))
## Object of class 'Terms' with 20 entries
## From the NCIT, PR, GO, ZP, PW ontologies
## NCIT:C33802, PR:O43493 ... GO:0042147, PW:0000426
str(qdrf <- as(qry, "data.frame"))
## 'data.frame': 20 obs. of 8 variables:
## $ iri : chr "http://purl.obolibrary.org/obo/NCIT_C33802" "http://purl.obolibrary.org/obo/PR_O43493" "http://purl.obolibrary.org/obo/GO_0005802" "http://purl.obolibrary.org/obo/GO_0032588" ...
## $ ontology_name : chr "ncit" "pr" "go" "go" ...
## $ ontology_prefix: chr "NCIT" "PR" "GO" "GO" ...
## $ short_form : chr "NCIT_C33802" "PR_O43493" "GO_0005802" "GO_0032588" ...
## $ description :List of 20
## ..$ : chr "A network of membrane components where vesicles bud off the Golgi apparatus to bring proteins, membranes and ot"| __truncated__
## ..$ : chr "A trans-Golgi network integral membrane protein 2 that is encoded in the genome of human." "Category=organism-gene."
## ..$ : chr "The network of interconnected tubular and cisternal structures located within the Golgi apparatus on the side d"| __truncated__ "There are different opinions about whether the TGN should be considered part of the Golgi apparatus or not. We "| __truncated__
## ..$ : chr "The lipid bilayer surrounding any of the compartments that make up the trans-Golgi network."
## ..$ : chr "Abnormal(ly) mislocalised (of) enterocyte of trans-Golgi network."
## ..$ : chr "A vesicle that mediates transport between the trans-Golgi network and other parts of the cell."
## ..$ : chr "The side (leaflet) of the trans-Golgi network transport vesicle membrane that faces the lumen."
## ..$ : chr "The side (leaflet) of the trans-Golgi network transport vesicle membrane that faces the cytoplasm."
## ..$ : chr "The lipid bilayer surrounding a vesicle transporting substances between the trans-Golgi network and other parts of the cell."
## ..$ : chr "The volume enclosed within the membrane of a trans-Golgi network transport vesicle."
## ..$ : chr "A clathrin coat found on a vesicle of the trans-Golgi network."
## ..$ : chr "A trans-Golgi network integral membrane protein 1 that is encoded in the genome of rat." "Category=organism-gene."
## ..$ : chr "A process which results in the assembly, arrangement of constituent parts, or disassembly of a trans-Golgi network membrane."
## ..$ : chr "A trans-Golgi network integral membrane protein 1 that is encoded in the genome of mouse." "Category=organism-gene."
## ..$ : chr "The directed movement of substances, in membrane-bounded vesicles, from the trans-Golgi network to the recycling endosomes."
## ..$ : chr "The directed movement of substances from the plasma membrane back to the trans-Golgi network, mediated by vesicles."
## ..$ : chr "The directed movement of proteins from the Golgi to the plasma membrane in transport vesicles that move from th"| __truncated__
## ..$ : chr "The directed movement of substances from the vacuole to the trans-Golgi network; this occurs in yeast via the p"| __truncated__
## ..$ : chr "The directed movement of membrane-bounded vesicles from endosomes back to the trans-Golgi network where they ar"| __truncated__
## ..$ : chr "In the secretory pathway, protein sorting, mainly in trans-Golgi Network (TGN), but also in other compartments,"| __truncated__
## $ label : chr "Trans-Golgi Network" "trans-Golgi network integral membrane protein 2 (human)" "trans-Golgi network" "trans-Golgi network membrane" ...
## $ obo_id : chr "NCIT:C33802" "PR:O43493" "GO:0005802" "GO:0032588" ...
## $ type : chr "class" "class" "class" "class" ...
In this case, we can see that we actually retrieve the same term used across different ontologies. In such cases, it might be useful to keep only non-redundant term instances. Here, this would have been equivalent to searching the ncit, pr, go, go, zp, go, go, go, go, go, go, pr, go, pr, go, go, go, go, go, pw ontology
qtrms <- unique(qtrms)
termOntology(qtrms)
## NCIT:C33802 PR:O43493 GO:0005802 GO:0032588 ZP:0142408 GO:0030140
## "ncit" "pr" "go" "go" "zp" "go"
## GO:0098540 GO:0098541 GO:0012510 GO:0098564 GO:0030130 PR:P19814
## "go" "go" "go" "go" "go" "pr"
## GO:0098629 PR:Q62313 GO:0044795 GO:0035526 GO:0043001 GO:0045018
## "go" "pr" "go" "go" "go" "go"
## GO:0042147 PW:0000426
## "go" "pw"
termNamespace(qtrms)
## $`NCIT:C33802`
## NULL
##
## $`PR:O43493`
## [1] "protein"
##
## $`GO:0005802`
## [1] "cellular_component"
##
## $`GO:0032588`
## [1] "cellular_component"
##
## $`ZP:0142408`
## NULL
##
## $`GO:0030140`
## [1] "cellular_component"
##
## $`GO:0098540`
## [1] "cellular_component"
##
## $`GO:0098541`
## [1] "cellular_component"
##
## $`GO:0012510`
## [1] "cellular_component"
##
## $`GO:0098564`
## [1] "cellular_component"
##
## $`GO:0030130`
## [1] "cellular_component"
##
## $`PR:P19814`
## [1] "protein"
##
## $`GO:0098629`
## [1] "biological_process"
##
## $`PR:Q62313`
## [1] "protein"
##
## $`GO:0044795`
## [1] "biological_process"
##
## $`GO:0035526`
## [1] "biological_process"
##
## $`GO:0043001`
## [1] "biological_process"
##
## $`GO:0045018`
## [1] "biological_process"
##
## $`GO:0042147`
## [1] "biological_process"
##
## $`PW:0000426`
## [1] "pathway"
Below, we execute the same query using the GO.db package.
library("GO.db")
GOTERM[["GO:0005802"]]
## GOID: GO:0005802
## Term: trans-Golgi network
## Ontology: CC
## Definition: The network of interconnected tubular and cisternal
## structures located within the Golgi apparatus on the side distal to
## the endoplasmic reticulum, from which secretory vesicles emerge.
## The trans-Golgi network is important in the later stages of protein
## secretion where it is thought to play a key role in the sorting and
## targeting of secreted proteins to the correct destination.
## Synonym: TGN
## Synonym: trans Golgi network
## Synonym: Golgi trans face
## Synonym: Golgi trans-face
## Synonym: late Golgi
## Synonym: maturing face
## Synonym: trans face
It is possible to observe different results with rols and GO.db, as a result of the different ways they access the data. rols or biomaRt perform direct online queries, while GO.db and other annotation packages use database snapshot that are updated every release.
Both approaches have advantages. While online queries allow to obtain
the latest up-to-date information, such approaches rely on network
availability and quality. If reproducibility is a major issue, the
version of the database to be queried can easily be controlled with
off-line approaches. In the case of rols, although the
load date of a specific ontology can be queried with olsVersion
, it
is not possible to query a specific version of an ontology.
rols 2.0 has substantially changed. While the table
below shows some correspondence between the old and new interface,
this is not always the case. The new interface relies on the
Ontology
/Ontologies
, Term
/Terms
and OlsSearch
classes, that
need to be instantiated and can then be queried, as described above.
version < 1.99 | version >= 1.99 |
---|---|
ontologyLoadDate |
olsLoaded and olsUpdated |
ontologyNames |
Ontologies |
olsVersion |
olsVersion |
allIds |
terms |
isIdObsolete |
isObsolete |
rootId |
olsRoot |
olsQuery |
OlsSearch and olsSearch |
Not all functionality is currently available. If there is anything that you need but not available in the new version, please contact the maintained by opening an issue on the package development site.
rols
version >= 2.99 has been refactored to use the OLS4 REST API.httr
.Term()
and Terms()
.Properties()
.The CVParam
class is used to handle controlled vocabulary. It can be
used for user-defined parameters
CVParam(name = "A user param", value = "the value")
## [, , A user param, the value]
or official controlled vocabulary (which triggers a query to the OLS service)
CVParam(label = "MS", accession = "MS:1000073")
## [MS, MS:1000073, electrospray ionization, ]
See ?CVParam
for more details and examples.
## R version 4.4.0 beta (2024-04-15 r86425)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.4 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.19-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] DT_0.33 rols_3.0.0 GO.db_3.19.1
## [4] AnnotationDbi_1.66.0 IRanges_2.38.0 S4Vectors_0.42.0
## [7] Biobase_2.64.0 BiocGenerics_0.50.0 BiocStyle_2.32.0
##
## loaded via a namespace (and not attached):
## [1] rappdirs_0.3.3 sass_0.4.9 RSQLite_2.3.6
## [4] digest_0.6.35 magrittr_2.0.3 evaluate_0.23
## [7] bookdown_0.39 fastmap_1.1.1 blob_1.2.4
## [10] jsonlite_1.8.8 GenomeInfoDb_1.40.0 DBI_1.2.2
## [13] BiocManager_1.30.22 httr_1.4.7 UCSC.utils_1.0.0
## [16] crosstalk_1.2.1 Biostrings_2.72.0 httr2_1.0.1
## [19] jquerylib_0.1.4 cli_3.6.2 rlang_1.1.3
## [22] crayon_1.5.2 XVector_0.44.0 bit64_4.0.5
## [25] cachem_1.0.8 yaml_2.3.8 tools_4.4.0
## [28] memoise_2.0.1 GenomeInfoDbData_1.2.12 curl_5.2.1
## [31] vctrs_0.6.5 R6_2.5.1 png_0.1-8
## [34] lifecycle_1.0.4 zlibbioc_1.50.0 KEGGREST_1.44.0
## [37] htmlwidgets_1.6.4 bit_4.0.5 pkgconfig_2.0.3
## [40] bslib_0.7.0 glue_1.7.0 xfun_0.43
## [43] knitr_1.46 htmltools_0.5.8.1 rmarkdown_2.26
## [46] compiler_4.4.0