Contents

Package: rols
Authors: Laurent Gatto lg390@cam.ac.uk, with contributions from Tiago Chedraoui Silva.
Modified: 2016-10-17 16:19:09
Compiled: Wed Jan 25 19:06:52 2017

1 Introduction

The Ontology Lookup Service (OLS) [1, 2] is originally spin-off of the PRoteomics IDEntifications database (PRIDE) service, located at the EBI, and is now developed and maintained by the Samples, Phenotypes and Ontologies team at EMBL-EBI.

The OLS provides a REST interface to hundreds of ontologies from a single location with a unified output format. The rols package make this possible from within R. Do do so, it relies on the httr package to query the REST interface, and access and retrieve data.

There are 162 ontologies available in the OLS, listed in the table below. Their name is to be use to defined which ontology to query.

2 A Brief rols overview

The rols package is build around a few classes that enable to query the OLS and retrieve, store and manipulate data. Each of these classes are described in more details in their respective manual pages. We start by loading the package.

library("rols")

2.1 Ontologies

The Ontology and Ontologies classes can store information about single of multiple ontologies. The latter can be easily subset using [ and [[, as one would for lists.

ol <- Ontologies()
ol
## Object of class 'Ontologies' with 162 entries
##    AEO, AERO ... OAE, OMP
head(olsNamespace(ol))
##        aeo       aero       agro   ancestro        apo       atol 
##      "aeo"     "aero"     "agro" "ancestro"      "apo"     "atol"
ol[["go"]]
## Ontology: Gene Ontology (go)  
##   An ontology for describing the function of genes and gene products
##    Loaded: 2017-01-25 Updated: 2017-01-25 Version: 2017-01-24 
##    48222 terms  66 properties  0 individuals

It is also possible to initialise a single ontology

go <- Ontology("go")
go
## Ontology: Gene Ontology (go)  
##   An ontology for describing the function of genes and gene products
##    Loaded: 2017-01-25 Updated: 2017-01-25 Version: 2017-01-24 
##    48222 terms  66 properties  0 individuals

2.2 Terms

Single ontology terms are stored in Term objects. When more terms need to be manipulated, they are stored as Terms objects. It is easy to obtain all terms of an ontology of interest, and the resulting Terms object can be subset using [ and [[, as one would for lists.

gotrms <- terms(go) ## or terms("go")
gotrms
## Object of class 'Terms' with 48223 entries
##  From the GO ontology
##   GO:0005230, GO:0015276 ... GO:0032942, GO:0019987
gotrms[1:10]
## Object of class 'Terms' with 10 entries
##  From the GO ontology
##   GO:0005230, GO:0015276 ... GO:0032852, GO:0005234
gotrms[["GO:0090575"]]
## A Term from the GO ontology: GO:0090575 
##  Label: RNA polymerase II transcription factor complex
##   A transcription factor complex that acts at promoters of genes
##   transcribed by RNA polymerase II.

It is also possible to initialise a single term

trm <- term(go, "GO:0090575")
termId(trm)
## [1] "GO:0090575"
termLabel(trm)
## [1] "RNA polymerase II transcription factor complex"
strwrap(termDesc(trm))
## [1] "A transcription factor complex that acts at promoters of genes"
## [2] "transcribed by RNA polymerase II."

It is then possible to extract the ancestors, descendants, parents and children terms. Each of these functions return a Terms object

parents(trm)
## Object of class 'Terms' with 1 entries
##  From the GO ontology
## GO:0044798
children(trm)
## Object of class 'Terms' with 20 entries
##  From the GO ontology
##   GO:0005675, GO:0005674 ... GO:0097221, GO:0036488

Similarly, the partOf and derivesFrom functions return, for an input term, the terms it is a part of and derived from.

2.3 Properties

Properties (relationships) of single or multiple terms or complete ontologies can be queries with the properties method, as briefly illustrated below.

trm <- term("uberon", "UBERON:0002107")
trm
## A Term from the UBERON ontology: UBERON:0002107 
##  Label: liver
##   An exocrine gland which secretes bile and functions in metabolism
##   of protein and carbohydrate and fat, synthesizes substances
##   involved in the clotting of the blood, synthesizes vitamin A,
##   detoxifies poisonous substances, stores glycogen, and breaks down
##   worn-out erythrocytes[GO].
p <- properties(trm)
p
## Object of class 'Properties' with 1011 entries
##  From the UBERON ontology
##   digestive gland, abdomen element ... liver bud, liver primordium
p[[1]]
## A Property from the UBERON ontology: UBERON:0006925 
##  Label: digestive gland
termLabel(p[[1]])
## [1] "digestive gland"

3 Use case

A researcher might be interested in the trans-Golgi network. Searching the OLS is assured by the OlsSearch and olsSearch classes/functions. The first step is to defined the search query with OlsSearch, as shown below. This creates an search object of class OlsSearch that stores the query and its parameters. In records the number of requested results (default is 20) and the total number of possible results (there are 13688 results across all ontologies, in this case). At this stage, the results have not yet been downloaded, as shown by the 0 responses.

OlsSearch(q = "trans-golgi network")
## Object of class 'OlsSearch':
##   query: trans-golgi network 
##   requested: 20 (out of 13688)
##   response(s): 0

13688 results are probably too many to be relevant. Below we show how to perform an exact search by setting exact = TRUE, and limiting the search the the GO ontology by specifying ontology = "GO", or doing both.

OlsSearch(q = "trans-golgi network", exact = TRUE)
## Object of class 'OlsSearch':
##   query: trans-golgi network 
##   requested: 20 (out of 7)
##   response(s): 0
OlsSearch(q = "trans-golgi network", ontology = "GO")
## Object of class 'OlsSearch':
##   ontolgy: GO 
##   query: trans-golgi network 
##   requested: 20 (out of 698)
##   response(s): 0
OlsSearch(q = "trans-golgi network", ontology = "GO", exact = TRUE)
## Object of class 'OlsSearch':
##   ontolgy: GO 
##   query: trans-golgi network 
##   requested: 20 (out of 1)
##   response(s): 0

One case set the rows argument to set the number of desired results.

OlsSearch(q = "trans-golgi network", ontology = "GO", rows = 200)
## Object of class 'OlsSearch':
##   ontolgy: GO 
##   query: trans-golgi network 
##   requested: 200 (out of 698)
##   response(s): 0

Alternatively, one can call the allRows function to request all results.

(tgnq <- OlsSearch(q = "trans-golgi network", ontology = "GO"))
## Object of class 'OlsSearch':
##   ontolgy: GO 
##   query: trans-golgi network 
##   requested: 20 (out of 698)
##   response(s): 0
(tgnq <- allRows(tgnq))
## Object of class 'OlsSearch':
##   ontolgy: GO 
##   query: trans-golgi network 
##   requested: 698 (out of 698)
##   response(s): 0

Let’s proceed with the exact search and retrieve the results. Even if we request the default 20 results, only the 7 relevant result will be retrieved. The olsSearch function updates the previously created object (called qry below) by adding the results to it.

qry <- OlsSearch(q = "trans-golgi network", exact = TRUE)
(qry <- olsSearch(qry))
## Object of class 'OlsSearch':
##   query: trans-golgi network 
##   requested: 20 (out of 7)
##   response(s): 7

We can now transform this search result object into a fully fledged Terms object or a data.frame.

(qtrms <- as(qry, "Terms"))
## Object of class 'Terms' with 7 entries
##  From  7 ontologies
##   OMIT:0020822, GO:0005802 ... GO:0005802, GO:0005802
(qdrf <- as(qry, "data.frame"))
##                                                 id
## 1 omit:http://purl.obolibrary.org/obo/OMIT_0020822
## 2     go:http://purl.obolibrary.org/obo/GO_0005802
## 3    cco:http://purl.obolibrary.org/obo/GO_0005802
## 4    nbo:http://purl.obolibrary.org/obo/GO_0005802
## 5   gexo:http://purl.obolibrary.org/obo/GO_0005802
## 6   rexo:http://purl.obolibrary.org/obo/GO_0005802
## 7   reto:http://purl.obolibrary.org/obo/GO_0005802
##                                           iri   short_form       obo_id
## 1 http://purl.obolibrary.org/obo/OMIT_0020822 OMIT_0020822 OMIT:0020822
## 2   http://purl.obolibrary.org/obo/GO_0005802   GO_0005802   GO:0005802
## 3   http://purl.obolibrary.org/obo/GO_0005802   GO_0005802   GO:0005802
## 4   http://purl.obolibrary.org/obo/GO_0005802   GO_0005802   GO:0005802
## 5   http://purl.obolibrary.org/obo/GO_0005802   GO_0005802   GO:0005802
## 6   http://purl.obolibrary.org/obo/GO_0005802   GO_0005802   GO:0005802
## 7   http://purl.obolibrary.org/obo/GO_0005802   GO_0005802   GO:0005802
##                 label ontology_name ontology_prefix       type
## 1 trans-Golgi Network          omit            OMIT      class
## 2 trans-Golgi network            go              GO      class
## 3 trans-Golgi network           cco             CCO      class
## 4 trans-Golgi network           nbo             NBO      class
## 5 trans-Golgi network          gexo            GeXO individual
## 6 trans-Golgi network          rexo            ReXO individual
## 7 trans-Golgi network          reto            ReTO individual
##   is_defining_ontology
## 1                 TRUE
## 2                 TRUE
## 3                FALSE
## 4                FALSE
## 5                FALSE
## 6                FALSE
## 7                FALSE
##                                                                                                                                                                                                                                                                                                                                                                                   description
## 1                                                                                                                                                                                                                                                                                                                                                                                        NULL
## 2 The network of interconnected tubular and cisternal structures located within the Golgi apparatus on the side distal to the endoplasmic reticulum, from which secretory vesicles emerge. The trans-Golgi network is important in the later stages of protein secretion where it is thought to play a key role in the sorting and targeting of secreted proteins to the correct destination.
## 3 The network of interconnected tubular and cisternal structures located within the Golgi apparatus on the side distal to the endoplasmic reticulum, from which secretory vesicles emerge. The trans-Golgi network is important in the later stages of protein secretion where it is thought to play a key role in the sorting and targeting of secreted proteins to the correct destination.
## 4 The network of interconnected tubular and cisternal structures located within the Golgi apparatus on the side distal to the endoplasmic reticulum, from which secretory vesicles emerge. The trans-Golgi network is important in the later stages of protein secretion where it is thought to play a key role in the sorting and targeting of secreted proteins to the correct destination.
## 5 The network of interconnected tubular and cisternal structures located within the Golgi apparatus on the side distal to the endoplasmic reticulum, from which secretory vesicles emerge. The trans-Golgi network is important in the later stages of protein secretion where it is thought to play a key role in the sorting and targeting of secreted proteins to the correct destination.
## 6 The network of interconnected tubular and cisternal structures located within the Golgi apparatus on the side distal to the endoplasmic reticulum, from which secretory vesicles emerge. The trans-Golgi network is important in the later stages of protein secretion where it is thought to play a key role in the sorting and targeting of secreted proteins to the correct destination.
## 7 The network of interconnected tubular and cisternal structures located within the Golgi apparatus on the side distal to the endoplasmic reticulum, from which secretory vesicles emerge. The trans-Golgi network is important in the later stages of protein secretion where it is thought to play a key role in the sorting and targeting of secreted proteins to the correct destination.

In this case, we can see that we actually retrieve the same term used across different ontologies. In such cases, it might be useful to keep only non-redundant term instances. In this case, this would have been equivalent to searching the omit, go ontology

qtrms <- unique(qtrms)
termOntology(qtrms)
## OMIT:0020822   GO:0005802 
##       "omit"         "go"
termNamespace(qtrms)
## $`OMIT:0020822`
## NULL
## 
## $`GO:0005802`
## [1] "cellular_component"

Below, we execute the same query using the GO.db package.

library("GO.db")
GOTERM[["GO:0005802"]]
## GOID: GO:0005802
## Term: trans-Golgi network
## Ontology: CC
## Definition: The network of interconnected tubular and cisternal
##     structures located within the Golgi apparatus on the side
##     distal to the endoplasmic reticulum, from which secretory
##     vesicles emerge. The trans-Golgi network is important in the
##     later stages of protein secretion where it is thought to play
##     a key role in the sorting and targeting of secreted proteins
##     to the correct destination.
## Synonym: Golgi trans face
## Synonym: Golgi trans-face
## Synonym: late Golgi
## Synonym: maturing face
## Synonym: TGN
## Synonym: trans face
## Synonym: trans Golgi network

4 On-line vs. off-line data

It is possible to observe different results with rols and r Biocpkg("GO.db"), as a result of the different ways they access the data. rols or r Biocpkg("biomaRt") perform direct online queries, while GO.db and other annotation packages use database snapshot that are updated every release.

Both approaches have advantages. While online queries allow to obtain the latest up-to-date information, such approaches rely on network availability and quality. If reproducibility is a major issue, the version of the database to be queried can easily be controlled with off-line approaches. In the case of rols, although the load date of a specific ontology can be queried with olsVersion, it is not possible to query a specific version of an ontology.

5 Changes in rols 2.0

rols 2.0 has substantially changed. While the table below shows some correspondence between the old and new interface, this is not always the case. The new interface relies on the Ontology/Ontologies, Term/Terms and OlsSearch classes, that need to be instantiated and can then be queried, as described above.

version < 1.99 version >= 1.99
ontologyLoadDate olsLoaded and olsUpdated
ontologyNames Ontologies
olsVersion olsVersion
allIds terms
isIdObsolete isObsolete
rootId olsRoot
olsQuery OlsSearch and olsSearch

Not all functionality is currently available. If there is anything that you need but not available in the new version, please contact the maintained by opening an issue on the package development site.

6 CVParams

The CVParam class is used to handle controlled vocabulary. It can be used for user-defined parameters

CVParam(name = "A user param", value = "the value")
## [, , A user param, the value]

or official controlled vocabulary (which triggers a query to the OLS service)

CVParam(label = "MS", accession = "MS:1000073")
## [MS, MS:1000073, electrospray ionization, ]
CVParam(label = "MS", name ="electrospray ionization")
## [MS, MS:1000073, electrospray ionization, ]
CVParam(label = "MS", name ="ESI") ## using a synonym
## [MS, MS:1000073, ESI, ]

See ?CVParam for more details and examples.

7 Session information

## R version 3.3.2 (2016-10-31)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.1 LTS
## 
## attached base packages:
## [1] parallel  stats4    stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
## [1] DT_0.2               rols_2.2.5           GO.db_3.4.0         
## [4] AnnotationDbi_1.36.1 IRanges_2.8.1        S4Vectors_0.12.1    
## [7] Biobase_2.34.0       BiocGenerics_0.20.0  BiocStyle_2.2.1     
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.9       knitr_1.15.1      magrittr_1.5     
##  [4] progress_1.1.2    R6_2.2.0          stringr_1.1.0    
##  [7] httr_1.2.1        tools_3.3.2       DBI_0.5-1        
## [10] htmltools_0.3.5   assertthat_0.1    yaml_2.1.14      
## [13] rprojroot_1.2     digest_0.6.11     htmlwidgets_0.8  
## [16] curl_2.3          memoise_1.0.0     evaluate_0.10    
## [19] RSQLite_1.1-2     rmarkdown_1.3     stringi_1.1.2    
## [22] prettyunits_1.0.2 backports_1.0.5   jsonlite_1.2