Overview

Pathways, reactions, and biological entities in Reactome knowledge are systematically represented as an ordered network of molecular reactions. Graph database technology is an effective tool for modeling highly connected data, hence Reactome’s relational database is imported in Neo4j to create one large interconnected graph. Instances are represented as nodes and relationships between nodes as edges.

The ReactomeGraph4R package is an R interface for retrieving data with network structure from the Reactome Graph Database. There is another R package, ReactomeContentService4R, for querying specific bits of information from the Reactome Database through the RESTful API in the Content Service.

ReactomeGraph4R is built on the Neo4j driver neo4r, thus returned data are mainly same as those called by neo4r but with a little modifications, and are in these two formats:

  • “row”: a list of results in dataframes
  • “graph”: a graph object with nodes and relationships information that can be used for visualization

This package will allow you to interact with the data in Reactome’s graph database in R, with the aim of minimizing the number of Neo4j Cypher queries that the user will need to perform. For example, if you wanted to retrieve any Reactome information associated with the hypothetical identifier ‘123456789’, you can use matchObject(id="123456789"), which would be equivalent to using the Cypher query MATCH (rgp:ReferenceGeneProduct) WHERE rgp.identifier = "123456789" RETURN rgp on the Reactome graph database.

Aside from performing basic Cypher queries and formatting the results as R objects, the package also contains functionality that can not be easily performed using Cypher. This includes finding hierarchical data of an instance (for example what Reactions and Pathways a Protein is found in), getting the entire Reaction sequence/context using preceding/following relationships, what role a PhysicalEntity plays in each of its associated Reactions (catalyst, regulator, input, etc.), searching for research papers that are cited in Reactome’s curations, and even displaying network data. Please read on to see detailed instructions for the ReactomeGraph4R package - it is a flexible package with plenty of useful functionality for the prospective R-Reactome user!

Setups

Follow this instruction to download and setup the Reactome Graph Database, then install ReactomeGraph4R package.

There are two questions needed to be answered for Neo4j server connection when loading the package. You can change the url port if it’s not 7474. And if the Neo4j authentication is required, the username and password are same as the ones to log in your local Neo4j database.

## Is the url 'http://localhost:7474'? (Yes/no/cancel) 
## Does Neo4J require authentication? (yes/No/cancel)

## Successfully connected to the local Reactome Graph Database v76!

Basic query

The basic function matchObject allows you to fetch Reactome objects using:

  • id: Reactome or non-Reactome identifier (e.g. UniProt id)
  • displayName: display name of an object
  • schemaClass: schema class
  • property: attributes of Reactome objects
  • relationship: relationship between two nodes

Moreover, you could specify the argument returnedAttributes for retrieving only a few attributes of the targeted object; species for specific species; and limit for the number of returned objects. Note that this function only returns “row” data.

Fetch by id

The “id” input can be either non-Reactome or Reactome identifiers. If you use a non-Reactome id, remember that you must also specify databaseName since the default one is “Reactome”. For example, to get the Reactome instance associated with a circadian rhythmic gene PER2:

## $databaseObject
##            schemaClass identifier   databaseName              displayName
## 1 ReferenceDNASequence       PER2 COSMIC (genes) COSMIC (genes):PER2 PER2
##       dbId geneName                                                     url
## 1 11509503     PER2 http://cancer.sanger.ac.uk/cosmic/gene/overview?ln=PER2

Now we know that the database name should be “COSMIC (genes)”! We can also try with a Reactome id “R-HSA-400219”:

## $databaseObject
##   schemaClass  speciesName     oldStId isInDisease releaseDate
## 1    Reaction Homo sapiens REACT_25088       FALSE  2010-09-21
##                                    displayName    stIdVersion   dbId
## 1 Beta-TrCP1 binds phosphorylated PER proteins R-HSA-400219.1 400219
##                                           name         stId category isInferred
## 1 Beta-TrCP1 binds phosphorylated PER proteins R-HSA-400219  binding       TRUE

For multiple ids, say you want to get more information for your significantly enriched pathways, you can use function multiObjects. The speedUp option can determine to use the doParallel method or not, details see ?multiObjects.

Fetch by name

Instances can also be fetched by their “displayNames”. Do note that spaces and symbols within the name are required. Here we focus on the complex SUMO1:TOP1 in nucleoplasm “SUMO1:TOP1 [nucleoplasm]” in C. elegans:

## $databaseObject
##   schemaClass            speciesName isInDisease              displayName
## 1     Complex Caenorhabditis elegans       FALSE SUMO1:TOP1 [nucleoplasm]
##       stIdVersion     dbId       name          stId
## 1 R-CEL-4641301.1 10549504 SUMO1:TOP1 R-CEL-4641301

Fetch by class

When retrieving instances belonging to one schema class, it’s better specify the argument limit as well for restricting the number of returned instances. For all available schema classes see Reactome Data Schema. For instance, to get 5 “EntitySets” in human and then return their display names and stId only:

Fetch by property

By specifying the property, nodes with the given property (or properties), which are actually attributes/slots of Reactome instances, could be returned. Let’s try to get instances that are chimeric and are in disease.

Fetch by relationship

The actual Cypher query for this command is MATCH (n1)-[r:relationship]->(n2) RETURN n1,n2, therefore the n1 and n2 dataframes in the returned list have the same number of rows, and every two rows with the same index are connected with the given relationship.

## $n1
##   schemaClass       speciesName isInDisease                        displayName
## 1     Complex      Homo sapiens       FALSE p-GFAP:EEF1A1 [lysosomal membrane]
## 2     Complex      Homo sapiens       FALSE p-GFAP:EEF1A1 [lysosomal membrane]
## 3     Complex Rattus norvegicus       FALSE p-Gfap:Eef1a1 [lysosomal membrane]
##       stIdVersion    dbId          name isChimeric          stId
## 1 R-HSA-9626070.1 9626070 p-GFAP:EEF1A1      FALSE R-HSA-9626070
## 2 R-HSA-9626070.1 9626070 p-GFAP:EEF1A1      FALSE R-HSA-9626070
## 3 R-RNO-9626031.1 9626031 p-Gfap:Eef1a1      FALSE R-RNO-9626031
## 
## $n2
##                     schemaClass       speciesName isInDisease
## 1 EntityWithAccessionedSequence      Homo sapiens       FALSE
## 2 EntityWithAccessionedSequence      Homo sapiens       FALSE
## 3 EntityWithAccessionedSequence Rattus norvegicus       FALSE
##                   displayName     stIdVersion    dbId   name          stId
## 1 p-GFAP [lysosomal membrane] R-HSA-9626054.1 9626054 p-GFAP R-HSA-9626054
## 2 EEF1A1 [lysosomal membrane] R-HSA-9626022.1 9626022 EEF1A1 R-HSA-9626022
## 3 p-Gfap [lysosomal membrane] R-RNO-9626029.1 9626029 p-Gfap R-RNO-9626029
##   startCoordinate        referenceType endCoordinate
## 1               1 ReferenceGeneProduct           432
## 2               1 ReferenceGeneProduct           462
## 3               1 ReferenceGeneProduct           430

MATCHing

These following functions in the MATCH family provide several commonly used cases that you might be interested in for Reactome data querying.

Hierarchy data

Reactome data are organized in a hierarchical way: Pathway --> Reaction --> PhysicalEntity, or sometimes it might be Pathway --> Reaction --> PhysicalEntity --> ReferenceEntity where the PhysicalEntity has links to external database information via the ReferenceEntity. You could retrieve the hierarchical data of a given Event (Pathway or Reaction) or Entity (PhysicalEntity or ReferenceEntity) using matchHierarchy. In this example, we’ll take a look at a RNA sequence (PhysicalEntity) “POU5F1 mRNA [cytosol]” with stable identifier “R-HSA-500358”:

## List of 4
##  $ physicalEntity:'data.frame':  1 obs. of  12 variables:
##  $ event         :'data.frame':  3 obs. of  13 variables:
##  $ upperevent    :'data.frame':  2 obs. of  16 variables:
##  $ relationships :'data.frame':  7 obs. of  9 variables:

The RNA sequence we specified is in the physicalEntity dataframe of the result list. It’s directly connected with those Events in the event dataframe, which are then connected with Events in the upperevent. Relationships between all these objects are in relationship dataframe:

Reactions in associated Pathway

This method can find all ReactionLikeEvents (RLEs) connected with a given Pathway by the relationship “hasEvent”. Additionally, the input can be a RLE, the result would be Pathway(s) linked via “hasEvent” together with other RLEs linked with the Pathways(s). Here we focus on a RLE “OAS1 oligomerizes” with identifier “R-HSA-8983688”.

## List of 4
##  $ reactionLikeEvent     :'data.frame':  1 obs. of  12 variables:
##  $ pathway               :'data.frame':  1 obs. of  14 variables:
##  $ otherReactionLikeEvent:'data.frame':  14 obs. of  12 variables:
##  $ relationships         :'data.frame':  15 obs. of  9 variables:

otherReactionLikeEvent are RLEs other than “OAS1 oligomerizes” connected with Pathway “OAS antiviral response”.

##  [1] "OAS1 binds viral dsRNA"                        
##  [2] "RNASEL cleaves viral ssRNA"                    
##  [3] "OAS2 binds viral dsRNA"                        
##  [4] "RNASEL cleaves cellular ssRNA"                 
##  [5] "OAS2 produces oligoadenylates"                 
##  [6] "ABCE1 binds RNASEL"                            
##  [7] "PDE12 cleaves 2'-5' oligoadenylates "          
##  [8] "Viral 2',5'-PDE cleaves 2'-5' oligoadenylates "
##  [9] "OAS3 binds viral dsRNA"                        
## [10] "RNASEL binds 2'-5' oligoadenylate"             
## [11] "OAS3 produces oligoadenylates"                 
## [12] "OAS1 produces oligoadenylates"                 
## [13] "OASL binds DDX58"                              
## [14] "OAS2 dimerizes"

The contect of these Events can actually be visualized in R using the exportImage function from the ReactomeContentService4R package! And it looks the same as that in Pathway Browser. To get the pathway diagram of Pathway “OAS antiviral response” (stId: R-HSA-8983711) that we just retrieved, and highlight the RLE (stId: R-HSA-8983688) that we specified:

## Connecting...welcome to Reactome v80!
## Warning in knitr::include_graphics(tmp): It is highly recommended to
## use relative paths for images. You had absolute paths: "/tmp/RtmpZ9rqWz/
## Rbuildcd699327716ce/ReactomeGraph4R/vignettes/Introduction_files/figure-html/
## export-img-1.png"

Preceding/following Events

With the diagram shown above, we can see that the Reaction highlighted in blue is in the middle of a Reaction cascade, with other RLEs immediately preceding and following it. In order to know what these preceding and following Reactions are, we can use function matchPrecedingAndFollowingEvents to find RLEs linked via “precedingEvent”. The argument depth is used to describe the “variable length relationships”, the default value is 1 (i.e. immediately connected); or you can set all.depth = TRUE for retrieving the whole context. Details see ?matchPrecedingAndFollowingEvents.

## List of 4
##  $ precedingEvent:'data.frame':  1 obs. of  12 variables:
##  $ event         :'data.frame':  1 obs. of  12 variables:
##  $ followingEvent:'data.frame':  2 obs. of  12 variables:
##  $ relationships :'data.frame':  3 obs. of  9 variables:

Referrals

Usually we query data in a way like parent to child (parent) --> (child), where we provide information about the parent. But with the Graph Database, we are able to search in a reverse direction that is child to parent (parent) <-- (child) with child’s information only. This “child-to-parent” relationship is called Referral. You could carry out the referral fetching by matchReferrals that supports Classes “Event”, “PhysicalEntity”, “Regulation”, “CatalystActivity”, “ReferenceEntity”, “Interaction”, “AbstractModifiedResidue”. Depth related arguments could also be specified here. More details sees ?matchReferrals.

We would look at a Regulation “Negative gene expression regulation by ’EGR2 [nucleoplasm]” with dbId “6810147”:

## $Regulation
##                        schemaClass
## 1 NegativeGeneExpressionRegulation
##                                                   displayName     stIdVersion
## 1 Negative gene expression regulation by 'EGR2 [nucleoplasm]' R-HSA-6810147.1
##      dbId          stId
## 1 6810147 R-HSA-6810147
## 
## $databaseObject
##     schemaClass               displayName     stIdVersion    dbId          stId
## 1 BlackBoxEvent HOXB1 gene is transcribed R-HSA-5617454.3 5617454 R-HSA-5617454
##    speciesName isInDisease releaseDate                      name isChimeric
## 1 Homo sapiens       FALSE  2015-12-15 HOXB1 gene is transcribed      FALSE
##   category isInferred
## 1  omitted       TRUE
## 
## $relationships
##   neo4jId        type startNode.neo4jId startNode.dbId startNode.schemaClass
## 1 5505941 regulatedBy           1330001        5617454         BlackBoxEvent
##   endNode.neo4jId endNode.dbId              endNode.schemaClass properties
## 1         1330002      6810147 NegativeGeneExpressionRegulation       1, 4

The dbId of endNode (endNode.dbId in $relationships) is exactly the dbId we just specified.

Interactors

Interactions of a PhysicalEntity (PE) could be retrieved by matchInteractors. This method begins with finding the ReferenceEntity matched with the PE, then get the Interactions having “interactor” relationship with the ReferenceEntity. For example, to get interactions of “FANCM [nucleoplasm]” with stable id “R-HSA-419535”:

## List of 4
##  $ physicalEntity :'data.frame': 1 obs. of  12 variables:
##  $ referenceEntity:'data.frame': 1 obs. of  17 variables:
##  $ interaction    :'data.frame': 7 obs. of  8 variables:
##  $ relationships  :'data.frame': 8 obs. of  9 variables:

PhysicalEntity roles

The roles of PhysicalEntities include “input”, “output”, “regulator”, “catalyst”, which are represented as relationships “input” ,“output”, “regulatedBy”, “catalystActivity” respectively. Therefore, we could retrieve instances that are possibly connected with the given PhysicalEntity via these relationships, and see the exact role(s) from the existing relationships. We’ll take a look at a Polymer “HSBP1 oligomer [cytosol]” and input it into matchPEroles. Either id or displayName could be specified.

## List of 3
##  $ physicalEntity:'data.frame':  6 obs. of  10 variables:
##  $ databaseObject:'data.frame':  6 obs. of  12 variables:
##  $ relationships :'data.frame':  6 obs. of  9 variables:
## [1] "output"

Diseases

Diseases related to a PhysicalEntity or an Event could be found using function matchDisease. In reverse, you can also get PhysicalEntities/Events associated with a Disease.

## $disease
##   schemaClass identifier               synonym databaseName displayName    dbId
## 1     Disease        870 peripheral neuropathy         DOID  neuropathy 9635395
##         name                                                         definition
## 1 neuropathy A nervous system disease that is located_in nerves or nerve cells.
##                                                               url
## 1 https://www.ebi.ac.uk/ols/ontologies/doid/terms?obo_id=DOID:870
## 
## $databaseObject
##    schemaClass                        displayName    dbId        name
## 1 ChemicalDrug pralidoxime [extracellular region] 9635003 pralidoxime
##   isInDisease     stIdVersion          stId
## 1        TRUE R-ALL-9635003.1 R-ALL-9635003
## 
## $relationships
##   neo4jId    type startNode.neo4jId startNode.dbId startNode.schemaClass
## 1 1031748 disease            263887        9635003          ChemicalDrug
##   endNode.neo4jId endNode.dbId endNode.schemaClass properties
## 1          263890      9635395             Disease       1, 0

Papers

Given the PubMed id or the title for a paper, Reactome instances related to this paper could be found by matchPaperObjects. The DatabaseObjects are connected with the LiteratureReference (i.e. paper) via “literatureReference” relationship. Let’s try with a paper “Aggresomes: a cellular response to misfolded proteins”.

## $literatureReference
##   volume         schemaClass       journal   pages year
## 1    143 LiteratureReference J. Cell Biol. 1883-98 1998
##                                             displayName    dbId
## 1 Aggresomes: a cellular response to misfolded proteins 9646681
##   pubMedIdentifier                                                 title
## 1          9864362 Aggresomes: a cellular response to misfolded proteins
## 
## $databaseObject
##   schemaClass                                               displayName    dbId
## 1    Reaction PolyUb-Misfolded proteins bind vimentin to form aggresome 9646679
##    speciesName isInDisease releaseDate     stIdVersion
## 1 Homo sapiens       FALSE  2019-12-10 R-HSA-9646679.2
##                                                        name isChimeric
## 1 PolyUb-Misfolded proteins bind vimentin to form aggresome      FALSE
##            stId category isInferred
## 1 R-HSA-9646679  binding      FALSE
## 
## $relationships
##   neo4jId                type startNode.neo4jId startNode.dbId
## 1   46368 literatureReference             12675        9646679
##   startNode.schemaClass endNode.neo4jId endNode.dbId endNode.schemaClass
## 1              Reaction           12676      9646681 LiteratureReference
##   properties
## 1       1, 0

Network graph

The ability to view network graphs is definitely a big advantage of a graph database. Fortunately, R has developed into a powerful tool for network analysis. There are a number of R packages targeted network analysis and visualization, therefore we are able to get a graph just like the one in the Neo4j server, and even to set more visualization options!

Don’t forget that results can also be returned in the “graph” format, which are used to create the network visualization in R! This comprehensive tutorial - Network visualization with R (Ognyanova, K., 2019) - walks through each step on the creation of network graphs in R.

Here we will show a couple of examples to generate an interactive network graph after retrieving the specific Reactome graph data. Let’s say we want to visualize the hierarchical data of a ReferenceEntity “UniProt:P33992 MCM5”.

First install and load the following packages.

visNetwork

We will try the visNetwork package which visualizes networks using vis.js javascript library.

We are going to change the visual parameters of nodes and edges by adding them as columns in the dataframes. More customizations see the visNetwork documentation or ?vignette("Introduction-to-visNetwork").

Add a drop-down menu:

networkD3

We can also take a look at another package networkD3, which generates network graphs using D3 javascript library.

To modify the forceNetwork graph, one can execute custom javascript code with the htmlwidgets R package, but it won’t be discussed here.

SessionInfo

## R version 4.2.0 RC (2022-04-19 r82224)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.4 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.15-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.15-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] wesanderson_0.3.6              networkD3_0.4                 
## [3] visNetwork_2.1.0               stringr_1.4.0                 
## [5] ReactomeContentService4R_1.4.0 ReactomeGraph4R_1.4.0         
## 
## loaded via a namespace (and not attached):
##  [1] tidyselect_1.1.2  xfun_0.30         bslib_0.3.1       purrr_0.3.4      
##  [5] vctrs_0.4.1       generics_0.1.2    htmltools_0.5.2   yaml_2.3.5       
##  [9] utf8_1.2.2        rlang_1.0.2       jquerylib_0.1.4   later_1.3.0      
## [13] pillar_1.7.0      glue_1.6.2        DBI_1.1.2         foreach_1.5.2    
## [17] lifecycle_1.0.1   htmlwidgets_1.5.4 codetools_0.2-18  evaluate_0.15    
## [21] knitr_1.38        fastmap_1.1.0     doParallel_1.0.17 httpuv_1.6.5     
## [25] curl_4.3.2        parallel_4.2.0    fansi_1.0.3       highr_0.9        
## [29] Rcpp_1.0.8.3      xtable_1.8-4      promises_1.2.0.1  magick_2.7.3     
## [33] jsonlite_1.8.0    mime_0.12         png_0.1-7         digest_0.6.29    
## [37] stringi_1.7.6     dplyr_1.0.8       shiny_1.7.1       getPass_0.2-2    
## [41] cli_3.3.0         tools_4.2.0       magrittr_2.0.3    sass_0.4.1       
## [45] tibble_3.1.6      crayon_1.5.1      tidyr_1.2.0       pkgconfig_2.0.3  
## [49] ellipsis_0.3.2    data.table_1.14.2 attempt_0.3.1     neo4r_0.1.1      
## [53] assertthat_0.2.1  rmarkdown_2.14    httr_1.4.2        rstudioapi_0.13  
## [57] iterators_1.0.14  R6_2.5.1          igraph_1.3.1      compiler_4.2.0