Contents

1 Introduction

Networks are a powerful and flexible methodology for expressing biological knowledge for computation and communication. Biological networks can hold a variety of different types of information, like genetic or metabolic interactions, gene, and transcriptional regulation, protein-protein interaction (PPI), or cell signaling networks and pathways. They often form a valuable resource for hypothesis generation and further investigations, and in the course of the analyses, they are processed and enriched with additional information from experiments. As a result further networks are generated, whether as intermediate results that should be documented in the process, as the outcome of those analyses, or as visual representations and illustrations used in reports and publications. As a consequence, these resulting networks do not follow anymore the strict rules the source networks were subjected, therefore a more flexible format is needed to capture their content.

In addition, suitable solutions for transmission conflict with those for storage, or usage in applications and analyses. Therefore, seamless conversion between those different formats becomes as important as the data itself.

1.1 The Cytoscape Exchange (CX)

A possible solution as flexible transmission format is provided by the Cytoscape Exchange (CX) format. CX is a JSON-based, aspect-oriented data structure, which means that the network is divided into several independent modules (“aspects”). This way, every aspect of a network, meaning nodes, edges, its attributes, and visual representations can be handled individually without interference. Each aspect has its own schema for its contained information, and links between aspects are realized by referencing the unique internal IDs of other aspects. The CX data model was developed by the NDEx project, in collaboration with the Cytoscape Consortium (http://www.cytoscapeconsortium.org/) as a transmission format between their tools, and since adopted by many others. More details about the CX data model can be found on its documentation website: https://home.ndexbio.org/data-model/

1.2 The NDEx platform

The Network Data Exchange, or NDEx, is an online commons for biological networks (Pratt et al., 2015, Cell Systems 1, 302-305, October 28, 2015 ©2015 Elsevier Inc. ScienceDirect). It is an open-source software framework to manipulate, store, and exchange networks of various types and formats. NDEx can be used to upload, share and publicly distribute networks while providing an output in formats, that can be used by plenty of other applications.

The public NDEx server is a network data commons that provides pathway collections like the Pathway Interaction Database of the NCI (http://www.ndexbio.org/#/user/301a91c6-a37b-11e4-bda0-000c29202374) and the Cancer Cell Maps Initiative (http://www.ndexbio.org/#/user/b47268a6-8112-11e6-b0a6-06603eb7f303). Public networks can be searched and retrieved from the platform for further use. Own networks can be uploaded and shared with certain collaborators or groups privately or provided publicly to the community. Furthermore, private installation of the NDEx platform can be used to store and collaborate on networks locally.

The ndexr package available on Bioconductor (https://doi.org/doi:10.18129/B9.bioc.ndexr) allows connecting with the NDEx platform from within R. This package provides an interface to query the public NDEx server, as well as private installations, to upload, download or modify biological networks.

1.3 Cytoscape

The most prominent software environment for biological network analysis and visualization is Cytoscape (https://cytoscape.org/). It provides support for large networks and comes with a rich set of features for custom visualization, and advanced layout and analysis algorithms. One of these visualization features is the “attribute-to-visual mapping”, where the network’s data translates to its visual representation. Based on this visualization strategy, Cytoscape contributed aspects to the CX-format to capture the visual representation as part of the network. Because of these aspects, the visualization not only can be documented along with the network, but also reproduced on other platforms, and even shared between networks with the same attributes used for creating the visualization.

1.4 RCX - an adaption of the CX format

CX is a JSON-based data structure designed as a flexible model for transmitting networks with a focus on flexibility, modularity, and extensibility. Although those features are widely used in common REST protocols they don’t quite fit the R way of thinking about data.

This package provides an adaption of the CX format to standard R data formats and types to create and modify, load, export, and visualize those networks. This document aims to help the user to install and benefit from the wide range of functionality of this implementation. For an overview of the differences of the RCX implementation to the CX specification see Appendix: The RCX and CX Data Model

2 Installation

For installing packages from github the devtools package is the most common approach. However, it requires XML libraries installed on the system which can cause problems while installation due to unmet dependencies. The remotes package covers the functionality to download and install R packages stored in ‘GitHub’, ‘GitLab’, ‘Bitbucket’, ‘Bioconductor’, or plain ‘subversion’ or ‘git’ repositories without depending on XML libraries. If devtools is already installed, of course it can be used, otherwise it is recommended to use the lightweight remotes package.

From github using remotes:

if (!"remotes" %in% installed.packages()) {
    install.packages("remotes")
}
if (!"RCX" %in% installed.packages()) {
    remotes::install_github("frankkramer-lab/RCX")
}
library(RCX)

From github using devtools:

if (!"devtools" %in% installed.packages()) {
    install.packages("devtools")
}
if (!"RCX" %in% installed.packages()) {
    devtools::install_github("frankkramer-lab/RCX")
}
library(RCX)

3 The basics

In the following, it will be explained, how to read and write networks from/to CX files, create and modify RCX networks, validate its contents and finally visualize them.

3.1 Read and write CX files

Networks can be downloaded from the NDEx plattform as CX files in JSON format. Those files can be read, and are automatically transformed into RCX networks that can be used in R. Here we load a provided example network from file:

cxFile <- system.file(
  "extdata", 
  "Imatinib-Inhibition-of-BCR-ABL-66a902f5-2022-11e9-bb6a-0ac135e8bacf.cx", 
  package = "RCX"
)

rcx = readCX(cxFile)

This network also can be accessed and downloaded from NDEx at https://www.ndexbio.org/viewer/networks/66a902f5-2022-11e9-bb6a-0ac135e8bacf

RCX networks can be saved in a similar manner:

writeCX(rcx, "path/to/some-file.cx")

However, there might some errors occur while reading CX file. This might happen, because the definition of the CX has changed over time, and so the definition of some aspects. Therefore it is possible, that there are still some networks stored at the NDEx platform following a deprecated format. In those cases it might be helpful to process the CX network step by step:

1. just read the JSON content without parsing

json <- readJSON(cxFile)

substr(json, 1, 77)
## [{"numberVerification":[{"longNumber":281474976710655}]},{"metaData":[{"name"

This also allows to handle a CX network in JSON format, even if it comes from a different source instead of a file.

2. parse the JSON

aspectList <- parseJSON(json)

str(aspectList, 2)
## List of 12
##  $ :List of 1
##   ..$ numberVerification:List of 1
##  $ :List of 1
##   ..$ metaData:List of 9
##  $ :List of 1
##   ..$ provenanceHistory:List of 1
##  $ :List of 1
##   ..$ nodes:List of 75
##  $ :List of 1
##   ..$ edges:List of 159
##  $ :List of 1
##   ..$ networkAttributes:List of 10
##  $ :List of 1
##   ..$ nodeAttributes:List of 1129
##  $ :List of 1
##   ..$ edgeAttributes:List of 229
##  $ :List of 1
##   ..$ cartesianLayout:List of 75
##  $ :List of 1
##   ..$ cyVisualProperties:List of 3
##  $ :List of 1
##   ..$ cyHiddenAttributes:List of 1
##  $ :List of 1
##   ..$ status:List of 1

The result of the parsing are nested lists containing all aspects and its contents. This format not easy to handle in R, but allows error correction previous to forming aspect and RCX objects.

3. process the aspect data

rcx <- processCX(aspectList)

All the above function for processing CX networks come with an option to show the performed steps. This might be helpful for finding occurring errors:

rcx <- readCX(cxFile, verbose = TRUE)
## Read file "/tmp/RtmpdSUcei/Rinst32101366321677/RCX/extdata/Imatinib-Inhibition-of-BCR-ABL-66a902f5-2022-11e9-bb6a-0ac135e8bacf.cx"...done!
## Parse json...done!
## Parsing nodes...create aspect...done!
## Create RCX from parsed nodes...done!
## Parsing edges...create aspect...done!
## Add aspect "edges" to RCX...done!
## Parsing node attributes...create aspect...done!
## Add aspect "nodeAttributes" to RCX...done!
## Parsing edge attributes...create aspect...done!
## Add aspect "edgeAttributes" to RCX...done!
## Parsing network attributes...create aspect...done!
## Add aspect "networkAttributes" to RCX...done!
## Parsing cartesian layout...create aspect...done!
## Add aspect "cartesianLayout" to RCX...done!
## Parsing Cytoscape visual properties...done!
## - Create sub-objects...done!
## - Create aspect...done!
## Add aspect "cyVisualProperties" to RCX...done!
## Parsing Cytoscape hidden attributes...create aspect...done!
## Add aspect "cyHiddenAttributes" to RCX...done!
## Parsing meta-data...done!
## Ignore "numberVerification" aspect, not necessary in RCX!
## Can't process aspect "numberVerification", so skip it...done!
## Don't know what to do with a "provenanceHistory" aspect!
## Can't process aspect "provenanceHistory", so skip it...done!
## Ignore "status" aspect, not necessary in RCX!
## Can't process aspect "status", so skip it...done!

This shows, that some aspects that are contained in the CX file are ignored while creating the RCX network. Those are for example aspects needed for transmission of the CX (status, numberVerification), or deprecated aspects (provenanceHistory). For more details about the differences in aspects see Appendix: The RCX and CX Data Model.

3.2 Explore the RCX object

The simplest way to have a look at the content of an RCX object is by printing it:

print(rcx)
## OR:
rcx

However, especially for large networks this can produce long and hardly readable output. To get a better overview of the contained aspects, the mandatory and automatically generated metaData aspect provides information about the contained aspects. This includes information about the number of elements or the highest used ID, if an aspect uses internal IDs:

rcx$metaData
## Meta-data:
##                 name version idCounter elementCount consistencyGroup
## 1              nodes     1.0     11551           75                1
## 2              edges     1.0     11554          159                1
## 3     nodeAttributes     1.0        NA         1129                1
## 4     edgeAttributes     1.0        NA          229                1
## 5  networkAttributes     1.0        NA           10                1
## 6    cartesianLayout     1.0        NA           75                1
## 7 cyVisualProperties     1.0        NA            3                1
## 8 cyHiddenAttributes     1.0        NA            1                1

Besides exploring the RCX-object manually, a summary of the object, or single aspects, can provide more insight on them:

summary(rcx$nodeAttributes)
##       propertyOf        name               value    
##  Total     : 1129   Length:1129        Boolean:185  
##  Unique ids:   75   Unique:34          Double : 92  
##  Min.      :11321   Class :character   Integer: 68  
##  Max.      :11551                      String :784

We already can quickly see that there are many different node attributes are used. The different node attributes are:

unique(rcx$nodeAttributes$name)
##  [1] "sbo"                   "metaId"                "compartmentCode"      
##  [4] "sbml type"             "sbml id"               "reversible"           
##  [7] "sbml compartment"      "cyId"                  "label"                
## [10] "sbml type ext"         "fast"                  "constant"             
## [13] "units"                 "boundaryCondition"     "derivedUnits"         
## [16] "sbml initial amount"   "value"                 "substanceUnits"       
## [19] "hasOnlySubstanceUnits" "uniprot"               "inchikey"             
## [22] "chebi"                 "biocyc"                "cas"                  
## [25] "chemspider"            "kegg.compound"         "inchi"                
## [28] "pubchem.compound"      "hmdb"                  "scale"                
## [31] "exponent"              "kind"                  "multiplier"           
## [34] "unitSid"

3.3 Visualize the network

This package provides simple functions to visualize the network encoded in the RCX object.

visualize(rcx)

Imatinib Inhibition of BCR-ABL (66a902f5-2022-11e9-bb6a-0ac135e8bacf)

The visualization also utilizes the same JavaScript library as the NDEx platform. Therefore the visual result is the same as when the network is uploaded to the NDEx platform. Additionally, this allows the visualization to be exported as a single HTML file, which can directly be hosted on a web-server or included in existing websites:

writeHTML(rcx, "path/to/some-file.html")

Networks with many nodes and edges, or those without a provided layout, often are difficult to interpret visually. To untangle the “hairball” it is possible to apply different layout option provided by the Cytoscape.js framework. A force driven layout is a good starting point in those cases. For demonstration purposes, let’s delete the visual layout of our network first:

## save them for later
originalVisualProperties <- rcx$cyVisualProperties

## and delete them from the RCX network
rcx$cyVisualProperties <- NULL
rcx <- updateMetaData(rcx)

rcx$metaData
## Meta-data:
##                 name version idCounter elementCount consistencyGroup
## 1              nodes     1.0     11551           75                1
## 2              edges     1.0     11554          159                1
## 3     nodeAttributes     1.0        NA         1129                1
## 4     edgeAttributes     1.0        NA          229                1
## 5  networkAttributes     1.0        NA           10                1
## 6    cartesianLayout     1.0        NA           75                1
## 7 cyHiddenAttributes     1.0        NA            1                1

Lets have a look at the visualization now:

visualize(rcx, layout = c(name = "cose"))