1 About

SBGNview is a tool set for visualizing omics data on pathway maps and pathway related data analysis. Pathway is rendered with community standard notation: Systems Biology Graphical Notation (SBGN)(Le Novere et al. 2009). Given an omics data table and a pathway file (SBGN-ML format with layout information), SBGNview can display omics data as colors on glyphs and output image files. For omics data, SBGNview supports automatic ID mapping of common gene/protein/compound ID types (e.g. Entrez Gene ID, UNIPROT, ChEBI etc.). For pathway files, SBGNview can automatically retrieve SBGN-ML files from common pathway databases (e.g. Reactome, MetaCyc, SMPDB, PANTHER, METACROP etc.). To support visualizing multiple types of data on the same glyph/arc, SBGNview provides extensive options to control glyph and edge features (e.g. color, line width, label color/size etc.). To facilitate pathway based analysis, SBGNview can search for pathways by keywords, extract node information (e.g. gene set, compound set) and highlight shortest path between two nodes etc.

2 Introduction

Molecular pathways have been widely used in omics data analysis. We previously developed an R/BioConductor package called Pathview, which maps, integrates and visualizes omics data onto KEGG pathway graphs(Luo and Brouwer 2013). Since its publication, Pathview has been widely used in numerous omics studies and analysis tools. Here we introduce the SBGNview package, which adopts Systems Biology Graphical Notation (SBGN)(Le Novere et al. 2009) and greatly extends the Pathview project by supportting multiple major pathway databases besides KEGG.

Key features:

Pathway maps use different glyphs shapes to represent different types of molecules (macromolecules or simple chemicals) and different arc shapes to represent different reaction types (consumption or catalysis), collectively, they are called graphical notations. SBGN is a community developed notation standard and has been used by major pathway databases (e.g. Reactome, SMPDB, PANTHER pathways, MetaCrop etc.). For details about SBGN, please check http://sbgn.github.io/sbgn

As a community standard, SBGN is adopted by major pathway databases, including Reactome, Panther, PathwayCommons, MetaCrop etc. Therefore, users can use SBGNview to visualize and interpret their omics data on any pathways from these databases. In additions, molecular biologists often summarize new discoveries or literature knowledge in pathways, and create their own pathway maps. This can be done in SBGN editing/drawing tools: https://sbgn.github.io/software and the pathways can be saved as SBGN-ML files as input for SBGNview. This makes SBGNview much more flexible than existing tools such as Pathview and PaintOmics, which only support KEGG pathways.

Like Pathview, SBGNview supports multiple samples for each gene/compound. In addition, it provides rich options to control glyph/arc attributes such as line color/width and text size/color/wrapping/positioning etc. This gives users maximal control over the pathway graphs.

3 Installation

3.1 Install SBGNview

Install SBGNview through Bioconductor.

BiocManager::install(c("SBGNview"))

4 Overview

To visualize omics data on SBGN pathway map, we need two inputs:

  1. A SBGN-ML file containing the pathway information: nodes,edges and their layout (coordidnates).

  2. An omics data table in which rows are genes/compounds and columns are different measurements. The measurements can be any numeric values, such as fold change, abundance etc.

Given these two inputs, SBGNview will parse SBGN-ML file and render a .svg graph with SBGN notation, then displays omics data of each gene/compound on its corresponding glyph on the SBGN map. Each measured value will be displayed as a color corresponding to the value. When there are multiple samples/measurements for each molecule, nodes are divided into multiple slices correspondingly. The output images can be in SVG, PDF, PNG or PS format.

4.1 A quick example

A quick example to visualize a demo gene expression dataset on pathway “Adrenaline and noradrenaline biosynthesis” and highlight several interesting nodes, edges and path.

# load demo dataset and pathway information of built-in collection of SBGN-ML files. 
# We use a cancer microarray dataset 'gse16837.d' from package 'pathview'
library(SBGNview)
data("gse16873.d","pathways.info","sbgn.xmls")
# search for pathways with user defined keywords
input.pathways <- findPathways("Adrenaline and noradrenaline biosynthesis")
# render SBGN pathway graph and output image files
SBGNview.obj <- SBGNview(
          gene.data = gse16873.d[,1:3], 
          gene.id.type = "entrez",
          input.sbgn = input.pathways$pathway.id,
          output.file = "quick.start", 
          output.formats = c("png")
          ) 
print(SBGNview.obj)

Two image files (a .svg file and a .pdf file) will be created in the current working directory:

list.files( pattern = "quick.start", full.names = TRUE) 
\label{fig:quickStartFig}Quick start example: Adrenaline and noradrenaline biosynthesis pathway.

Figure 4.1: Quick start example: Adrenaline and noradrenaline biosynthesis pathway.

Link to SBGN notation

In this example, the original pathway SBGN-ML file is from pathwayCommons with improved layout(node-edge overlaps are removed by routed edges).

We can highlight nodes, edges and path:

outputFile(SBGNview.obj) <- "quick.start.highlight.elements"
SBGNview.obj + 
        highlightArcs(class = "production",color = "red") +
        highlightArcs(class = "consumption",color = "blue") +
        highlightNodes(node.set = c("tyrosine", "(+-)-epinephrine"),
                       stroke.width = 4, stroke.color = "green") + 
        highlightPath(from.node = "tyrosine", to.node = "dopamine",
                      from.node.color = "green",
                      to.node.color = "blue",
                      shortest.paths.cols = "purple",
                      input.node.stroke.width = 6,
                      path.node.stroke.width = 5,
                      path.node.color = "purple",
                      path.stroke.width = 5,
                      tip.size = 10 )
\label{fig:quickStartFigModified}Quick start example: Adrenaline and noradrenaline biosynthesis pathway. Highlight nodes and edges.

Figure 4.2: Quick start example: Adrenaline and noradrenaline biosynthesis pathway. Highlight nodes and edges.

The color of consumption arcs and production arcs are set to blue and red, respectively.

Tyrosine and epinephrine are highlighted by thicker border (stroke width) and green color. Note that there are four nodes mapped to (+-)-epinephrine.

A shortest path from tyrosine to dopamine is highlighted with purple arcs and nodes. The start (Tyrosine) and end (epinephrine) nodes have thicker border and different colors. Since there are multiple dopamines in the map, a random dopamine node is selected. If user wants a specific node, function changeIds can help find the node IDs corresponding to input IDs. Then user can run highlightNodes and/or highlightPath again with “node IDs” instead of “compound name”. See this example for details.

5 Getting started

SBGNview is the main function to overylay omics data on SBGN pathway maps. It extracts node and edge data from SBGN-ML file and creates a SBGN graph in SVG format. Then it maps omics data to the glyphs and renders the graph with mapped data as colors. Currently it maps gene/protein omics data to “macromolecule” glyphs and maps compound omics data to “simple chemical” glyphs. Please see the documentation of function SBGNview for more details. The SBGNview function returns a SBGNview object, it contains information necessary to render SBGN graph and can be further modified to change graph features. See this section for more details.

5.1 SBGN pathway file (SBGN-ML)

SBGN pathway is defined in a special XML format (SBGN-ML file). It contains information of the pathway content (molecules and their reactions) as well as graph layout information. There are two main types of data in SBGN-ML files: 1. node data (in tag “glyph”), such as node location, width, hight and node class (macromolecule, simple chemical etc.). 2. edge data (in tag “arc”), such as arc class, start node and end node. For more details, see: https://github.com/sbgn/sbgn/wiki/SBGN_ML

5.1.1 SBGN-ML pathway file from online databases

Several online databases provide SBGN-ML files, such as pathwayCommons, Reactome and MetaCrop. They can be downloaded from their webpage or FTP site.

5.1.2 SBGNview’s SBGN-ML file collection

Many pathways from the above databases don’t have desirable layout and often have extensive node-node overlaps and node-edge crossings. Thus we refined the layout and removed node-node overlaps. For node-edge crossings, we computed spline edges to resolve this issue and added additional elements in the SBGN-ML file to encode spline edges. The resulting collection of SBGN-ML files are available in a separate GitHub repository. SBGNview can automatically search in this pathway collection and download the SBGN-ML files. Users can further modify the SBGN-ML files using other tools (e.g. newt editor) for desired node layout. The package used to layout nodes and route spline edges is currently under development and will be released in the near future.

5.1.2.1 Information about pre-generated SBGN-ML file collection

We can check the information of all pre-generated SBGN-ML files

data("pathways.stat")
gse16873 <- gse16873.d[,1:3]
input.pathway.ids = input.pathways$pathway.id
head(pathways.info)
##                                            file.name       database
## 1 http___identifiers.org_panther.pathway_P00001.sbgn pathwayCommons
## 2 http___identifiers.org_panther.pathway_P00002.sbgn pathwayCommons
## 3 http___identifiers.org_panther.pathway_P00003.sbgn pathwayCommons
## 4 http___identifiers.org_panther.pathway_P00004.sbgn pathwayCommons
## 5 http___identifiers.org_panther.pathway_P00005.sbgn pathwayCommons
## 6 http___identifiers.org_panther.pathway_P00006.sbgn pathwayCommons
##                                             uri    sub.database pathway.id
## 1 http://identifiers.org/panther.pathway/P00001 panther.pathway     P00001
## 2 http://identifiers.org/panther.pathway/P00002 panther.pathway     P00002
## 3 http://identifiers.org/panther.pathway/P00003 panther.pathway     P00003
## 4 http://identifiers.org/panther.pathway/P00004 panther.pathway     P00004
## 5 http://identifiers.org/panther.pathway/P00005 panther.pathway     P00005
## 6 http://identifiers.org/panther.pathway/P00006 panther.pathway     P00006
##                                  pathway.name macromolecule.ID.type
## 1   Adrenaline and noradrenaline biosynthesis        pathwayCommons
## 2 Alpha adrenergic receptor signaling pathway        pathwayCommons
## 3 Alzheimer disease-amyloid secretase pathway        pathwayCommons
## 4        Alzheimer disease-presenilin pathway        pathwayCommons
## 5                                Angiogenesis        pathwayCommons
## 6                 Apoptosis signaling pathway        pathwayCommons
##   simple.chemical.ID.type
## 1          pathwayCommons
## 2          pathwayCommons
## 3          pathwayCommons
## 4          pathwayCommons
## 5          pathwayCommons
## 6          pathwayCommons
pathways.stat
##          database    sub.database Freq
## 1        MetaCrop        MetaCrop   62
## 5         MetaCyc         MetaCyc 2518
## 9  pathwayCommons panther.pathway  149
## 12 pathwayCommons        reactome 1746
## 15 pathwayCommons           smpdb  725

There are two common scenarios of using SBGNview

In this scenario, the SBGN-ML file contents are stored in data sbgn.xmls (included in package SBGNview.data). SBGNview can automatically retrieve SBGN-ML contents from this dataset by pathway IDs.

  • Using SBGN-ML files from other sources.

In this scenario, more parameters and/or ID mapping files (map between omics data IDs and SBGN-ML file glyph IDs) are needed. Please see the documentation of function SBGNview for more details.

5.1.2.2 Search for pathways by keywords

SBGNview has several functions to search for pathways by keyword and automatically download SBGN-ML files.

pathways <- findPathways(c("bile acid","bile salt"))
head(pathways)
pathways.local.file <- downloadSbgnFile(pathways$pathway.id[1:3])
pathways.local.file

By default findPathways searches for keywords in pathway names. It can also search by different ID types

pathways <- findPathways(c("tp53","Trp53"),keyword.type = "SYMBOL")
head(pathways)
pathways <- findPathways(c("K04451","K10136"),keyword.type = "KO")
head(pathways)

5.1.2.3 Different layout for the same pathway

Researchers may have different tastes for a “good looking” layout. We have created differen layouts for each pathway. User can download them from pre-generated SBGN-ML files and try.

5.1.3 User customized SBGN-ML file

We can also create a SBGN-ML file from scratch. Several tools like Newt editor (http://newteditor.org/) can let the user draw a pathway diagram and save it as SBGN-ML file. The tools may also able to generate a primitive pathway layout. But these layouts often have too many node-node overlaps and edge-node crossings. Therefore, we recommend the user to use our SBGN-ML file collection mentioned above, which have been optimized to solve these problems.

5.2 Omics data

SBGNview can visualize a range of omics data, including both gene (or transcript, protein, enzyme) data and compound (or metabolite, chemical, small molecules) data.

5.2.1 Gene expression data

Gene/protein related data will be mapped to “macromolecule” nodes on a SBGN map.

5.2.2 Chemical compound data

Chemical compound data will be mapped to “simple chemical” nodes on a SBGN map.

Here we simulate a compound dataset.

cpd.data <- sim.mol.data(mol.type = "cpd", id.type = "KEGG COMPOUND accession", nmol = 50000, nexp = 2)
head(cpd.data)

5.3 Visualize gene data

Most of the SBGN-ML files from online resources have their unique ID types and it is different from the ID type in the omics data. If the ID types are different, we need to map the omics IDs to the node IDs in the SBGN-ML file. In the quick start example (“Adrenaline and noradrenaline biosynthesis” pathway), the SBGN-ML file uses pathwayCommons IDs for gene/protein nodes, whereas the omics dataset uses Entrez gene IDs. The function SBGNview can automatically map common ID types such as ENTREZ, UniProt etc. to nodes in our pre-generated SBGN-ML files as shown in the quick start example. We can also do it manually using function changeDataId, which is called by SBGNview to do ID mapping. Supprted ID type pairs can be found in data(mapped.ids). changeDataId uses pre-generated mapping tables or pathview to do the mapping. If the input-output ID type pair is not in data(mapped.ids) or can’t be mapped bypathview, user needs to provide the mapping table explicitly using the “id.mapping.table” argument.

Let’s change the IDs in the gene expression omics data.

gene.data <- gse16873
head(gene.data[,1:2])
##                DCIS_1      DCIS_2
## 10000     -0.30764480 -0.14722769
## 10001      0.41586805 -0.33477259
## 10002      0.19854925  0.03789588
## 10003     -0.23155297 -0.09659311
## 100048912 -0.04490724 -0.05203146
## 10004     -0.08756237 -0.05027725
gene.data <- changeDataId(data.input.id = gene.data,
                           input.type = "entrez",
                           output.type = "pathwayCommons",
                           cpd.or.gene = "gene",
                           sum.method = "sum"
                           )
head(gene.data[,1:2])
##                              DCIS_1     DCIS_2
## Protein100049:@:PWY-5188 0.02357413 0.35777714
## Protein100090:@:PWY-5188 0.02357413 0.35777714
## Protein100099:@:PWY-5188 0.02357413 0.35777714
## Protein100143:@:PWY-5424 0.23976217 0.03226769
## Protein100158:@:PWY-5188 0.02357413 0.35777714
## Protein100294:@:PWY-5188 0.23614703 0.52414811

Now we run SBGNview, the main function to overlay omics data on SBGN map.

SBGNview.obj <- SBGNview(
              gene.data = gene.data,
              input.sbgn = "P00001",
              output.file = "test_output",
              gene.id.type = "pathwayCommons",
              output.formats =  c("svg")
    )
SBGNview.obj

By default SBGNview will generate a .svg file. Other formats can be added also. In this example, three additional files (pdf, ps, png) will be created in the same folder.

\label{fig:figGeneData}Visualization of gene expression data.

Figure 5.1: Visualization of gene expression data.

5.4 Visualize both gene data and compound data

Here for demo purpose, we change the kegg compound IDs to pathwayCommons compound IDs. Although SBGNview can do this automatically (e.g. in the quick start example).

cpd.data <- changeDataId(data.input.id = cpd.data,
                           input.type = "kegg.ligand",
                           output.type = "pathwayCommons",
                           cpd.or.gene = "compound",
                           sum.method = "sum"
                           )
head(cpd.data)

Now we can visualize both gene and compound data. In this example, we use the original gene expression data with “ENTREZ” IDs to show SBGNview’s automatic ID mapping ability.

SBGNview.obj <- SBGNview(
                gene.data = gse16873,
                cpd.data = cpd.data,
                input.sbgn = "P00001",
                output.file = "test_output.gene.compound",
                gene.id.type = "entrez",
                cpd.id.type = "pathwayCommons",
                output.formats =  c("svg")
                )
SBGNview.obj
\label{fig:figGeneAndCpdData}Visualization of both gene expression and compound abundance data.

Figure 5.2: Visualization of both gene expression and compound abundance data.

5.5 About SBGNview object

SBGNview operates in a way similar to ggplot2: The main function SBGNview returns a SBGNview object (similar to the ggplot object “p” returned by function ggplot in ggplot2), which contains all information needed to render SBGN graph, including output file path. Printing this object will render the graph and write output image files. SBGNview object can be further modified by several built-in functions to highlight nodes/edges/paths (e.g. SBGNview.obj+highlightNodes(), similar to p+geom_boxplot() in ggplot2).

  • These operations will generate plot files:
    • SBGNview(…) + highlightNodes(…) or SBGNview.obj + highlightNodes(…)

      How it works: The functions will return a *SBGNview* object to R console. The returned object is executed as a top-level R expression, thus will be implicitly printed using a *print.SBGNview* function in **SBGNview** package. For more details, please see the documentaion of function *print.SBGNview*.
    • run SBGNview.obj in R console

      The mechanism is the same as above. The object run in R console is implicitly printed.
    • print(SBGNview.obj)

      In this case the "print.SBGNview" function is run explicitly.
    • for (i in 1:2) {print(SBGNview.obj)}

      Same as above: the "print.SBGNview" function is run explicitly.
  • These commands will NOT generate plot files:
    • SBGNview.obj = SBGNview(…)+highlightNodes(…)

      In this case, the assign operation "=" made the returned object invisible thus not printed
    • for (i in 1:2) {SBGNview.obj}

      In this case SBGNview.obj is no longer a top-level R expression thus won't be implicitly printed.

5.5.1 Structure of SBGNview object

result.one.sbgn <- SBGNview.obj$data[[1]]
names(result.one.sbgn)
glyphs <- result.one.sbgn$glyphs.list
arcs <- result.one.sbgn$arcs.list
str(glyphs[[1]])
str(arcs[[1]])

5.5.2 Change output file in a SBGNview object

We can change the output file using built-in function outputFile:

outputFile(SBGNview.obj)
outputFile(SBGNview.obj) <- "test.change.output.file"
outputFile(SBGNview.obj)
SBGNview.obj
outputFile(SBGNview.obj) <- "test.print"
outputFile(SBGNview.obj)
print(SBGNview.obj)

6 Try different layout for the same pathway.

If the default layout is not ideal, users have two options:

download.file("https://raw.githubusercontent.com/datapplab/SBGNhub/master/data/SBGN.with.stamp/pathwayCommons/http___identifiers.org_panther.pathway_P00001.1.sbgn",destfile = "P00001.new.layout.sbgn")
SBGNview(
          gene.data = gse16873, 
          gene.id.type = "entrez",
          input.sbgn = "P00001.new.layout.sbgn",
          sbgn.gene.id.type = "pathwayCommons",
          output.file = "test.different.layout", 
          output.formats =  c("svg")
          )
\label{fig:differentLayoutFig}Graph with different layout.

Figure 6.1: Graph with different layout.

7 Modify graph elements

It is useful to highlight interesting nodes, edges or paths in a pathway map. This can be done by modifying the SBGNview object, which contains all information needed to render a SBGN map.

7.1 Built-in functions

Like ggplot2, the SBGNview object can be further modified by concatenating it with modification functions using binary operator + (see quick start for example).

7.2 Hightlight nodes

7.2.1 Highlight all nodes

highlight.all.nodes.sbgn.obj <-  SBGNview.obj +
        highlightNodes( 
# Here we set argument "node.set" to select all nodes
            node.set = "all",
            stroke.width = 4, stroke.color = "green")
outputFile(highlight.all.nodes.sbgn.obj) = "highlight.all.nodes"
print(highlight.all.nodes.sbgn.obj)
\label{fig:highlightAllNodes}Highlight all nodes.

Figure 7.1: Highlight all nodes.

7.2.2 Highlight nodes by class

highlight.macromolecule.sbgn.obj <-  SBGNview.obj +
        highlightNodes(
# Here we set argument "select.glyph.class" to select macromolecule nodes
            select.glyph.class = "macromolecule",
            stroke.width = 4, stroke.color = "green")
outputFile(highlight.macromolecule.sbgn.obj) = "highlight.macromolecule"
print(highlight.macromolecule.sbgn.obj)
\label{fig:highlightMacromolecule}Highlight macromolecule nodes.

Figure 7.2: Highlight macromolecule nodes.

7.2.3 Show node IDs instead of node labels

highlight.all.nodes.sbgn.obj <-  SBGNview.obj +
        highlightNodes( node.set = "(+-)-epinephrine", stroke.width = 4, stroke.color = "green",
# Here we set argument "show.glyph.id" to display node ID instead of the original label.
                         show.glyph.id = TRUE,
                        label.font.size = 10)
outputFile(highlight.all.nodes.sbgn.obj) = "highlight.all.id.nodes"
print(highlight.all.nodes.sbgn.obj)
\label{fig:highlightNodes}Highlight nodes using node IDs.

Figure 7.3: Highlight nodes using node IDs.

7.3 Adjust node labels.

7.3.1 Label position, font size, color, change labels

The function highlightNodes also can be used to adjust labels. In this example, we move the label horizontally and vertically, change their color and font size.

my.labels <- c("Tyr","epinephrine")
names(my.labels) <- c("tyrosine", "(+-)-epinephrine")
SBGNview.obj.adjust.label <-  SBGNview.obj +
        highlightNodes( node.set = c("tyrosine", "(+-)-epinephrine"), stroke.width = 4, stroke.color = "green",
                         label.x.shift = 0,
# Labels are moved up a little bit                        
                         label.y.shift = -20,
                         label.color = "red",
                         label.font.size = 30,
                         label.spliting.string = "", 
# node labels can be customized by a named vector. The names of the vector is the IDs assigned to argument "node.set". Values of the vector are the new labels for display.                
                         labels = my.labels)
outputFile(SBGNview.obj.adjust.label) <- "adjust.label"
print(SBGNview.obj.adjust.label)
\label{fig:changeLabel}Modify node labels.

Figure 7.4: Modify node labels.

7.3.2 Label text wrapping into multiple lines

Some nodes may have long labels thus overlap with surrounding graph elements. In this case we can set the parameter label.spliting.string to “any” so the label will be wrapped in multiple lines that fit the width of the node.

SBGNview.obj.change.label.wrapping <-  SBGNview.obj +
        highlightNodes( node.set = c("tyrosine", "(+-)-epinephrine"), stroke.width = 4, stroke.color = "green",
                         show.glyph.id = TRUE,
                         label.x.shift = 10,label.y.shift = 20,label.color = "red",
                         label.font.size = 10,label.spliting.string = "any")
outputFile(SBGNview.obj.change.label.wrapping) = "change.label.wrapping"
print(SBGNview.obj.change.label.wrapping)
\label{fig:changeWrapping}Change how labels are wrapped.

Figure 7.5: Change how labels are wrapped.

7.4 When one input ID maps to multiple nodes

In the example above, we saw that one input ID (e.g. “(+-)-epinephrine”) can be mapped to multiple nodes in the graph. If we just want to focus on several particular ones, we can use function highlightNodes to find the node IDs, which is unique to each node:

test.show.glyph.id <- SBGNview.obj+
    highlightNodes( node.set = c("tyrosine", "(+-)-epinephrine"), stroke.width = 4,
                     stroke.color = "green", show.glyph.id = TRUE,
                     label.x.shift = 10,label.y.shift = 20,label.color = "red",
                     label.font.size = 10,
# When "label.spliting.string" is set to a string that is not in the label (including an empty string ""), the label will not be wrapped into multiple lines.                     
                     label.spliting.string = "")
outputFile(test.show.glyph.id) <- "test.show.glyph.id"
print(test.show.glyph.id)
\label{fig:displayNodeIds}Show node IDs of mapped nodes.

Figure 7.6: Show node IDs of mapped nodes.

We can find the mapping between input IDs and node IDs:

mapping <- changeIds(input.ids = c("tyrosine", "(+-)-epinephrine"),
           input.type = "CompoundName",
           output.type = "pathwayCommons",
           cpd.or.gene = "compound",
           limit.to.pathways = input.pathway.ids[1] )
mapping
## $tyrosine
## [1] "SmallMolecule_96737c854fd379b17cb3b7715570b733"
## 
## $`(+-)-epinephrine`
## [1] "SmallMolecule_db0211694be3283c6b1fb217c8f331ac"
## [2] "SmallMolecule_7753c3822ee83d806156d21648c931e6"
## [3] "SmallMolecule_a25ed355cc2a6a0800babeb708e2cba4"

We can pick two nodes to highlight and find a shortest path between them.

outputFile(SBGNview.obj) <- "highlight.by.node.id"
SBGNview.obj+  highlightNodes(node.set = c("tyrosine", "(+-)-epinephrine"),
                       stroke.width = 4, stroke.color = "red") + 
    highlightPath(from.node =  "SmallMolecule_96737c854fd379b17cb3b7715570b733",
                   to.node =   "SmallMolecule_7753c3822ee83d806156d21648c931e6",
                   node.set.id.type = "pathwayCommons",
                      from.node.color = "green",
                      to.node.color = "blue",
                      shortest.paths.cols = c("purple"),
                      input.node.stroke.width = 6,
                      path.node.stroke.width = 3,
                      path.node.color = "purple",
                      path.stroke.width = 5,
                      tip.size = 10)
\label{fig:highlightNodesById}Highlight nodes and shortest path using node IDs.

Figure 7.7: Highlight nodes and shortest path using node IDs.

7.5 Modify SBGNview object directly

More graph features can be controlled by directly modifing the SBGNview object.

result.one.sbgn <- SBGNview.obj$data[[1]]
names(result.one.sbgn)
glyphs <- result.one.sbgn$glyphs.list
arcs <- result.one.sbgn$arcs.list
str(glyphs[[1]])
str(arcs[[1]])

8 Retrieve pathway related information

8.1 Extract node information

Node information can be extracted using function sbgnNodes.

node.info <- sbgnNodes(input.sbgn = c("P00001","P00002"),
                       output.gene.id.type = "SYMBOL",
                       output.cpd.id.type = "chebi",
                       species = "hsa"
                       )

The returned list contains information about all nodes in the SBGN-ML file.

head(node.info[[1]])
##            glyph.id                                                            
## nodes.info "SmallMolecule_7900c8b1b134ced992834480db9f57ce"                    
## nodes.info "SmallMolecule_d6e1edab5acc63ab37e4709f277a00f3"                    
## nodes.info "Protein_608bc3adf283a9858c923f468826fcb4"                          
## nodes.info "BiochemicalReaction_69ac0e38aa1214bbe7b90c612a7f07d8_LEFT_TO_RIGHT"
## nodes.info "SmallMolecule_7753c3822ee83d806156d21648c931e6"                    
## nodes.info "PhysicalEntity_647ec4d047adde4a098bef258c43cdfe"                   
##            id                                                                  
## nodes.info "18357"                                                             
## nodes.info "18243"                                                             
## nodes.info ""                                                                  
## nodes.info "BiochemicalReaction_69ac0e38aa1214bbe7b90c612a7f07d8_LEFT_TO_RIGHT"
## nodes.info "33568"                                                             
## nodes.info "PhysicalEntity_647ec4d047adde4a098bef258c43cdfe"                   
##            class                parent.complex parent.submap
## nodes.info "simple chemical"    ""             ""           
## nodes.info "simple chemical"    ""             ""           
## nodes.info "macromolecule"      ""             ""           
## nodes.info "process"            ""             ""           
## nodes.info "simple chemical"    ""             ""           
## nodes.info "unspecified entity" ""             ""           
##            parent.compartment      parent.macromolecule parent.node
## nodes.info "free.node.compartment" ""                   ""         
## nodes.info "free.node.compartment" ""                   ""         
## nodes.info "free.node.compartment" ""                   ""         
## nodes.info "free.node.compartment" ""                   ""         
## nodes.info "free.node.compartment" ""                   ""         
## nodes.info "free.node.compartment" ""                   ""

For example, the complex membership information can be retrieved by accessing the “complex” element. Macromolecules are represented by gene symbols. Simple chemicals are represented by ChEBI IDs (e.g. 33568). When there are multiple IDs of output type match the same node in SBGN-ML file, the target IDs are concatenated by “;”. In the following example, complex with ID “Complex_4e65cdd554d14679587b7822e6426705” has two members: 1. a protein (symbol Slc18A2 etc.) and 2. a simple chemical (ChEBI 33568)

9 ID mapping

SBGNview can automatically map common ID types to SBGN-ML glyphs in our pre-generated SBGN-ML files. Supported ID types can be accessed as follow:

data("mapped.ids")

9.1 Map between two types of IDs

Besides changeDataId which changes ID for omics data, SBGNview provides functions to map between different types IDs:

mapping <- changeIds(
  input.ids = c("tyrosine", "(+-)-epinephrine"),
  input.type = "CompoundName",
  output.type = "pathwayCommons",
  cpd.or.gene = "compound",
  limit.to.pathways = "P00001"
)
head(mapping)
## $tyrosine
## [1] "SmallMolecule_96737c854fd379b17cb3b7715570b733"
## 
## $`(+-)-epinephrine`
## [1] "SmallMolecule_db0211694be3283c6b1fb217c8f331ac"
## [2] "SmallMolecule_7753c3822ee83d806156d21648c931e6"
## [3] "SmallMolecule_a25ed355cc2a6a0800babeb708e2cba4"
mapping <- changeIds(
  input.ids = c("tyrosine", "(+-)-epinephrine"),
  input.type = "CompoundName",
  output.type = "chebi",
  cpd.or.gene = "compound"
)
head(mapping)
## $tyrosine
## [1] "15277" "18186"
## 
## $`(+-)-epinephrine`
## [1] "33568"

9.2 Re-use downloaded ID mapping tables

SBGNview has generated pairwise ID mapping tables (between various gene/compound ID types and pathway glyph IDs and pathway IDs) for the pre-collected SBGN-ML files. SBGNview automatically downloads these mapping tables into a folder specified by parameter “SBGNview.data.folder”, if the file is not in that folder. Therefore, user can retain the downloaded files and specify “SBGNview.data.folder” to re-use the downloaded ID mapping files. The default SBGNview.data.folder is “SBGNview.tmp.data” in the working directory. In the following example, we set “SBGNview.tmp.data” so SBGNview doesn’t need to download the ID mapping table again.

mapping <- changeIds(
  input.ids = c("tyrosine"),
  input.type = "CompoundName",
  output.type = "chebi",
  cpd.or.gene = "compound",
  SBGNview.data.folder = "./SBGNview.tmp.data"
)
head(mapping)
## $tyrosine
## [1] "15277" "18186"

9.3 Extract molecule list from pathways

mol.list <- getMolList(
                        database = "metacrop"
                        ,mol.list.ID.type = "ENZYME"
                        ,org = "ath"
                        ,output.pathway.name = FALSE
                        ,truncate.name.length = 50
)
mol.list[[1]]
## [1] "2.6.1.2"  "2.6.1.44"
mol.list <- getMolList(
                 database = "pathwayCommons",
                 mol.list.ID.type = "ENTREZID",
                 org = "hsa"
)
mol.list[[1]]
## [1] "217" "219" "224"
mol.list <- getMolList(
                 database = "pathwayCommons",
                 mol.list.ID.type = "ENTREZID",
                 org = "mmu"
)
mol.list[[2]]
## [1] "69574"
mol.list <- getMolList(
                 database = "MetaCyc",
                 mol.list.ID.type = "KO",
                 org = "eco"
)
mol.list[[2]]
## [1] "K01061"
mol.list <- getMolList(
                 database = "pathwayCommons",
                 mol.list.ID.type = "chebi",
                 cpd.or.gene = "compound"
)
mol.list[[2]]
##  [1] "13390" "13392" "15377" "15378" "15379" "15775" "16468" "16793" "17996"
## [10] "19375" "28318" "28618" "28957" "33191" "48981"

10 Example using selected database

10.1 Use Reactome pathway database

is.reactome <- pathways.info[,"sub.database"]== "reactome"
reactome.ids <- pathways.info[is.reactome ,"pathway.id"]
SBGNview.obj <- SBGNview(
              gene.data = gse16873, 
              gene.id.type = "entrez",
              input.sbgn =  reactome.ids[1:2],
              output.file = "demo.reactome", 
              output.formats =  c("svg")
            )
SBGNview.obj

10.2 Use MetaCrop pathway database

is.metacrop <- pathways.info[,"sub.database"]== "MetaCrop"
metacrop.ids <- pathways.info[is.metacrop ,"pathway.id"]
SBGNview.obj <- SBGNview(
              gene.data = c(), 
              input.sbgn =  metacrop.ids[1:2],
              output.file = "demo.metacrop", 
              output.formats =  c("svg")
            )
SBGNview.obj

11 Test SBGN reference cards

downloadSbgnFile(c("AF_Reference_Card.sbgn"
                 ,"PD_Reference_Card.sbgn"
                 ,"ER_Reference_Card.sbgn"
                 ))

SBGNview.obj <- SBGNview(
             gene.data = c()
             ,input.sbgn = c("AF_Reference_Card.sbgn"
                             ,"PD_Reference_Card.sbgn"
                             ,"ER_Reference_Card.sbgn"
                             )
             ,sbgn.gene.id.type ="glyph"
             
             ,output.file = "./test.refcards" 
             ,output.formats = c("pdf")
             ,font.size = 1
             ,logic.node.font.scale = 10
             ,status.node.font.scale = 10
           )
SBGNview.obj

12 FAQs

12.1 Color key

12.1.1 Turn off color key

# Not run!
SBGNview(
 key.pos = "none"
)

13 References

14 Session Info

sessionInfo()
## R version 4.0.0 (2020-04-24)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.4 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.11-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.11-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] parallel  stats4    stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] SBGNview_1.2.0       SBGNview.data_1.1.0  pathview_1.28.0     
##  [4] org.Hs.eg.db_3.10.0  AnnotationDbi_1.50.0 IRanges_2.22.0      
##  [7] S4Vectors_0.26.0     Biobase_2.48.0       BiocGenerics_0.34.0 
## [10] knitr_1.28          
## 
## loaded via a namespace (and not attached):
##  [1] KEGGREST_1.28.0             SummarizedExperiment_1.18.0
##  [3] xfun_0.13                   lattice_0.20-41            
##  [5] vctrs_0.2.4                 htmltools_0.4.0            
##  [7] yaml_2.2.1                  blob_1.2.1                 
##  [9] XML_3.99-0.3                rlang_0.4.5                
## [11] DBI_1.1.0                   Rgraphviz_2.32.0           
## [13] bit64_0.9-7                 matrixStats_0.56.0         
## [15] GenomeInfoDbData_1.2.3      stringr_1.4.0              
## [17] zlibbioc_1.34.0             Biostrings_2.56.0          
## [19] memoise_1.1.0               evaluate_0.14              
## [21] rsvg_1.3                    gbRd_0.4-11                
## [23] GenomeInfoDb_1.24.0         highr_0.8                  
## [25] Rcpp_1.0.4.6                DelayedArray_0.14.0        
## [27] graph_1.66.0                XVector_0.28.0             
## [29] bit_1.1-15.2                png_0.1-7                  
## [31] digest_0.6.25               stringi_1.4.6              
## [33] bookdown_0.18               GenomicRanges_1.40.0       
## [35] grid_4.0.0                  bibtex_0.4.2.2             
## [37] Rdpack_0.11-1               tools_4.0.0                
## [39] bitops_1.0-6                magrittr_1.5               
## [41] RCurl_1.98-1.2              RSQLite_2.2.0              
## [43] crayon_1.3.4                pkgconfig_2.0.3            
## [45] Matrix_1.2-18               xml2_1.3.2                 
## [47] KEGGgraph_1.48.0            rmarkdown_2.1              
## [49] httr_1.4.1                  R6_2.4.1                   
## [51] igraph_1.2.5                compiler_4.0.0

Le Novere, Nicolas, Michael Hucka, Huaiyu Mi, Stuart Moodie, Falk Schreiber, Anatoly Sorokin, Emek Demir, et al. 2009. “The Systems Biology Graphical Notation.” Nature Biotechnology 27 (8). Nature Publishing Group:735.

Luo, Weijun, and Cory Brouwer. 2013. “Pathview: An R/Bioconductor Package for Pathway-Based Data Integration and Visualization.” Bioinformatics 29 (14). Oxford University Press:1830–1.