This vignette describes how to use the RandomWalkRestartMH package to run Random Walk with Restart algorithms on monoplex, multiplex, heterogeneous, multiplex-heterogeneous networks and full multiplex-heterogeneous networks It is based on the work we presented on the following article:
https://academic.oup.com/bioinformatics/article/35/3/497/5055408
Although, we have recently extended the method to take into account weigthed networksand full multiplex-heterogeneous networks (both networks connected by bipartite interactions are multiplex.)
RandomWalkRestartMH 1.20.0
RandomWalkRestartMH (Random Walk with Restart on Multiplex and Heterogeneous Networks) is an R package built to provide an easy interface to perform Random Walk with Restart in different types of complex networks:
It is based on the work we presented in the article:
https://academic.oup.com/bioinformatics/article/35/3/497/5055408
We have recently extended the method in order to take into account weighted networks. In addition, the package is now able to perform Random Walk with Restart on:
RWR simulates an imaginary particle that starts on a seed(s) node(s) and
follows randomly the edges of a network. At each step, there is a restart
probability, r
, meaning that the particle can come back to the seed(s)
(Pan et al. 2004). This imaginary particle can explore the following types
of networks:
A monoplex or single network, which contains solely nodes of the same nature. In addition, all the edges belong to the same category.
A multiplex network, defined as a collection of monoplex networks considered as layers of the multiplex network. In a multiplex network, the different layers share the same set of nodes, but the edges represent relationships of different nature (Battiston, Nicosia, and Latora 2014). In this case, the RWR particle can jump from one node to its counterparts on different layers.
A heterogeneous network, which is composed of two monoplex networks containing nodes of different nature. These different kind of nodes can be connected thanks to bipartite edges, allowing the RWR particle to jump between the two networks.
A multiplex and heterogeneous network, which is built by linking the nodes in every layer of a multiplex network to nodes of different nature thanks to bipartite edges.
A full multiplex and heterogeneous network, in which the two networks connected by bipartite interactions are of multiplex nature. The RWR particle can now explore the full multiplex-heterogeneous network.
The user can integrate single networks (monoplex networks) to create a multiplex network. The multiplex network can also be integrated, thanks to bipartite relationships, with another multiplex network containing nodes of different nature. Proceeding this way, a network both multiplex and heterogeneous will be generated. To do so, follow the instructions detailed below
Please note that this version of the package does not deal with directed networks. New features will be included in future updated versions of RandomWalkRestartMH.
First of all, you need a current version of R. RandomWalkRestartMH is a freely available package deposited on Bioconductor and GitHub. You can install it by running the following commands on an R console:
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("RandomWalkRestartMH")
or to install the latest version from GitHub before it is released in Bioconductor:
devtools::install_github("alberto-valdeolivas/RandomWalkRestartMH")
In the following paragraphs, we describe how to use the RandomWalkRestartMH package to perform RWR on different types of biological networks. Concretely, we use a protein-protein interaction (PPI) network, a pathway network, a disease-disease similarity network and combinations thereof. These networks are obtained as detailed in (Valdeolivas et al. 2018). The PPI and the Pathway network were reduced by only considering genes/proteins expressed in the adipose tissue, in order to reduce the computation time of this vignette.
The goal in the example presented here is, as described in (Valdeolivas et al. 2018), to find candidate genes potentially associated with diseases by a guilt-by-association approach. This is based on the fact that genes/proteins with similar functions or similar phenotypes tend to lie closer in biological networks. Therefore, the larger the RWR score of a gene, the more likely it is to be functionally related with the seeds.
We focus on a real biological example: the SHORT syndrome (MIM code: 269880) and its causative gene PIK3R1 as described in (Valdeolivas et al. 2018). We will see throughout the following paragraphs how the RWR results evolve due to the the integration and exploration of additional networks.
RWR has usually been applied within the framework of single PPI networks in bioinformatics (Kohler et al. 2008). A gene or a set of genes, so-called seed(s), known to be implicated in a concrete function or in a specific disease, are chosen as the starting point(s) of the algorithm. The RWR particle explores the neighbourhood of the seeds and the algorithm computes a score for all the nodes of the network. The larger it is the score of a node, the closer it is to the seed(s).
Let us generate an object of the class Multiplex
, even if it is a
monoplex network, with our PPI network.
library(RandomWalkRestartMH)
library(igraph)
##
## Attaching package: 'igraph'
## The following objects are masked from 'package:stats':
##
## decompose, spectrum
## The following object is masked from 'package:base':
##
## union
data(PPI_Network) # We load the PPI_Network
## We create a Multiplex object composed of 1 layer (It's a Monoplex Network)
## and we display how it looks like
PPI_MultiplexObject <- create.multiplex(list(PPI=PPI_Network))
PPI_MultiplexObject
## Number of Layers:
## [1] 1
##
## Number of Nodes:
## [1] 4317
##
## IGRAPH 8b8a201 UNW- 4317 18062 --
## + attr: name (v/c), weight (e/n), type (e/c)
## + edges from 8b8a201 (vertex names):
## [1] AAMP --VPS52 AAMP --BHLHE40 AAMP --GABARAPL2 AAMP --MAP1LC3B
## [5] VPS52 --TXN2 VPS52 --DDX6 VPS52 --MFAP1 VPS52 --PRKAA1
## [9] VPS52 --LMO4 VPS52 --STX11 VPS52 --KANK2 VPS52 --PPP1R18
## [13] VPS52 --TXLNA VPS52 --KIAA1217 VPS52 --VPS28 VPS52 --ATP6V1D
## [17] VPS52 --TPM3 VPS52 --KIF5B VPS52 --NOP2 VPS52 --RNF41
## [21] VPS52 --WTAP VPS52 --MAPK3 VPS52 --ZMAT2 VPS52 --VPS51
## [25] BHLHE40--AES BHLHE40--PRKAA1 BHLHE40--CCNK BHLHE40--RBPMS
## [29] BHLHE40--COX5B BHLHE40--UBE2I BHLHE40--MAGED1 BHLHE40--PLEKHB2
## + ... omitted several edges
To apply the RWR on a monoplex network, we need to compute the adjacency matrix of the network and normalize it by column (Kohler et al. 2008), as follows:
AdjMatrix_PPI <- compute.adjacency.matrix(PPI_MultiplexObject)
AdjMatrixNorm_PPI <- normalize.multiplex.adjacency(AdjMatrix_PPI)
Then, we need to define the seed(s) before running the RWR algorithm on this PPI network. As commented above, we are focusing on the example of the SHORT syndrome. Therefore, we take the PIK3R1 gene as seed, and we execute RWR.
SeedGene <- c("PIK3R1")
## We launch the algorithm with the default parameters (See details on manual)
RWR_PPI_Results <- Random.Walk.Restart.Multiplex(AdjMatrixNorm_PPI,
PPI_MultiplexObject,SeedGene)
# We display the results
RWR_PPI_Results
## Top 10 ranked Nodes:
## NodeNames Score
## 1 GRB2 0.006845881
## 2 EGFR 0.006169129
## 3 CRK 0.005674261
## 4 ABL1 0.005617041
## 5 FYN 0.005611086
## 6 CDC42 0.005594680
## 7 SHC1 0.005577900
## 8 CRKL 0.005509182
## 9 KHDRBS1 0.005443541
## 10 TYRO3 0.005441887
##
## Seed Nodes used:
## [1] "PIK3R1"
Finally, we can create a network (an igraph
object) with the top
scored genes. Visualize the top results within their interaction network is
always a good idea in order to prioritize genes, since we can have a global view
of all the potential candidates. The results are presented in Figure 1
## In this case we selected to induce a network with the Top 15 genes.
TopResults_PPI <-
create.multiplexNetwork.topResults(RWR_PPI_Results,PPI_MultiplexObject,
k=15)
## We print that cluster with its interactions.
par(mar=c(0.1,0.1,0.1,0.1))
plot(TopResults_PPI, vertex.label.color="black",vertex.frame.color="#ffffff",
vertex.size= 20, edge.curved=.2,
vertex.color = ifelse(igraph::V(TopResults_PPI)$name == "PIK3R1","yellow",
"#00CCFF"), edge.color="blue",edge.width=0.8)