signet R package implements a method to detect selection in biological
pathways. The general idea is to search for gene subnetworks within biological
pathways that present unusual features, using a heuristic approach
The general idea is simple: we consider a gene list with prealably defined scores (e.g. a differentiation measure like the Fst) and we want to find gene networks presenting a score higher than expected under the null hypothesis.
To do so, we will use biological pathways databases converted as gene networks and search in these graphs for high-scoring subnetworks.
Details about the algorithm can be found in Gouy et al. (2017).
Please cite this paper if you use
signet for your project:
signet takes as input a data frame of gene scores. The first column must
correspond to the gene ID (e.g. Entrez) and the second columns is the gene
score (a single value per gene).
The other input is a list of biological pathways (gene networks) in the
graphNEL format. We advise to use the package
graphite to get the
library(graphite) # pathwayDatabases() #to have a look at pathways and species available # get the pathway list: paths <- graphite::pathways("hsapiens", "kegg") # convert the first 3 pathways to graphs: kegg_human <- lapply(paths[1:3], graphite::pathwayGraph) head(kegg_human)
Note that gene identifiers must be the same between the gene scores data frame
and the pathway list (e.g. entrez).
graphite provides a function to convert
A example dataset from Daub et al. (2013, MBE) as well as human KEGG pathways are provided:
library(signet) data(daub13) head(scores) # gene scores
## gene score ## 1 1 0.9200665 ## 2 10 1.5974385 ## 3 100 1.6885589 ## 4 1000 3.3314333 ## 5 10000 1.6668512 ## 6 10001 1.3529425
We first have to search for high-scoring subnetworks within the provided biological pathways, using simulated annealing:
# Run simulated annealing on the first 3 KEGG pathways: HSS <- searchSubnet(kegg_human, scores)
This function returns, for each pathway, the highest-scoring subnetwork found, its score and other information about the simulated annealing run.
Then, to test the significance of the high-scoring subnetworks, we generate a null distribution of high-scores:
#Generate the empirical null distribution null <- nullDist(kegg_human, scores, n = 1000)
Note that the
null object is a simple vector of null high-scores (here, 1000).
Therefore, you can run other iterations afterwards and concatenate the output
with the previous vector if you want to compute more precise p-values.
This distribution is finally used to compute p-values and update the
HSS <- testSubnet(HSS, null)
When p-values have been computed, you can generate a summary table (one row per pathway):
# Results: generate a summary table tab <- summary(HSS) head(tab)
## pathway net.size subnet.size subnet.score ## 1 Acute myeloid leukemia 53 9 2.75399072301884 ## 2 Adherens junction 66 8 3.25243964604843 ## 3 Adipocytokine signaling pathway 62 9 2.63809122980381 ## p.val subnet.genes ## 1 0.105 572 2475 2885 3815 3845 5291 5295 6654 10000 ## 2 0.038 87 1495 1496 4008 4301 7082 7414 29119 ## 3 0.128 1374 2538 4852 5562 5563 6774 53632 57818 92579
# you can write the summary table as follow: # write.table(tab, # file = "signet_output.tsv", # sep = "\t", # quote = FALSE, # row.names = FALSE)
Note that searching for high-scoring subnetworks and generating the null distribution can take a few hours. However, these steps are easy to parallelize on a cluster as different iterations are independent from each other.
Cytoscape (www.cytoscape.org) is an external software dedicated to network
signet allows to generate an XGMML file to be loaded in
Cytoscape (File > Import > Network > File…).
This file can be written in your working directory thanks to the
If the input of the function is a single
signet object, the whole pathway will
be represented and nodes belonging to the highest-scoring subnetwork (HSS)
will be highlighted in red.
writeXGMML(HSS[], filename = "cytoscape_input.xgmml")
If a list of pathways (signetList) is provided, all subnetworks with a p-value below a given threshold (default: 0.01) are merged and represented. Note that in this case, only the nodes belonging to HSS are kept for representation.
writeXGMML(HSS, filename = "cytoscape_input.xgmml", threshold = 0.01)
The representation can then be finely customised in Cytoscape.