The netboxr package composes a number of functions to retrive and process genetic data from large-scale genomics projects (e.g. TCGA projects) including from mutations, copy number alterations, gene expression and DNA methylation. The netboxr package implements NetBox algorithm in R package. NetBox algorithm integrates genetic alterations with literature-curated pathway knowledge to identify pathway modules in cancer. NetBox algorithm uses (1) global network null model and (2) local network null model to access the statistic significance of the discovered pathway modules.
BiocManager::install("netboxr")
Load netboxr package:
library(netboxr)
A list of all accessible vignettes and methods is available with the following command:
help(package="netboxr")
For help on any netboxr package functions, use one of the following command formats:
help(geneConnector)
?geneConnector
This is an example to reproduce the network discovered on Cerami et al.(2010).
The results presented here are comparable to the those from Cerami et al. 2010 though the unadjusted p-values for linker genes are not the same. It is because the unadjusted p-value of linker genes in Cerami et al. 2010 were calculated by the probabiliy of the observed data point, Pr(X). The netboxr used the probability of an observed or more extreme assuming the null hypothesis is true, Pr(X>=x|H), as unadjusted p-value for linker genes. The final number of linker genes after FDR correction are the same between netboxr result and original Cerami et al. 2010.
Load pre-defined HIN network and simplify the interactions by removing loops and duplicated interactions in the network. The netowork after reduction contains 9264 nodes and 68111 interactions.
data(netbox2010)
sifNetwork <- netbox2010$network
graphReduced <- networkSimplify(sifNetwork, directed = FALSE)
## Loading network of 9264 nodes and 157780 interactions
## Treated as undirected network
## Removing multiple interactions and loops
## Returning network of 9264 nodes and 68111 interactions
The altered gene list contains 517 candidates from mutations and copy number alterations.
geneList <- as.character(netbox2010$geneList)
length(geneList)
## [1] 517
The geneConnector function in the netboxr package takes altered gene list as input and maps the genes on the curated network to find the local processes represented by the gene list.
## Use Benjamini-Hochberg method to do multiple hypothesis correction for
## linker candidates.
## Use edge-betweeness method to detect community structure in the network.
threshold <- 0.05
results <- geneConnector(geneList = geneList, networkGraph = graphReduced, directed = FALSE,
pValueAdj = "BH", pValueCutoff = threshold, communityMethod = "ebc", keepIsolatedNodes = FALSE)
## 274 / 517 candidate nodes match the name in the network of 9264
## nodes
## Only test neighbor nodes with local degree equals or exceeds 2
## Multiple hypothesis corrections for 892 neighbor nodes in the network
## For p-value 0.05 cut-off, 6 nodes were included as linker nodes
## Connecting 274 candidate nodes and 6 linker nodes
## Remove 208 isolated candidate nodes from the input
## Final network contains 72 nodes and 152 interactions
## Detecting modules using "edge betweeness" method
# Add edge annotations
library(RColorBrewer)
edges <- results$netboxOutput
interactionType <- unique(edges[, 2])
interactionTypeColor <- brewer.pal(length(interactionType), name = "Spectral")
edgeColors <- data.frame(interactionType, interactionTypeColor, stringsAsFactors = FALSE)
colnames(edgeColors) <- c("INTERACTION_TYPE", "COLOR")
netboxGraphAnnotated <- annotateGraph(netboxResults = results, edgeColors = edgeColors,
directed = FALSE, linker = TRUE)
# Check the p-value of the selected linker
linkerDF <- results$neighborData
linkerDF[linkerDF$pValueFDR < threshold, ]
## idx name localDegree globalDegree pValueRaw oddsRatio pValueFDR
## CRK 1712 CRK 11 81 2.392088e-05 1.708732 0.01866731
## IFNAR1 4546 IFNAR1 6 23 4.185496e-05 2.518726 0.01866731
## CBL 20 CBL 14 140 6.505470e-05 1.361057 0.01934293
## GAB1 500 GAB1 8 57 2.483197e-04 1.751122 0.04887827
## CDK6 414 CDK6 5 21 3.008515e-04 2.406906 0.04887827
## PTPN11 84 PTPN11 14 163 3.287776e-04 1.191405 0.04887827
# The geneConnector function returns a list of data frames.
names(results)
## [1] "netboxGraph" "netboxCommunity" "netboxOutput" "nodeType"
## [5] "moduleMembership" "neighborData"
# Plot graph with the Fruchterman-Reingold layout algorithm As an example, plot
# both the original and the annotated graphs Save the layout for easier
# comparison
graph_layout <- layout_with_fr(results$netboxGraph)
# plot the original graph
plot(results$netboxCommunity, results$netboxGraph, layout = graph_layout)