Before running BioNetStat, you must set the execution parameters available on the left sidebar. Below, we detail each differential network analysis parameter.
Column classes name
After input, the data variables values data BioNetStat will choose the first character (factor) column to classify the samples. Suppose your dataset has more than one column which labels the sample classes. In that case, it is possible to select which column of classes you want to use.
Classes (conditions) being compared
Select the classes you want to analyze with BioNetStat.
Gene sets size range
BioNetStat performs tests for each variable set of a collection of sets defined in the Variable set database. If the user inputted no file, the program would analyze only one group with all variables. To test only a subcollection of sets, you can filter the groups according to their sizes (number of variables). It is possible to set the "Minimum gene set size" and "Maximum gene set size" parameters.
The minimum gene set size allowed is 5. However, we recommend testing groups with at least 15 variables.
Testing large gene sets can take much time. In general,
it is feasible to set 1000 or some hundreds of variables as the maximum size. However, this number may vary according to the user's machine specification.
Method for network construction
The network links are inferred according to a measure of association between the values of the variables. BioNetStat provides three classical association measures:
-
Pearson: Pearson's correlation coefficient that measures the linear dependence between two variables. For the statistical test, we use the
Hmisc package.
-
Spearman: Spearman's correlation coefficient that measures the monotonic dependence between two variables. For the statistical test, we use the
Hmisc package.
-
Kendall: Kendall's Tau coefficient. It measures the monotonic dependence between two variables. For the statistical test, we use the psych package.
Network type
You can choose between unweighted and weighted networks:
-
Unweighted: Graphs where all the edges are weighted by one.
You must choose a threshold for edge creation. Only edges that connect genes with a higher degree of association than the threshold will be created in the graph.
-
Weighted: Weighted networks are graphs where each edge has a weight defined in section 2d: Set the criterion for network edges weights. The weight of an edge is defined as the association degree between the two gene products connected by it.
Statistic to link formation
The correlation coefficient or p-value obtained by one of the methods mentioned
above are used to set an association degree for each link of the network.
The following options are available to measure the association degrees:
- Absolute correlation: the absolute value of the correlation coefficient
- 1 - p-value: One minus the p-value of the test for dependence between
two gene products. If the p-value is small, the expression levels are
tightly associated.
- 1 - q-value: One minus the adjusted p-value of the test for dependence
between two gene products. The p-value is adjusted by
the False Discovery Rate (Benjamini and
Hochberg, 1995) method for multiple testing.
After choosing the association measure, the user has to select the threshold value to links formation.
Links weights
If the user selected the weighted option, he also has to choose which measure will be used as the weight of the links (Section 2d: Set the criterion for network edges weights)
).
Method for gene networks comparison
BioNetStat compares the correlation networks between the classes for each variable set.
Below, we describe the methods available for comparing unweighted networks:
-
Spectral distribution test: The spectrum of an undirected graph is the set of eigenvalues of its adjacency matrix. The spectrum distribution describes many topological properties of a graph, such as the number of walks, diameter, and cliques. The spectral distribution test is based on the Kullback-Leiber (KL) divergence between spectral distributions (Takahashi et al., 2012). This test can use it to verify if the same model generated two graphs.
-
Spectral entropy test: It uses the absolute difference between spectral entropies (Takahashi et al., 2012) to measure the difference in
the graph topological organization complexity.
-
Degree distribution test: The degree of a node is the number of edges that connect to it. The degree distribution test is based on the Kullback-Leiber (KL) divergence between the degree distributions. BioNetStat uses the igraph package implementation of the node degree.
-
Degree centrality test: The degree centrality test
is based on the Euclidian distance between the degree centralities
of the two networks adjusted by the number of vertices.
-
Betweenness centrality test: The betweenness centrality of a node is the number of shortest paths going through it (Freeman, 1979).
The betweenness centrality test is based on the Euclidian distance between the betweenness centralities of the two networks adjusted by the number of vertices. BioNetStat uses the igraph package
implementation.
-
Closeness centrality test: The closeness centrality of a node is the inverse of the average length of the shortest paths between it and all the other vertices in the graph (Freeman, 1979). The closeness centrality test is based on the Euclidian distance between the closeness centralities of the two networks adjusted by the number of vertices. BioNetStat uses the igraph package implementation.
-
Eigenvector centrality test: The eigenvector centrality of a node
vi is the ith value of the first eigenvector
of the graph adjacency matrix (Bonacich, 1987). The eigenvector
centrality test is based on the Euclidian distance between
eigenvector centralities of the two networks adjusted by the number of
vertices. BioNetStat uses the igraph package
implementation.
-
Clustering coefficient test:
The local clustering coefficient of a node is the number of edges between the vertices within its neighborhood divided by the number of edges that could exist among them (Watts and Strogatz, 1998). The clustering coefficient test is based on the Euclidian distance between the local clustering coefficients of the two networks adjusted by the number of vertices. BioNetStat uses the igraph package implementation.
BioNetStat includes generalizations of some of the statistics described above to
weighted undirected graphs. Let G be a weighted undirected graph.
We define the weighted adjacency matrix of G to be the
matrix W = (w)ij, such that wij is the
weight of the edge that connects the vertices vi and
vj.
In this context, 0 ≤ wij ≤ 1 and G is a full graph.
Below, we describe the methods available for comparing weighted networks:
-
Spectral distribution test: Replaces the usual adjacency matrix by the
weighted adjacency matrix, and then performs the spectral distribution
test for unweighted networks.
-
Spectral entropy test: Replaces the usual adjacency matrix by the
weighted adjacency matrix, and then performs the spectral entropy
test for unweighted networks.
-
Degree distribution test: BioNetStat generalizes the degree of a node to the sum of the weights of the edges that connect to it (Barrat, 2004).
The software uses the igraph implementation of the node strength.
It replaces the usual node degree with the weighted degree and
then computes the degree distribution test for unweighted networks.
-
Degree centrality test: Replaces the usual node degree by the weighted
degree, and then computes the degree centrality test for
unweighted networks.
-
Betweenness centrality test: The betweenness centrality of a node is the number of shortest paths going through it (Freeman, 1979).
The betweenness centrality test is based on the Euclidian distance between the betweenness centralities of the two networks adjusted by the number of vertices. BioNetStat uses the igraph package
implementation.
-
Eigenvector centrality test: replaces the usual adjacency matrix by the
weighted adjacency matrix, and then performs the eigenvector centrality
test for unweighted networks (Newton, 2004).
-
Clustering coefficient test: replaces the local clustering coefficient of a node by the sum of the weights of the edges between the vertices within its neighborhood divided by the number of edges that could exist among them (Lopez-Fernandez et al. , 2004). Then it performs the clustering coefficient test for unweighted networks.
For the "Spectral distribution test," the "Spectral entropy test," and
the "Degree distribution test" methods, you must select a criterion to
define the bandwidth for the probability density function estimation. The
available methods for computing the bandwidth are:
-
Sturges: the bandwidth is defined as (max(x) - min(x))/nbins (Sturges, 1926), where
x is the graph spectrum (for the tests based on the spectral density)
or the node degrees (for the degree distribution test), and
nbins=⌈log2(nV)
+ 1⌉,
with nV denoting the number of genes.
-
Silverman: the bandwidth is defined as 0.9min{sd(x), IQR(x)/1.34}
nV-0.2
(Silverman, 1986), unless the quartiles coincide,
where nV is the number of genes, sd(x) is the standard deviation of x, and IQR is the interquantile
range of x, with x denoting the graph spectrum (for the tests based on the spectral density)
or the node degrees (for the degree distribution test). If the
graph is empty, it is defined as 0.9nV-0.2.
BioNetStat uses the R 'density' function from the base package for estimating the
probability density function.
Permutation test settings
To compute a p-value for the differential network analysis, BioNetStat performs
a permutation based test, which generates N random permutations of
the sample labels.
The minimum possible p-value is 1⁄N + 1.
Therefore, the choice of N depends on the required significance level
of the test. You can set the N parameter on the
"Enter the number of label permutations" option.
To perform the same label permutations for all variable sets, you can set a seed to generate the random permutations on the "Enter a seed to generate random permutations" option.
Running the analysis
After loading the dataset and the execution parameters, click on the "Start
analysis" button. The warning "The analysis is running..." will be shown on the "Analysis Results" section:
The results and other execution messages are shown on the
"Analysis results" section.
Note: If an error occurs during the analysis, the page gets grey, as in the example below. In this case, restart the analysis, reloading the application.
