Epistatic Nested Effects Models:

Inferring mixed epistasis from indirect measurements of knock-out screens

Inferring mixed epistasis from indirect measurements of knock-out screens

The method in this package is an extension of the classic Nested Effects Models provided in the package . Nested Effects Models is a pathway reconstruction method, which takes into account effects of downstream genes. Those effects are observed for every knock-out of an upstream pathway gene, and the nested structure of observed effects can then be used to reconstruct the pathway structure. However, classic Nested Effects Models do not account for double knock-outs. In this package , one additional layer of complexity is added. For every two genes, acting on one gene together, the relationship is evaluated and added to the model as a logical gate. Genetic relationships are represented by the logics OR (no relationship), AND (functional overlap), NOT (masking or inhibiting) and XOR (mutual prevention from acting on gene C). Please see the references for a more detailed description of NEMs and epiNEMs.

Simulation and application results are imported from pre-calculated data sets to shorten the runtime of this vignette.

`library(epiNEM)`

The data should be in the form of binary effects stemming from
knock-out data including single and double perturbations with
effect reporters, e.g. genes, as rows and perturbations as columns.
A one in row i and column j denotes an effect of reporter i for
perturbation j. These effects are usually derived from comparing
control/wild-type experiments with respective perturbation experiments
(e.g. differential expression with edgeR, Robinson *et al.*, 2010).
The binary data is the main input for the epiNEM function, which
contains the inference algorithm to estimate the underlying network
structure of the data. For up to four signalling genes (=different
perturbation targets) exhaustive search is available. For five or
more a greedy search is implemented as an alternative.

```
data <- matrix(sample(c(0,1), 100*4, replace = TRUE), 100, 4)
colnames(data) <- c("A", "A.B", "B", "C")
rownames(data) <- paste("E", 1:100, sep = "_")
print(head(data))
```

```
## A A.B B C
## E_1 0 0 0 1
## E_2 0 0 0 0
## E_3 1 1 1 0
## E_4 1 0 1 1
## E_5 1 1 0 0
## E_6 0 0 1 1
```

`res <- epiNEM(data, method = "exhaustive")`

`plot(res)`

The plot shows the inferred network with the signaling genes (bright red), the inferred logic (green) and attachment of the number of effect reporters (grey).

Alternatively the input can be a larger matrix with single and double knock-outs. EpiNEM can perform a systematic analysis to identify most likely modulators for a signaling gene pair of a double knock-out.

```
data <- matrix(sample(c(0,1), 100*9, replace = TRUE), 100, 9)
colnames(data) <- c("A.B", "A.C", "B.C", "A", "B", "C", "D", "E", "G")
rownames(data) <- paste("E", 1:100, sep = "_")
res <- epiScreen(data)
```

```
## [1] "A.B"
## [1] "A"
## [1] "B"
## [1] "C"
## [1] "D"
## [1] "E"
## [1] "G"
## [1] "A.C"
## [1] "A"
## [1] "B"
## [1] "C"
## [1] "D"
## [1] "E"
## [1] "G"
## [1] "B.C"
## [1] "A"
## [1] "B"
## [1] "C"
## [1] "D"
## [1] "E"
## [1] "G"
```

`plot(res)`

```
## Warning in names(legend)[1] <- space: number of items to replace is not a
## multiple of replacement length
```

```
## Warning in if (key$space == "right") {: the condition has length > 1 and only
## the first element will be used
```

`plot(res, global = FALSE, ind = 1)`

If the “global” parameter is set to FALSE, detailed results are plotted for each or a specific (“ind” parameter) pair of a double knock-out.

The results (logics) of the knock-out screens have been annotated according to the following legend of effects (black), where the x-axis denotes the gene and the y-axis denotes the type of knock-out. E.g. if C is regulated via the OR gate, it is effected by by both single and the double knock-out.

`epiAnno()`

We compare epiNEM to original NEMs (Markowetz *et al.*, 2005), Boolean Nested
Effects Models (B-NEM, link,
Pirkl *et al.*, 2016), Aracne (Margolin *et al.*, 2006) and the PC algorithm
(Kalisch *et al.*, 2007) for 100 four node networks and respective data
experiencing different levels of noise. The evaluation is done by accounting
for the accuracy of discovered edges and in the case of B-NEM and epiNEM also
the truth tables and logical gates.

We simulate code based on a ground truth and infer an optimal network with all aforementioned methods solely based on the data. The B-NEM package is necessary to run the full simulation: link. If the comparison to B-NEM is omitted, the B-NEM package is not necessary for epiNEM to function.

B-NEM is a method for general Boolean network inference based on multivariate read-out from combinatorial perturbation experiments. EpiNEM is specifically designed for knock-out screens including double and single knock-outs. Thus epiNEM is faster especially for large scale screens and more accurate, because it is designed for triples.

We boxplot running time, accuracy for the inferred edges, logical gates and expected data (truth table) for all five respectively two methods.

`data(sim)`

`plot(sim)`

The plot shows that as expected B-NEM is the slowest followed by epiNEM, while NEM, ARACNE and the PC algorithm are much faster. However, the last three can not derive any logics in the network structure. epiNEM and B-NEM show almost equally high accuracy of inferred edges and truth tables (expected data). However, epiNEM reaches a much higher accuracy for inferred logic gates, which is due to the much higher degree of equivalent network structures in the case of B-NEM.

In this section we analyse previously published yeast knock-out screens. The screens consist of gene expression data derived from double and single knock-out mutants. We use epiNEM on each double mutant combined with each single mutant.

The first knock-out screen is from van Wageningen et al., 2010 and was created to investigate relationships between phosphorylation based pathways.

We applied epiNEM to all triples and infer logics and calculated the log-likelihood.

The following plot shows the global distribution of the inferred logical gates for all pairs. The single modulators are on the y-axis and the double knock-out pairs are on the x-axis.

```
data(wagscreen)
doubles <- wagscreen$doubles
dataWag <- wagscreen$dataWag
## clean up the results:
if (length(grep("fus3|ptp2.ptc2", wagscreen$doubles)) > 0) {
wagscreen$doubles <- wagscreen$doubles[-grep("fus3|ptp2.ptc2",
wagscreen$doubles)]
wagscreen$dataWag <- wagscreen$dataWag[, -grep("fus3|ptp2.ptc2",
colnames(wagscreen$dataWag))]
wagscreen$ll <- wagscreen$ll[, -grep("fus3|ptp2.ptc2",
colnames(wagscreen$ll))]
wagscreen$logic <- wagscreen$logic[, -grep("fus3|ptp2.ptc2",
colnames(wagscreen$logic))]
}
plot(wagscreen, xrot = 45, borderwidth = 0)
```

```
## Warning in names(legend)[1] <- space: number of items to replace is not a
## multiple of replacement length
```

```
## Warning in if (key$space == "right") {: the condition has length > 1 and only
## the first element will be used
```