simona provides functions for generating random DAGs. A random tree is first generated, later more links can be randomly added to form a more general DAG.
dag_random_tree()
generates a random tree. By default it
generates a binary tree where all leaf terms have depth = 9.
## An ontology_DAG object:
## Source: dag_random_tree
## 1023 terms / 1022 relations / a tree
## Root: 1
## Terms: 1, 10, 100, 1000, ...
## Max depth: 9
## Aspect ratio: 56.89:1
Strictly speaking, tree1
is not random. The tree is
growing from the root. In dag_random_tree()
, there are
several arguments that can be used for generating random trees.
n_children
: Number of child terms. It can be a single
value where each term will the same number of child terms. The value can
also be a range, then the number of child terms will be randomly picked
in that range.p_stop
: A branch can stop growing based on this
probability. On a certain step of the tree growing, let’s denote the set
of leaf terms as L
, then, in the next round,
floor(length(L)*p_stop)
leaf terms will stop growing, while
the remaining leaf terms will continue to grow. If a leaf term continues
to grow, it will be linked to n_children
child terms if
n_children
is a single value, or pick a number from the
range of [n_children[1], n_children[2]]
.The tree growing stops when the number of total terms exceeds
max
.
So the default call of dag_random_tree()
is identical
to:
We can change these arguments to some other values, such as:
## An ontology_DAG object:
## Source: dag_random_tree
## 1999 terms / 1998 relations / a tree
## Root: 1
## Terms: 1, 10, 100, 1000, ...
## Max depth: 7
## Aspect ratio: 105.71:1
A more general random DAG is generated based on the random tree.
Taking tree1
which is already generated, the function
dag_add_random_children()
adds more random children to
terms in tree1
.
## An ontology_DAG object:
## Source: dag_add_random_children
## 1023 terms / 1115 relations
## Root: 1
## Terms: 1, 10, 100, 1000, ...
## Max depth: 9
## Avg number of parents: 1.09
## Avg number of children: 1.03
## Aspect ratio: 56.89:1 (based on the longest distance from root)
## 52.78:1 (based on the shortest distance from root)
There are three arguments that controls new child terms. We first introduce two of them.
p_add
: For each term, the probability that it is
selected to add new child terms.new_children
: Once a term is selected, the number of
new children it is linked to.Let’s try to generate a more dense DAG:
## An ontology_DAG object:
## Source: dag_add_random_children
## 1023 terms / 2550 relations
## Root: 1
## Terms: 1, 10, 100, 1000, ...
## Max depth: 9
## Avg number of parents: 2.50
## Avg number of children: 1.59
## Aspect ratio: 56.89:1 (based on the longest distance from root)
## 32.22:1 (based on the shortest distance from root)
By default, once a term t
is going to add more child
terms, it only selects new child terms from the terms that are:
t
, i.e. with depths less than t’s depth in
the DAG.t
already has.Then in this subset of candidate child terms, new child terms is
randomly picked according to the numbers set in
new_children
.
The way to randomly pick new child terms can be implemented as a
self-defined function. This function accepts two arguments, the
dag
object and an integer index of “current term”. In the
following example, we implemented a function which only pick new child
terms from term t
’s offspring terms.
add_new_children_from_offspring = function(dag, i, new_children = c(1, 8)) {
l = rep(FALSE, dag_n_terms(dag))
offspring = dag_offspring(dag, i, in_labels = FALSE)
if(length(offspring)) {
l[offspring] = TRUE
l[dag_children(dag, i, in_labels = FALSE)] = FALSE
}
candidates = which(l)
n_candidates = length(candidates)
if(n_candidates) {
if(n_candidates < new_children[1]) {
integer(0)
} else {
sample(candidates, min(n_candidates, sample(seq(new_children[1], new_children[2]), 1)))
}
} else {
integer(0)
}
}
dag3 = dag_add_random_children(tree1, p_add = 0.6,
add_random_children_fun = add_new_children_from_offspring)
dag3
## An ontology_DAG object:
## Source: dag_add_random_children
## 1023 terms / 1583 relations
## Root: 1
## Terms: 1, 10, 100, 1000, ...
## Max depth: 9
## Avg number of parents: 1.55
## Avg number of children: 1.25
## Aspect ratio: 56.89:1 (based on the longest distance from root)
## 32.22:1 (based on the shortest distance from root)
## R version 4.3.2 (2023-10-31)
## Platform: aarch64-apple-darwin20 (64-bit)
## Running under: macOS Ventura 13.6.1
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
##
## locale:
## [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: America/New_York
## tzcode source: internal
##
## attached base packages:
## [1] grid stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] ComplexHeatmap_2.18.0 org.Hs.eg.db_3.18.0 AnnotationDbi_1.64.1
## [4] IRanges_2.36.0 S4Vectors_0.40.2 Biobase_2.62.0
## [7] BiocGenerics_0.48.1 igraph_1.5.1 simona_1.0.10
## [10] knitr_1.45
##
## loaded via a namespace (and not attached):
## [1] blob_1.2.4 Biostrings_2.70.2 bitops_1.0-7
## [4] fastmap_1.1.1 RCurl_1.98-1.13 promises_1.2.1
## [7] digest_0.6.33 mime_0.12 lifecycle_1.0.3
## [10] cluster_2.1.4 ellipsis_0.3.2 KEGGREST_1.42.0
## [13] RSQLite_2.3.2 magrittr_2.0.3 compiler_4.3.2
## [16] rlang_1.1.1 sass_0.4.7 tools_4.3.2
## [19] yaml_2.3.7 htmlwidgets_1.6.2 bit_4.0.5
## [22] scatterplot3d_0.3-44 curl_5.1.0 xml2_1.3.5
## [25] RColorBrewer_1.1-3 xtable_1.8-4 colorspace_2.1-0
## [28] GO.db_3.18.0 iterators_1.0.14 cli_3.6.1
## [31] rmarkdown_2.25 DiagrammeR_1.0.10 crayon_1.5.2
## [34] ragg_1.2.6 rstudioapi_0.15.0 httr_1.4.7
## [37] rjson_0.2.21 visNetwork_2.1.2 DBI_1.1.3
## [40] cachem_1.0.8 zlibbioc_1.48.0 parallel_4.3.2
## [43] XVector_0.42.0 matrixStats_1.0.0 vctrs_0.6.4
## [46] jsonlite_1.8.7 GetoptLong_1.0.5 bit64_4.0.5
## [49] clue_0.3-65 systemfonts_1.0.5 foreach_1.5.2
## [52] jquerylib_0.1.4 glue_1.6.2 codetools_0.2-19
## [55] Polychrome_1.5.1 shape_1.4.6 later_1.3.1
## [58] GenomeInfoDb_1.38.5 htmltools_0.5.7 GenomeInfoDbData_1.2.11
## [61] circlize_0.4.15 R6_2.5.1 textshaping_0.3.7
## [64] doParallel_1.0.17 evaluate_0.23 shiny_1.7.5.1
## [67] highr_0.10 png_0.1-8 memoise_2.0.1
## [70] httpuv_1.6.12 bslib_0.5.1 Rcpp_1.0.11
## [73] xfun_0.41 pkgconfig_2.0.3 GlobalOptions_0.1.2