Basic Use of ceRNAnetsim

Selcen Ari

2020-04-28

1. Introduction

This vignette demonstrates how to analyse miRNA:Competing interactions via ceRNAnetsim package. The perturbations in the miRNA:target interactions are handled step by step in ceRNAnetsim. The package calculates and simulates regulation of miRNA:competing RNA interactions based on amounts of miRNA and the targets and interaction factors.

The ceRNAnetsim works by executing following steps:

The workflow of ceRNAnetsim are shown as following:

library(ceRNAnetsim)

2. Installation

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("ceRNAnetsim")

3. About the data

Below is the minimal data that can be used with ceRNAnetsim.

data("minsamp")
minsamp %>% 
  select(1:4)
#>   competing miRNA Competing_expression miRNA_expression
#> 1     Gene1  Mir1                10000             1000
#> 2     Gene2  Mir1                10000             1000
#> 3     Gene3  Mir1                 5000             1000
#> 4     Gene4  Mir1                10000             1000
#> 5     Gene4  Mir2                10000             2000
#> 6     Gene5  Mir2                 5000             2000
#> 7     Gene6  Mir2                10000             2000

The table is actually constructed by merging three different tables:

So, the basic_data table is constructed by merging following tables:

data("minsamp")
minsamp %>% 
  select(competing, Competing_expression) %>% 
  distinct() -> gene_expression
gene_expression
#>   competing Competing_expression
#> 1     Gene1                10000
#> 2     Gene2                10000
#> 3     Gene3                 5000
#> 4     Gene4                10000
#> 5     Gene5                 5000
#> 6     Gene6                10000
minsamp %>% 
  select(miRNA, miRNA_expression) %>% 
  distinct() -> mirna_expression
mirna_expression
#>   miRNA miRNA_expression
#> 1  Mir1             1000
#> 2  Mir2             2000

Third table should contain miRNA:gene interactions per row. The ceRNAnetsim will assume first column contains competing RNA names and second column to be miRNA names. If the order is different the user should indicate column names accordingly.

minsamp %>% 
  select(competing, miRNA) -> interaction_simple
interaction_simple
#>   competing miRNA
#> 1     Gene1  Mir1
#> 2     Gene2  Mir1
#> 3     Gene3  Mir1
#> 4     Gene4  Mir1
#> 5     Gene4  Mir2
#> 6     Gene5  Mir2
#> 7     Gene6  Mir2

The three tables can be joined in R (as shown below) or elsewhere to have interaction and expression data altogether in expected format.

interaction_simple %>%
  inner_join(gene_expression, by = "competing") %>%
  inner_join(mirna_expression, "miRNA") -> basic_data

basic_data
#>   competing miRNA Competing_expression miRNA_expression
#> 1     Gene1  Mir1                10000             1000
#> 2     Gene2  Mir1                10000             1000
#> 3     Gene3  Mir1                 5000             1000
#> 4     Gene4  Mir1                10000             1000
#> 5     Gene4  Mir2                10000             2000
#> 6     Gene5  Mir2                 5000             2000
#> 7     Gene6  Mir2                10000             2000

4. Simulation via expression values of miRNAs and genes in minsamp dataset

ceRNAnetsim processes your dataset as graph object and simulates competing behaviours of targets when steady-state is perturbed via expression level changes in one or more genes. Let’s go over three steps:

4.1. Handle basic dataset

In first step, the expression and interaction table is converted into graph/network. tidygraph is used importing the data thus both node and edge data are accessible as tables if needed. priming_graph generates many columns in edge/node table which are mostly for internal use.

4.2. Trigger a change

update_how function can be used to simulate a change in the network. (If multiple chnages are aimed to be used as trigger, update_variables() function should be used).

In the example below, expression level of “Gene2” is increased to two-fold.

You can see the current count of Gene2 node is 20000 and its change is denoted as “Up” in changes_variable column in node table data.

4.3. Simulate the changes in graph

Finally, with the help of simulate function, the effect of expression change (i.e. the trigger) on overall network. The example code advances only for 5 cycles.

count_current column indicate results after 5 cycle of calculations for each node. You can see the gene expression changes after this perturbation and simulation. This table can be obtained easily at following:

4.4. A special case: knockdown

ceRNAnetsim also provides the simulation of gene knockdown in the network. In normal conditions, when a gene is up or down regulated, it is considered that amounts of gene transcripts change depended on interactions. But, the transcripts of the gene are not observed in the system when it is knocked down. To achieve this case, you just need to define how argument to 0 (zero) in update_how function.

So, if Gene2 is knocked down, there will be more miRNA (Mir1 to be exact) available for Gene1, Gene3 and Gene4, thus lowering their transcript levels. Since Gene4 is has lower expression level, we can observe minute changes in Gene5 and Gene6 levels due to more miRNA (Mir2) being available for them. These changes can be observed in current_count column.

Briefly, ceRNAnetsim utilizes the change(s) as trigger and calculates regulation of targets according to miRNA:target and target:total target ratios.

5. Simulation via interaction factors in addition to expression values of miRNAs and genes in minsamp dataset

Minimal sample minsamp is processed with priming_graph() function in first step. This provides conversion of dataset from data frame to graph object. This step comprises of:

minsamp
#>   competing miRNA Competing_expression miRNA_expression seed_type region energy
#> 1     Gene1  Mir1                10000             1000      0.43   0.30    -20
#> 2     Gene2  Mir1                10000             1000      0.43   0.01    -15
#> 3     Gene3  Mir1                 5000             1000      0.32   0.40    -14
#> 4     Gene4  Mir1                10000             1000      0.23   0.50    -10
#> 5     Gene4  Mir2                10000             2000      0.35   0.90    -12
#> 6     Gene5  Mir2                 5000             2000      0.05   0.40    -11
#> 7     Gene6  Mir2                10000             2000      0.01   0.80    -25
priming_graph(minsamp, 
              competing_count = Competing_expression, 
              miRNA_count = miRNA_expression,
              aff_factor = c(energy, seed_type), 
              deg_factor = region)
#> Warning in priming_graph(minsamp, competing_count = Competing_expression, : First column is processed as competing and the second as miRNA.
#> # A tbl_graph: 8 nodes and 7 edges
#> #
#> # A rooted tree
#> #
#> # Node Data: 8 x 7 (active)
#>   name  type      node_id initial_count count_pre count_current changes_variable
#>   <chr> <chr>       <int>         <dbl>     <dbl>         <dbl> <chr>           
#> 1 Gene1 Competing       1         10000     10000         10000 Competing       
#> 2 Gene2 Competing       2         10000     10000         10000 Competing       
#> 3 Gene3 Competing       3          5000      5000          5000 Competing       
#> 4 Gene4 Competing       4         10000     10000         10000 Competing       
#> 5 Gene5 Competing       5          5000      5000          5000 Competing       
#> 6 Gene6 Competing       6         10000     10000         10000 Competing       
#> # … with 2 more rows
#> #
#> # Edge Data: 7 x 22
#>    from    to Competing_name miRNA_name Competing_expre… miRNA_expression energy
#>   <int> <int> <chr>          <chr>                 <dbl>            <dbl>  <dbl>
#> 1     1     7 Gene1          Mir1                  10000             1000    -20
#> 2     2     7 Gene2          Mir1                  10000             1000    -15
#> 3     3     7 Gene3          Mir1                   5000             1000    -14
#> # … with 4 more rows, and 15 more variables: seed_type <dbl>, region <dbl>,
#> #   dummy <dbl>, afff_factor <dbl>, degg_factor <dbl>, comp_count_list <list>,
#> #   comp_count_pre <dbl>, comp_count_current <dbl>, mirna_count_list <list>,
#> #   mirna_count_pre <dbl>, mirna_count_current <dbl>,
#> #   mirna_count_per_dep <dbl>, effect_current <dbl>, effect_pre <dbl>,
#> #   effect_list <list>

In the processed data, the values are carried as node variables and many more columns are initialized which are to be used in subsequent steps.

5.1. Change expression level of one or more nodes in the graph

In the steady-state, the miRNA degradation effect on gene expression is assumed to be stable (i.e. in equilibrium). But, if one or more nodes have altered expression level, the system tends to reach steady-state again.

The ceRNAnetsim package utilizes two methods to simulate change in expression level, update_how() and update_variables() functions provide unstable state from which calculations are triggered to reach steady-state.

If updating expression level of single node is desired then update_how() function should be used. In the example below, expression level of Gene4 is increased 2-fold.

The update_variables() function uses an external dataset which has number of rows equal to number of nodes in graph. The external dataset might include changed and unchanged expression values for each node.

Load the new_count dataset (provided with package sample data) in which expression level of Gene2 is increased 2 fold (from 10,000 to 20,000). Note that variables of the dataset included updated variables must be named as “Competing”, “miRNA”, “miRNA_count” and “Competing_count”.

update_variables() function replaces the existing expression values with new values. The function checks for updates in all rows after importing expression values, thus it’s possible to introduce multiple changes at once.

minsamp %>%
   priming_graph(competing_count = Competing_expression, 
                 miRNA_count = miRNA_expression,
                 aff_factor = c(energy, seed_type), 
                 deg_factor = region) %>%
   update_variables(current_counts = new_counts)
#> Warning in priming_graph(., competing_count = Competing_expression, miRNA_count = miRNA_expression, : First column is processed as competing and the second as miRNA.
#> # A tbl_graph: 8 nodes and 7 edges
#> #
#> # A rooted tree
#> #
#> # Node Data: 8 x 7 (active)
#>   name  type      node_id initial_count count_pre count_current changes_variable
#>   <chr> <chr>       <int>         <dbl>     <dbl>         <dbl> <chr>           
#> 1 Gene1 Competing       1         10000     10000         10000 Competing       
#> 2 Gene2 Competing       2         10000     10000         20000 Up              
#> 3 Gene3 Competing       3          5000      5000          5000 Competing       
#> 4 Gene4 Competing       4         10000     10000         10000 Competing       
#> 5 Gene5 Competing       5          5000      5000          5000 Competing       
#> 6 Gene6 Competing       6         10000     10000         10000 Competing       
#> # … with 2 more rows
#> #
#> # Edge Data: 7 x 22
#>    from    to Competing_name miRNA_name Competing_expre… miRNA_expression energy
#>   <int> <int> <chr>          <chr>                 <dbl>            <dbl>  <dbl>
#> 1     1     7 Gene1          Mir1                  10000             1000    -20
#> 2     2     7 Gene2          Mir1                  10000             1000    -15
#> 3     3     7 Gene3          Mir1                   5000             1000    -14
#> # … with 4 more rows, and 15 more variables: seed_type <dbl>, region <dbl>,
#> #   dummy <dbl>, afff_factor <dbl>, degg_factor <dbl>, comp_count_list <list>,
#> #   comp_count_pre <dbl>, comp_count_current <dbl>, mirna_count_list <list>,
#> #   mirna_count_pre <dbl>, mirna_count_current <dbl>,
#> #   mirna_count_per_dep <dbl>, effect_current <dbl>, effect_pre <dbl>,
#> #   effect_list <list>

5.2. Update the node variables with edge variables.

The functions update_variables() and update_how() updates edge data. In these functions, update_nodes() function is applied in order to reflect changes in edge data over to node data. In other words, if there’s a change in edge data, nodes can be updated accordingly with update_nodes() function.

minsamp %>%
  priming_graph(competing_count = Competing_expression, 
                miRNA_count = miRNA_expression,
                aff_factor = c(energy, seed_type), 
                deg_factor = region) %>%
  update_how("Gene4", how = 2) 
#> Warning in priming_graph(., competing_count = Competing_expression, miRNA_count = miRNA_expression, : First column is processed as competing and the second as miRNA.
#> # A tbl_graph: 8 nodes and 7 edges
#> #
#> # A rooted tree
#> #
#> # Node Data: 8 x 7 (active)
#>   name  type      node_id initial_count count_pre count_current changes_variable
#>   <chr> <chr>       <int>         <dbl>     <dbl>         <dbl> <chr>           
#> 1 Gene1 Competing       1         10000     10000         10000 Competing       
#> 2 Gene2 Competing       2         10000     10000         10000 Competing       
#> 3 Gene3 Competing       3          5000      5000          5000 Competing       
#> 4 Gene4 Competing       4         10000     10000         20000 Up              
#> 5 Gene5 Competing       5          5000      5000          5000 Competing       
#> 6 Gene6 Competing       6         10000     10000         10000 Competing       
#> # … with 2 more rows
#> #
#> # Edge Data: 7 x 22
#>    from    to Competing_name miRNA_name Competing_expre… miRNA_expression energy
#>   <int> <int> <chr>          <chr>                 <dbl>            <dbl>  <dbl>
#> 1     1     7 Gene1          Mir1                  10000             1000    -20
#> 2     2     7 Gene2          Mir1                  10000             1000    -15
#> 3     3     7 Gene3          Mir1                   5000             1000    -14
#> # … with 4 more rows, and 15 more variables: seed_type <dbl>, region <dbl>,
#> #   dummy <dbl>, afff_factor <dbl>, degg_factor <dbl>, comp_count_list <list>,
#> #   comp_count_pre <dbl>, comp_count_current <dbl>, mirna_count_list <list>,
#> #   mirna_count_pre <dbl>, mirna_count_current <dbl>,
#> #   mirna_count_per_dep <dbl>, effect_current <dbl>, effect_pre <dbl>,
#> #   effect_list <list>

# OR
# minsamp %>%
#   priming_graph(competing_count = Competing_expression, 
#                 miRNA_count = miRNA_expression,
#                 aff_factor = c(energy, seed_type), 
#                 deg_factor = region) %>%
#   update_variables(current_counts = new_counts) 

5.3. Simulate the model

Change in expression level of one or more nodes will trigger a perturbation in the system which will effect neighboring miRNA:target interactions. The effect will propagate and iterate over until it reaches steady-state.

With simulate() function the changes in the system, are calculated iteratively. For example, in the example below, simulation will proceed ten cycles only. In simulation of the regulation, the important argument is threshold which provides to specify absolute minimum amount of change required to be considered changed element as up or down.

minsamp %>%
  priming_graph(competing_count = Competing_expression, 
                miRNA_count = miRNA_expression,
                aff_factor = c(energy, seed_type), 
                deg_factor = region) %>%
  update_how("Gene4", how = 2) %>%
  simulate(cycle=10) #threshold with default 0.
#> Warning in priming_graph(., competing_count = Competing_expression, miRNA_count = miRNA_expression, : First column is processed as competing and the second as miRNA.
#> # A tbl_graph: 8 nodes and 7 edges
#> #
#> # A rooted tree
#> #
#> # Node Data: 8 x 7 (active)
#>   name  type      node_id initial_count count_pre count_current changes_variable
#>   <chr> <chr>       <int>         <dbl>     <dbl>         <dbl> <chr>           
#> 1 Gene1 Competing       1         10000    10027.        10027. Competing       
#> 2 Gene2 Competing       2         10000    10001.        10001. Competing       
#> 3 Gene3 Competing       3          5000     5009.         5009. Competing       
#> 4 Gene4 Competing       4         10000    19806.        19806. Competing       
#> 5 Gene5 Competing       5          5000     5024.         5024. Competing       
#> 6 Gene6 Competing       6         10000    10044.        10044. Competing       
#> # … with 2 more rows
#> #
#> # Edge Data: 7 x 23
#>    from    to Competing_name miRNA_name Competing_expre… miRNA_expression energy
#>   <int> <int> <chr>          <chr>                 <dbl>            <dbl>  <dbl>
#> 1     1     7 Gene1          Mir1                  10000             1000    -20
#> 2     2     7 Gene2          Mir1                  10000             1000    -15
#> 3     3     7 Gene3          Mir1                   5000             1000    -14
#> # … with 4 more rows, and 16 more variables: seed_type <dbl>, region <dbl>,
#> #   dummy <dbl>, afff_factor <dbl>, degg_factor <dbl>, comp_count_list <list>,
#> #   comp_count_pre <dbl>, comp_count_current <dbl>, mirna_count_list <list>,
#> #   mirna_count_pre <dbl>, mirna_count_current <dbl>,
#> #   mirna_count_per_dep <dbl>, effect_current <dbl>, effect_pre <dbl>,
#> #   effect_list <list>, mirna_count_per_comp <dbl>

simulate() saves the expression level of previous iterations in list columns in edge data. The changes in expression level throughout the simulate cycles are accessible with standard dplyr functions. For example:

Here, comp_count_list and mirna_count_list are list-columns which track changes in both competing RNA and miRNA levels. In the sample above, “Gene4” has initial expression level of 10000 (after trigger, it’s initial expression is 20000) and reached level of 19806 at 9th cycle (count_pre) and also stayed at 19806 in last cycle (count_current). The full history of expression level for Gene4 is as follows:

#>  [1] 10000 19803 19806 19806 19806 19806 19806 19806 19806 19806 19806

Actually, Gene4 seems like reached steady-state in iteration 4. But, this approach is sensitive to even small decimal numbers. So, threshold argument could be used to ignore very small decimal numbers. With a threshold value the system can reach steady-state early, like following.

minsamp %>%
  priming_graph(competing_count = Competing_expression, 
                miRNA_count = miRNA_expression,
                aff_factor = c(energy, seed_type), 
                deg_factor = region) %>%
  update_how("Gene4", how = 2) %>%
  simulate(cycle=3, threshold = 1)
#> Warning in priming_graph(., competing_count = Competing_expression, miRNA_count = miRNA_expression, : First column is processed as competing and the second as miRNA.
#> # A tbl_graph: 8 nodes and 7 edges
#> #
#> # A rooted tree
#> #
#> # Node Data: 8 x 7 (active)
#>   name  type      node_id initial_count count_pre count_current changes_variable
#>   <chr> <chr>       <int>         <dbl>     <dbl>         <dbl> <chr>           
#> 1 Gene1 Competing       1         10000    10027.        10027. Competing       
#> 2 Gene2 Competing       2         10000    10001.        10001. Competing       
#> 3 Gene3 Competing       3          5000     5009.         5009. Competing       
#> 4 Gene4 Competing       4         10000    19806.        19806. Competing       
#> 5 Gene5 Competing       5          5000     5024.         5024. Competing       
#> 6 Gene6 Competing       6         10000    10044.        10044. Competing       
#> # … with 2 more rows
#> #
#> # Edge Data: 7 x 23
#>    from    to Competing_name miRNA_name Competing_expre… miRNA_expression energy
#>   <int> <int> <chr>          <chr>                 <dbl>            <dbl>  <dbl>
#> 1     1     7 Gene1          Mir1                  10000             1000    -20
#> 2     2     7 Gene2          Mir1                  10000             1000    -15
#> 3     3     7 Gene3          Mir1                   5000             1000    -14
#> # … with 4 more rows, and 16 more variables: seed_type <dbl>, region <dbl>,
#> #   dummy <dbl>, afff_factor <dbl>, degg_factor <dbl>, comp_count_list <list>,
#> #   comp_count_pre <dbl>, comp_count_current <dbl>, mirna_count_list <list>,
#> #   mirna_count_pre <dbl>, mirna_count_current <dbl>,
#> #   mirna_count_per_dep <dbl>, effect_current <dbl>, effect_pre <dbl>,
#> #   effect_list <list>, mirna_count_per_comp <dbl>

5.4. Visualisation of the graph

The vis_graph() function is used for visualization of the graph object. The initial graph object (steady-state) is visualized as following:

Also, The graph can be visualized at any step of process, for example, after simulation of 3 cycles the graph will look like:

On the other hand, the network of each step can be plotted individually by using simulate_vis() function. simulate_vis() processes the given network just like simulate() function does while saving image of each step.

Minsamp with 3 iteration

Minsamp with 3 iteration

Note: Animated gif above was generated by online service. Actually, workflow gives the frames which include condition of each iteration. Note that you must use a terminal or online service, if you want to generate the animated gif.

6. Session Info

sessionInfo()
#> R version 4.0.0 (2020-04-24)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 18.04.4 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.11-bioc/R/lib/libRblas.so
#> LAPACK: /home/biocbuild/bbs-3.11-bioc/R/lib/libRlapack.so
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] ceRNAnetsim_1.0.0 tidygraph_1.1.2   dplyr_0.8.5      
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.0.4.6       pillar_1.4.3       compiler_4.0.0     viridis_0.5.1     
#>  [5] tools_4.0.0        digest_0.6.25      viridisLite_0.3.0  evaluate_0.14     
#>  [9] lifecycle_0.2.0    tibble_3.0.1       gtable_0.3.0       png_0.1-7         
#> [13] pkgconfig_2.0.3    rlang_0.4.5        igraph_1.2.5       cli_2.0.2         
#> [17] ggrepel_0.8.2      yaml_2.2.1         parallel_4.0.0     xfun_0.13         
#> [21] gridExtra_2.3      furrr_0.1.0        stringr_1.4.0      knitr_1.28        
#> [25] graphlayouts_0.7.0 vctrs_0.2.4        globals_0.12.5     grid_4.0.0        
#> [29] tidyselect_1.0.0   glue_1.4.0         listenv_0.8.0      R6_2.4.1          
#> [33] ggraph_2.0.2       fansi_0.4.1        rmarkdown_2.1      polyclip_1.10-0   
#> [37] farver_2.0.3       tweenr_1.0.1       purrr_0.3.4        tidyr_1.0.2       
#> [41] ggplot2_3.3.0      magrittr_1.5       MASS_7.3-51.6      scales_1.1.0      
#> [45] codetools_0.2-16   ellipsis_0.3.0     htmltools_0.4.0    assertthat_0.2.1  
#> [49] ggforce_0.3.1      colorspace_1.4-1   future_1.17.0      labeling_0.3      
#> [53] utf8_1.1.4         stringi_1.4.6      munsell_0.5.0      crayon_1.3.4

See the other vignettes for more information.