Build functional gene modules with GNET2

Chen Chen

2020-04-28

1. Introduction

GNET2 is a R package used for build regulator network and cluster genes to functional groups with E-M process. It iteratively perform TF assigning and Gene assigning, until the assignment of genes did not change, or max number of iterations is reached.

2. Installation

To install, open R and type:

install.packages("devtools")
library("devtools")
install_github("chrischen1/GNET2")

3. Build module networks

We first generate random expression data and a list of regulator gene names.

The input is typically a p by n matrix of expression data of p genes and n samples, for example log2 RPKM from RNA-Seq.

set.seed(2)
init_group_num = 8
init_method = 'boosting'
exp_data <- matrix(rnorm(600*12),600,12)
reg_names <- paste0('TF',1:20)
rownames(exp_data) <- c(reg_names,paste0('gene',1:(nrow(exp_data)-length(reg_names))))
colnames(exp_data) <- paste0('condition_',1:ncol(exp_data))
se <- SummarizedExperiment::SummarizedExperiment(assays=list(counts=exp_data))

The module construction process make take a few time, depending on the size of data and maximum iterations allowed.

gnet_result <- gnet(se,reg_names,init_method,init_group_num,heuristic = TRUE)
#> Determining initial group number...
#> Building module networks...
#> Iteration 1
#> Iteration 2
#> Iteration 3
#> Converged.
#> Generating final network modules...
#> Done.

4. Plot the modules and trees

Plot the regulators module and heatmap of the expression inferred downstream genes for each sample. It can be interpreted as two parts: the bars at the top shows how samples are splited by the regression tree and the heatmap at the bottom shows how downstream genes are regulated by each subgroup determined by the regulators.

plot_gene_group(gnet_result,group_idx = 1)
#> The "ward" method has been renamed to "ward.D"; note new "ward.D2"

Plot the tree of the first group

plot_tree(gnet_result,group_idx = 1)

Plot the correlation of each group and auto detected knee point.

group_above_kn <- plot_group_correlation(gnet_result)

print(group_above_kn)
#> [1] 1 2 3 4 7

The group indices in group_above_kn can been used as a list of indices of the data point with correlation higher than the knee point. You may consider use them only for further analysis.

sessionInfo()
#> R version 4.0.0 (2020-04-24)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 18.04.4 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.11-bioc/R/lib/libRblas.so
#> LAPACK: /home/biocbuild/bbs-3.11-bioc/R/lib/libRlapack.so
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] GNET2_1.4.0
#> 
#> loaded via a namespace (and not attached):
#>  [1] SummarizedExperiment_1.18.0 tidyselect_1.0.0           
#>  [3] xfun_0.13                   purrr_0.3.4                
#>  [5] reshape2_1.4.4              lattice_0.20-41            
#>  [7] colorspace_1.4-1            vctrs_0.2.4                
#>  [9] htmltools_0.4.0             stats4_4.0.0               
#> [11] yaml_2.2.1                  rlang_0.4.5                
#> [13] pillar_1.4.3                glue_1.4.0                 
#> [15] BiocGenerics_0.34.0         xgboost_1.0.0.2            
#> [17] RColorBrewer_1.1-2          matrixStats_0.56.0         
#> [19] GenomeInfoDbData_1.2.3      lifecycle_0.2.0            
#> [21] plyr_1.8.6                  stringr_1.4.0              
#> [23] zlibbioc_1.34.0             munsell_0.5.0              
#> [25] gtable_0.3.0                visNetwork_2.0.9           
#> [27] htmlwidgets_1.5.1           evaluate_0.14              
#> [29] labeling_0.3                Biobase_2.48.0             
#> [31] knitr_1.28                  IRanges_2.22.0             
#> [33] GenomeInfoDb_1.24.0         DiagrammeR_1.0.5           
#> [35] parallel_4.0.0              Rcpp_1.0.4.6               
#> [37] scales_1.1.0                DelayedArray_0.14.0        
#> [39] S4Vectors_0.26.0            jsonlite_1.6.1             
#> [41] XVector_0.28.0              farver_2.0.3               
#> [43] ggplot2_3.3.0               digest_0.6.25              
#> [45] stringi_1.4.6               dplyr_0.8.5                
#> [47] GenomicRanges_1.40.0        grid_4.0.0                 
#> [49] tools_4.0.0                 bitops_1.0-6               
#> [51] magrittr_1.5                RCurl_1.98-1.2             
#> [53] tibble_3.0.1                crayon_1.3.4               
#> [55] pkgconfig_2.0.3             ellipsis_0.3.0             
#> [57] Matrix_1.2-18               data.table_1.12.8          
#> [59] rstudioapi_0.11             assertthat_0.2.1           
#> [61] rmarkdown_2.1               R6_2.4.1                   
#> [63] igraph_1.2.5                compiler_4.0.0