An Introduction to the pandaR Package

Daniel Schlauch, Albert Young, Joseph N. Paulson

2018-04-30

Introduction

The fundamental concepts behind the PANDA approach is to model the regulatory network as a bipartite network and estimate edge weights based on the evidence that information from a particular transcription factor i is successfully being passed to a particular gene j. This evidence comes from the agreement between two measured quantities. First, the correlation in expression between gene j and other genes. And second, the strength of evidence of the existence of an edge between TF i and those same genes. This concordance is measured using Tanimoto similarity. A gene is said to be available if there is strong evidence of this type of agreement. Analogous to this is the concept of responsibility which similarly focuses on a TF-gene network edge but instead measures the concordance between suspected protein-complex partners of TF i and the respective strength of evidence of a regulatory pathway between those TFs and gene j.

PANDA utilizes an iterative approach to updating the bipartite edge weights incrementally as evidence for new edges emerges and evidence for existing edges diminishes. This process continues until the algorithm reaches a point of convergence settling on a final score for the strength of information supporting a regulatory mechanism for every pairwise combination of TFs and genes. This package provides a straightforward tool for applying this established method. Beginning with data.frames or matrices representing a set of gene expression samples, motif priors and optional protein-protein interaction users can generate an m by n matrix representing the bipartite network from m TFs regulating n genes. Additionally, pandaR reports the co-regulation and cooperative networks at convergence. These are reported as complete graphs representing the evidence for gene co-regulation and transcription factor cooperation.

Example

An example dataset derived from a subset of stress-induced Yeast is available by running

library(pandaR)
## Loading required package: Biobase
## Loading required package: BiocGenerics
## Loading required package: parallel
## 
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:parallel':
## 
##     clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
##     clusterExport, clusterMap, parApply, parCapply, parLapply,
##     parLapplyLB, parRapply, parSapply, parSapplyLB
## The following objects are masked from 'package:stats':
## 
##     IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
## 
##     Filter, Find, Map, Position, Reduce, anyDuplicated, append,
##     as.data.frame, basename, cbind, colMeans, colSums, colnames,
##     dirname, do.call, duplicated, eval, evalq, get, grep, grepl,
##     intersect, is.unsorted, lapply, lengths, mapply, match, mget,
##     order, paste, pmax, pmax.int, pmin, pmin.int, rank, rbind,
##     rowMeans, rowSums, rownames, sapply, setdiff, sort, table,
##     tapply, union, unique, unsplit, which, which.max, which.min
## Welcome to Bioconductor
## 
##     Vignettes contain introductory material; view with
##     'browseVignettes()'. To cite Bioconductor, see
##     'citation("Biobase")', and for packages 'citation("pkgname")'.
data(pandaToyData)

pandaToyData is a list containing a regulatory structure derived from sequence motif analysis, protein-protein interaction data and a gene expression. The primary function in pandaR is called with

pandaResult <- panda(pandaToyData$motif, pandaToyData$expression, pandaToyData$ppi)
pandaResult
## PANDA network for1000genes and87transcription factors.
## 
## Slots:
## regNet   : Regulatory network of 87 transcription factors to 1000 genes.
## coregNet: Co-regulation network of 1000 genes.
## coopNet  : Cooperative network of 87 transcription factors.
## Regulatory graph contains 87000 edges.
## Regulatory graph is complete.

Where pandaResult is a ‘panda’ object which contains data.frames describing the complete bipartite gene regulatory network as well as complete networks for gene coregulation and transcription factor cooperation. Due to completeness, edgeweights for the regulatory network are reported for all mxn possible TF-gene edges. The distribution of these edge weights for these networks has approximate mean 0 and standard deviation 1. The edges are therefore best interpreted in a relative sense. Strongly positive values indicative of relatively larger amounts of evidence in favor a regulatory mechanism and conversely, smaller or negative values can be interpreted as lacking evidence of a shared biological role. It is naturally of interest to specify a high edge weight subset of the complete network to investigate as a set of present/absent edges. This is easily performed by using the topedges function. A network containing the top 1000 edge scores as binary edges can be obtained by:

topNet <- topedges(pandaResult, 1000)

Users may then examine the genes targeted by a transcription factor of interest.

targetedGenes(topNet, c("AR"))
##  [1] "AKAP10"       "CNDP2"        "CRHR1"        "HNRNPD"      
##  [5] "KIAA0652"     "LOC100093631" "LOC100128811" "PRR15"       
##  [9] "TCF4"         "TCP11L2"      "TMPRSS11B"    "VCX3B"       
## [13] "WDR4"

The network can be further simplified by focusing only on transcription factors on interest and the genes that they are found to regulate. The subnetwork method serves this function

topSubnet <- subnetwork(topNet, c("AR","ARID3A","ELK1"))

Existing R packages, such as igraph, can be used to visualize the results

plotGraph(topSubnet)

Comparing state-specific PANDA networks

We provide a number of useful plotting functions for the analysis of the networks. The main functions used to plot and analyze the PANDA networks are:

We can compare how parameter choices effect the Z-score estimation between two PANDA runs. Additionally, we can compare two phenotypes:

panda.res1 <- with(pandaToyData, panda(motif, expression, ppi, hamming=1))
panda.res2 <- with(pandaToyData, panda(motif, expression + 
                   rnorm(prod(dim(expression)),sd=5), ppi, hamming=1))
plotZ(panda.res1, panda.res2,addLine=FALSE)

Other comparisons are underway…

Other helpful functions

There are a host of other helpful functions, including testMotif, plotCommunityDetection, and multiplot. See the full help pages.

Session information

sessionInfo()
## R version 3.5.0 (2018-04-23)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.4 LTS
## 
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.7-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.7-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] parallel  stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
## [1] hexbin_1.27.2       pandaR_1.12.0       Biobase_2.40.0     
## [4] BiocGenerics_0.26.0
## 
## loaded via a namespace (and not attached):
##  [1] igraph_1.2.1       Rcpp_0.12.16       knitr_1.20        
##  [4] magrittr_1.5       munsell_0.4.3      colorspace_1.3-2  
##  [7] lattice_0.20-35    rlang_0.2.0        stringr_1.3.0     
## [10] plyr_1.8.4         tools_3.5.0        grid_3.5.0        
## [13] gtable_0.2.0       matrixStats_0.53.1 htmltools_0.3.6   
## [16] lazyeval_0.2.1     yaml_2.1.18        rprojroot_1.3-2   
## [19] digest_0.6.15      tibble_1.4.2       ggplot2_2.2.1     
## [22] RUnit_0.4.31       evaluate_0.10.1    rmarkdown_1.9     
## [25] labeling_0.3       stringi_1.1.7      pillar_1.2.2      
## [28] compiler_3.5.0     scales_0.5.0       backports_1.1.2   
## [31] reshape_0.8.7      pkgconfig_2.0.1

##References

Glass K, Huttenhower C, Quackenbush J, Yuan GC. Passing Messages Between Biological Networks to Refine Predicted Interactions, PLoS One, 2013 May 31;8(5):e64832

Glass K, Quackenbush J, Silverman EK, Celli B, Rennard S, Yuan GC and DeMeo DL. Sexually-dimorphic targeting of functionally-related genes in COPD, BMC Systems Biology, 2014 Nov 28; 8:118