RTN: reconstruction of transcriptional networks and analysis of master regulators.

Mauro AA Castro, Xin Wang, Michael NC Fletcher, Florian Markowetz and Kerstin B Meyer.

30 October 2017

Abstract

This package provides classes and methods for transcriptional network inference and analysis. Modulators of transcription factor activity are assessed by conditional mutual information, and master regulators are mapped to phenotypes using different strategies, e.g., gene set enrichment, shadow and synergy analyses. Additionally, new frameworks are provided at the derivative packages RTNduals and RTNsurvival.

Package

RTN 2.2.0

1 Overview
2 Quick Start
- 2.1 Transcriptional network inference
- 2.2 Transcriptional network analysis
3 Session information
References

1 Overview

The package RTN is designed for reconstruction and analysis of transcriptional networks (TN) using mutual information (Margolin et al. 2006). It is implemented by S4 classes in R (R Core Team 2012) and extends several methods previously validated for assessing transcriptional regulatory units, or regulons, e.g., MRA (Carro et al. 2010), GSEA (Subramanian et al. 2005), synergy and shadow (Lefebvre et al. 2010). The package computes mutual information (MI) between annotated transcription factors (TFs) and all potential targets using gene expression data. It is tuned to deal with large gene expression datasets in order to build genome-wide transcriptional networks centered on TFs and regulons. Using a robust statistical pipeline, RTN allows user to set the stringency of the analysis in a stepwise process, including a boostrep routine designed to remove unstable associations. Parallel computing is available for critical steps demanding high-performance.

2 Quick Start

2.1 Transcriptional network inference

The dt4rtn dataset consists of a list with 6 objects used for demonstration purposes only. It was extracted, pre-processed and size-reduced from Fletcher et al. (2013) and Curtis et al. (2012) and contains a named gene expression matrix (gexp), a data frame with gexp annotation (gexpIDs), a named numeric vector with differential gene expression data (pheno), a data frame with pheno annotation (phenoIDs), a character vector with genes differentially expressed (hits), and a named vector with transcriptions factors (tfs).

library(RTN)
data(dt4rtn)

Objects of class TNI provide a series of methods to do transcriptional network inference from high-throughput gene expression data. In this 1st step, the generic function tni.preprocess is used to run several checks on the input data.

#Input 1: 'expData', a named gene expression matrix (samples on cols)
#Input 2: 'regulatoryElements', a named vector with TF ids
#Input 3: 'rowAnnotation', an optional data frame with gene annotation
tfs <- dt4rtn$tfs[c("PTTG1","E2F2","FOXM1","E2F3","RUNX2")]
rtni <- tni.constructor(expData=dt4rtn$gexp, regulatoryElements=tfs, rowAnnotation=dt4rtn$gexpIDs)

The tni.permutation function takes the pre-processed TNI object and returns a transcriptional network inferred by mutual information (with multiple hypothesis testing corrections).

rtni <- tni.permutation(rtni, verbose = FALSE)

In an additional step, unstable interactions can be removed by bootstrap analysis using the tni.bootstrap function, which creates a consensus bootstrap network (referred here as refnet).

rtni <- tni.bootstrap(rtni)

In the TN each target can be linked to multiple TFs and regulation can occur as a result of both direct (TF-target) and indirect interactions (TF-TF-target). The Data Processing Inequality (DPI) algorithm (P. Meyer, Lafitte, and Bontempi 2008) is used to remove the weakest interaction in any triangle of two TFs and a common target gene, thus preserving the dominant TF-target pairs, resulting in the filtered transcriptional network (referred here as tnet). The filtered TN has less complexity and highlights the most significant interactions.

rtni <- tni.dpi.filter(rtni)

All results available in the TNI object can be retrieved using the tni.get function:

tni.get(rtni, what="summary")
refnet <- tni.get(rtni, what="refnet")
tnet <- tni.get(rtni, what="tnet")

The inferred transcriptional network can also be retrieved as an igraph object (Csardi and Nepusz (2006)) using the tni.graph function. The graph object includes some basic network attributes pre-formatted for visualization in the R package RedeR (Castro et al. 2012).

g <- tni.graph(rtni)

2.2 Transcriptional network analysis

Objects of class TNA provide a series of methods to do enrichment analysis on transcriptional networks. In this 1st step, the generic function tni2tna.preprocess is used to convert the pre-processed TNI object to TNA, also running several checks on the input data.

#Input 1: 'object', a TNI object with a pre-processed transcripional network
#Input 2: 'phenotype', a named numeric vector, usually with log2 differential expression values
#Input 3: 'hits', a character vector of gene ids considered as hits
#Input 4: 'phenoIDs', an optional data frame with anottation used to aggregate genes in the phenotype
rtna <- tni2tna.preprocess(object=rtni, 
                         phenotype=dt4rtn$pheno, 
                         hits=dt4rtn$hits, 
                         phenoIDs=dt4rtn$phenoIDs
                         )

The tna.mra function takes the TNA object and returns the results of the Master Regulator Analysis (RMA) (Carro et al. 2010) over a list of regulons from a transcriptional network (with multiple hypothesis testing corrections). The MRA computes the overlap between the transcriptional regulatory unities (regulons) and the input signature genes using the hypergeometric distribution (with multiple hypothesis testing corrections).

rtna <- tna.mra(rtna)

A simple overlap among all regulons can also be tested using the tna.overlap function:

rtna <- tna.overlap(rtna)

Alternatively, the gene set enrichment analysis (GSEA) can be used to assess if a given transcriptional regulatory unit is enriched for genes that are differentially expressed among 2 classes of microarrays (i.e., a differentially expressed phenotype). The GSEA uses a rank-based scoring metric in order to test the association between gene sets and the ranked phenotypic difference. Here regulons are treated as gene sets, an extension of the GSEA statistics as previously described (Subramanian et al. 2005).

rtna <- tna.gsea1(rtna, stepFilter=FALSE, nPermutations=100)
# ps. default 'nPermutations' is 1000.

The two-tailed GSEA tests whether positive or negative targets for a TF are enriched at each extreme of a particular response (e.g., differentially expressed genes). The pipeline splits the regulon into a group of activated and a group of repressed genes, based on the Pearson’s correlation, and then asks how the two sets are distributed in the ranked list of genes (please refer to Campbell et al. (2016) and Castro et al. (2016) for more details).

rtna <- tna.gsea2(rtna, tfs="PTTG1", nPermutations=100)
# ps. default 'nPermutations' is 1000.

All results available in the TNA object can be retrieved using the tna.get function:

tna.get(rtna, what="summary")
tna.get(rtna, what="mra")
tna.get(rtna, what="overlap")
tna.get(rtna, what="gsea1")
tna.get(rtna, what="gsea2")

To visualize the GSEA distributions, the user can apply the tna.plot.gsea1 and tna.plot.gsea2 functions that plot the one-tailed and two-tailed GSEA results, respectively:

tna.plot.gsea1(rtna, file="tna_gsea1", labPheno="abs(log2) diff. expression") 
tna.plot.gsea2(rtna, file="tna_gsea2", labPheno="log2 diff. expression")

title Figure 1. GSEA analysis showing genes in each regulon (as hits) ranked by their differential expression (as phenotype). This toy example illustrates the output from the TNA pipeline evaluated by the tna.gsea1 method.

title Figure 2. Two-tailed GSEA analysis showing positive or negative targets for a TF (as hits) ranked by their differential expression (as phenotype). This toy example illustrates the output from the TNA pipeline evaluated by the tna.gsea2 method (for detailed interpretation of results from this method, please refer to Campbell et al. (2016) and Castro et al. (2016)).

3 Session information

## R version 3.4.2 (2017-09-28)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.3 LTS
## 
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.6-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.6-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] RTN_2.2.0       BiocStyle_2.6.0
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.13        compiler_3.4.2      nloptr_1.0.4       
##  [4] tools_3.4.2         minet_3.36.0        digest_0.6.12      
##  [7] lme4_1.1-14         evaluate_0.10.1     nlme_3.1-131       
## [10] lattice_0.20-35     mgcv_1.8-22         pkgconfig_2.0.1    
## [13] Matrix_1.2-11       igraph_1.1.2        yaml_2.1.14        
## [16] parallel_3.4.2      SparseM_1.77        stringr_1.2.0      
## [19] knitr_1.17          MatrixModels_0.4-1  S4Vectors_0.16.0   
## [22] IRanges_2.12.0      stats4_3.4.2        rprojroot_1.2      
## [25] nnet_7.3-12         grid_3.4.2          data.table_1.10.4-3
## [28] snow_0.4-2          rmarkdown_1.6       bookdown_0.5       
## [31] limma_3.34.0        minqa_1.2.4         RedeR_1.26.0       
## [34] car_2.1-5           magrittr_1.5        backports_1.1.1    
## [37] htmltools_0.3.6     BiocGenerics_0.24.0 MASS_7.3-47        
## [40] splines_3.4.2       pbkrtest_0.4-7      quantreg_5.34      
## [43] stringi_1.1.5

References

Campbell, Thomas, Mauro Castro, Ines de Santiago, Michael Fletcher, Silvia Halim, Radhika Prathalingam, Bruce Ponder, and Kerstin Meyer. 2016. “FGFR2 Risk Snps Confer Breast Cancer Risk by Augmenting Oestrogen Responsiveness.” Carcinogenesis 37 (8): 741. doi:10.1093/carcin/bgw065.

Carro, Maria, Wei Lim, Mariano Alvarez, Robert Bollo, Xudong Zhao, Evan Snyder, Erik Sulman, et al. 2010. “The Transcriptional Network for Mesenchymal Transformation of Brain Tumours.” Nature 463 (7279): 318–25. doi:10.1038/nature08712.

Castro, Mauro, Ines de Santiago, Thomas Campbell, Courtney Vaughn, Theresa Hickey, Edith Ross, Wayne Tilley, Florian Markowetz, Bruce Ponder, and Kerstin Meyer. 2016. “Regulators of Genetic Risk of Breast Cancer Identified by Integrative Network Analysis.” Nature Genetics 48: 12–21. doi:10.1038/ng.3458.

Castro, Mauro, Xin Wang, Michael Fletcher, Kerstin Meyer, and Florian Markowetz. 2012. “RedeR: R/Bioconductor Package for Representing Modular Structures, Nested Networks and Multiple Levels of Hierarchical Associations.” Genome Biology 13 (4): R29. doi:10.1186/gb-2012-13-4-r29.

Csardi, Gabor, and Tamas Nepusz. 2006. “The Igraph Software Package for Complex Network Research.” InterJournal Complex Systems: 1695. http://igraph.sf.net.

Curtis, Christina, Sohrab Shah, Suet-Feung Chin, Gulisa Turashvili, Mark Rueda Oscar amd Dunning, Doug Speed, and et al. 2012. “The Genomic and Transcriptomic Architecture of 2,000 Breast Tumours Reveals Novel Subgroups.” Nature 486: 346–52. doi:10.1038/nature10983.

Fletcher, Michael, Mauro Castro, Suet-Feung Chin, Oscar Rueda, Xin Wang, Carlos Caldas, Bruce Ponder, Florian Markowetz, and Kerstin Meyer. 2013. “Master Regulators of FGFR2 Signalling and Breast Cancer Risk.” Nature Communications 4: 2464. doi:10.1038/ncomms3464.

Lefebvre, Celine, Presha Rajbhandari, Mariano J Alvarez, Pradeep Bandaru, Wei Keat Lim, Mai Sato, Kai Wang, et al. 2010. “A Human B-Cell Interactome Identifies Myb and Foxm1 as Master Regulators of Proliferation in Germinal Centers.” Mol Syst Biol 6 (377): 1–10. doi:10.1038/msb.2010.31.

Margolin, Adam, Ilya Nemenman, Katia Basso, Chris Wiggins, Gustavo Stolovitzky, Riccardo Favera, and Andrea Califano. 2006. “ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context.” BMC Bioinformatics 7 (Suppl 1): S7. doi:10.1186/1471-2105-7-S1-S7.

Meyer, Patrick, Frederic Lafitte, and Gianluca Bontempi. 2008. “Minet: A R/Bioconductor Package for Inferring Large Transcriptional Networks Using Mutual Information.” BMC Bioinformatics 9 (1): 461. doi:10.1186/1471-2105-9-461.

R Core Team. 2012. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. http://www.R-project.org/.

Subramanian, Aravind, Pablo Tamayo, Vamsi Mootha, Sayan Mukherjee, Benjamin Ebert, Michael Gillette, Amanda Paulovich, et al. 2005. “Gene Set Enrichment Analysis: A Knowledge-Based Approach for Interpreting Genome-Wide Expression Profiles.” Proceedings of the National Academy of Sciences of the United States of America 102 (43): 15545–50. doi:10.1073/pnas.0506580102.