Contents

1 Overview

Transcriptional networks are important tools to visualize complex biological systems that involve large groups of genes and multiple regulators. In a previous study, we have implemented the RTN R/Bioconductor package to reconstruct transcriptional regulatory networks (Fletcher et al. 2013). This package reconstructs regulons, consisting of a regulator and its target genes. A regulon can be further used to investigate, for example, the association of its expression on survival probabilities.

RTNsurvival is a tool for integrating regulons generated by the RTN package with survival information for the same set of samples used in the reconstruction of the transcriptional regulatory network. There are two main survival analysis pipelines: a Cox Proportional Hazards approach used to model regulons as predictors of survival time (Figure 1), and a Kaplan-Meier analysis showing the stratification of a cohort based on the regulon activity (Figure 2). For a given regulon, the 2-tailed GSEA approach is used to estimate regulon activity (differential Enrichment Score - dES) for each individual sample (Castro et al. 2016), and the dES distribution of all samples is then used to assess the survival statistics for the cohort. The plots can be fine-tuned to the user’s specifications.

2 Quick Start

2.1 Load regulons and survival data

This Vignette uses precomputed regulons available in the RTN package:

library(RTN)
data(stni, package="RTN")

The stni is a TNI-class object created from a subset of the expression data available in the Fletcher2013b package. The stni object contains a minimal toy reconstruction of 5 regulons (PGR, MYB, E2F2, FOXM1 and PTTG1).

More information about the parameters used to build the toy regulons can be viewed by calling the summary of the stni object.

summary <- tni.get(stni, what = "summary")

The RTNsurvival package provides a survival table for the samples in the stni object, including clinical data from the METABRIC study (Curtis et al. 2012) where the expression data was originally obtained.

library(RTNsurvival)
data(survival.data)

2.2 Preprocess input data

In order to run the analysis pipelines, the input data must be evaluated by the tnsPreprocess method in order to build a TNS-class object. Note that the survival table must be provided with time and event columns, and key covariates can also be specified for subsequent use in the Cox analysis.

rtns <- tnsPreprocess(stni, survival.data, keycovar = c("Grade","Age"), time = 1, event = 2)

2.3 Compute regulon activity

The survival analysis pipeline depends on the 2-tailed GSEA approach, which estimates regulon activity (dES) for all samples in the cohort. The tnsGSEA2 function calls the tni.gsea2 method available in the RTN package.

rtns <- tnsGSEA2(rtns, verbose = FALSE)

2.4 Run the Cox analysis pipeline

Once the dES metric has been computed by tnsGSEA2 function, then it is possible to run the Cox analysis.

The tnsCox method runs a Cox multivariate regression analysis and shows the proportional hazards of each of the specified regulons and the provided key covariates, indicating the contribution of each variable to survival (Figure 1). The method uses the Bioconductor survival package to fit the Cox model.

tnsCox(rtns, sortregs = TRUE)
## NOTE: a 'PDF' file should be available at the working directory!

title Figure 1 - The plot shows the Hazard Ratio for all key covariates and regulons. Lines that are completely to the right of the grey line, shown in red, are positively associated with hazard. This means that samples with high expression of this regulon have poor prognosis. The further to the right or left of the grey line, the more significant is the association.

2.5 Run the Kaplan-Meier analysis pipeline

The tnsKM method generates a Kaplan-Meier plot, which consists of three panels put together: a ranked dES plot for the cohort, a status of key attributes plot (optional) and a Kaplan-Meier plot, showing survival curves for lower and higher dES samples (Figure 2).

tnsKM(rtns, regs="FOXM1", attribs = list(c("ER+","ER-"),c("PR+","PR-"),c("G1","G2","G3"),"HT"), 
endpoint=180)
## NOTE: a 'PDF' file should be available at the working directory!

title Figure 2 - The Kaplan-Meier plot for FOXM1 shows that samples with high regulon activity (red and pink) have poorer prognosis, as their survival probability is lower than the samples that have low regulon activity (light and dark blue).

3 Plot 2-tailed GSEA for individual samples

Individual sample differential enrichment analysis can be investigated using the tnsPlotGSEA2 function. This will generate a 2-tailed GSEA plot for the differential expression of both positive and negative targets of a regulon (Figure 3). This step takes a little longer because the GSEA is recomputed for a selected regulon, and because tnsPlotGSEA2 is a wrapper for the RTN function tna.plot.gsea2, which generates the GSEA plot.

tnsPlotGSEA2(rtns, "MB-5115", regs = "FOXM1", verbose = FALSE)

title Figure 3 - The 2-tailed GSEA plot for the MB-5116 sample. It shows that the positive targets of the FOXM1 regulator are positively enriched, while the negative targets are negatively enriched.

4 Plot regulon activity for all samples

An overview of regulon activity can be obtained by plotting a heatmap with all evaluated samples and regulons. First, we need to obtain the matrix of dES values from the TNS object. Then, we can plot the heatmap using the pheatmap function from the pheatmap package. In this example, we also illustrate how to incorporate sample features from the survival data.

library(pheatmap)
enrichmentScores <- tnsGet(rtns, "EScores")
survival.data <- tnsGet(rtns, "survivalData")
annotationBars <- survival.data[,c("ER+", "ER-")]
pheatmap(t(enrichmentScores$dif),
        annotation_col = annotationBars,
        main = "Differential Enrichment Scores (dES) for tumour samples",
        show_colnames = FALSE,
        annotation_legend = FALSE)

title Figure 4 - Regulon activity of individual tumour samples. This heatmap shows two main regulon clusters. The PGR and MYB regulons are repressed in the ER- samples and activated in ER+ samples. The PTTG1, E2F2 and FOXM1 regulons, on the other hand, are activated in ER- samples and repressed in ER+ samples.

5 Evaluating survival in dual regulons

Integrating data from different regulons may be of interest, especially if they come from different regulatory networks made for the same expression matrix. Concepts from the RTNduals package can be used to compute dual regulons, which are regulon pairs whose common targets are likely to be affected by both regulators.

The workflow of the RTNduals can be used to infer dual TF-TF regulons, using the precomputed regulons from this Vignette, for demonstration purposes only.

library(RTNduals)
smbr <- tni2mbrPreprocess(stni, stni, verbose = FALSE)
smbr <- mbrAssociation(smbr, prob = 0.75, verbose = FALSE)
## Warning: Only 10 regulon pair(s) is(are) being tested!
## Ideally, the search space should represent all possible
## combinations of a given class of regulators! For example,
## all nuclear receptors annotated for a given species.
smbr <- mbrDuals(smbr, verbose = FALSE)
mbrGet(smbr, "dualsInformation")
##           Regulon1 Size.Regulon1 Regulon2 Size.Regulon2 Jaccard.coefficient
## MYB~PGR        MYB           230      PGR            99          0.08580858
## FOXM1~MYB    FOXM1           269      MYB           230          0.06170213
##           Hypergeometric.Pvalue Hypergeometric.Adjusted.Pvalue        MI
## MYB~PGR            3.498670e-07                   6.997341e-07 0.1158941
## FOXM1~MYB          1.989813e-01                   1.989813e-01 0.1215824
##           MI.Adjusted.Pvalue          R Quantile
## MYB~PGR               <0.001  0.1966971      1.0
## FOXM1~MYB             <0.001 -0.1413777      0.8

The smbr is an object of MBR-class, containing the motifs between the regulons represented in the stni object. As prob was set to 75%, only the 25% most significant inferred duals are shown in the results.

Now, since the rtns object and smbr object were computed from the same expression matrix, we can use the survival information for that cohort in conjunction with the duals information.

duals <- mbrGet(smbr, what="dualRegulons")
dualSurvivalPanel(smbr, rtns, dual = duals[1], attribs = c("ER+", "ER-", "PR+", "PR-"))
The dualsSurvivalPanel method generated a directory containing six plots:

title Figure 5 - Dual Survival Plot for MYB~PGR dual regulon. It shows the regulon activity is positively correlated, but there is no additional hazard ratio information given by the Cox regression of the interaction. a) Sample ranking using differential Enrichment Score (dES) of regulon MYB b) dES sample ranking for regulon PGR and a sample scatter plot, showing the ranking of each sample in both regulons. In this case, MYB and PGR agree in sample stratification. c) Kaplan-Meier curves. The Interaction curve follows the individual regulons very closely. d) Cox regression plot. It doesn’t show any association between the activity of these regulons or interaction and hazard ratio.

6 Session information

## R version 3.4.2 (2017-09-28)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.3 LTS
## 
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.6-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.6-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] RTNduals_1.2.0    RTNsurvival_1.2.0 RTN_2.2.0         BiocStyle_2.6.0  
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.13        RColorBrewer_1.1-2  compiler_3.4.2     
##  [4] nloptr_1.0.4        tools_3.4.2         minet_3.36.0       
##  [7] digest_0.6.12       lme4_1.1-14         evaluate_0.10.1    
## [10] nlme_3.1-131        lattice_0.20-35     mgcv_1.8-22        
## [13] png_0.1-7           pkgconfig_2.0.1     Matrix_1.2-11      
## [16] igraph_1.1.2        yaml_2.1.14         parallel_3.4.2     
## [19] SparseM_1.77        stringr_1.2.0       knitr_1.17         
## [22] MatrixModels_0.4-1  S4Vectors_0.16.0    IRanges_2.12.0     
## [25] stats4_3.4.2        rprojroot_1.2       nnet_7.3-12        
## [28] grid_3.4.2          data.table_1.10.4-3 snow_0.4-2         
## [31] survival_2.41-3     rmarkdown_1.6       bookdown_0.5       
## [34] limma_3.34.0        minqa_1.2.4         RedeR_1.26.0       
## [37] car_2.1-5           magrittr_1.5        backports_1.1.1    
## [40] htmltools_0.3.6     BiocGenerics_0.24.0 MASS_7.3-47        
## [43] splines_3.4.2       pbkrtest_0.4-7      quantreg_5.34      
## [46] stringi_1.1.5

References

Castro, Mauro, Ines de Santiago, Thomas Campbell, Courtney Vaughn, Theresa Hickey, Edith Ross, Wayne Tilley, Florian Markowetz, Bruce Ponder, and Kerstin Meyer. 2016. “Regulators of Genetic Risk of Breast Cancer Identified by Integrative Network Analysis.” Nature Genetics 48 (1): 12–21. doi:10.1038/ng.3458.

Curtis, Christina, Sohrab P. Shah, Suet-Feung Chin, Gulisa Turashvili, Oscar M. Rueda, Mark J. Dunning, Doug Speed, et al. 2012. “The Genomic and Transcriptomic Architecture of 2,000 Breast Tumours Reveals Novel Subgroups.” Nature 486: 346–52. doi:10.1038/nature10983.

Fletcher, Michael, Mauro Castro, Suet-Feung Chin, Oscar Rueda, Xin Wang, Carlos Caldas, Bruce Ponder, Florian Markowetz, and Kerstin Meyer. 2013. “Master Regulators of FGFR2 Signalling and Breast Cancer Risk.” Nature Communications 4: 2464. doi:10.1038/ncomms3464.