1 Overview

The RTN package is designed for the reconstruction of TRNs and analysis of regulons using mutual information (MI) (Fletcher et al. 2013). It is implemented by S4 classes in R (R Core Team 2012) and extends several methods previously validated for assessing regulons, e.g. MRA (Carro et al. 2010), GSEA (Subramanian et al. 2005), and EVSE (Castro et al. 2016). The package tests the association between a given TF and all potential targets using either RNA-Seq or microarray transcriptome data. It is tuned to deal with large gene expression datasets in order to build transcriptional regulatory units that are centered on TFs. RTN allows a user to set the stringency of the analysis in a stepwise process, including a bootstrap routine designed to remove unstable associations. Parallel data processing is available for steps that benefit from high-performance computing.

2 Quick Start

2.1 Transcriptional Network Inference (TNI)

The TNI pipeline starts with the generic function tni.constructor() and creates a TNI-class object, which provides methods for TRN inference from high-throughput gene expression data. The tni.constructor takes in a matrix of normalized gene expression and the corresponding gene and sample annotations, as well as a list of regulators to be evaluated. Here, the gene expression matrix and annotations are available in the tniData dataset, which was extracted, pre-processed and size-reduced from Fletcher et al. (2013) and Curtis et al. (2012). This dataset consists of a list with 3 objects: a named gene expression matrix (tniData$expData), a data frame with gene annotations (tniData$rowAnnotation), and a data frame with sample annotations (tniData$colAnnotation). We will use this dataset to demonstrate the construction of a small TRN that has only 5 regulons. In general, though, we recommend building regulons for all TFs annotated for a given species; please see the case study sections for additional recommendations.

The tni.constructor method will check the consistency of all the given arguments. The TNI pipeline is then executed in three steps: (i) compute MI between a regulator and all potential targets, removing non-significant associations by permutation analysis; (ii) remove unstable interactions by bootstrapping; and (iii) apply the ARACNe algorithm. Additional comments are provided throughout the examples.

# Input 1: 'expData', a named gene expression matrix (genes on rows, samples on cols); 
# Input 2: 'regulatoryElements', a vector listing genes regarded as TFs
# Input 3: 'rowAnnotation', an optional data frame with gene annotation
# Input 4: 'colAnnotation', an optional data frame with sample annotation
tfs <- c("FOXM1","E2F2","E2F3","RUNX2","PTTG1")
rtni <- tni.constructor(expData = tniData$expData, 
                        regulatoryElements = tfs, 
                        rowAnnotation = tniData$rowAnnotation, 
                        colAnnotation = tniData$colAnnotation)
# p.s. alternatively, 'expData' can be a 'SummarizedExperiment' object

Then the tni.permutation() function takes the pre-processed TNI-class object and returns a TRN inferred by permutation analysis (with corrections for multiple hypothesis testing).

# Please set nPermutations >= 1000
rtni <- tni.permutation(rtni, nPermutations = 100)

Unstable interactions are subsequently removed by bootstrap analysis using the tni.bootstrap() function, which creates a consensus bootstrap network, referred here as refnet (reference network).

rtni <- tni.bootstrap(rtni)

At this stage each target in the TRN may be linked to multiple TFs. Because regulation can occur by both direct (TF-target) and indirect interactions (TF-TF-target), the pipeline next applies the ARACNe algorithm (Margolin, Nemenman, et al. 2006) to remove the weakest interaction in any triplet formed by two TFs and a common target gene, preserving the dominant TF-target pair (Meyer, Lafitte, and Bontempi 2008). The ARACNe algorithm uses the data processing inequality (DPI) theorem to enrich the regulons with direct TF-target interactions, creating a DPI-filtered TRN, referred here as tnet (transcriptional network). For additional details, please refer to Margolin, Wang, et al. (2006) and Fletcher et al. (2013). Briefly, consider three random variables, X, Y and Z that form a network triplet, with X interacting with Z only through Y (i.e., the interaction network is X->Y->Z), and no alternative path exists between X and Z). The DPI theorem states that the information transferred between Y and Z is always larger than the information transferred between X and Z. Based on this assumption, the ARACNe algorithm scans all triplets formed by two regulators and one target and removes the edge with the smallest MI value of each triplet, which is regarded as a redundant association.

rtni <- tni.dpi.filter(rtni)

For a summary of the resulting regulatory network we recommend using the tni.regulon.summary() function. From the summary below, we can see the number of regulons, the number of inferred targets and the regulon size distribution.

## Regulatory network comprised of 5 regulons.
## -- DPI-filtered network:
## regulatoryElements            Targets              Edges 
##                  5               1218               1385 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##      78     238     349     277     357     363
## -- Reference network:
## regulatoryElements            Targets              Edges 
##                  5               1218               2629 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##      78     504     621     526     677     749 
## ---

The tni.regulon.summary() function also lets us get detailed information about a specific regulon, including the number of positive and negative targets. Please use this information as an initial guide in assessing the set of regulons. Usually small regulons (<15 targets) are not useful for most downstream methods, and highly unbalanced regulons (e.g. those with only positive targets) provide unstable activity readouts.

tni.regulon.summary(rtni, regulatoryElements = "FOXM1")
## The FOXM1 regulon has 238 targets, it's a large and balanced regulon.
## -- DPI filtered network targets:
##    Total Positive Negative 
##      238      134      104
## -- Reference network targets:
##    Total Positive Negative 
##      621      370      251
## -- Regulators with mutual information:
## E2F2, E2F3, PTTG1
## ---

All results available in the TNI-class object can be retrieved using the tni.get() accessory function. For example, setting what = 'regulons.and.mode' will return a list with regulons, including the weight assigned for each interaction.

regulons <- tni.get(rtni, what = "regulons.and.mode", idkey = "SYMBOL")
##   C1orf106        HBB       MAPT       HBA2    ZNF385B       FOSB 
##  0.1321503 -0.1378999 -0.2063462 -0.1029147 -0.1092636 -0.1271561

The absolute value of a weight represents the MI value, while the sign (+/-) indicates the predicted mode of action based on the Pearson’s correlation between the regulator and its targets. We can also retrieve the inferred regulons into an igraph-class object (Csardi 2019) using the tni.graph() function, which sets some basic graph attributes for visualization in the RedeR R package (Castro et al. 2012).

g <- tni.graph(rtni, regulatoryElements = c("FOXM1","E2F2","PTTG1"))

The next chunk shows how to plot the igraph-class object using RedeR (Figure 1).

rdp <- RedPort()
addGraph(rdp, g, layout=NULL)
addLegend.color(rdp, g, type="edge")
addLegend.shape(rdp, g)
relax(rdp, ps = TRUE)

title Figure 1. Graph representation of three regulons inferred by the RTN package.

2.2 Transcriptional Network Analysis (TNA)

In the previous section we outlined the TNI pipeline, which calculates a TRN and makes information available on its regulons. Here, we describe the TNA pipeline, which provides methods for doing enrichment analysis over a list of regulons. The TNA pipeline starts with the function tni2tna.preprocess(), converting the TNI-class into a TNA-class object. Then regulons will be tested for association with a given gene expression signature, provided here in the tnaData dataset. Note that the dataset in this vignette should be used for demonstration purposes only. It consists of a list with 3 objects: a named numeric vector with log2 fold changes from a differential gene expression analysis, called here a ‘phenotype’ (tnaData$phenotype), a character vector listing the differentially expressed genes (tnaData$hits), and a data frame with gene annotations mapped to the phenotype (tnaData$phenoIDs).

# Input 1: 'object', a TNI object with regulons
# Input 2: 'phenotype', a named numeric vector, usually log2 differential expression levels
# Input 3: 'hits', a character vector, usually a set of differentially expressed genes
# Input 4: 'phenoIDs', an optional data frame with gene anottation mapped to the phenotype
rtna <- tni2tna.preprocess(object = rtni, 
                           phenotype = tnaData$phenotype, 
                           hits = tnaData$hits, 
                           phenoIDs = tnaData$phenoIDs)

The tna.mra() function takes the TNA-class object and runs the Master Regulator Analysis (MRA) (Carro et al. 2010) over a list of regulons (with corrections for multiple hypothesis testing). The MRA assesses the overlap between each regulon and the genes listed in the hits argument.

# Run the MRA method
rtna <- tna.mra(rtna)

All results available in the TNA-class object can be retrieved using the tna.get() accessory function; setting what = 'mra' will return a data frame listing the total number of genes in the TRN (Universe.Size), the number of targets in each regulon (Regulon.Size), the number of genes listed in the hits argument (Total.Hits), the expected overlap between a given regulon and the ‘hits’ (Expected.Hits), the observed overlap between a given regulon and the ‘hits’ (Observed.Hits), the statistical significance of the observed overlap assessed by the phyper() function (Pvalue), and the adjusted P-value (Adjusted.Pvalue).

# Get MRA results;
#..setting 'ntop = -1' will return all results, regardless of a threshold
mra <- tna.get(rtna, what="mra", ntop = -1)
##      Regulon Universe.Size Regulon.Size Total.Hits Expected.Hits Observed.Hits
## 1870    E2F2          5304          349        660         43.43            87
## 2305   FOXM1          5304          238        660         29.62            61
## 9232   PTTG1          5304          363        660         45.17            74
## 1871    E2F3          5304          357        660         44.42            61
## 860    RUNX2          5304           78        660          9.71             6
##       Pvalue Adjusted.Pvalue
## 1870 2.5e-11         1.3e-10
## 2305 1.0e-08         2.6e-08
## 9232 6.0e-06         1.0e-05
## 1871 5.0e-03         6.3e-03
## 860  9.4e-01         9.4e-01

As a complementary approach, the tna.gsea1() function runs the one-tailed gene set enrichment analysis (GSEA-1T) to find regulons associated with a particular ‘response’, which is represented by a ranked list of genes generated from a differential gene expression signature (i.e. the ‘phenotype’ included in the TNI-to-TNA preprocessing step). Here the regulon’s target genes are considered a gene set, which is evaluated against the phenotype. The GSEA-1T uses a rank-based scoring metric in order to test the association between the gene set and the phenotype (Subramanian et al. 2005).

# Run the GSEA method
# Please set nPermutations >= 1000
rtna <- tna.gsea1(rtna, nPermutations=100)

Setting what = 'gsea1' in the tna.get() accessory function will retrive a data frame listing the GSEA statistics, and the corresponding GSEA plots can be generated using the tna.plot.gsea1() function (Figure 2).

# Get GSEA results
gsea1 <- tna.get(rtna, what="gsea1", ntop = -1)
##      Regulon Regulon.Size Observed.Score     Pvalue Adjusted.Pvalue
## 1870    E2F2          349           0.68 4.2409e-26      1.5769e-25
## 9232   PTTG1          362           0.65 6.3075e-26      1.5769e-25
## 2305   FOXM1          238           0.68 1.9083e-22      3.1806e-22
## 1871    E2F3          357           0.59 2.6079e-18      3.2598e-18
## 860    RUNX2           78           0.45 1.6832e-01      1.6832e-01
# Plot GSEA results
tna.plot.gsea1(rtna, labPheno="abs(log2 fold changes)", ntop = -1)