1 Overview

The RTN package is designed for the reconstruction of TRNs and analysis of regulons using mutual information (MI) (Fletcher et al. 2013). It is implemented by S4 classes in R (R Core Team 2012) and extends several methods previously validated for assessing regulons, e.g. MRA (Carro et al. 2010), GSEA (Subramanian et al. 2005), and EVSE (Castro et al. 2016). The package tests the association between a given TF and all potential targets using either RNA-Seq or microarray transcriptome data. It is tuned to deal with large gene expression datasets in order to build transcriptional regulatory units that are centered on TFs. RTN allows a user to set the stringency of the analysis in a stepwise process, including a bootstrap routine designed to remove unstable associations. Parallel data processing is available for steps that benefit from high-performance computing.

2 Quick Start

2.1 Transcriptional Network Inference (TNI)

The TNI pipeline starts with the generic function tni.constructor and creates a TNI-class object, which provides methods for TRN inference from high-throughput gene expression data. The tni.constructor takes in a matrix of normalized gene expression and the corresponding gene and sample annotations, as well as a list of regulators to be evaluated. Here, the gene expression matrix and annotations are available in the tniData dataset, which was extracted, pre-processed and size-reduced from Fletcher et al. (2013) and Curtis et al. (2012). This dataset consists of a list with 3 objects: a named gene expression matrix (tniData$expData), a data frame with gene annotations (tniData$rowAnnotation), and a data frame with sample annotations (tniData$colAnnotation). We will use this dataset to demonstrate the construction of a small TRN that has only 5 regulons. In general, though, we recommend building regulons for all TFs annotated for a given species; please see the case study sections for additional recommendations.

The tni.constructor method will check the consistency of all the given arguments. The TNI pipeline is then executed in three steps: (i) compute MI between a regulator and all potential targets, removing non-significant associations by permutation analysis; (ii) remove unstable interactions by bootstrapping; and (iii) apply the ARACNe algorithm. Additional comments are provided throughout the examples.

# Input 1: 'expData', a named gene expression matrix (genes on rows, samples on cols); 
# Input 2: 'regulatoryElements', a vector listing genes regarded as TFs
# Input 3: 'rowAnnotation', an optional data frame with gene annotation
# Input 4: 'colAnnotation', an optional data frame with sample annotation
tfs <- c("FOXM1","E2F2","E2F3","RUNX2","PTTG1")
rtni <- tni.constructor(expData = tniData$expData, 
                        regulatoryElements = tfs, 
                        rowAnnotation = tniData$rowAnnotation, 
                        colAnnotation = tniData$colAnnotation)
# p.s. alternatively, 'expData' can be a 'SummarizedExperiment' object

Then the tni.permutation function takes the pre-processed TNI-class object and returns a TRN inferred by permutation analysis (with corrections for multiple hypothesis testing).

# Please set nPermutations >= 1000
rtni <- tni.permutation(rtni, nPermutations = 100)

Unstable interactions are subsequently removed by bootstrap analysis using the tni.bootstrap function, which creates a consensus bootstrap network, referred here as refnet (reference network).

rtni <- tni.bootstrap(rtni)

At this stage each target in the TRN may be linked to multiple TFs. Because regulation can occur by both direct (TF-target) and indirect interactions (TF-TF-target), the pipeline next applies the ARACNe algorithm (Margolin, Nemenman, et al. 2006) to remove the weakest interaction in any triplet formed by two TFs and a common target gene, preserving the dominant TF-target pair (Lafitte, Bontempi, and Meyer 2008). The ARACNe algorithm uses the data processing inequality (DPI) theorem to enrich the regulons with direct TF-target interactions, creating a DPI-filtered TRN, referred here as tnet (transcriptional network). For additional details, please refer to Margolin, Wang, et al. (2006) and Fletcher et al. (2013). Briefly, consider three random variables, X, Y and Z that form a network triplet, with X interacting with Z only through Y (i.e., the interaction network is X->Y->Z), and no alternative path exists between X and Z). The DPI theorem states that the information transferred between Y and Z is always larger than the information transferred between X and Z. Based on this assumption, the ARACNe algorithm scans all triplets formed by two regulators and one target and removes the edge with the smallest MI value of each triplet, which is regarded as a redundant association.

rtni <- tni.dpi.filter(rtni)

For a summary of the resulting regulatory network we recommend using the tni.regulon.summary function. From the summary below, we can see the number of regulons, the number of inferred targets and the regulon size distribution.

## This regulatory network comprised of 5 regulons.
## -- DPI-filtered network:
## regulatoryElements            Targets              Edges 
##                  5               1185               1263 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##      87     221     297     253     310     348
## -- Reference network:
## regulatoryElements            Targets              Edges 
##                  5               1185               2489 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##      87     491     609     498     638     664 
## ---

The tni.regulon.summary function also lets us get detailed information about a specific regulon, including the number of positive and negative targets. Please use this information as an initial guide in assessing the set of regulons. Usually small regulons (<15 targets) are not useful for most downstream methods, and highly unbalanced regulons (e.g. those with only positive targets) provide unstable activity readouts.

tni.regulon.summary(rtni, regulatoryElements = "FOXM1")
## The FOXM1 regulon has 221 targets, it's a large and balanced regulon.
## -- DPI filtered network targets:
##    Total Positive Negative 
##      221      140       81
## -- Reference network targets:
##    Total Positive Negative 
##      609      393      216
## -- Regulators with mutual information:
## E2F2, E2F3, PTTG1
## ---

All results available in the TNI-class object can be retrieved using the tni.get accessory function. For example, setting what = 'regulons.and.mode' will return a list with regulons, including the weight assigned for each interaction.

regulons <- tni.get(rtni, what = "regulons.and.mode", idkey = "SYMBOL")
##    C1orf106         HBB        MAPT        HBA2     ZNF385B        FOSB 
##  0.17508612 -0.10897826 -0.15494978 -0.08383799 -0.08849254 -0.10138658

The absolute value of a weight represents the MI value, while the sign (+/-) indicates the predicted mode of action based on the Pearson’s correlation between the regulator and its targets. We can also retrieve the inferred regulons into an igraph-class object (Csardi 2019) using the tni.graph function, which sets some basic graph attributes for visualization in the RedeR R package (Castro et al. 2012).

g <- tni.graph(rtni, tfs = c("FOXM1","E2F2","PTTG1"))

The next chunk shows how to plot the igraph-class object using RedeR (Figure 1).

rdp <- RedPort()
addGraph(rdp, g, layout=NULL)
addLegend.color(rdp, g, type="edge")
addLegend.shape(rdp, g)
relax(rdp, ps = TRUE)

title Figure 1. Graph representation of three regulons inferred by the RTN package.

2.2 Transcriptional Network Analysis (TNA)

In the previous section we outlined the TNI pipeline, which calculates a TRN and makes information available on its regulons. Here, we describe the TNA pipeline, which provides methods for doing enrichment analysis over a list of regulons. The TNA pipeline starts with the function tni2tna.preprocess, converting the TNI-class into a TNA-class object. Then regulons will be tested for association with a given gene expression signature, provided here in the tnaData dataset. Note that the dataset in this vignette should be used for demonstration purposes only. It consists of a list with 3 objects: a named numeric vector with log2 fold changes from a differential gene expression analysis, called here a ‘phenotype’ (tnaData$phenotype), a character vector listing the differentially expressed genes (tnaData$hits), and a data frame with gene annotations mapped to the phenotype (tnaData$phenoIDs).

# Input 1: 'object', a TNI object with regulons
# Input 2: 'phenotype', a named numeric vector, usually log2 differential expression levels
# Input 3: 'hits', a character vector, usually a set of differentially expressed genes
# Input 4: 'phenoIDs', an optional data frame with gene anottation mapped to the phenotype
rtna <- tni2tna.preprocess(object = rtni, 
                           phenotype = tnaData$phenotype, 
                           hits = tnaData$hits, 
                           phenoIDs = tnaData$phenoIDs)

The tna.mra function takes the TNA-class object and runs the Master Regulator Analysis (MRA) (Carro et al. 2010) over a list of regulons (with corrections for multiple hypothesis testing). The MRA assesses the overlap between each regulon and the genes listed in the hits argument.

# Run the MRA method
rtna <- tna.mra(rtna)

All results available in the TNA-class object can be retrieved using the tna.get accessory function; setting what = 'mra' will return a data frame listing the total number of genes in the TRN (Universe.Size), the number of targets in each regulon (Regulon.Size), the number of genes listed in the hits argument (Total.Hits), the expected overlap between a given regulon and the ‘hits’ (Expected.Hits), the observed overlap between a given regulon and the ‘hits’ (Observed.Hits), the statistical significance of the observed overlap assessed by the phyper function (Pvalue), and the adjusted P-value (Adjusted.Pvalue).

# Get MRA results;
#..setting 'ntop = -1' will return all results, regardless of a threshold
mra <- tna.get(rtna, what="mra", ntop = -1)
##      Regulon Universe.Size Regulon.Size Total.Hits Expected.Hits
## 1870    E2F2          5304          297        660         36.96
## 2305   FOXM1          5304          221        660         27.50
## 9232   PTTG1          5304          348        660         43.30
## 1871    E2F3          5304          310        660         38.57
## 860    RUNX2          5304           87        660         10.83
##      Observed.Hits  Pvalue Adjusted.Pvalue
## 1870            78 2.0e-11         9.8e-11
## 2305            55 1.7e-07         4.2e-07
## 9232            72 4.7e-06         7.8e-06
## 1871            54 5.4e-03         6.8e-03
## 860              9 7.7e-01         7.7e-01

As a complementary approach, the tna.gsea1 function runs the one-tailed gene set enrichment analysis (GSEA-1T) to find regulons associated with a particular ‘response’, which is represented by a ranked list of genes generated from a differential gene expression signature (i.e. the ‘phenotype’ included in the TNI-to-TNA preprocessing step). Here the regulon’s target genes are considered a gene set, which is evaluated against the phenotype. The GSEA-1T uses a rank-based scoring metric in order to test the association between the gene set and the phenotype (Subramanian et al. 2005).

# Run the GSEA method
# Please set nPermutations >= 1000
rtna <- tna.gsea1(rtna, stepFilter=FALSE, nPermutations=100)

Setting what = 'gsea1' in the tna.get accessory function will retrive a data frame listing the GSEA statistics, and the corresponding GSEA plots can be generated using the tna.plot.gsea1 function (Figure 2).

# Get GSEA results
gsea1 <- tna.get(rtna, what="gsea1", ntop = -1)
##      Regulon Regulon.Size Observed.Score   Pvalue Adjusted.Pvalue
## 1870    E2F2          297           0.68 0.009901        0.012376
## 1871    E2F3          310           0.59 0.009901        0.012376
## 2305   FOXM1          221           0.68 0.009901        0.012376
## 9232   PTTG1          347           0.65 0.009901        0.012376
## 860    RUNX2           87           0.48 0.049505        0.049505
# Plot GSEA results
tna.plot.gsea1(rtna, labPheno="abs(log2 fold changes)", ntop = -1)