A transcriptional regulatory network (TRN) consists of a collection of regulated target genes and transcription factors (TFs). The TFs recognize specific DNA sequences and influence gene expression across the genome, either activating or repressing the expression of target genes. The set of genes controlled by a given TF forms a regulon. The RTN package provides classes and methods for the reconstruction of TRNs and analysis of regulons. New computational frameworks are also available from RTNduals and RTNsurvival packages.
RTN 2.20.0
The RTN package is designed for the reconstruction of TRNs and analysis of regulons using mutual information (MI) (Fletcher et al. 2013). It is implemented by S4 classes in R (R Core Team 2012) and extends several methods previously validated for assessing regulons, e.g. MRA (Carro et al. 2010), GSEA (Subramanian et al. 2005), and EVSE (Castro et al. 2016). The package tests the association between a given TF and all potential targets using either RNA-Seq or microarray transcriptome data. It is tuned to deal with large gene expression datasets in order to build transcriptional regulatory units that are centered on TFs. RTN allows a user to set the stringency of the analysis in a stepwise process, including a bootstrap routine designed to remove unstable associations. Parallel data processing is available for steps that benefit from high-performance computing.
The TNI pipeline starts with the generic function tni.constructor()
and creates a TNI-class
object, which provides methods for TRN inference from high-throughput gene expression data. The tni.constructor
takes in a matrix of normalized gene expression and the corresponding gene and sample annotations, as well as a list of regulators to be evaluated. Here, the gene expression matrix and annotations are available in the tniData
dataset, which was extracted, pre-processed and size-reduced from Fletcher et al. (2013) and Curtis et al. (2012). This dataset consists of a list with 3 objects: a named gene expression matrix (tniData$expData
), a data frame with gene annotations (tniData$rowAnnotation
), and a data frame with sample annotations (tniData$colAnnotation
). We will use this dataset to demonstrate the construction of a small TRN that has only 5 regulons. In general, though, we recommend building regulons for all TFs annotated for a given species; please see the case study sections for additional recommendations.
The tni.constructor
method will check the consistency of all the given arguments. The TNI pipeline is then executed in three steps: (i) compute MI between a regulator and all potential targets, removing non-significant associations by permutation analysis; (ii) remove unstable interactions by bootstrapping; and (iii) apply the ARACNe algorithm. Additional comments are provided throughout the examples.
library(RTN)
data(tniData)
# Input 1: 'expData', a named gene expression matrix (genes on rows, samples on cols);
# Input 2: 'regulatoryElements', a vector listing genes regarded as TFs
# Input 3: 'rowAnnotation', an optional data frame with gene annotation
# Input 4: 'colAnnotation', an optional data frame with sample annotation
tfs <- c("FOXM1","E2F2","E2F3","RUNX2","PTTG1")
rtni <- tni.constructor(expData = tniData$expData,
regulatoryElements = tfs,
rowAnnotation = tniData$rowAnnotation,
colAnnotation = tniData$colAnnotation)
# p.s. alternatively, 'expData' can be a 'SummarizedExperiment' object
Then the tni.permutation()
function takes the pre-processed TNI-class
object and returns a TRN inferred by permutation analysis (with corrections for multiple hypothesis testing).
# Please set nPermutations >= 1000
rtni <- tni.permutation(rtni, nPermutations = 100)
Unstable interactions are subsequently removed by bootstrap analysis using the tni.bootstrap()
function, which creates a consensus bootstrap network, referred here as refnet
(reference network).
rtni <- tni.bootstrap(rtni)
At this stage each target in the TRN may be linked to multiple TFs. Because regulation can occur by both direct (TF-target) and indirect interactions (TF-TF-target), the pipeline next applies the ARACNe algorithm (Margolin, Nemenman, et al. 2006) to remove the weakest interaction in any triplet formed by two TFs and a common target gene, preserving the dominant TF-target pair (Meyer, Lafitte, and Bontempi 2008). The ARACNe algorithm uses the data processing inequality (DPI) theorem to enrich the regulons with direct TF-target interactions, creating a DPI-filtered TRN, referred here as tnet
(transcriptional network). For additional details, please refer to Margolin, Wang, et al. (2006) and Fletcher et al. (2013). Briefly, consider three random variables, X
, Y
and Z
that form a network triplet, with X
interacting with Z
only through Y
(i.e., the interaction network is X->Y->Z
), and no alternative path exists between X
and Z
). The DPI theorem states that the information transferred between Y
and Z
is always larger than the information transferred between X
and Z
. Based on this assumption, the ARACNe algorithm scans all triplets formed by two regulators and one target and removes the edge with the smallest MI value of each triplet, which is regarded as a redundant association.
rtni <- tni.dpi.filter(rtni)
For a summary of the resulting regulatory network we recommend using the tni.regulon.summary()
function. From the summary below, we can see the number of regulons, the number of inferred targets and the regulon size distribution.
tni.regulon.summary(rtni)
## Regulatory network comprised of 5 regulons.
## -- DPI-filtered network:
## regulatoryElements Targets Edges
## 5 1161 1324
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 75 233 333 265 341 342
## -- Reference network:
## regulatoryElements Targets Edges
## 5 1161 2468
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 75 458 617 494 629 689
## ---
The tni.regulon.summary()
function also lets us get detailed information about a specific regulon, including the number of positive and negative targets. Please use this information as an initial guide in assessing the set of regulons. Usually small regulons (<15 targets) are not useful for most downstream methods, and highly unbalanced regulons (e.g. those with only positive targets) provide unstable activity readouts.
tni.regulon.summary(rtni, regulatoryElements = "FOXM1")
## The FOXM1 regulon has 233 targets, it's a large and balanced regulon.
## -- DPI filtered network targets:
## Total Positive Negative
## 233 136 97
## -- Reference network targets:
## Total Positive Negative
## 617 369 248
## -- Regulators with mutual information:
## E2F2, E2F3, PTTG1
##
## ---
All results available in the TNI-class
object can be retrieved using the tni.get()
accessory function. For example, setting what = 'regulons.and.mode'
will return a list with regulons, including the weight assigned for each interaction.
regulons <- tni.get(rtni, what = "regulons.and.mode", idkey = "SYMBOL")
head(regulons$FOXM1)
## C1orf106 HBB MAPT HBA2 ZNF385B FOSB
## 0.1321503 -0.1378999 -0.2063462 -0.1029147 -0.1092636 -0.1271561
The absolute value of a weight represents the MI value, while the sign (+/-) indicates the predicted mode of action based on the Pearson’s correlation between the regulator and its targets. We can also retrieve the inferred regulons into an igraph-class
object (Csardi 2019) using the tni.graph()
function, which sets some basic graph attributes for visualization in the RedeR R package (Castro et al. 2012).
g <- tni.graph(rtni, regulatoryElements = c("FOXM1","E2F2","PTTG1"))
The next chunk shows how to plot the igraph-class
object using RedeR (Figure 1).
library(RedeR)
rdp <- RedPort()
calld(rdp)
addGraph(rdp, g, layout=NULL)
addLegend.color(rdp, g, type="edge")
addLegend.shape(rdp, g)
relax(rdp, ps = TRUE)
Figure 1. Graph representation of three regulons inferred by the RTN package.
In the previous section we outlined the TNI pipeline, which calculates a TRN and makes information available on its regulons. Here, we describe the TNA pipeline, which provides methods for doing enrichment analysis over a list of regulons. The TNA pipeline starts with the function tni2tna.preprocess()
, converting the TNI-class
into a TNA-class
object. Then regulons will be tested for association with a given gene expression signature, provided here in the tnaData
dataset. Note that the dataset in this vignette should be used for demonstration purposes only. It consists of a list with 3 objects: a named numeric vector with log2 fold changes from a differential gene expression analysis, called here a ‘phenotype’ (tnaData$phenotype
), a character vector listing the differentially expressed genes (tnaData$hits
), and a data frame with gene annotations mapped to the phenotype (tnaData$phenoIDs
).
# Input 1: 'object', a TNI object with regulons
# Input 2: 'phenotype', a named numeric vector, usually log2 differential expression levels
# Input 3: 'hits', a character vector, usually a set of differentially expressed genes
# Input 4: 'phenoIDs', an optional data frame with gene anottation mapped to the phenotype
data(tnaData)
rtna <- tni2tna.preprocess(object = rtni,
phenotype = tnaData$phenotype,
hits = tnaData$hits,
phenoIDs = tnaData$phenoIDs)
The tna.mra()
function takes the TNA-class
object and runs the Master Regulator Analysis (MRA) (Carro et al. 2010) over a list of regulons (with corrections for multiple hypothesis testing). The MRA assesses the overlap between each regulon and the genes listed in the hits
argument.
# Run the MRA method
rtna <- tna.mra(rtna)
All results available in the TNA-class
object can be retrieved using the tna.get()
accessory function; setting what = 'mra'
will return a data frame listing the total number of genes in the TRN (Universe.Size
), the number of targets in each regulon (Regulon.Size
), the number of genes listed in the hits
argument (Total.Hits
), the expected overlap between a given regulon and the ‘hits’ (Expected.Hits
), the observed overlap between a given regulon and the ‘hits’ (Observed.Hits
), the statistical significance of the observed overlap assessed by the phyper()
function (Pvalue
), and the adjusted P-value (Adjusted.Pvalue
).
# Get MRA results;
#..setting 'ntop = -1' will return all results, regardless of a threshold
mra <- tna.get(rtna, what="mra", ntop = -1)
head(mra)
## Regulon Universe.Size Regulon.Size Total.Hits Expected.Hits Observed.Hits
## 1870 E2F2 5304 333 660 41.44 85
## 2305 FOXM1 5304 233 660 28.99 60
## 9232 PTTG1 5304 341 660 42.43 71
## 1871 E2F3 5304 342 660 42.56 59
## 860 RUNX2 5304 75 660 9.33 6
## Pvalue Adjusted.Pvalue
## 1870 1.1e-11 5.7e-11
## 2305 1.1e-08 2.8e-08
## 9232 4.3e-06 7.2e-06
## 1871 4.6e-03 5.8e-03
## 860 9.2e-01 9.2e-01
As a complementary approach, the tna.gsea1()
function runs the one-tailed gene set enrichment analysis (GSEA-1T) to find regulons associated with a particular ‘response’, which is represented by a ranked list of genes generated from a differential gene expression signature (i.e. the ‘phenotype’ included in the TNI-to-TNA preprocessing step). Here the regulon’s target genes are considered a gene set, which is evaluated against the phenotype. The GSEA-1T uses a rank-based scoring metric in order to test the association between the gene set and the phenotype (Subramanian et al. 2005).
# Run the GSEA method
# Please set nPermutations >= 1000
rtna <- tna.gsea1(rtna, nPermutations=100)
Setting what = 'gsea1'
in the tna.get()
accessory function will retrive a data frame listing the GSEA statistics, and the corresponding GSEA plots can be generated using the tna.plot.gsea1()
function (Figure 2).
# Get GSEA results
gsea1 <- tna.get(rtna, what="gsea1", ntop = -1)
head(gsea1)
## Regulon Regulon.Size Observed.Score Pvalue Adjusted.Pvalue
## 1870 E2F2 333 0.68 0.009901 0.012376
## 2305 FOXM1 233 0.67 0.009901 0.012376
## 9232 PTTG1 340 0.65 0.009901 0.012376
## 1871 E2F3 342 0.61 0.009901 0.012376
## 860 RUNX2 75 0.42 0.247520 0.247520
# Plot GSEA results
tna.plot.gsea1(rtna, labPheno="abs(log2 fold changes)", ntop = -1)