A transcriptional regulatory network (TRN) consists of a collection of regulated target genes and transcription factors (TFs), which recognize specific DNA sequences and guide the expression of the genome, either activating or repressing the expression the target genes. The set of genes controlled by the same TF forms a regulon. This package provides classes and methods for the reconstruction of TRNs and analysis of regulons. New computational frameworks are also available from the RTN’s derivative packages, RTNduals and RTNsurvival.
RTN 2.8.2
The RTN package is designed for the reconstruction of TRNs and analysis of regulons using mutual information (MI) (Fletcher et al. 2013). It is implemented by S4 classes in R (R Core Team 2012) and extends several methods previously validated for assessing regulons, e.g., MRA (Carro et al. 2010), GSEA (Subramanian et al. 2005), and EVSE (Castro et al. 2016). The package tests the association between a given TF and all potential targets using transcriptomes (either from RNA-Seq or microarray studies). It is tuned to deal with large gene expression datasets in order to build transcriptional regulatory units centered on TFs. RTN allows user to set the stringency of the analysis in a stepwise process, including a boostrep routine designed to remove unstable associations. Parallel data processing is available for critical steps demanding high-performance computing.
The TNI pipeline starts with the generic function tni.constructor
and creates a TNI-class
object, which provides methods for TRN inference from high-throughput gene expression data. The tni.constructor
takes in a matrix of gene expression and the corresponding gene and sample annotation, as well as a list of regulators to be evaluated. Here, the gene expression matrix and annotation are available in the tniData
dataset, which was extracted, pre-processed and size-reduced from Fletcher et al. (2013) and Curtis et al. (2012). This dataset consists of a list with 3 objects: a named gene expression matrix (tniData$expData
), a data frame with gene annotation (tniData$rowAnnotation
), and a data frame with sample annotation (tniData$colAnnotation
). We will use this dataset to demonstrate the construction of a small TRN, with 5 regulons (we recommend building regulons for all TFs annotated for a given species; please see the case study section for additional examples).
The tni.constructor
method will check the consistency of all the given arguments. The TNI pipeline is then executed in three subsequent steps: (i) compute MI between a regulator and all potential targets, removing non-significant associations by permutation analysis, (ii) remove unstable interactions by bootstrapping, and (iii) apply the ARACNe algorithm (additional comments are provided throughout the examples).
library(RTN)
data(tniData)
#Input 1: 'expData', a named gene expression matrix (genes on rows, samples on cols);
#Input 2: 'regulatoryElements', a vector listing genes regarded as TFs
#Input 3: 'rowAnnotation', an optional data frame with gene annotation
#Input 4: 'colAnnotation', an optional data frame with sample annotation
tfs <- c("PTTG1","E2F2","FOXM1","E2F3","RUNX2")
rtni <- tni.constructor(expData = tniData$expData,
regulatoryElements = tfs,
rowAnnotation = tniData$rowAnnotation,
colAnnotation = tniData$colAnnotation)
#p.s. alternativelly, 'expData' can be a 'SummarizedExperiment' object
Then the tni.permutation
function takes the pre-processed TNI-class
object and returns a TRN inferred by permutation analysis (with multiple hypothesis testing corrections).
#Please, set nPermutations >= 1000
rtni <- tni.permutation(rtni, nPermutations = 100)
Unstable interactions are subsequently removed by bootstrap analysis using the tni.bootstrap
function, which creates a consensus bootstrap network, referred here as refnet
(reference network).
rtni <- tni.bootstrap(rtni)
At this stage each target in the TRN can be linked to multiple TFs. As regulation can occur by both direct (TF-target) and indirect interactions (TF-TF-target), next the pipeline applies the ARACNe algorithm (Margolin, Nemenman, et al. 2006) to remove the weakest interaction in any triplet formed by two TFs and a common target gene, preserving the dominant TF-target pair (Lafitte, Bontempi, and Meyer 2008). The ARACNe algorithm uses the data processing inequality (DPI) theorem to enrich the regulons with direct TF-target interactions, creating a DPI-filtered TRN, referred here as tnet
(for additional details, please refer to Margolin, Wang, et al. (2006) and Fletcher et al. (2013)). Briefly, consider three random variables, X
, Y
and Z
that form a network triplet, with X
interacting with Z
only through Y
(i.e., the interaction network is X->Y->Z
), and no alternative path exists between X
and Z
). The DPI theorem states that the information transferred between Y
and Z
is always larger than the information transferred between X
and Z
. Based on this assumption, the ARACNe algorithm scans all triplets formed by two regulators and one target and removes the edge with the smallest MI value of each triplet, which is regarded as a redundant association.
rtni <- tni.dpi.filter(rtni)
For a summary of the resulting regulatory network we recommend using the tni.regulon.summary
function. From the summary below, we can see the number of regulons, the number of inferred targets and the regulon size distribution.
tni.regulon.summary(rtni)
## This regulatory network comprised of 5 regulons.
## -- DPI-filtered network:
## regulatoryElements Targets Edges
## 5 1201 1285
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 93 231 299 257 306 356
## -- Reference network:
## regulatoryElements Targets Edges
## 5 1201 2526
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 93 479 630 505 645 679
## ---
The tni.regulon.summary
function also lets us get detailed information about a specific regulon, including the number of positive and negative targets. Please use this information as an initial guide to help the description of the regulons. Usually small regulons (<15 targets) are not usufull for most downstream methods, and highly unbalanced regulons (e.g. only positive targets) provide unstable activity readouts.
tni.regulon.summary(rtni, regulatoryElements = "PTTG1")
## This regulatory network comprised of 5 regulons.
## -- DPI-filtered network:
## regulatoryElements Targets Edges
## 5 1201 1285
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 93 231 299 257 306 356
## -- Reference network:
## regulatoryElements Targets Edges
## 5 1201 2526
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 93 479 630 505 645 679
## ---
All results available in the TNI-class
object can be retrieved using the tni.get
accessory function. For example, setting what = 'regulons.and.mode'
will return a list with regulons, including the weight assigned for each interaction.
regulons <- tni.get(rtni, what = "regulons.and.mode", idkey = "SYMBOL")
head(regulons$PTTG1)
## PIP ANKRD30A RTN1 CIDEC TMSB15A PLIN4
## -0.08358555 -0.14140215 -0.09671745 -0.10450404 0.13643477 -0.15031445
The absolute weight represents the MI value, while the signal (+/-) indicates the predicted mode of action based on the Pearson’s correlation between the regulator and its targets. We can also retrieve the inferred regulons into an igraph-class
object (Csardi 2019) using the tni.graph
function, which sets some basic graph attributes for visualization in the RedeR R package (Castro et al. 2012).
g <- tni.graph(rtni, tfs = c("PTTG1","E2F2","FOXM1"))
The next chunk shows how to plot the igraph-class
object using RedeR (Figure 1).
library(RedeR)
rdp <- RedPort()
calld(rdp)
addGraph(rdp, g, layout=NULL)
addLegend.color(rdp, g, type="edge")
addLegend.shape(rdp, g)
relax(rdp, ps = TRUE)
Figure 1. Graph representation of three regulons inferred by the RTN package.
The TNA pipeline starts with the generic function tni2tna.preprocess
, which converts the TNI-class
into a TNA-class
object. Objects of class TNA provide methods for enrichment analysis over a list of regulons, testing the association between regulons and a given gene expression signature, which is provided here in the tnaData
dataset. This dataset should be used for demonstration purposes only and consists of a list with 3 objects: a named numeric vector with log2 fold changes from a differential gene expression analysis, called here as phenotype (tnaData$phenotype
), a character vector listing the differentially expressed genes (tnaData$hits
), and a data frame with gene annotation mapped to the phenotype (tnaData$phenoIDs
).
#Input 1: 'object', a TNI object with regulons
#Input 2: 'phenotype', a named numeric vector, usually log2 differential expression levels
#Input 3: 'hits', a character vector, usually a set of differentially expressed genes
#Input 4: 'phenoIDs', an optional data frame with gene anottation mapped to the phenotype
data(tnaData)
rtna <- tni2tna.preprocess(object = rtni,
phenotype = tnaData$phenotype,
hits = tnaData$hits,
phenoIDs = tnaData$phenoIDs)
The tna.mra
function takes the TNA-class
object and runs the Master Regulator Analysis (MRA) (Carro et al. 2010) over a list of regulons (with multiple hypothesis testing corrections). The MRA assesses the overlap between each regulon and the genes listed as ‘hits’.
#Run the MRA method
rtna <- tna.mra(rtna)
All results available in the TNA-class
object can be retrieved using the tna.get
accessory function; setting what = 'mra'
will return a data frame listing the total number of genes in the TRN (Universe.Size
), the number of targets in each regulon (Regulon.Size
), the number of genes listed as ‘hits’ (Total.Hits
), the expected overlap between a given regulon and the ‘hits’ (Expected.Hits
), the observed overlap between a given regulon and the ‘hits’ (Observed.Hits
), the statistical significance of the observed overlap assessed by the phyper
function (Pvalue
), and the adjusted P-value (Adjusted.Pvalue
).
#Get MRA results;
#..setting 'ntop = -1' will return all results, regardless of a threshold
mra <- tna.get(rtna, what="mra", ntop = -1)
head(mra)
## Regulon Universe.Size Regulon.Size Total.Hits Expected.Hits
## 1870 E2F2 5304 299 660 37.21
## 2305 FOXM1 5304 231 660 28.74
## 9232 PTTG1 5304 356 660 44.30
## 1871 E2F3 5304 306 660 38.08
## 860 RUNX2 5304 93 660 11.57
## Observed.Hits Pvalue Adjusted.Pvalue
## 1870 77 7.6e-11 3.8e-10
## 2305 57 1.3e-07 3.4e-07
## 9232 74 2.8e-06 4.7e-06
## 1871 54 4.1e-03 5.1e-03
## 860 10 7.4e-01 7.4e-01
As a complementary approach, the tna.gsea1
function runs the one-tailed gene set enrichment analysis (GSEA-1T) to find regulons associated with a particular response, represented by a ranked list of genes generated from a differential gene expression signature (i.e. the ‘phenotype’ included in the TNI-to-TNA preprocessing step). Here the regulon’s targets are considered a gene set, which is evaluated against the phenotype. The GSEA-1T uses a rank-based scoring metric in order to test the association between the gene set and the phenotype (Subramanian et al. 2005).
#Run the GSEA method
#Please, set nPermutations >= 1000
rtna <- tna.gsea1(rtna, stepFilter=FALSE, nPermutations=100)
Setting what = 'gsea1'
in the tna.get
accessory function will retrive a data frame listing the GSEA statistics, and the corresponding GSEA plots can be generated using the tna.plot.gsea1
function (Figure 2).
#Get GSEA results
gsea1 <- tna.get(rtna, what="gsea1", ntop = -1)
head(gsea1)
## Regulon Regulon.Size Observed.Score Pvalue Adjusted.Pvalue
## 1870 E2F2 299 0.67 0.009901 0.009901
## 1871 E2F3 306 0.61 0.009901 0.009901
## 2305 FOXM1 231 0.67 0.009901 0.009901
## 9232 PTTG1 355 0.65 0.009901 0.009901
## 860 RUNX2 93 0.50 0.009901 0.009901
#Plot GSEA results
tna.plot.gsea1(rtna, labPheno="abs(log2 fold changes)", ntop = -1)
Figure 2. GSEA analysis showing genes in each regulon (vertical bars) ranked by differential gene expression analysis (phenotype). This toy example illustrates the output from the TNA pipeline evaluated by the
tna.gsea1
function.
The GSEA-1T, however, does not indicate the direction of the association. Next the TNA pipeline uses the two-tailed GSEA (GSEA-2T) approach to test whether the regulon is positively or negatively associated with the phenotype. The tna.gsea2
function splits the regulon into positive and negative targets (based on Pearson’s correlation between the TF and its targets), and then assesses the distribution of the targets in the ranked list of genes.
#Run the GSEA-2T method
#Please, set nPermutations >= 1000
rtna <- tna.gsea2(rtna, stepFilter = FALSE, nPermutations = 100)
Setting what = 'gsea2'
in the tna.get
accessory function will retrive a data frame listing the GSEA-2T statistics, and the corresponding GSEA plots can be generated using the tna.plot.gsea2
function.
#Get GSEA-2T results
gsea2 <- tna.get(rtna, what = "gsea2", ntop = -1)
head(gsea2$differential)
## Regulon Regulon.Size Observed.Score Pvalue Adjusted.Pvalue
## 1870 E2F2 299 1.07 0.009901 0.024752
## 9232 PTTG1 355 1.08 0.009901 0.024752
## 2305 FOXM1 231 0.39 0.188120 0.313530
## 860 RUNX2 92 -0.26 0.306930 0.383660
## 1871 E2F3 306 0.02 0.445540 0.445540
In GSEA-2T (Figure 3), a regulon’s positive and negative targets are each considered separate as pos and neg gene sets, which are then evaluated against the phenotype. For each gene set (pos and neg) a walk down the ranked list is performed, stepwise. When a gene in the gene set is found, its position is marked in the ranked list. A running sum, shown as the pink and blue (pos and neg gene sets, respectively) lines, increases when the gene at that position belongs to the gene set and decreases when it doesn’t. The maximum distance of each running sum from the x-axis represents the enrichment score. GSEA-2T produces two per-phenotype enrichment scores (ES), whose difference (dES = ESpos - ESneg) represents the regulon activity. The goal is to assess whether the target genes are overrepresented among the genes that are more positively or negatively differentially expressed. A large positive dES indicates an induced (activated) regulon, while a large negative dES indicates a repressed regulon (please refer to Campbell et al. (2016) and Campbell et al. (2018) for cases illustrating the use of this approach; an extension of the GSEA-2T to single samples was implemented by Castro et al. (2016) and Groeneveld et al. (2019)).
#Plot GSEA-2T results
tna.plot.gsea2(rtna, labPheno="log2 fold changes", tfs="PTTG1")
Figure 3. Two-tailed GSEA analysis showing regulon’s positive or negative targets (red/blue vertical bars) ranked by differential gene expression analysis (phenotype). This toy example illustrates the output from the TNA pipeline evaluated by the
tna.gsea2
function (Campbell et al. (2016) and Campbell et al. (2018) provide examples on how to interpret results from this method).
Here we show how to prepare input data for the RTN package using publicly available mRNA-seq data, and clinical/molecular data for the TCGA-BRCA cohort. We show how to download harmonized GRCh38/hg38 data from the Genomic Data Commons (GDC) using the TCGAbiolinks package (Colaprico et al. 2016). The preprocessing will generate a SummarizedExperiment
object that contains gene expression data, which we will then use to compute BRCA-specific regulons.
Please ensure you have installed all libraries before proceeding.
library(RTN)
library(TCGAbiolinks)
library(SummarizedExperiment)
library(TxDb.Hsapiens.UCSC.hg38.knownGene)
library(snow)
We’ll use the Bioconductor package TCGAbiolinks
to query and download from the GDC. We are looking for the harmonized, normalized RNA-seq data for the TCGA-BRCA cohort. TCGAbiolinks
will create a directory called GDCdata in your working directory and will save into it the files downloaded from the GDC. The files for each patient will be downloaded in a separate file, and we will use a subset of 500 cases for demonstration purposes only. As a large number of mRNA-seq text files will be downloaded, totaling 140 MB, this can take a while. Then, the GDCprepare
function will compile the files into an R object of class RangedSummarizedExperiment
. The RangedSummarizedExperiment
has 6 slots. The most important are rowRanges
(gene annotation), colData
(sample annotation), and assays
, which contains the gene expression matrix.
# Get gene expression aligned against hg38
query <- GDCquery(project = "TCGA-BRCA",
data.category = "Transcriptome Profiling",
data.type = "Gene Expression Quantification",
workflow.type = "HTSeq - FPKM-UQ",
sample.type = c("Primary solid Tumor"))
# Get a subset from TCGA-BRCA for demonstration (n=500)
cases <- getResults(query, cols = "cases")
cases <- sample(cases, size = 500)
query <- GDCquery(project = "TCGA-BRCA",
data.category = "Transcriptome Profiling",
data.type = "Gene Expression Quantification",
workflow.type = "HTSeq - FPKM-UQ",
sample.type = c("Primary solid Tumor"),
barcode = cases)
GDCdownload(query)
tcgaBRCA_mRNA_data <- GDCprepare(query)
The object downloaded from the GDC contains gene-level expression data that includes both coding and noncoding genes (e.g. lncRNAs). We will filter these, retaining only genes annotated in the UCSC hg38 known gene list (~30,000 genes).
#-- Subset by known gene locations
geneRanges <- genes(TxDb.Hsapiens.UCSC.hg38.knownGene)
tcgaBRCA_mRNA_data <- subsetByOverlaps(tcgaBRCA_mRNA_data, geneRanges)
nrow(rowData(tcgaBRCA_mRNA_data))
## [1] ~30,000
Finally, we’ll simplify names in the cohort annotation for better summarizations in the subsequent RTN functions. When this step has been run, the tcgaBRCA_mRNA_data
object is ready for the TNI pipeline. Please ensure the tcgaBRCA_mRNA_data
object is saved in an appropriated folder.
#-- Change column names in gene annotation for better summarizations
colnames(rowData(tcgaBRCA_mRNA_data)) <- c("ENSEMBL", "SYMBOL", "OG_ENSEMBL")
#-- Save the preprocessed data for subsequent analyses
save(tcgaBRCA_mRNA_data, file = "tcgaBRCA_mRNA_data_preprocessed.RData")
Next we will generate regulons for the TCGA-BRCA cohort following the same steps described in the Quick Start section. For the regulon reconstruction, we will use a comprehensive list of regulators available in the tfsData
dataset, comprising 1612 TFs compiled from Lambert et al. (2018).
#-- Load TF annotation
data("tfsData")
#-- Check TF annotation:
#-- Intersect TFs from Lambert et al. (2018) with gene annotation
#-- from the TCGA-BRCA cohort
geneannot <- rowData(tcgaBRCA_mRNA_data)
regulatoryElements <- intersect(tfsData$Lambert2018$SYMBOL, geneannot$SYMBOL)
#-- Run the TNI constructor
rtni_tcgaBRCA <- tni.constructor(expData = tcgaBRCA_mRNA_data,
regulatoryElements = regulatoryElements)
To compute a large regulatory network we recommend using a multithreaded mode with the snow package. As minimum computational resources, we suggest a processor with >= 4 cores and RAM >= 8 GB per core (specific routines should be adjusted for the available resources). The makeCluster
function will set the number of nodes to create on the local machine, making a cluster
object available for the TNI-class
methods. For example, it should take ~2h to conclude the reconstruction of 1000 regulons from a gene expression matrix with ~30,000 genes and 500 samples when running in a 2.9 GHz Core i9-8950H workstation with 32GB DDR4 RAM (you might add something for the current calculation with ~1600 TFs). Please note that running large datasets in parallel can quickly run the system out of memory. We recommend monitoring the parallelization to avoid working too close to the memory ceiling; the parChunks
argument available in the tni.permutation
and tni.bootstrap
functions can be used to adjust the job size sent for parallelization. For RNA-seq data we recommend using the non-parametric estimator of mutual information (default option of the tni.permutation
function).
#-- Compute the reference regulatory network by permutation and bootstrap analyses.
#-- Please, set 'spec' according to the available resources in your hardware
options(cluster=snow::makeCluster(spec=5, "SOCK"))
rtni_tcgaBRCA <- tni.permutation(rtni_tcgaBRCA, pValueCutoff = 10^-6)
rtni_tcgaBRCA <- tni.bootstrap(rtni_tcgaBRCA, nBootstraps = 200)
stopCluster(getOption("cluster"))
Next we run the ARACNe algorithm with eps = 0
, which sets the threshold for removing the edge with the smallest MI value in each triplet (see comments above and the aracne
function documentation). Zero is the most stringent MI threshold. A less stringent approach sets eps = NA
, which estimates a MI threshold from the empirical null distribution computed in the permutation and bootstrap steps.
#-- Compute the DPI-filtered regulatory network
rtni_tcgaBRCA <- tni.dpi.filter(rtni_tcgaBRCA, eps = 0)
#-- Save the TNI object for subsequent analyses
save(rtni_tcgaBRCA, file="rtni_tcgaBRCA.RData")
Note1: Some level of missing annotation is expected, as not all gene symbols listed in the regulatoryElements
might be available in the TCGA-BRCA preprocessed data. Also, data that are inconsistent with the calculation may be removed in the tni.constructor
preprocess; for example, if a gene’s expression does not vary across a cohort, it is not possible to associate this gene’s expression with the expression of other genes in the cohort. As an extreme case, genes that exhibit no variability (e.g. that are not expressed in all samples) are excluded from the analysis. For a summary of the resulting regulatory network we recommend using the tni.regulon.summary
function (see the Quick Start section).
Note2: The MI metric is based on a gene’s expression varying across a cohort. Large cohorts of tumour samples typically contain multiple molecular subtypes, and typically provide good expression variability for building regulons. In contrast, sample sets that are more homogeneous may be more challenging to explore with regulons, and this may be the case with sets of normal, non-cancerous samples. We do not recommend computing regulons for sample sets of low variability.
Fletcher et al. (2013) reconstructed regulons for 809 transcription factors (TFs) using microarray transcriptomic data from the METABRIC breast cancer cohort (Curtis et al. 2012). Castro et al. (2016) found that 36 of these TF regulons were associated with genetic risk of breast cancer. The risk TFs were in two distinct clusters. The “cluster 1” risk TFs were associated with estrogen receptor-positive (ER+) breast cancer risk and comprise TFs such as ESR1, FOXA1, and GATA3, whereas the “cluster 2” risk TFs were associated with estrogen receptor-negative (ER-), basal-like breast cancer. Our goal here is to demonstrate how to generate regulon activity profiles for individual tumour samples using the regulons reconstructed by Fletcher et al. (2013).
Please ensure you have installed all libraries before proceeding. The Fletcher2013b package is available from the R/Bioconductor repository. Installing and then loading the Fletcher2013b data package will make available all data required for this case study. The rtni1st
dataset is a pre-processed TNI-class
object that includes the regulons reconstructed by Fletcher et al. (2013) and a gene expression matrix for the METABRIC cohort 1.
library(RTN)
library(Fletcher2013b)
library(pheatmap)
#-- Load 'rtni1st' data object, which includes regulons and expression profiles
data("rtni1st")
#-- A list of transcription factors of interest (here 36 risk-associated TFs)
risk.tfs <- c("AFF3", "AR", "ARNT2", "BRD8", "CBFB", "CEBPB", "E2F2", "E2F3", "ENO1", "ESR1", "FOSL1", "FOXA1", "GATA3", "GATAD2A", "LZTFL1", "MTA2", "MYB", "MZF1", "NFIB", "PPARD", "RARA", "RB1", "RUNX3", "SNAPC2", "SOX10", "SPDEF", "TBX19", "TCEAL1", "TRIM29", "XBP1", "YBX1", "YPEL3", "ZNF24", "ZNF434", "ZNF552", "ZNF587")
Regulon activity profiles (RAPs) seek to characterize regulatory program similarities and differences between samples in a cohort. In order to assess a large number of samples, we implemented a function that computes the two-tailed GSEA for the entire cohort (additional details are provided in Groeneveld et al. (2019)). Briefly, for each regulon, the tni.gsea2
function estimates a regulon activity score for each sample available in the TNI-class
object. For each gene in a sample, a differential gene expression is calculated from its expression in the sample relative to its average expression in the cohort; the genes are then ordered as a ranked list representing a differential gene expression signature, which is used to run the GSEA-2T as explained in tna.gsea2
method (see section 2.1).
#-- Compute regulon activity for individual samples
metabric_regact <- tni.gsea2(rtni1st, tfs = risk.tfs)
The tni.gsea2
returns a list with the calculated enrichment scores: ESpos, ESneg and dES, which represents the regulon activity of individual samples. Next, the pheatmap
function is used to generate a heatmap showing RAPs along with some sample attributes (Figure 4).
#-- Get sample attributes from the 'rtni1st' dataset
metabric_annot <- tni.get(rtni1st, "colAnnotation")
#-- Get ER+/- and PAM50 attributes for pheatmap
attribs <- c("LumA","LumB","Basal","Her2","Normal","ER+","ER-")
metabric_annot <- metabric_annot[,attribs]
#-- Plot regulon activity profiles
pheatmap(t(metabric_regact$dif),
main="METABRIC cohort 1 (n=977 samples)",
annotation_col = metabric_annot,
show_colnames = FALSE, annotation_legend = FALSE,
clustering_method = "ward.D2", fontsize_row = 6,
clustering_distance_rows = "correlation",
clustering_distance_cols = "correlation")
Figure 4. Regulon activity profiles of individual tumour samples from the METABRIC cohort 1 (for an example where this strategy has been used, please see Robertson et al. (2017)).
The tni.replace.samples
function is the entry point to assess new samples with previously calculated regulons. This function will require an existing TNI-class
objects, replacing the gene expression matrix and sample annotation. The tni.replace.samples
will check all the given arguments, specilly the consistency of gene annotation between datasets. Next we show how to generate RAPs for TCGA-BRCA samples using the regulons reconstructed by Fletcher et al. (2013).
#-- Replace samples
rtni1st_tcgasamples <- tni.replace.samples(rtni1st, tcgaBRCA_mRNA_data)
#-- Compute regulon activity for the new samples
tcga_regact <- tni.gsea2(rtni1st_tcgasamples, tfs = risk.tfs)
#-- Get sample attributes from the 'rtni1st_tcgasamples' dataset
tcga_annot <- tni.get(rtni1st_tcgasamples, "colAnnotation")
#-- Adjust PAM50 attributes for pheatmap
tcga_annot <- within(tcga_annot,{
'LumA' = ifelse(subtype_BRCA_Subtype_PAM50%in%c("LumA"),1,0)
'LumB' = ifelse(subtype_BRCA_Subtype_PAM50%in%c("LumB"),1,0)
'Basal' = ifelse(subtype_BRCA_Subtype_PAM50%in%"Basal",1,0)
'Her2' = ifelse(subtype_BRCA_Subtype_PAM50%in%c("Her2"),1,0)
'Normal' = ifelse(subtype_BRCA_Subtype_PAM50%in%c("Normal"),1,0)
})
attribs <- c("LumA","LumB","Basal","Her2","Normal")
tcga_annot <- tcga_annot[,attribs]
#-- Plot regulon activity profiles
pheatmap(t(tcga_regact$dif),
main="TCGA-BRCA cohort subset (n=500 samples)",
annotation_col = tcga_annot,
show_colnames = FALSE, annotation_legend = FALSE,
clustering_method = "ward.D2", fontsize_row = 6,
clustering_distance_rows = "correlation",
clustering_distance_cols = "correlation")
Figure 5. Regulon activity profiles of individual tumour samples for a subset (n=500) of the TCGA-BRCA cohort using regulons reconstructed by Fletcher et al. (2013) (for an example where this strategy has been used, please see Corces et al. (2018)).
When using RTN in publications, please cite:
Groeneveld CS, Chagas VS, Jones SJM, Robertson AG, Ponder BAJ, Meyer KB, Castro MAA. RTNsurvival: An R/Bioconductor package for regulatory network survival analysis. Bioinformatics, btz229, 2019. doi: 10.1093/bioinformatics/btz229
Chagas VS, Groeneveld CS, Oliveira KG, Trefflich S, de Almeida RC, Ponder BAJ, Meyer KB, Jones SJM, Robertson AG, Castro MAA. RTNduals: An R/Bioconductor package for analysis of co-regulation and inference of dual regulons. Bioinformatics, btz534, 2019. doi: 10.1093/bioinformatics/btz534
Castro MAA, de Santiago I, Campbell TM, Vaughn C, Hickey TE, Ross E, Tilley WD, Markowetz F, Ponder BAJ, Meyer KB. Regulators of genetic risk of breast cancer identified by integrative network analysis. Nature Genetics, 48(1):12-21, 2016. doi: 10.1038/ng.3458
Fletcher MNC, Castro MAA, Wang X, de Santiago I, O’Reilly M, Chin S, Rueda OM, Caldas C, Ponder BAJ, Markowetz F, Meyer KB. Master regulators of FGFR2 signalling and breast cancer risk. Nature Communications, 4:2464, 2013. doi: 10.1038/ncomms3464
## R version 3.6.1 (2019-07-05)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.3 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.9-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.9-bioc/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] RTN_2.8.2 BiocStyle_2.12.0
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.2 XVector_0.24.0
## [3] GenomeInfoDb_1.20.0 compiler_3.6.1
## [5] BiocManager_1.30.4 zlibbioc_1.30.0
## [7] bitops_1.0-6 class_7.3-15
## [9] mixtools_1.1.0 tools_3.6.1
## [11] minet_3.42.0 digest_0.6.20
## [13] evaluate_0.14 lattice_0.20-38
## [15] pkgconfig_2.0.2 Matrix_1.2-17
## [17] igraph_1.2.4.1 DelayedArray_0.10.0
## [19] yaml_2.2.0 parallel_3.6.1
## [21] xfun_0.9 GenomeInfoDbData_1.2.1
## [23] e1071_1.7-2 stringr_1.4.0
## [25] knitr_1.24 S4Vectors_0.22.0
## [27] IRanges_2.18.2 stats4_3.6.1
## [29] segmented_1.0-0 grid_3.6.1
## [31] data.table_1.12.2 Biobase_2.44.0
## [33] snow_0.4-3 BiocParallel_1.18.1
## [35] survival_2.44-1.1 rmarkdown_1.15
## [37] bookdown_0.13 limma_3.40.6
## [39] RedeR_1.32.2 viper_1.18.1
## [41] magrittr_1.5 matrixStats_0.54.0
## [43] GenomicRanges_1.36.0 htmltools_0.3.6
## [45] BiocGenerics_0.30.0 MASS_7.3-51.4
## [47] splines_3.6.1 SummarizedExperiment_1.14.1
## [49] KernSmooth_2.23-15 stringi_1.4.3
## [51] RCurl_1.95-4.12
Campbell, Thomas M., Mauro A. A. Castro, Kelin Gonçalves de Oliveira, Bruce A. J. Ponder, and Kerstin B. Meyer. 2018. “ERα Binding by Transcription Factors Nfib and Ybx1 Enables Fgfr2 Signaling to Modulate Estrogen Responsiveness in Breast Cancer.” Cancer Research 78 (2):410–21. https://doi.org/10.1158/0008-5472.CAN-17-1153.
Campbell, Thomas, Mauro Castro, Ines de Santiago, Michael Fletcher, Silvia Halim, Radhika Prathalingam, Bruce Ponder, and Kerstin Meyer. 2016. “FGFR2 Risk Snps Confer Breast Cancer Risk by Augmenting Oestrogen Responsiveness.” Carcinogenesis 37 (8):741. https://doi.org/10.1093/carcin/bgw065.
Carro, Maria, Wei Lim, Mariano Alvarez, Robert Bollo, Xudong Zhao, Evan Snyder, Erik Sulman, et al. 2010. “The Transcriptional Network for Mesenchymal Transformation of Brain Tumours.” Nature 463 (7279):318–25. https://doi.org/10.1038/nature08712.
Castro, Mauro, Ines de Santiago, Thomas Campbell, Courtney Vaughn, Theresa Hickey, Edith Ross, Wayne Tilley, Florian Markowetz, Bruce Ponder, and Kerstin Meyer. 2016. “Regulators of Genetic Risk of Breast Cancer Identified by Integrative Network Analysis.” Nature Genetics 48:12–21. https://doi.org/10.1038/ng.3458.
Castro, Mauro, Xin Wang, Michael Fletcher, Kerstin Meyer, and Florian Markowetz. 2012. “RedeR: R/Bioconductor Package for Representing Modular Structures, Nested Networks and Multiple Levels of Hierarchical Associations.” Genome Biology 13 (4):R29. https://doi.org/10.1186/gb-2012-13-4-r29.
Colaprico, Antonio, Tiago C. Silva, Catharina Olsen, Luciano Garofano, Claudia Cava, Davide Garolini, Thais S. Sabedot, et al. 2016. “TCGAbiolinks: An R/Bioconductor Package for Integrative Analysis of Tcga Data.” Nucleic Acids Research 44 (8):e71. https://doi.org/10.1093/nar/gkv1507.
Corces, M. Ryan, Jeffrey M. Granja, Shadi Shams, Bryan H. Louie, Jose A. Seoane, Wanding Zhou, Tiago C. Silva, et al. 2018. “The Chromatin Accessibility Landscape of Primary Human Cancers.” Edited by Rehan Akbani, Christopher C. Benz, Evan A. Boyle, Bradley M. Broom, Andrew D. Cherniack, Brian Craft, John A. Demchok, et al. Science 362 (6413). https://doi.org/10.1126/science.aav1898.
Csardi, Gabor. 2019. Igraph: Network Analysis and Visualization. https://CRAN.R-project.org/package=igraph.
Curtis, Christina, Sohrab Shah, Suet-Feung Chin, Gulisa Turashvili, Mark Rueda Oscar amd Dunning, Doug Speed, and et al. 2012. “The Genomic and Transcriptomic Architecture of 2,000 Breast Tumours Reveals Novel Subgroups.” Nature 486:346–52. https://doi.org/10.1038/nature10983.
Fletcher, Michael, Mauro Castro, Suet-Feung Chin, Oscar Rueda, Xin Wang, Carlos Caldas, Bruce Ponder, Florian Markowetz, and Kerstin Meyer. 2013. “Master Regulators of FGFR2 Signalling and Breast Cancer Risk.” Nature Communications 4:2464. https://doi.org/10.1038/ncomms3464.
Groeneveld, Clarice S, Vinicius S Chagas, Steven J M Jones, A Gordon Robertson, Bruce A J Ponder, Kerstin B Meyer, and Mauro A A Castro. 2019. “RTNsurvival: an R/Bioconductor package for regulatory network survival analysis.” Bioinformatics, March. https://doi.org/10.1093/bioinformatics/btz229.
Lafitte, Frederic, Gianluca Bontempi, and Patrick Meyer. 2008. “Minet: A R/Bioconductor Package for Inferring Large Transcriptional Networks Using Mutual Information.” BMC Bioinformatics 9 (1):461. https://doi.org/10.1186/1471-2105-9-461.
Lambert, S A, A Jolma, L F Campitelli, P K Das, Y Yin, M Albu, X Chen, J Taipale, T R Hughes, and Weirauch M T. 2018. “The Human Transcription Factors.” Cell 172 (4):650–65. https://doi.org/10.1016/j.cell.2018.01.029.
Margolin, Adam, Ilya Nemenman, Katia Basso, Chris Wiggins, Gustavo Stolovitzky, Riccardo Favera, and Andrea Califano. 2006. “ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context.” BMC Bioinformatics 7 (Suppl 1):S7. https://doi.org/10.1186/1471-2105-7-S1-S7.
Margolin, Adam, Kai Wang, Wei Keat Lim, Manjunath Kustagi, Ilya Nemenman, and Andrea Califano. 2006. “Reverse Engineering Cellular Networks.” Nature Protocols 1 (2):662–71. https://doi.org/10.1038/nprot.2006.106.
R Core Team. 2012. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. http://www.R-project.org/.
Robertson, A G, Kim J, Al-Ahmadie H, Bellmunt J, Guo G, Cherniack A D, Hinoue T, et al. 2017. “Comprehensive Molecular Characterization of Muscle-Invasive Bladder Cancer.” Cell 171 (3):540–56. https://doi.org/10.1016/j.cell.2017.09.007.
Subramanian, Aravind, Pablo Tamayo, Vamsi Mootha, Sayan Mukherjee, Benjamin Ebert, Michael Gillette, Amanda Paulovich, et al. 2005. “Gene Set Enrichment Analysis: A Knowledge-Based Approach for Interpreting Genome-Wide Expression Profiles.” Proceedings of the National Academy of Sciences of the United States of America 102 (43):15545–50. https://doi.org/10.1073/pnas.0506580102.