Time Incorporated miR-mRNA Generation of Networks (TimiRGeN) is aimed at researchers who wish to explore interactions in time series microRNA(miR/miRNA)-mRNA expression data. This package integrates, functionally analyses and generates small networks for hypothesis generation. To achieve data reduction without reducing biological signal, the TimiRGeN package utilises several published packages and employs their functions in a synergistic fashion for time series multi-omic analysis. The following packages have been built upon for several functions in the TimiRGeN package:
rWikiPathways [1], clusterProfiler [2], DOSE [3], biomaRt [4], RCy3 [5], Mfuzz [6], igraph [7].
TimiRGeN is very selective and only uses miR-mRNA interaction data from databases curated within the last 4 years.
TargetScans[8], miRDB[9], miRTarBase[10].
TimiRGeN does have the capability to generate networks in R, however this package is uniquely open ended, as the output can be easily be exported to cytoscape [11] or pathvisio [12] for better visualisation options.
TimiRGeN solely uses WikiPathways for functional pathway analysis, and is the first tool to allow this for time series data. WikiPathways is a user curated pathway database that contains 1000s of mechanistic signalling pathways from multiple species [13]. Furthermore, WikiPathways works very well with PathVisio which is our recommended tool for GRN (gene regulatory network) design. Please read the (Pathvisio_GRN_guide.pdf)[https://github.com/Krutik6/TimiRGeN/issues/2] for a step-by-step tutorial for a GRN creation process.
Currently the package can analyse most vertebrate model organisms e.g. human, mouse, rat, and zebrafish. It can also analyse miR and mRNA data combined or separately, and can use entrez or ensembl gene IDs for pathway analysis. This is because most WikiPathways are annotated with either entrez IDs or ensembl gene IDs. This tool can be best used after differential expression (DE) analysis, and has potential to become a staple part of any miR-mRNA expression data study. A number of longitudinal analysis methods have been included in this package including correlation analysis, cross-correlation, and hierarchichal clustering. Furthermore, there are now options to investigate miRNA-mRNA pairs using regression analysis and even multi-miRNA/mRNA regression prediction over time.
TimiRGeN dependencies loaded with the package can be further investigated from these sources [1-7, 14-24].
Depending on the data to be analysed, please load the appropriate org.db package e.g. org.Mm.eg.db
for mouse data or org.Hs.eg.db
for human data.
In this section the combined method will be used to analyse a mouse kidney fibrosis data set. The mRNA data was published in Craciun et al (2016) [25] which was downloaded from GSE65267. The associated miR data was published in Pellegrini et al (2016) [26] and this was downloaded from GSE61328.
Notice the standard nomenclature used in the column names. Do follow the this standard for your own input data. The time point should come first and is followed by a .
. The time point should consist of alphabetical characters followed by numerical characters e.g. D1, H6, TP3. After the .
, the column name should continue to display the specific result types from differential expression analysis.
Note. There should only be one .
in each column name and no _
characters. Having more than one .
or any _
characters will confuse some functions.
Note. There should be no NAs in your miR and mRNA data files.
TimiRGeN uses MultiAssayExperiment
(MAE) to contain information. The dataframes and matrices will be stored as assays
, S4 objects will be stored as Experiments
and the lists will be stored as metadata
.
If unfamiliar with MultiAssayExpriments please read through the vignette to understand how data can be accessed or go through the user guide which can be found on the MultiAssayExperiment bioconductor page [24].
MAE <- getIdsMir(MAE = MAE, assay(MAE, 1), orgDB = org.Mm.eg.db, miRPrefix = "mmu")
MAE <- getIdsMrna(MAE = MAE, assay(MAE, 2), mirror = "useast", species = "mmusculus", orgDB = org.Mm.eg.db)
## [1] "BiomaRt server connection was established."
Using getIds
functions will produce dataframes containing entrezgene and ensembl ID annotations for genes. This is useful for downstream analysis.
Due to the nature of miRs, many NAs may be found in the output of getIdsMir
functions. Entrezgene IDs and ensemble IDs are insensitive to miRs with -3p and -5p strands. Therefore, adjusted entrezgene IDs and ensemble IDs are also created.
Note. For getIdsMrna
functions, if a connection time out error occurs or if downloads are very slow, try to use other mirrors e.g. mirror = "useast"
. Incase BiomaRt
servers do not connect, clusterProfiler
will be used instead. Generally, the former method finds more annotations.
MAE <- combineGenes(MAE = MAE, miR_data = assay(MAE, 1), assay(MAE, 2))
MAE <- genesList(MAE = MAE, method = "c", genetic_data = assay(MAE, 9), timeString = "D")
MAE <- significantVals(MAE = MAE, method = "c", geneList = metadata(MAE)[[1]], maxVal = 0.05, stringVal = "adjPVal")
mRNA and miR data can be combined using combineGenes
function.
The genesList
function will transform the large dataframe into multiple nested dataframes within a list. The data will be separated by the timeString
parameter. In this example by D
(days), because it was the non-numeric character before the .
in the column names.
Significantly differentially expressed genes can be retrieved from each nested dataframe using the significantVals
function. In this example. only genes which had an adjusted P value of less than 0.05
would remain in the list.
MAE <- addIds(MAE = MAE, method = "c", filtered_genelist = metadata(MAE)[[2]], miR_IDs = assay(MAE, 3), mRNA_IDs = assay(MAE, 7))
MAE <- eNames(MAE = MAE, method = "c", gene_IDs = metadata(MAE)[[3]])
Now entrezgene IDs or ensembl IDs which were created before can be integrated into the filtered dataframes of genes using addIds
. In this example entrezgene IDs were added.
Lists of entrez IDs/ ensembl IDs can be extracted for further analysis using the eNames
function.
Once we have a list of significant genes per time point we can put this through gene set enrichment analysis to find enriched pathways in each time point in the data. TimiRGeN uses WikiPathways [13] for overrepresentation analysis.
This is standard overrepresentation analysis, here the enrichWiki
function wraps around enrichment functions from DOSE and clusterProfiler [2,3] but applies these functions for time series analysis with WikiPathways.
Note. Making multiple separate MAE objects makes it easier to work with all the generated data files.
MAE2 <- enrichWiki(MAE = MAE2, method = "c", ID_list = metadata(MAE)[[4]], orgDB = org.Mm.eg.db, path_gene = assay(MAE2, 1), path_name = assay(MAE2, 2), ID = "ENTREZID", universe = assay(MAE2, 1)[[2]])
path_gene
and path_name
can be found as output from the dloadGmt
function.
For an alternative test, a unique universe
can be used e.g. all probes found in a microarray or all known genes expressed in a cell type. This is done in section 3.