Micro RNAs (miRNAs) are a group of small non-coding RNAs (21-25 nucleotide long),which have been associated with post-transcriptional Gene Silencing, since its discovery in C.elegans as a regulator of larval development (R. C. Lee, Feinbaum, & Ambros, 1993; Wightman, Ha, & Ruvkun, 1993). miRNAs play a major roles in various developmental and disease conditions, however one significant challenge in characterizing the mechanism via which miRNAs exert their post-transcriptional effect is the identification of biologically relevant target mRNAs, given that miRNAs exhibit a one-to-many relationship with their putative target mRNAs. Most microRNA Target prediction tools look at biophysical properties like seed sequence matching and Gibbs Free Energy, but that does not take into consideration the expression of the miRNA of the tissue of interest.
This is an important parameter to take into account, as the expression of the miRNA, along with the physical properties discussed earlier, would determine effectively whether a particular miRNA plays a functional role in the process of interest. Although there have been tools available which predict targets for a given microRNA based on statistical correlation between miRNA expression and mRNA expression values. But this approach has 2 caveats: a) The miRNA expression values both microarray and RNA-seq expression values are limited compared to those mRNAs. b) Other than human and mouse, the number of miRNA expression data is quite limited for other model organisms like D.melanogaster, and predicting miRNA functionality across the whole genome is quite difficult
One method to bypass this impediment and use expression profiles to identify miRNA targets, in a model organism like Drosophila, is to focus on intragenic miRNAs, which are located within host protein coding genes. Intragenic miRNAs constitute approximately 60% of all miRNAs in Drosophila melanogaster (Fruit Flies), making these miRNAs an important component in post-transcriptional regulation of gene expression. Reports have confirmed that the expression of intragenic miRNAs is highly correlated with the expression of the host gene mRNA (Baskerville & Bartel 2005; Karali et al. 2007; Kim & Kim 2007). Based upon this correlation, it is possible to use the host gene expression values as a proxy for the expression of the intragenic miRNA (Tsang et al. 2007). Target prediction of intragenic miRNAs using the host gene expression has been successfully implemented in Humans, with the HOCTAR algorithm (Gennarino et al. 2008; Gennarino et al. 2011). Given that there are much larger available datasets for mRNA expression profiles, than miRNA expression profiles, by using the host gene as a proxy for the intragenic miRNA, one can significantly extend bioinformatic analyses and statistical power in predicting miRNA:mRNA target interactions, that are rooted not only in target prediction algorithms, but also in biologically relevant, and inversely correlated, patterns of expression between a miRNA and potential target mRNAs.
The IntramiRExploreR tool using 2 distinct correlation methods Distance and Pearson correlation finds targets for miRNA in Fruit flies using the availbale Affymetric microarray data available in the Gene Expression omnibus database. Other than the targets the tool also integrates Gene Ontology functionalities using FGNet(Bioconductor), Data from NCBI and visualisation tool using igraph.
IntramiEExploreR is currently available from the github repository. Installation method would be as the following:
library("devtools")
devtools::install_github("sbhattacharya3/IntramiRExploreR")
library("IntramiRExploreR")
IntramiRExploreR has dependency on R version (>= 3.1.2). To use the DAVID functionality for Gene Ontology functional classification (called from GetGOS_ALL function), user has to install the RDAVIDWebService package using the link below: http://stackoverflow.com/questions/31480579/r-david-webservice-sudden-transport-error-301-error-moved-permanently.
For building up the intragenic miRNA target data base, we have used Affymetrix platform 1 & 2 microarray datasets for D.melanogaster, from GEO database(Barrett et al., 2013). For the significance of the statistical analysis, experiments with greater than or equal to 5 assays were considered. The experiments were normalized using the Robust Multichip Average (RMA) (Bolstad, Irizarry, Astrand, & Speed, 2003) from the affy package (Gautier, Cope, Bolstad, & Irizarry, 2004)(Gautier et al., 2004) from the Bioconductor suite (Gentleman et al., 2004) in R. The statistical functions are then used to find the correlation between the host genes and each of the other genes in an experiment. The correlation methods used are Pearson Correlation (Lee Rodgers & Nicewander, 1988; Pearson, 1895) and distance correlation(Szekely & Rizzo, 2009).
After the correlation analysis has been performed, a false discovery rate calculation, Benjamini Hochberg (BH) False Dicovery Rate (FDR) Calculation (Benjamini & Hochberg, 1995) is done on the p values obtained for each miRNA-mRNA pair for a particular experiment, using the p.adjust function in R. To identify statistically significant, anti-correlated mRNA targets (p<0) for a particular miRNA, all mRNAs with a q-value (FDR threshold) of less than 0.01 are selected across all experiments. From these analyses, the top 25% most frequently occurring mRNAs are then compared with the targets predicted for a given miRNA in a variety of target databases (TargetScan, PITA, and Miranda). A target gene which is found in the output list of both the statistical tests and also found in the target database can be called as a putative target for a given miRNA. To get the most important putative targets a scoring system has also been designed. The scoring system is a summation of 3 parameters:
These Statistically predicted targets for a given miRNA of interest can be obtained using miRTargets_Stat function, but can be visualized by the user using the Visualisation function.
These Statistically predicted targets for a given miRNA of interest can be obtained using miRTargets_Stat function.
miR="dme-miR-12"
a<-miRTargets_Stat(miR,method=c("Both"),Platform=c("Affy1"),Text=FALSE)
a[1:4,1:5]
## miRNA Target_GeneSymbol Targets_FBID Targets_CGID Score
## 2648 dme-mir-12 ACT42A FBGN0000043 CG12051 2.1349
## 2649 dme-mir-12 ACT57B FBGN0000044 CG10067 4.2699
## 2650 dme-mir-12 ADE2 FBGN0000052 CG9127 1.2699
## 2651 dme-mir-12 AOP FBGN0000097 CG3166 2.2699
The input to the function are single or multiple miRNAs, the Statitical method which predicts the target, and the platform. The method chosen here is “Both” which is an union of both the Pearson and the Distance correlation method. The platform is chosen as Affy1 (Affymetrix platform1). The output from the function is targets that are statistically significant, the score associated to each target, the GEO accession IDS where the miRNA and the Targets are correlated and the function of the target genes from the flybase.
Similarly, genes_Stat is used to obtain statistically relevant miRNAs that target a gene of interest.
gene ="Ank2"
a<-genes_Stat(gene,geneIDType="GeneSymbol", method=c("Both"),Platform=c("Affy1"))
a[1:4,1:5]
## Gene miRNA Gene_FBID Genes_CGID Score
## 1 ANK2 dme-mir-7 FBGN0261788 CG42734 4.0333
## 2 ANK2 dme-mir-12 FBGN0261788 CG42734 4.0815
## 3 ANK2 dme-mir-274 FBGN0261788 CG42734 4.7385
## 4 ANK2 dme-mir-283 FBGN0261788 CG42734 14.2509
genes_Stat has similar output format as miRTargets_Stat, the only difference is that it outs the miRNA function from flybase, instead of the genes.
Visualisation function has three output formats: a)text: Output miRNA targets result obtained from miRTargets_Stat, in text format. b)Cytoscqape: Output in the format of cytoscape input files. c)igraphs: Output miRNA:Target gene results in the form of network. d)If no output format is chosen, a datframe containing the result returned to the user.
miR=c("dme-miR-12","dme-miR-283")
a<-Visualisation(miR,mRNA_type=c("GeneSymbol"),method=c("Both"),platform=c("Affy1"),
visualisation=c("console"),thresh=10)
a[1:10,1:5]
## miRNA Target_GeneSymbol Targets_FBID Targets_CGID Score
## 3412 dme-mir-12 VMAT FBGN0260964 CG33528 10.1080
## 4311 dme-mir-12 CG14330 FBGN0038512 CG14330 9.5455
## 3242 dme-mir-12 CG14330 FBGN0038512 CG14330 9.4318
## 4395 dme-mir-12 A2BP1 FBGN0052062 CG32062 8.1717
## 3343 dme-mir-12 A2BP1 FBGN0052062 CG32062 8.0900
## 3017 dme-mir-12 CDGAPR FBGN0032821 CG10538 7.3239
## 3293 dme-mir-12 ASATOR FBGN0039908 CG11533 7.3239
## 3900 dme-mir-12 EX FBGN0004583 CG4114 7.1939
## 3954 dme-mir-12 ARF51F FBGN0013750 CG8156 6.6545
## 4525 dme-mir-12 CG17646 FBGN0264494 CG17646 6.3182
The input to the function are single or multiple miRNAs, the Statitical method which predicts the target, and the platform. The method chosen here is “Both” which is an union of both the Pearson and the Distance correlation method. The platform is chosen as Affy1 (Affymetrix platform1). The output from the function is targets that are statistically significant, the score associated to each target, the GEO accession IDS where the miRNA and the Targets are correlated and the function of the target genes from the flybase.
The output can be visualised using igraph.
Similarly, Genes_Visualisation is used to obtain statistically relevant miRNAs that target a gene of interest, as an output from genes_Stat function.
mRNA="Syb"
a<-Gene_Visualisation(mRNA,mRNA_type=c("GeneSymbol"),method=c("Pearson"),
platform=c("Affy1"),visualisation= "console")
a[1:10,1:5]
## Gene miRNA Gene_FBID Genes_CGID Score
## 2 SYB dme-mir-289 FBGN0003660 CG12210 3.2923
## 1 SYB dme-mir-274 FBGN0003660 CG12210 2.8125
## 3 SYB dme-mir-960 FBGN0003660 CG12210 2.6067
## 4 SYB dme-mir-1013 FBGN0003660 CG12210 2.3043
## 6 SYB dme-mir-2492 FBGN0003660 CG12210 1.6728
## 5 SYB dme-mir-2280 FBGN0003660 CG12210 1.2800
## 7 SYB dme-mir-2494 FBGN0003660 CG12210 1.2273
## NA <NA> <NA> <NA> <NA> NA
## NA.1 <NA> <NA> <NA> <NA> NA
## NA.2 <NA> <NA> <NA> <NA> NA
The output can be visualised using igraph, similar to visualisation function.
GetGOS_ALL function outputs functional network clusters, using FGNet. topGO and DAVID are the 2 available GO methods.
miR="dme-miR-12"
a<-Visualisation(miR,mRNA_type=c("GeneSymbol"),method=c("Both"),platform=c("Affy1"),thresh=100,
visualisation="console")
genes<-a$Target_GeneSymbol
GetGOS_ALL(genes,GO=c("topGO"),term=c("GO_BP"),path="C://",filename="test")