package:multiMiR
- S3 and S4 classesget_multimir()
to query the multiMiR databasemultiMiR 1.22.0
microRNAs (miRNAs) regulate expression by promoting degradation or repressing translation of target transcripts. miRNA target sites have been cataloged in databases based on experimental validation and computational prediction using a variety of algorithms. Several online resources provide collections of multiple databases but need to be imported into other software, such as R, for processing, tabulation, graphing and computation. Currently available miRNA target site packages in R are limited in the number of databases, types of databases and flexibility.
The R package multiMiR, with web server at http://multimir.org, is a comprehensive collection of predicted and validated miRNA-target interactions and their associations with diseases and drugs. multiMiR includes several novel features not available in existing R packages:
The multiMiR package enables retrieval of miRNA-target interactions from 14 external databases in R without the need to visit all these databases. Advanced users can also submit SQL queries to the web server to retrieve results. See the publication on PubMed for additional detail on the database and its creation. The database is now versioned so it is possible to use previous versions of databases from the current R package, however the package defaults to the most recent version.
Warning There are issues with merging target IDs from older unmaintained databases. Databases that have been updated more recently (1-2 years) use current versions of annotated IDs. In each update these old target IDs are carried over due to a lack of a reliable method to disambiguate the original ID with current IDs. Please keep this in mind with results from older databases that have not been updated. We continue to look at methods to resolve these ambiguities and improve target agreement between databases. You can use the unique() R function to identify and then remove multiple target genes if needed.
The multiMiR web server
http://multimir.org hosts a database
containing miRNA-target interactions from external databases. The package
multiMiR provides functions to communicate with the multiMiR web server and
its database. The multiMiR database is now versioned. By default multiMiR
will use the most recent version each time multiMiR is loaded. However it is
now possible to switch between database versions and get information about the
multiMiR database versions. multimir_dbInfoVersions()
returns a dataframe with
the available versions.
library(multiMiR)
## Warning: replacing previous import 'utils::findMatches' by 'S4Vectors::findMatches' when loading
## 'AnnotationDbi'
## Welcome to multiMiR.
##
## multiMiR database URL has been set to the
## default value: http://multimir.org/
##
## Database Version: 2.3.0 Updated: 2020-04-15
multimir_dbInfoVersions()
db.ver = db.ver
## VERSION UPDATED RDA DBNAME SCHEMA PUBLIC
## 1 2.3.0 2020-04-15 multimir_cutoffs_2.3.rda multimir2_3 multiMiR_DB_schema.sql 1
## 2 2.2.0 2017-08-08 multimir_cutoffs_2.2.rda multimir2_2 multiMiR_DB_schema.sql 1
## 3 2.1.0 2016-12-22 multimir_cutoffs_2.1.rda multimir2_1 multiMiR_DB_schema.sql 1
## 4 2.0.0 2015-05-01 multimir_cutoffs.rda multimir multiMiR_DB_schema.sql 1
## TABLES
## 1 multiMiR_dbTables.txt
## 2 multiMiR_dbTables.txt
## 3 multiMiR_dbTables.txt
## 4 multiMiR_dbTables.txt
To switch between versions we can use multimir_switchDBVersion()
.
multimir_dbInfoVersions()
vers_table <- vers_table
## VERSION UPDATED RDA DBNAME SCHEMA PUBLIC
## 1 2.3.0 2020-04-15 multimir_cutoffs_2.3.rda multimir2_3 multiMiR_DB_schema.sql 1
## 2 2.2.0 2017-08-08 multimir_cutoffs_2.2.rda multimir2_2 multiMiR_DB_schema.sql 1
## 3 2.1.0 2016-12-22 multimir_cutoffs_2.1.rda multimir2_1 multiMiR_DB_schema.sql 1
## 4 2.0.0 2015-05-01 multimir_cutoffs.rda multimir multiMiR_DB_schema.sql 1
## TABLES
## 1 multiMiR_dbTables.txt
## 2 multiMiR_dbTables.txt
## 3 multiMiR_dbTables.txt
## 4 multiMiR_dbTables.txt
multimir_switchDBVersion(db_version = "2.0.0")
## Now using database version: 2.0.0
vers_table[1, "VERSION"] # current version
curr_vers <-multimir_switchDBVersion(db_version = curr_vers)
## Now using database version: 2.3.0
The remaining functions will query the selected version until the package is reloaded or until we switch to another version.
Information from each external database is stored in a table in the multiMiR database.
To see a list of the tables, we can use the multimir_dbTables()
function.
multimir_dbTables()
db.tables = db.tables
## [1] "diana_microt" "elmmo" "map_counts" "map_metadata" "microcosm" "mir2disease"
## [7] "miranda" "mirdb" "mirecords" "mirna" "mirtarbase" "pharmaco_mir"
## [13] "phenomir" "pictar" "pita" "tarbase" "target" "targetscan"
To display the database schema, we can use the multimir_dbSchema()
function. Following is only a portion of the full output.
--
## -- Table structure for table `mirna`
## --
##
## DROP TABLE IF EXISTS `mirna`;
## CREATE TABLE `mirna` (
## INTEGER UNSIGNED AUTO_INCREMENT, -- mature miRNA unique ID
## mature_mirna_uid VARCHAR(4) NOT NULL, -- organism abbreviation
## org VARCHAR(20) default NULL, -- mature miRNA accession
## mature_mirna_acc VARCHAR(20) default NULL, -- mature miRNA ID/name
## mature_mirna_id PRIMARY KEY (mature_mirna_uid),
## KEY org (org),
## KEY mature_mirna_acc (mature_mirna_acc),
## KEY mature_mirna_id (mature_mirna_id)
##
## );
## --
## -- Table structure for table `target`
## --
##
## DROP TABLE IF EXISTS `target`;
## CREATE TABLE `target` (
## INTEGER UNSIGNED AUTO_INCREMENT, -- target gene unique ID
## target_uid VARCHAR(4) NOT NULL, -- organism abbreviation
## org VARCHAR(80) default NULL, -- target gene symbol
## target_symbol VARCHAR(10) default NULL, -- target gene Entrez gene ID
## target_entrez VARCHAR(20) default NULL, -- target gene Ensembl gene ID
## target_ensembl PRIMARY KEY (target_uid),
## KEY org (org),
## KEY target_symbol (target_symbol),
## KEY target_entrez (target_entrez),
## KEY target_ensembl (target_ensembl)
##
## );
## --
## -- Table structure for table `mirecords`
## --
##
## DROP TABLE IF EXISTS `mirecords`;
## CREATE TABLE `mirecords` (
## INTEGER UNSIGNED NOT NULL, -- mature miRNA unique ID
## mature_mirna_uid INTEGER UNSIGNED NOT NULL, -- target gene unique ID
## target_uid INT(10) default NULL, -- target site number
## target_site_number INT(10) default NULL, -- target site position
## target_site_position VARCHAR(160) default NULL, -- supporting experiment
## experiment VARCHAR(40) default NULL, -- type of supporting experiment
## support_type VARCHAR(10) default NULL, -- PubMed ID
## pubmed_id FOREIGN KEY (mature_mirna_uid)
## REFERENCES mirna(mature_mirna_uid)
## ON UPDATE CASCADE ON DELETE RESTRICT,
## FOREIGN KEY (target_uid)
## REFERENCES target(target_uid)
## ON UPDATE CASCADE ON DELETE RESTRICT
##
## );
## ......
##
## only three of the 19 tables are shown here for demonstration
## (Please note that ## purpose.)
The function multimir_dbInfo()
will display information about the external
miRNA and miRNA-target databases in multiMiR, including version, release date,
link to download the data, and the corresponding table in multiMiR.
multimir_dbInfo()
db.info = db.info
## map_name source_name source_version source_date
## 1 diana_microt DIANA-microT 5 Sept, 2013
## 2 elmmo EIMMo 5 Jan, 2011
## 3 microcosm MicroCosm 5 Sept, 2009
## 4 mir2disease miR2Disease Mar 14, 2011
## 5 miranda miRanda Aug, 2010
## 6 mirdb miRDB 6 June, 2019
## 7 mirecords miRecords 4 Apr 27, 2013
## 8 mirtarbase miRTarBase 7.0 Sept, 2017
## 9 pharmaco_mir Pharmaco-miR (Verified Sets)
## 10 phenomir PhenomiR 2 Feb 15, 2011
## 11 pictar PicTar 2 Dec 21, 2012
## 12 pita PITA 6 Aug 31, 2008
## 13 tarbase TarBase 8 2018
## 14 targetscan TargetScan 7.2 March, 2018
## source_url
## 1 http://diana.imis.athena-innovation.gr/DianaTools/index.php?r=microT_CDS/index
## 2 http://www.mirz.unibas.ch/miRNAtargetPredictionBulk.php
## 3 http://www.ebi.ac.uk/enright-srv/microcosm/cgi-bin/targets/v5/download.pl
## 4 http://www.mir2disease.org
## 5 http://www.microrna.org/microrna/getDownloads.do
## 6 http://mirdb.org
## 7 http://mirecords.biolead.org/download.php
## 8 http://mirtarbase.mbc.nctu.edu.tw/php/download.php
## 9 http://www.pharmaco-mir.org/home/download_VERSE_db
## 10 http://mips.helmholtz-muenchen.de/phenomir/
## 11 http://dorina.mdc-berlin.de
## 12 http://genie.weizmann.ac.il/pubs/mir07/mir07_data.html
## 13 http://carolina.imis.athena-innovation.gr/diana_tools/web/index.php?r=tarbasev8%2Findex
## 14 http://www.targetscan.org/cgi-bin/targetscan/data_download.cgi?db=vert_61
Among the 14 external databases, eight contain predicted miRNA-target interactions (DIANA-microT-CDS, ElMMo, MicroCosm, miRanda, miRDB, PicTar, PITA, and TargetScan), three have experimentally validated miRNA-target interactions (miRecords, miRTarBase, and TarBase) and the remaining three contain miRNA-drug/disease associations (miR2Disease, Pharmaco-miR, and PhenomiR). To check these categories and databases from within R, we have a set of four helper functions:
predicted_tables()
## [1] "diana_microt" "elmmo" "microcosm" "miranda" "mirdb" "pictar"
## [7] "pita" "targetscan"
validated_tables()
## [1] "mirecords" "mirtarbase" "tarbase"
diseasedrug_tables()
## [1] "mir2disease" "pharmaco_mir" "phenomir"
reverse_table_lookup("targetscan")
## [1] "predicted"
To see how many records are in these 14 external databases we refer to the
multimir_dbCount
function.
multimir_dbCount()
db.count = db.count
## map_name human_count mouse_count rat_count total_count
## 1 diana_microt 7664602 3747171 0 11411773
## 2 elmmo 3959112 1449133 547191 5955436
## 3 microcosm 762987 534735 353378 1651100
## 4 mir2disease 2875 0 0 2875
## 5 miranda 5429955 2379881 247368 8057204
## 6 mirdb 1990425 1091263 199250 3280938
## 7 mirecords 2425 449 171 3045
## 8 mirtarbase 544588 50673 652 595913
## 9 pharmaco_mir 308 5 0 313
## 10 phenomir 15138 491 0 15629
## 11 pictar 404066 302236 0 706302
## 12 pita 7710936 5163153 0 12874089
## 13 tarbase 433048 209831 1307 644186
## 14 targetscan 13906497 10442093 0 24348590
apply(db.count[,-1], 2, sum)
## human_count mouse_count rat_count total_count
## 42826962 25371114 1349317 69547393
The current version of multiMiR contains nearly 50 million records.
package:multiMiR
- S3 and S4 classesWith the addition of multiMiR to Bioconductor, the return object has changed
from an S3 (mmquery
) to an S4 class (mmquery_bioc
). The new S4 object has a
similar structure to the prior version, except all returned datasets (validated,
predicted, disease.drug) are now combined into a single dataset. To get only
one type, filter on the type
variable using the AnnotationDbi method discussed
later or the base R approach subset(myqry@data, myqry@data$type == "validated")
).
For backwards compatibility, get_multimir()
will return the old S3 object if
the argument legacy.out = TRUE
.
Features are now accessible using the S4 accessor operator @
. Additionally,
the AnnotationDbi
accessor methods column
, keys
, keytypes
, and select
all work for mmquery_bioc
objects. See Section 7.6.
In addition to functions displaying database and table information, the
multiMiR package also provides the list_multimir()
function to list all the
unique miRNAs, target genes, drugs, and diseases in the multiMiR database. An
option for limiting the number of returned records has been added to help with
testing and exploration.
list_multimir("mirna", limit = 10)
miRNAs = list_multimir("gene", limit = 10)
genes = list_multimir("drug", limit = 10)
drugs = list_multimir("disease", limit = 10)
diseases =# executes 2 separate queries, giving 20 results
head(miRNAs)
## mature_mirna_uid org mature_mirna_acc mature_mirna_id
## 1 7829 ath MIMAT0000184 ath-miR163
## 2 7833 ath MIMAT0000201 ath-miR170-3p
## 3 7831 ath MIMAT0000206 ath-miR173-5p
## 4 7828 ath MIMAT0001001 ath-miR400
## 5 7837 ath MIMAT0001004 ath-miR403-3p
## 6 7830 ath MIMAT0001011 ath-miR408-3p
head(genes)
## target_uid org target_symbol target_entrez target_ensembl
## 1 112854 ath AT1G02860
## 2 112855 ath AT1G06180
## 3 112856 ath AT1G06580
## 4 112857 ath AT1G15125
## 5 112868 ath At1g31280
## 6 112869 ath At1g50055
head(drugs)
## drug
## 1 3,3'-diindolylmethane
## 2 5-fluoroucil
## 3 abt-737
## 4 alitretinoin
## 5 arabinocytosine
## 6 arsenic trioxide
head(diseases)
## disease
## 1 ACTH-INDEPENDENT MACRONODULAR ADRENAL HYPERPLASIA; AIMAH
## 2 ACUTE LYMPHOBLASTIC LEUKEMIA (ALL)
## 3 ACUTE MYELOGENEOUS LEUKEMIA (AML)
## 4 ACUTE MYELOID LEUKEMIA (AML)
## 5 ACUTE PROMYELOCYTIC LEUKEMIA (APL)
## 6 ADENOMA
The current version of multiMiR has 5830 miRNAs and 97186 target genes from human, mouse, and rat, as well as 64 drugs and 223 disease terms. Depending on the speed of your Internet connection, it may take a few minutes to retrieve the large number of target genes.
get_multimir()
to query the multiMiR databaseget_multimir()
is the main function in the package to retrieve predicted and
validated miRNA-target interactions and their disease and drug associations from
the multiMiR database.
To get familiar with the parameters in get_multimir()
, you can type
?get_multimir
or help(get_multimir)
in R. In the next section, many examples
illustrate the use of the parameters.
This example shows the use of multiMiR
alongside the edgeR
Bioconductor
package. Here we take microRNA data from ISS and ILS mouse strains and conduct
a differential expression analysis. The top differentially expresssed
microRNA’s are then used to search the multiMiR database for validated
target genes.
library(edgeR)
## Loading required package: limma
library(multiMiR)
# Load data
system.file("extdata", "counts_table.Rds", package = "multiMiR")
counts_file <- system.file("extdata", "strains_factor.Rds", package = "multiMiR")
strains_file <- readRDS(counts_file)
counts_table <- readRDS(strains_file)
strains_factor <-
# Standard edgeR differential expression analysis
model.matrix(~ strains_factor)
design <-
# Using trended dispersions
DGEList(counts = counts_table)
dge <- calcNormFactors(dge)
dge <-$samples$strains <- strains_factor
dge estimateGLMCommonDisp(dge, design)
dge <- estimateGLMTrendedDisp(dge, design)
dge <- estimateGLMTagwiseDisp(dge, design)
dge <-
# Fit GLM model for strain effect
glmFit(dge, design)
fit <- glmLRT(fit)
lrt <-
# Table of unadjusted p-values (PValue) and FDR values
topTags(lrt, adjust.method = 'BH', n = Inf)
p_val_DE_edgeR <-
# Getting top differentially expressed miRNA's
rownames(p_val_DE_edgeR$table)[1:10]
top_miRNAs <-
# Plug miRNA's into multiMiR and getting validated targets
get_multimir(org = 'mmu',
multimir_results <-mirna = top_miRNAs,
table = 'validated',
summary = TRUE)
## Searching mirecords ...
## Searching mirtarbase ...
## Searching tarbase ...
head(multimir_results@data)
## database mature_mirna_acc mature_mirna_id target_symbol target_entrez target_ensembl
## 1 mirecords MIMAT0000233 mmu-miR-200b-3p Zeb2 24136 ENSMUSG00000026872
## 2 mirecords MIMAT0000233 mmu-miR-200b-3p Flt1 14254 ENSMUSG00000029648
## 3 mirecords MIMAT0000153 mmu-miR-141-3p Dlx5 13395 ENSMUSG00000029755
## 4 mirecords MIMAT0000519 mmu-miR-200a-3p Dlx5 13395 ENSMUSG00000029755
## 5 mirecords MIMAT0000541 mmu-miR-96-5p Aqp5 11830 ENSMUSG00000044217
## 6 mirecords MIMAT0000541 mmu-miR-96-5p Celsr2 53883 ENSMUSG00000068740
## experiment support_type pubmed_id type
## 1 Western blot 17585049 validated
## 2 21115742 validated
## 3 Western blot//Luciferase activity assay 19454767 validated
## 4 Western blot//Luciferase activity assay 19454767 validated
## 5 19363478 validated
## 6 19363478 validated
In this section a variety of examples are described on how to query the multiMiR database.
In the first example, we ask what genes are validated targets of hsa-miR-18a-3p.
# The default is to search validated interactions in human
get_multimir(mirna = 'hsa-miR-18a-3p', summary = TRUE) example1 <-
## Searching mirecords ...
## Searching mirtarbase ...
## Searching tarbase ...
names(example1)
## NULL
# Check which types of associations were returned
table(example1@data$type)
##
## validated
## 803
# Detailed information of the validated miRNA-target interaction
head(example1@data)
## database mature_mirna_acc mature_mirna_id target_symbol target_entrez target_ensembl
## 1 mirecords MIMAT0002891 hsa-miR-18a-3p KRAS 3845 ENSG00000133703
## 2 mirtarbase MIMAT0002891 hsa-miR-18a-3p KRAS 3845 ENSG00000133703
## 3 mirtarbase MIMAT0002891 hsa-miR-18a-3p G3BP1 10146 ENSG00000145907
## 4 mirtarbase MIMAT0002891 hsa-miR-18a-3p DHCR24 1718 ENSG00000116133
## 5 mirtarbase MIMAT0002891 hsa-miR-18a-3p POLR3G 10622 ENSG00000113356
## 6 mirtarbase MIMAT0002891 hsa-miR-18a-3p NFIC 4782 ENSG00000141905
## experiment support_type pubmed_id type
## 1 Western blot//Luciferase activity assay 19372139 validated
## 2 Luciferase reporter assay//qRT-PCR//Western blot Functional MTI 19372139 validated
## 3 CLASH Functional MTI (Weak) 23622248 validated
## 4 CLASH Functional MTI (Weak) 23622248 validated
## 5 PAR-CLIP Functional MTI (Weak) 23446348 validated
## 6 CLASH Functional MTI (Weak) 23622248 validated
# Which interactions are supported by Luciferase assay?
@data[grep("Luciferase", example1@data[, "experiment"]), ] example1
## database mature_mirna_acc mature_mirna_id target_symbol target_entrez target_ensembl
## 1 mirecords MIMAT0002891 hsa-miR-18a-3p KRAS 3845 ENSG00000133703
## 2 mirtarbase MIMAT0002891 hsa-miR-18a-3p KRAS 3845 ENSG00000133703
## 19 mirtarbase MIMAT0002891 hsa-miR-18a-3p CBX7 23492 ENSG00000100307
## 35 mirtarbase MIMAT0002891 hsa-miR-18a-3p ATM 472 ENSG00000149311
## experiment support_type pubmed_id
## 1 Western blot//Luciferase activity assay 19372139
## 2 Luciferase reporter assay//qRT-PCR//Western blot Functional MTI 19372139
## 19 Luciferase reporter assay//Western blot Functional MTI 28123848
## 35 Immunofluorescence//Luciferase reporter assay//qRT-PCR//Western blot Functional MTI 25963391
## type
## 1 validated
## 2 validated
## 19 validated
## 35 validated
@summary[example1@summary[,"target_symbol"] == "KRAS",] example1
## mature_mirna_acc mature_mirna_id target_symbol target_entrez target_ensembl mirecords
## 17 MIMAT0002891 hsa-miR-18a-3p KRAS 3845 ENSG00000133703 1
## mirtarbase tarbase validated.sum all.sum
## 17 1 0 2 2
It turns out that KRAS is the only target validated by Luciferase assay. The
interaction was recorded in miRecords and miRTarBase and supported by the same
literature, whose PubMed ID is in column pubmed_id
. The summary (by setting
summary = TRUE
when calling get_multimir()
) shows the number of records in
each of the external databases and the total number of databases supporting the
interaction.
In this example we would like to know which miRNAs and their target genes are associated with Cisplatin, a chemotherapy drug used in several cancers.
get_multimir(disease.drug = 'cisplatin', table = 'disease.drug') example2 <-
## Searching mir2disease ...
## Searching pharmaco_mir ...
## Searching phenomir ...
names(example2)
## NULL
nrow(example2@data)
## [1] 45
table(example2@data$type)
##
## disease.drug
## 45
head(example2@data)
## database mature_mirna_acc mature_mirna_id target_symbol target_entrez target_ensembl
## 1 pharmaco_mir MIMAT0000772 hsa-miR-345-5p ABCC1 4363 ENSG00000103222
## 2 pharmaco_mir MIMAT0000720 hsa-miR-376c-3p ALK7
## 3 pharmaco_mir MIMAT0000423 hsa-miR-125b-5p BAK1 578 ENSG00000030110
## 4 pharmaco_mir hsa-miR-34 BCL2 596 ENSG00000171791
## 5 pharmaco_mir MIMAT0000318 hsa-miR-200b-3p BCL2 596 ENSG00000171791
## 6 pharmaco_mir MIMAT0000617 hsa-miR-200c-3p BCL2 596 ENSG00000171791
## disease_drug paper_pubmedID type
## 1 cisplatin 20099276 disease.drug
## 2 cisplatin 21224400 disease.drug
## 3 cisplatin 21823019 disease.drug
## 4 cisplatin 18803879 disease.drug
## 5 cisplatin 21993663 disease.drug
## 6 cisplatin 21993663 disease.drug
get_multimir()
returns 53 miRNA-target pairs. For more information, we can
always refer to the published papers with PubMed IDs in column paper_pubmedID
.
get_multimir()
also takes target gene(s) as input. In this example we retrieve
miRNAs predicted to target Gnb1 in mouse. For predicted interactions, the
default is to query the top 20% predictions within each external database,
which is equivalent to setting parameters predicted.cutoff = 20
and
predicted.cutoff.type = 'p'
(for percentage cutoff). Here we search the top
35% among all conserved and nonconserved target sites.
get_multimir(org = "mmu",
example3 <-target = "Gnb1",
table = "predicted",
summary = TRUE,
predicted.cutoff = 35,
predicted.cutoff.type = "p",
predicted.site = "all")
## Searching diana_microt ...
## Searching elmmo ...
## Searching microcosm ...
## Searching miranda ...
## Searching mirdb ...
## Searching pictar ...
## Searching pita ...
## Searching targetscan ...
names(example3)
## NULL
table(example3@data$type)
##
## predicted
## 716
head(example3@data)
## database mature_mirna_acc mature_mirna_id target_symbol target_entrez target_ensembl
## 1 diana_microt MIMAT0000663 mmu-miR-218-5p Gnb1 14688 ENSMUSG00000029064
## 2 diana_microt MIMAT0017276 mmu-miR-493-5p Gnb1 14688 ENSMUSG00000029064
## 3 diana_microt MIMAT0000656 mmu-miR-139-5p Gnb1 14688 ENSMUSG00000029064
## 4 diana_microt MIMAT0014946 mmu-miR-3074-2-3p Gnb1 14688 ENSMUSG00000029064
## 5 diana_microt MIMAT0000144 mmu-miR-132-3p Gnb1 14688 ENSMUSG00000029064
## 6 diana_microt MIMAT0020608 mmu-miR-5101 Gnb1 14688 ENSMUSG00000029064
## score type
## 1 0.975 predicted
## 2 0.964 predicted
## 3 0.96 predicted
## 4 0.921 predicted
## 5 0.92 predicted
## 6 0.918 predicted
head(example3@summary)
## mature_mirna_acc mature_mirna_id target_symbol target_entrez target_ensembl diana_microt
## 1 MIMAT0000133 mmu-miR-101a-3p Gnb1 14688 ENSMUSG00000029064 1
## 2 MIMAT0000616 mmu-miR-101b-3p Gnb1 14688 ENSMUSG00000029064 1
## 3 MIMAT0000663 mmu-miR-218-5p Gnb1 14688 ENSMUSG00000029064 1
## 4 MIMAT0003476 mmu-miR-669b-5p Gnb1 14688 ENSMUSG00000029064 1
## 5 MIMAT0017276 mmu-miR-493-5p Gnb1 14688 ENSMUSG00000029064 1
## 6 MIMAT0000144 mmu-miR-132-3p Gnb1 14688 ENSMUSG00000029064 1
## elmmo microcosm miranda mirdb pictar pita targetscan predicted.sum all.sum
## 1 2 1 1 0 2 0 0 5 5
## 2 2 1 1 0 2 0 0 5 5
## 3 4 0 0 1 0 2 1 5 5
## 4 0 0 1 2 0 1 2 5 5
## 5 4 0 0 2 2 0 1 5 5
## 6 2 0 0 2 0 0 2 4 4
The records in example3@predicted
are ordered by scores from best to
worst within each external database. Once again, the summary option allows us to
examine the number of target sites predicted by each external database and the
total number of databases predicting the interaction.
Finally we examine how many predictions each of the databases has.
apply(example3@summary[, 6:13], 2, function(x) sum(x > 0))
## diana_microt elmmo microcosm miranda mirdb pictar pita
## 105 51 5 43 37 9 132
## targetscan
## 177
You may have a list of genes involved in a common biological process. It is interesting to check whether some, or all, of these genes are targeted by the same miRNA(s). Here we have four genes involved in chronic obstructive pulmonary disease (COPD) in human and want to know what miRNAs target these genes by searching the top 500,000 predictions in each external database.
get_multimir(org = 'hsa',
example4 <-target = c('AKT2', 'CERS6', 'S1PR3', 'SULF2'),
table = 'predicted',
summary = TRUE,
predicted.cutoff.type = 'n',
predicted.cutoff = 500000)
## Number predicted cutoff (predicted.cutoff) 500000 is larger than the total number of records in table pictar. All records will be queried.
## Number predicted cutoff (predicted.cutoff) 500000 is larger than the total number of records in table targetscan. All records will be queried.
## Searching diana_microt ...
## Searching elmmo ...
## Searching microcosm ...
## Searching miranda ...
## Searching mirdb ...
## Searching pictar ...
## Searching pita ...
## Searching targetscan ...
Then we count the number of target genes for each miRNA.
addmargins(table(example4@summary[, 2:3]))
example4.counts <- example4.counts[-nrow(example4.counts), ]
example4.counts <- example4.counts[order(example4.counts[, 5], decreasing = TRUE), ]
example4.counts <-head(example4.counts)
## target_symbol
## mature_mirna_id AKT2 CERS6 S1PR3 SULF2 Sum
## hsa-miR-129-5p 0 1 2 1 4
## hsa-miR-330-3p 0 1 2 1 4
## hsa-miR-144-3p 0 1 2 0 3
## hsa-miR-3180-5p 0 1 2 0 3
## hsa-miR-325-3p 1 1 0 1 3
## hsa-miR-34a-5p 0 1 2 0 3
In this example, we profiled miRNA and mRNA expression in poorly metastatic bladder cancer cell lines T24 and Luc, and their metastatic derivatives FL4 and Lul2, respectively. We identified differentially expressed miRNAs and genes between the metastatic and poorly metastatic cells. Let’s load the data.
load(url("http://multimir.org/bladder.rda"))
Variable DE.miRNA.up
contains 9 up-regulated miRNAs and variable
DE.entrez.dn
has 47 down-regulated genes in the two metastatic cell
lines. The hypothesis is that interactions between these miRNAs and genes whose
expression changed at opposite directions may play a role in cancer metastasis.
So we use multiMiR
to check whether any of the nine miRNAs could
target any of the 47 genes.
# search all tables & top 10% predictions
get_multimir(org = "hsa",
example5 <-mirna = DE.miRNA.up,
target = DE.entrez.dn,
table = "all",
summary = TRUE,
predicted.cutoff.type = "p",
predicted.cutoff = 10,
use.tibble = TRUE)
## Searching mirecords ...
## Warning: `as_data_frame()` was deprecated in tibble 2.0.0.
## ℹ Please use `as_tibble()` (with slightly different semantics) to convert to a tibble, or
## `as.data.frame()` to convert to a data frame.
## ℹ The deprecated feature was likely used in the multiMiR package.
## Please report the issue at <https://github.com/KechrisLab/multiMiR/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.
## Searching mirtarbase ...
## Searching tarbase ...
## Searching diana_microt ...
## Searching elmmo ...
## Searching microcosm ...
## Searching miranda ...
## Searching mirdb ...
## Searching pictar ...
## Searching pita ...
## Searching targetscan ...
## Searching mir2disease ...
## Searching pharmaco_mir ...
## Searching phenomir ...
## Warning in matrix(info[, !is.na(p.m)], ncol = sum(!is.na(p.m))): data length [917] is not a
## sub-multiple or multiple of the number of rows [153]
## Warning in cbind(info, predicted.sum = p.sum): number of rows of result is not a multiple of vector
## length (arg 2)
## Joining with `by = join_by(database, mature_mirna_acc, mature_mirna_id, target_symbol,
## target_entrez, target_ensembl, type)`
## Joining with `by = join_by(database, mature_mirna_acc, mature_mirna_id, target_symbol,
## target_entrez, target_ensembl, type)`
table(example5@data$type)
##
## disease.drug predicted validated
## 442 160 98
select(example5, keytype = "type", keys = "validated", columns = columns(example5))
result <-
unique_pairs <- result[!duplicated(result[, c("mature_mirna_id", "target_entrez")]), ]
result
## # A tibble: 98 × 13
## database mature_mirna_acc mature_mirna_id target_symbol target_entrez target_ensembl experiment
## <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 mirtarbase MIMAT0000418 hsa-miR-23b-3p SWAP70 23075 ENSG000001337… PAR-CLIP
## 2 mirtarbase MIMAT0000418 hsa-miR-23b-3p RRAS2 22800 ENSG000001338… Luciferas…
## 3 mirtarbase MIMAT0000087 hsa-miR-30a-5p FDX1 2230 ENSG000001377… Proteomics
## 4 mirtarbase MIMAT0000087 hsa-miR-30a-5p LIMCH1 22998 ENSG000000640… pSILAC//P…
## 5 mirtarbase MIMAT0000087 hsa-miR-30a-5p PEG10 23089 ENSG000002422… PAR-CLIP
## 6 mirtarbase MIMAT0000259 hsa-miR-182-5p CUL5 8065 ENSG000001662… qRT-PCR
## 7 mirtarbase MIMAT0000245 hsa-miR-30d-5p PEG10 23089 ENSG000002422… PAR-CLIP
## 8 mirtarbase MIMAT0000420 hsa-miR-30b-5p PEG10 23089 ENSG000002422… PAR-CLIP
## 9 tarbase MIMAT0000420 hsa-miR-30b-5p KRAS 3845 ENSG000001337… Degradome…
## 10 tarbase MIMAT0000259 hsa-miR-182-5p SWAP70 23075 ENSG000001337… Degradome…
## # ℹ 88 more rows
## # ℹ 6 more variables: support_type <chr>, pubmed_id <chr>, type <chr>, score <chr>,
## # disease_drug <chr>, paper_pubmedID <chr>
In the result, there are 85 unique miRNA-gene pairs that have been validated.
"disease_drug"
mykeytype <-
keys(example5, keytype = mykeytype)
mykeys <- mykeys[grep("bladder", mykeys, ignore.case = TRUE)]
mykeys <-
select(example5, keytype = "disease_drug", keys = mykeys,
result <-columns = columns(example5))
result
## # A tibble: 3 × 13
## database mature_mirna_acc mature_mirna_id target_symbol target_entrez target_ensembl experiment
## <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 mir2disease MIMAT0000418 hsa-miR-23b-3p NA NA NA <NA>
## 2 phenomir MIMAT0000418 hsa-miR-23b-3p NA NA NA <NA>
## 3 phenomir MIMAT0000449 hsa-miR-146a-5p NA NA NA <NA>
## # ℹ 6 more variables: support_type <chr>, pubmed_id <chr>, type <chr>, score <chr>,
## # disease_drug <chr>, paper_pubmedID <chr>
2 miRNAs are associated with bladder cancer in miR2Disease and PhenomiR.
The predicted databases predict 65 miRNA-gene pairs between the 9 miRNAs and 28 of the 47 genes.
select(example5, keytype = "type", keys = "predicted",
predicted <-columns = columns(example5))
length(unique(predicted$mature_mirna_id))
## [1] 8
length(unique(predicted$target_entrez))
## [1] 26
unique.pairs <- unique(data.frame(miRNA.ID = as.character(predicted$mature_mirna_id),
target.Entrez = as.character(predicted$target_entrez)))
nrow(unique.pairs)
## [1] 58
head(unique.pairs)
## miRNA.ID target.Entrez
## 1 hsa-miR-182-5p 1112
## 3 hsa-miR-182-5p 2017
## 5 hsa-miR-30b-5p 22998
## 6 hsa-miR-30a-5p 22998
## 9 hsa-miR-30d-5p 22998
## 11 hsa-miR-182-5p 5962
Results from each of the predicted databases are already ordered by their scores from best to worst.
split(predicted, predicted$database) example5.split <-
AnnotationDbi accessor methods can be used to select and filter the data
returned by get_multimir()
.
# On example4's result
columns(example4)
## [1] "database" "mature_mirna_acc" "mature_mirna_id" "score" "target_ensembl"
## [6] "target_entrez" "target_symbol" "type"
head(keys(example4))
## [1] "hsa-miR-4795-3p" "hsa-miR-126-5p" "hsa-miR-545-3p" "hsa-miR-3944-5p" "hsa-miR-5688"
## [6] "hsa-miR-137-3p"
keytypes(example4)
## [1] "database" "mature_mirna_acc" "mature_mirna_id" "score" "target_ensembl"
## [6] "target_entrez" "target_symbol" "type"
keys(example4)[1:4]
mykeys <-head(select(example4, keys = mykeys,
columns = c("database", "target_entrez")))
## database mature_mirna_id target_entrez
## 1 diana_microt hsa-miR-4795-3p 1903
## 2 diana_microt hsa-miR-126-5p 1903
## 3 diana_microt hsa-miR-545-3p 253782
## 4 diana_microt hsa-miR-3944-5p 55959
## 63 diana_microt hsa-miR-4795-3p 253782
## 425 elmmo hsa-miR-545-3p 253782
# On example3's result
columns(example3)
## [1] "database" "mature_mirna_acc" "mature_mirna_id" "score" "target_ensembl"
## [6] "target_entrez" "target_symbol" "type"
head(keys(example3))
## [1] "mmu-miR-218-5p" "mmu-miR-493-5p" "mmu-miR-139-5p" "mmu-miR-3074-2-3p"
## [5] "mmu-miR-132-3p" "mmu-miR-5101"
keytypes(example3)
## [1] "database" "mature_mirna_acc" "mature_mirna_id" "score" "target_ensembl"
## [6] "target_entrez" "target_symbol" "type"
keys(example3)[1:4]
mykeys <-head(select(example3, keys = mykeys,
columns = c("database", "target_entrez", "score")))
## database mature_mirna_id target_entrez score
## 1 diana_microt mmu-miR-218-5p 14688 0.975
## 2 diana_microt mmu-miR-493-5p 14688 0.964
## 3 diana_microt mmu-miR-139-5p 14688 0.96
## 4 diana_microt mmu-miR-3074-2-3p 14688 0.921
## 106 elmmo mmu-miR-218-5p 14688 0.849
## 107 elmmo mmu-miR-218-5p 14688 0.849
# Search by gene on example4's result
columns(example4)
## [1] "database" "mature_mirna_acc" "mature_mirna_id" "score" "target_ensembl"
## [6] "target_entrez" "target_symbol" "type"
keytypes(example4)
## [1] "database" "mature_mirna_acc" "mature_mirna_id" "score" "target_ensembl"
## [6] "target_entrez" "target_symbol" "type"
head(keys(example4, keytype = "target_entrez"))
## [1] "1903" "253782" "55959" "208" "286223"
keys(example4, keytype = "target_entrez")[1]
mykeys <-head(select(example4, keys = mykeys, keytype = "target_entrez",
columns = c("database", "target_entrez", "score")))
## database target_entrez score
## 1 diana_microt 1903 1
## 2 diana_microt 1903 0.998
## 5 diana_microt 1903 0.994
## 19 diana_microt 1903 0.98
## 32 diana_microt 1903 0.964
## 56 diana_microt 1903 0.934
As shown previously, get_multimir is the main function to retrieve
information from the multiMiR database, which is hosted at
http://multimir.org. The function builds one SQL query for every
external database that the user is going to search, submits the query to the web
server, and parses, combines, and summarizes results from the web server. For
advanced users, there are a couple ways to query the multiMiR
database without using the multiMiR package; but they have to be
familiar with SQL queries. In general, users are still advised to use the
get_multimir()
function when querying multiple external databases in
multiMiR.
The multiMiR package communicates with the multiMiR database via the script http://multimir.org/cgi-bin/multimir_univ.pl on the web server. Once again, data from each of the external databases is stored in a table in multiMiR. There are also tables for miRNAs (table mirna) and target genes (table target).
NOTE: While it is possible to complete short queries from a browser, the limits of
submitting a query through typing in the address bar of a browser are quickly reached
(8192 characters total). If you are a developer you should use your preferred method
to submit a HTTP POST which will allow for longer queries. The fields to include are
query
and dbName
. query
is the SQL query to submit. dbName
is the
DBNAME column from a call to multimir_dbInfoVersions()
, however if this is
excluded the current version is the default.
To learn about the structure of a table (e.g. DIANA-microT data in table diana_microt), users can use URL
http://multimir.org/cgi-bin/multimir_univ.pl?query=describe diana_microt
Similar with Example 1, the following URL searches for validated target genes of hsa-miR-18a-3p in miRecords.
As you can see, the query is long and searches just one of the three validated
tables in multiMiR. While in Example 1, one line of R command using the
get_multimir()
function searches, combines and summarizes results from all
three validated external databases (miRecords, miRTarBase and TarBase).
The same direct queries we did above on the web server can be done in R as well.
This is the preferred method if you are unfamiliar with HTTP POST. Be sure to
set the correct database version, if you wish to change versions, before calling
search_multimir()
it uses the currently set version.
To show the structure of table diana_microt:
search_multimir(query = "describe diana_microt")
direct2 <- direct2
## Field Type Null Key Default Extra
## 1 mature_mirna_uid int unsigned NO MUL
## 2 target_uid int unsigned NO MUL
## 3 miTG_score double NO MUL
## 4 UTR3_hit int unsigned NO
## 5 CDS_hit int unsigned NO
To search for validated target genes of hsa-miR-18a-3p in miRecords:
"SELECT m.mature_mirna_acc, m.mature_mirna_id, t.target_symbol,
qry <- t.target_entrez, t.target_ensembl, i.experiment, i.support_type,
i.pubmed_id
FROM mirna AS m INNER JOIN mirecords AS i INNER JOIN target AS t
ON (m.mature_mirna_uid=i.mature_mirna_uid and
i.target_uid=t.target_uid)
WHERE m.mature_mirna_id='hsa-miR-18a-3p'"
search_multimir(query = qry)
direct3 <- direct3
## mature_mirna_acc mature_mirna_id target_symbol target_entrez target_ensembl
## 1 MIMAT0002891 hsa-miR-18a-3p KRAS 3845 ENSG00000133703
## experiment support_type pubmed_id
## 1 Western blot//Luciferase activity assay 19372139
sessionInfo()
## R version 4.3.0 RC (2023-04-13 r84269)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.2 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.17-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_GB
## [4] LC_COLLATE=C LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
## [10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] edgeR_3.42.0 limma_3.56.0 multiMiR_1.22.0 knitr_1.42 BiocStyle_2.28.0
##
## loaded via a namespace (and not attached):
## [1] KEGGREST_1.40.0 xfun_0.39 bslib_0.4.2 Biobase_2.60.0
## [5] lattice_0.21-8 vctrs_0.6.2 tools_4.3.0 bitops_1.0-7
## [9] generics_0.1.3 stats4_4.3.0 tibble_3.2.1 fansi_1.0.4
## [13] AnnotationDbi_1.62.0 RSQLite_2.3.1 blob_1.2.4 pkgconfig_2.0.3
## [17] S4Vectors_0.38.0 lifecycle_1.0.3 GenomeInfoDbData_1.2.10 compiler_4.3.0
## [21] Biostrings_2.68.0 GenomeInfoDb_1.36.0 htmltools_0.5.5 sass_0.4.5
## [25] RCurl_1.98-1.12 yaml_2.3.7 pillar_1.9.0 crayon_1.5.2
## [29] jquerylib_0.1.4 cachem_1.0.7 tidyselect_1.2.0 locfit_1.5-9.7
## [33] digest_0.6.31 dplyr_1.1.2 purrr_1.0.1 bookdown_0.33
## [37] splines_4.3.0 fastmap_1.1.1 grid_4.3.0 cli_3.6.1
## [41] magrittr_2.0.3 XML_3.99-0.14 utf8_1.2.3 bit64_4.0.5
## [45] rmarkdown_2.21 XVector_0.40.0 httr_1.4.5 bit_4.0.5
## [49] png_0.1-8 memoise_2.0.1 evaluate_0.20 IRanges_2.34.0
## [53] rlang_1.1.0 Rcpp_1.0.10 glue_1.6.2 DBI_1.1.3
## [57] BiocManager_1.30.20 BiocGenerics_0.46.0 jsonlite_1.8.4 R6_2.5.1
## [61] zlibbioc_1.46.0
warnings()