MeSH (Medical Subject Headings) is the NLM (U.S. National Library of Medicine) controlled vocabulary used to manually index articles for MEDLINE/PubMed. MeSH is comprehensive life science vocabulary. MeSH has 19 categories and MeSH.db
contains 16 of them. That is:
Abbreviation | Category |
---|---|
A | Anatomy |
B | Organisms |
C | Diseases |
D | Chemicals and Drugs |
E | Analytical, Diagnostic and Therapeutic Techniques and Equipment |
F | Psychiatry and Psychology |
G | Phenomena and Processes |
H | Disciplines and Occupations |
I | Anthropology, Education, Sociology and Social Phenomena |
J | Technology and Food and Beverages |
K | Humanities |
L | Information Science |
M | Persons |
N | Health Care |
V | Publication Type |
Z | Geographical Locations |
MeSH terms were associated with Entrez Gene ID by three methods, gendoo
, gene2pubmed
and RBBH
(Reciprocal Blast Best Hit).
Method | Way of corresponding Entrez Gene IDs and MeSH IDs |
---|---|
Gendoo | Text-mining |
gene2pubmed | Manual curation by NCBI teams |
RBBH | sequence homology with BLASTP search (E-value<10-50) |
meshes
supports enrichment analysis (over-representation analysis and gene set enrichment analysis) of gene list or whole expression profile using MeSH annotation. Data source from gendoo
, gene2pubmed
and RBBH
are all supported. User can selecte interesting category to test. All 16 categories are supported. The analysis supports >70 species listed in MeSHDb BiocView.
For algorithm details, please refer to the vignettes of DOSE1 package.
library(meshes)
data(geneList, package="DOSE")
de <- names(geneList)[1:100]
x <- enrichMeSH(de, MeSHDb = "MeSH.Hsa.eg.db", database='gendoo', category = 'C')
head(x)
## ID Description GeneRatio BgRatio pvalue
## D043171 D043171 Chromosomal Instability 16/96 198/16528 2.794765e-14
## D000782 D000782 Aneuploidy 17/96 320/16528 3.866830e-12
## D042822 D042822 Genomic Instability 16/96 312/16528 3.007419e-11
## D012595 D012595 Scleroderma, Systemic 11/96 279/16528 6.449334e-07
## D009303 D009303 Nasopharyngeal Neoplasms 11/96 314/16528 2.049315e-06
## D019698 D019698 Hepatitis C, Chronic 11/96 317/16528 2.246856e-06
## p.adjust qvalue
## D043171 2.434241e-11 1.794534e-11
## D000782 1.684004e-09 1.241456e-09
## D042822 8.731539e-09 6.436931e-09
## D012595 1.404343e-04 1.035288e-04
## D009303 3.261686e-04 2.404530e-04
## D019698 3.261686e-04 2.404530e-04
## geneID
## D043171 4312/991/2305/1062/4605/10403/7153/55355/4751/4085/81620/332/7272/9212/1111/6790
## D000782 4312/55143/991/1062/7153/4751/79019/55839/890/983/4085/332/7272/9212/8208/1111/6790
## D042822 55143/991/1062/4605/7153/1381/9787/4751/10635/890/4085/81620/332/9212/1111/6790
## D012595 4312/6280/1062/4605/7153/3627/4283/6362/7850/3002/4321
## D009303 4312/7153/3627/6241/983/4085/5918/332/3002/4321/6790
## D019698 4312/3627/10563/6373/4283/983/6362/7850/332/3002/3620
## Count
## D043171 16
## D000782 17
## D042822 16
## D012595 11
## D009303 11
## D019698 11
In the over-representation analysis, we use data source from gendoo
and C
(Diseases) category.
In the following example, we use data source from gene2pubmed
and test category G
(Phenomena and Processes) using GSEA.
y <- gseMeSH(geneList, MeSHDb = "MeSH.Hsa.eg.db", database = 'gene2pubmed', category = "G")
## [1] "preparing geneSet collections..."
## [1] "GSEA analysis..."
## [1] "leading edge analysis..."
## [1] "done..."
head(y)
## ID Description setSize enrichmentScore
## D059647 D059647 Gene-Environment Interaction 456 -0.3543814
## D009929 D009929 Organ Size 451 -0.3449653
## D009043 D009043 Motor Activity 398 -0.3391521
## D050156 D050156 Adipogenesis 371 -0.3585031
## D004041 D004041 Dietary Fats 314 -0.3425194
## D049629 D049629 Waist-Hip Ratio 302 -0.3419178
## NES pvalue p.adjust qvalues rank
## D059647 -1.582649 0.001222494 0.03881504 0.02848429 2237
## D009929 -1.538603 0.001225490 0.03881504 0.02848429 2309
## D009043 -1.496853 0.001253133 0.03881504 0.02848429 1757
## D050156 -1.574589 0.001264223 0.03881504 0.02848429 2207
## D004041 -1.482771 0.001331558 0.03881504 0.02848429 1684
## D049629 -1.477004 0.001338688 0.03881504 0.02848429 2176
## leading_edge
## D059647 tags=26%, list=18%, signal=22%
## D009929 tags=26%, list=18%, signal=22%
## D009043 tags=21%, list=14%, signal=18%
## D050156 tags=26%, list=18%, signal=22%
## D004041 tags=21%, list=13%, signal=19%
## D049629 tags=26%, list=17%, signal=22%
## core_enrichment
## D059647 9497/118/8859/6532/23405/7424/2295/7157/8631/627/2774/22891/2908/4088/51151/11132/1387/860/268/7366/2104/4153/29119/3791/1543/3643/22841/1129/5624/3240/3174/3350/5590/55304/55213/1548/2169/196/8204/8863/5021/23284/9162/11005/4256/3426/84159/5334/629/1793/4208/4322/7048/6817/553/56172/3953/22795/2638/210/5243/5468/1393/1012/27136/51314/4023/5172/4319/4214/3952/5577/126/7832/79068/4313/2944/9369/3075/6720/7494/2099/857/57161/9223/4306/79750/4035/4915/10443/5744/5654/100126791/3551/2487/1746/185/2952/6935/4128/4059/4582/27324/9358/64084/7166/6505/9370/3708/3117/80129/125/5105/2018/2167/652/4137/1524/5241
## D009929 154/9846/3315/6716/9732/5139/7337/5530/4086/6532/1499/7157/627/2252/22891/2908/8654/4088/22846/4057/860/268/2735/2104/23522/5480/51131/3082/10253/831/604/1028/182/7173/5624/8743/23047/596/9905/1548/2272/22829/948/27303/4314/196/6019/595/5021/7248/4212/2488/54820/5334/6403/2246/4803/866/5919/79789/1907/7048/1831/4060/2247/5468/8076/5793/3485/1733/3952/126/3778/79068/79633/6653/5244/4313/3625/10468/9201/1501/6720/2273/2099/3480/5764/6387/1471/1462/4016/2690/8817/8821/5125/1191/5350/2162/5744/23541/185/367/4982/25802/4128/150/3479/10451/9370/125/4857/1308/2167/652/57502/4137/8614/5241
## D009043 23621/3082/1291/2915/1543/7466/3240/3350/55304/181/2169/27306/80169/9627/196/8678/8863/23284/81627/4692/5799/2259/3087/1278/1277/3953/4747/2247/6414/210/4744/5468/89795/4023/8522/3485/3952/79068/8864/4313/2944/2273/2099/3480/8528/4908/56892/3339/57161/4741/4306/6571/79750/4915/5744/2487/58503/347/6863/2952/5327/367/4982/4128/4059/3572/150/7060/9358/7166/3479/9254/5348/4129/9370/3708/1311/5105/4137/1408/5241
## D050156 5595/8609/9563/27332/1499/79738/4837/7157/79960/5729/408/2908/4088/6500/8038/4057/6649/5564/860/8648/10365/10253/54884/4602/7474/6776/79875/596/25956/8644/80781/79923/1490/50486/7840/84162/6041/4692/2246/4208/11075/63924/5919/284119/2308/9411/54795/5950/79365/2247/5468/50507/6469/8553/4023/594/7350/81029/3952/79068/5733/4313/10468/10628/6720/11213/55893/290/6678/63895/4035/633/23414/8639/2162/165/3551/10788/185/3357/367/4982/3667/1634/4128/23024/3479/6424/9370/2167/652/8839/54829/2625/79689/10974
## D004041 3554/4925/22841/7466/2181/3350/201134/181/2169/948/55911/324/4018/3426/3087/6785/2308/1581/56172/3953/1384/5950/2166/60481/5468/5166/50507/1012/27136/4023/7056/4214/9365/7350/3952/3778/79068/8864/2944/6720/5159/3991/2203/2819/9223/4035/32/213/165/347/2152/185/3487/5327/3667/54898/150/64084/3479/9370/5105/5174/2018/5346/7021/79689
## D049629 8609/9563/23405/10206/23314/4776/25970/627/2908/490/4057/268/3567/23429/283450/1543/3240/3174/81490/23047/55304/5099/54808/4179/2169/8082/4018/54465/4256/3087/5919/253461/26470/10903/1581/56172/3953/5950/2638/5468/1012/8835/4023/594/4214/7350/3952/79068/51232/2202/6444/9369/2099/3991/4016/57161/79750/4915/5125/5167/8639/11188/2487/2697/6935/3487/367/3667/4059/150/9358/3479/6424/9370/4629/652/5346/7021/4239
User can use visualization methods implemented in DOSE (i.e.barplot
, dotplot
, cnetplot
, enrichMap
, upsetplot
and gseaplot
) to visualize these enrichment results. With these visualization methods, it’s much easier to interpret enriched results.
dotplot(x)
gseaplot(y, y[1,1], title=y[1,2])
meshes
implemented four IC-based methods (i.e. Resnik2, Jiang3, Lin4 and Schlicker5) and one graph-structure based method (i.e. Wang6). For algorithm details, please refer to the vignette of GOSemSim package7
meshSim
function is designed to measure semantic similarity between two MeSH term vectors.
library(meshes)
## hsamd <- meshdata("MeSH.Hsa.eg.db", category='A', computeIC=T, database="gendoo")
data(hsamd)
meshSim("D000009", "D009130", semData=hsamd, measure="Resnik")
## [1] 0.2910261
meshSim("D000009", "D009130", semData=hsamd, measure="Rel")
## [1] 0.521396
meshSim("D000009", "D009130", semData=hsamd, measure="Jiang")
## [1] 0.4914785
meshSim("D000009", "D009130", semData=hsamd, measure="Wang")
## [1] 0.5557103
meshSim(c("D001369", "D002462"), c("D017629", "D002890", "D008928"), semData=hsamd, measure="Wang")
## D017629 D002890 D008928
## D001369 0.2886598 0.1923711 0.2193326
## D002462 0.6521739 0.2381925 0.2809552
geneSim
function is designed to measure semantic similarity among two gene vectors.
geneSim("241", "251", semData=hsamd, measure="Wang", combine="BMA")
## [1] 0.487
geneSim(c("241", "251"), c("835", "5261","241", "994"), semData=hsamd, measure="Wang", combine="BMA")
## 835 5261 241 994
## 241 0.732 0.337 1.000 0.438
## 251 0.526 0.588 0.487 0.597
Here is the output of sessionInfo()
on the system on which this document was compiled:
## R version 3.3.1 (2016-06-21)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.1 LTS
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] parallel stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] meshes_1.0.0 DOSE_3.0.0 MeSH.db_1.7.0
## [4] MeSH.Hsa.eg.db_1.7.0 MeSHDbi_1.10.0 BiocGenerics_0.20.0
## [7] BiocStyle_2.2.0
##
## loaded via a namespace (and not attached):
## [1] Rcpp_0.12.7 formatR_1.4 plyr_1.8.4
## [4] tools_3.3.1 digest_0.6.10 RSQLite_1.0.0
## [7] evaluate_0.10 tibble_1.2 gtable_0.2.0
## [10] igraph_1.0.1 fastmatch_1.0-4 DBI_0.5-1
## [13] yaml_2.1.13 fgsea_1.0.0 gridExtra_2.2.1
## [16] stringr_1.1.0 knitr_1.14 S4Vectors_0.12.0
## [19] IRanges_2.8.0 stats4_3.3.1 grid_3.3.1
## [22] qvalue_2.6.0 Biobase_2.34.0 data.table_1.9.6
## [25] AnnotationDbi_1.36.0 BiocParallel_1.8.0 GOSemSim_2.0.0
## [28] rmarkdown_1.1 reshape2_1.4.1 GO.db_3.4.0
## [31] DO.db_2.9 ggplot2_2.1.0 magrittr_1.5
## [34] splines_3.3.1 scales_0.4.0 htmltools_0.3.5
## [37] assertthat_0.1 colorspace_1.2-7 labeling_0.3
## [40] stringi_1.1.2 munsell_0.4.3 chron_2.3-47
1. Yu, G., Wang, L.-G., Yan, G.-R. & He, Q.-Y. DOSE: An r/Bioconductor package for disease ontology semantic and enrichment analysis. Bioinformatics 31, 608–609 (2015).
2. Philip, R. Semantic similarity in a taxonomy: An Information-Based measure and its application to problems of ambiguity in natural language. Journal of Artificial Intelligence Research 11, 95–130 (1999).
3. Jiang, J. J. & Conrath, D. W. Semantic similarity based on corpus statistics and lexical taxonomy. Proceedings of 10th International Conference on Research In Computational Linguistics (1997). at <http://www.citebase.org/abstract?id=oai:arXiv.org:cmp-lg/9709008>
4. Lin, D. An Information-Theoretic definition of similarity. In Proceedings of the 15th International Conference on Machine Learning 296—304 (1998). doi:10.1.1.55.1832
5. Schlicker, A., Domingues, F. S., Rahnenfuhrer, J. & Lengauer, T. A new measure for functional similarity of gene products based on gene ontology. BMC Bioinformatics 7, 302 (2006).
6. Wang, J. Z., Du, Z., Payattakool, R., Yu, P. S. & Chen, C.-F. A new method to measure the semantic similarity of gO terms. Bioinformatics (Oxford, England) 23, 1274–81 (2007).
7. Yu, G. et al. GOSemSim: An r package for measuring semantic similarity among gO terms and gene products. Bioinformatics 26, 976–978 (2010).
8. Yu, G., Wang, L.-G., Han, Y. & He, Q.-Y. clusterProfiler: an r package for comparing biological themes among gene clusters. OMICS: A Journal of Integrative Biology 16, 284–287 (2012).
9. Yu, G. & He, Q.-Y. ReactomePA: An r/Bioconductor package for reactome pathway analysis and visualization. Mol. BioSyst. 12, 477–479 (2016).