Contents

1 Introduction

MeSH (Medical Subject Headings) is the NLM (U.S. National Library of Medicine) controlled vocabulary used to manually index articles for MEDLINE/PubMed. MeSH is comprehensive life science vocabulary. MeSH has 19 categories and MeSH.db contains 16 of them. That is:

Abbreviation Category
A Anatomy
B Organisms
C Diseases
D Chemicals and Drugs
E Analytical, Diagnostic and Therapeutic Techniques and Equipment
F Psychiatry and Psychology
G Phenomena and Processes
H Disciplines and Occupations
I Anthropology, Education, Sociology and Social Phenomena
J Technology and Food and Beverages
K Humanities
L Information Science
M Persons
N Health Care
V Publication Type
Z Geographical Locations

MeSH terms were associated with Entrez Gene ID by three methods, gendoo, gene2pubmed and RBBH (Reciprocal Blast Best Hit).

Method Way of corresponding Entrez Gene IDs and MeSH IDs
Gendoo Text-mining
gene2pubmed Manual curation by NCBI teams
RBBH sequence homology with BLASTP search (E-value<10-50)

2 Enrichment Analysis

meshes supports enrichment analysis (over-representation analysis and gene set enrichment analysis) of gene list or whole expression profile using MeSH annotation. Data source from gendoo, gene2pubmed and RBBH are all supported. User can selecte interesting category to test. All 16 categories are supported. The analysis supports >70 species listed in MeSHDb BiocView.

For algorithm details, please refer to the vignettes of DOSE1 package.

library(meshes)
data(geneList, package="DOSE")
de <- names(geneList)[1:100]
x <- enrichMeSH(de, MeSHDb = "MeSH.Hsa.eg.db", database='gendoo', category = 'C')
head(x)
##              ID              Description GeneRatio   BgRatio       pvalue
## D043171 D043171  Chromosomal Instability     16/96 198/16528 2.794765e-14
## D000782 D000782               Aneuploidy     17/96 320/16528 3.866830e-12
## D042822 D042822      Genomic Instability     16/96 312/16528 3.007419e-11
## D012595 D012595    Scleroderma, Systemic     11/96 279/16528 6.449334e-07
## D009303 D009303 Nasopharyngeal Neoplasms     11/96 314/16528 2.049315e-06
## D019698 D019698     Hepatitis C, Chronic     11/96 317/16528 2.246856e-06
##             p.adjust       qvalue
## D043171 2.459394e-11 1.815127e-11
## D000782 1.701405e-09 1.255702e-09
## D042822 8.821761e-09 6.510798e-09
## D012595 1.418854e-04 1.047168e-04
## D009303 3.295389e-04 2.432123e-04
## D019698 3.295389e-04 2.432123e-04
##                                                                                      geneID
## D043171    4312/991/2305/1062/4605/10403/7153/55355/4751/4085/81620/332/7272/9212/1111/6790
## D000782 4312/55143/991/1062/7153/4751/79019/55839/890/983/4085/332/7272/9212/8208/1111/6790
## D042822     55143/991/1062/4605/7153/1381/9787/4751/10635/890/4085/81620/332/9212/1111/6790
## D012595                              4312/6280/1062/4605/7153/3627/4283/6362/7850/3002/4321
## D009303                                4312/7153/3627/6241/983/4085/5918/332/3002/4321/6790
## D019698                               4312/3627/10563/6373/4283/983/6362/7850/332/3002/3620
##         Count
## D043171    16
## D000782    17
## D042822    16
## D012595    11
## D009303    11
## D019698    11

In the over-representation analysis, we use data source from gendoo and C (Diseases) category.

In the following example, we use data source from gene2pubmed and test category G (Phenomena and Processes) using GSEA.

y <- gseMeSH(geneList, MeSHDb = "MeSH.Hsa.eg.db", database = 'gene2pubmed', category = "G")
head(y)
##              ID            Description setSize enrichmentScore       NES
## D009929 D009929             Organ Size     489      -0.3508778 -1.574653
## D006339 D006339             Heart Rate     321      -0.3697848 -1.600728
## D001846 D001846       Bone Development     314      -0.3751600 -1.619238
## D055105 D055105    Waist Circumference     252      -0.4073259 -1.721104
## D020257 D020257 Ventricular Remodeling     255      -0.3972260 -1.680688
## D009767 D009767        Obesity, Morbid     231      -0.3750171 -1.575273
##              pvalue   p.adjust    qvalues rank
## D009929 0.001225490 0.03100751 0.02288401 2309
## D006339 0.001300390 0.03100751 0.02288401 2405
## D001846 0.001305483 0.03100751 0.02288401 2100
## D055105 0.001345895 0.03100751 0.02288401 1625
## D020257 0.001349528 0.03100751 0.02288401 2202
## D009767 0.001353180 0.03100751 0.02288401 1625
##                           leading_edge
## D009929 tags=27%, list=18%, signal=23%
## D006339 tags=29%, list=19%, signal=24%
## D001846 tags=27%, list=17%, signal=23%
## D055105 tags=24%, list=13%, signal=21%
## D020257 tags=31%, list=18%, signal=26%
## D009767 tags=23%, list=13%, signal=20%
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    core_enrichment
## D009929 154/9846/3315/6716/9732/5139/7337/5530/23001/4086/80114/6532/6416/1499/8945/7157/627/2252/22891/2908/8654/4088/27445/22846/4057/860/23286/268/2735/2104/23522/5480/51131/3082/10253/831/604/1028/182/7173/5624/8743/23047/596/9905/1548/2272/22829/948/27303/4314/196/6019/595/5021/7248/4212/2488/54820/5334/6403/2246/4803/866/5919/2308/79789/1907/7048/1831/4060/23387/2247/5468/8076/5793/3485/1733/3952/126/3778/79068/79633/6653/5244/4313/3625/10468/9201/1501/6720/2273/2099/3480/5764/6387/1471/1462/4016/2690/8817/6678/8821/5125/1191/5350/2162/5744/23541/185/367/4982/25802/4128/150/64084/3479/10451/9370/10699/125/4857/1308/2167/652/57502/4137/8614/5241
## D006339                                                                                                                                                                                                       4985/7139/8929/3784/3375/154/1760/9781/5139/118/2702/6532/6416/2869/270/7157/627/2908/7138/5563/3643/1129/7779/947/1901/2034/4179/4804/64388/1621/4881/8863/5021/844/4212/11030/5797/6403/4803/84059/79789/5176/3953/5243/5468/1012/2868/5793/4023/7056/3952/5577/126/2946/3778/477/5733/4313/2944/9201/3075/9499/2273/2099/1471/857/775/5138/4306/4487/213/5350/5744/23245/2152/2697/2791/185/6863/2952/5327/80206/9607/3572/150/3479/2006/55259/9370/125/652/55351
## D001846                                                                                                                                                                                                                          8945/7157/57798/79048/627/6500/8038/860/2752/4882/3371/2915/63971/54455/3791/819/57045/596/2034/54808/80781/1280/64388/2261/4054/11059/3483/9900/26234/4734/9452/4208/4322/253461/1278/7048/51280/10903/7869/1277/3953/10516/10411/8835/79776/11167/2317/3485/3952/5274/54681/4488/10486/1009/2202/91851/2099/5764/23327/3339/8817/83716/6678/4915/633/658/54361/5744/165/5654/10631/3487/367/4982/3667/79971/1634/3479/114899/9370/652/8614/4969
## D055105                                                                                                                                                                                                                                                                                                                                                                     7466/181/2078/2169/948/4887/3931/4018/4212/9208/6403/3087/84059/5919/253461/79789/5176/3953/5950/2166/6414/5243/5468/1012/4023/582/4214/3485/7350/3952/5577/126/585/79068/2202/4313/9369/2621/2099/2690/1363/4306/5167/23245/4886/185/9863/80206/3667/9607/1489/3479/9370/23704/2167/5346/7021/79689/9
## D020257                                                                                                                                                                                                                                                              1238/5595/5228/1499/7157/4776/25970/408/8654/4811/3910/3371/6548/3082/5914/1291/947/80781/5592/1490/1306/4314/80070/10272/4881/7042/3912/1511/2934/4060/283/1277/7078/5549/22795/1293/2247/50507/1281/5136/11167/4319/1513/6310/4313/2199/1294/2099/6387/3339/7079/1462/1292/775/3908/4306/4035/633/5167/5350/11188/10418/10631/9429/1805/2487/2697/185/7043/6863/3913/4982/1634/7060/3479/7373/9370/2167/652
## D009767                                                                                                                                                                                                                                                                                                                                                                                                       7474/4925/80781/2169/948/4887/4929/22982/4692/2246/63924/5919/56605/5176/23410/3953/5950/2166/5243/5468/5166/4023/90865/7350/3952/3778/79068/8864/901/9369/6720/2099/1471/3991/6678/2819/1363/4035/32/5167/8639/185/5327/3667/9607/150/563/3479/9370/2167/5346/79689

User can use visualization methods implemented in DOSE (i.e.barplot, dotplot, cnetplot, enrichMap, upsetplot and gseaplot) to visualize these enrichment results. With these visualization methods, it’s much easier to interpret enriched results.

dotplot(x)
gseaplot(y, y[1,1], title=y[1,2])

3 Semantic Similarity

meshes implemented four IC-based methods (i.e. Resnik2, Jiang3, Lin4 and Schlicker5) and one graph-structure based method (i.e. Wang6). For algorithm details, please refer to the vignette of GOSemSim package7

meshSim function is designed to measure semantic similarity between two MeSH term vectors.

library(meshes)
## hsamd <- meshdata("MeSH.Hsa.eg.db", category='A', computeIC=T, database="gendoo")
data(hsamd)
meshSim("D000009", "D009130", semData=hsamd, measure="Resnik")
## [1] 0.2910261
meshSim("D000009", "D009130", semData=hsamd, measure="Rel")
## [1] 0.521396
meshSim("D000009", "D009130", semData=hsamd, measure="Jiang")
## [1] 0.4914785
meshSim("D000009", "D009130", semData=hsamd, measure="Wang")
## [1] 0.5557103
meshSim(c("D001369", "D002462"), c("D017629", "D002890", "D008928"), semData=hsamd, measure="Wang")
##           D017629   D002890   D008928
## D001369 0.2886598 0.1923711 0.2193326
## D002462 0.6521739 0.2381925 0.2809552

geneSim function is designed to measure semantic similarity among two gene vectors.

geneSim("241", "251", semData=hsamd, measure="Wang", combine="BMA")
## [1] 0.487
geneSim(c("241", "251"), c("835", "5261","241", "994"), semData=hsamd, measure="Wang", combine="BMA")
##       835  5261   241   994
## 241 0.732 0.337 1.000 0.438
## 251 0.526 0.588 0.487 0.597

5 Session Information

Here is the output of sessionInfo() on the system on which this document was compiled:

## R version 3.4.0 (2017-04-21)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.2 LTS
## 
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.5-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.5-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] parallel  stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
## [1] meshes_1.2.0         DOSE_3.2.0           MeSH.db_1.8.0       
## [4] MeSH.Hsa.eg.db_1.8.0 MeSHDbi_1.12.0       BiocGenerics_0.22.0 
## [7] BiocStyle_2.4.0     
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.10         plyr_1.8.4           compiler_3.4.0      
##  [4] tools_3.4.0          digest_0.6.12        tibble_1.3.0        
##  [7] RSQLite_1.1-2        evaluate_0.10        memoise_1.1.0       
## [10] gtable_0.2.0         igraph_1.0.1         fastmatch_1.1-0     
## [13] DBI_0.6-1            rvcheck_0.0.8        yaml_2.1.14         
## [16] gridExtra_2.2.1      fgsea_1.2.1          stringr_1.2.0       
## [19] knitr_1.15.1         S4Vectors_0.14.0     IRanges_2.10.0      
## [22] stats4_3.4.0         rprojroot_1.2        grid_3.4.0          
## [25] qvalue_2.8.0         Biobase_2.36.0       data.table_1.10.4   
## [28] AnnotationDbi_1.38.0 BiocParallel_1.10.0  GOSemSim_2.2.0      
## [31] rmarkdown_1.4        reshape2_1.4.2       GO.db_3.4.1         
## [34] DO.db_2.9            ggplot2_2.2.1        magrittr_1.5        
## [37] splines_3.4.0        backports_1.0.5      scales_0.4.1        
## [40] htmltools_0.3.5      colorspace_1.3-2     labeling_0.3        
## [43] stringi_1.1.5        lazyeval_0.2.0       munsell_0.4.3

References

1. Yu, G., Wang, L.-G., Yan, G.-R. & He, Q.-Y. DOSE: An r/bioconductor package for disease ontology semantic and enrichment analysis. Bioinformatics 31, 608–609 (2015).

2. Philip, R. Semantic similarity in a taxonomy: An Information-Based measure and its application to problems of ambiguity in natural language. Journal of Artificial Intelligence Research 11, 95–130 (1999).

3. Jiang, J. J. & Conrath, D. W. Semantic similarity based on corpus statistics and lexical taxonomy. Proceedings of 10th International Conference on Research In Computational Linguistics (1997). at <http://www.citebase.org/abstract?id=oai:arXiv.org:cmp-lg/9709008>

4. Lin, D. An Information-Theoretic definition of similarity. In Proceedings of the 15th International Conference on Machine Learning 296—304 (1998). doi:10.1.1.55.1832

5. Schlicker, A., Domingues, F. S., Rahnenfuhrer, J. & Lengauer, T. A new measure for functional similarity of gene products based on gene ontology. BMC Bioinformatics 7, 302 (2006).

6. Wang, J. Z., Du, Z., Payattakool, R., Yu, P. S. & Chen, C.-F. A new method to measure the semantic similarity of go terms. Bioinformatics (Oxford, England) 23, 1274–81 (2007).

7. Yu, G. et al. GOSemSim: An r package for measuring semantic similarity among go terms and gene products. Bioinformatics 26, 976–978 (2010).

8. Yu, G., Wang, L.-G., Han, Y. & He, Q.-Y. clusterProfiler: an r package for comparing biological themes among gene clusters. OMICS: A Journal of Integrative Biology 16, 284–287 (2012).

9. Yu, G. & He, Q.-Y. ReactomePA: An r/bioconductor package for reactome pathway analysis and visualization. Mol. BioSyst. 12, 477–479 (2016).