HDO_vignette

Authors

Erqiang Hu

Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University.

Introduction

Disease Ontology (DO) was developed to create a consistent description of gene products with disease perspectives, and is essential for supporting functional genomics in disease context. Accurate disease descriptions can discover new relationships between genes and disease, and new functions for previous uncharacteried genes and alleles.We have developed the DOSE package for semantic similarity analysis and disease enrichment analysis, and DOSE import an Bioconductor package ‘DO.db’ to get the relationship(such as parent and child) between DO terms. But DO.db hasn’t been updated for years, and a lot of semantic information is missing. So we developed the new package HDO.db for Human Disease Ontology annotation.

library(HDO.db)

Overview

library(AnnotationDbi)
#> Loading required package: stats4
#> Loading required package: BiocGenerics
#> 
#> Attaching package: 'BiocGenerics'
#> The following objects are masked from 'package:stats':
#> 
#>     IQR, mad, sd, var, xtabs
#> The following objects are masked from 'package:base':
#> 
#>     Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append,
#>     as.data.frame, basename, cbind, colnames, dirname, do.call,
#>     duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
#>     lapply, mapply, match, mget, order, paste, pmax, pmax.int, pmin,
#>     pmin.int, rank, rbind, rownames, sapply, setdiff, sort, table,
#>     tapply, union, unique, unsplit, which.max, which.min
#> Loading required package: Biobase
#> Welcome to Bioconductor
#> 
#>     Vignettes contain introductory material; view with
#>     'browseVignettes()'. To cite Bioconductor, see
#>     'citation("Biobase")', and for packages 'citation("pkgname")'.
#> Loading required package: IRanges
#> Loading required package: S4Vectors
#> 
#> Attaching package: 'S4Vectors'
#> The following objects are masked from 'package:base':
#> 
#>     I, expand.grid, unname

The annotation data comes from https://github.com/DiseaseOntology/HumanDiseaseOntology/tree/main/src/ontology, and HDO.db provide these AnnDbBimap object:

ls("package:HDO.db")
#>  [1] "HDO"          "HDO.db"       "HDOALIAS"     "HDOANCESTOR"  "HDOCHILDREN" 
#>  [6] "HDOMAPCOUNTS" "HDOOFFSPRING" "HDOPARENTS"   "HDOSYNONYM"   "HDOTERM"     
#> [11] "HDO_dbInfo"   "HDO_dbconn"   "HDO_dbfile"   "HDO_dbschema" "HDOmetadata" 
#> [16] "columns"      "keys"         "keytypes"     "select"
packageVersion("HDO.db")
#> [1] '0.99.1'

You can use help function to get their documents: help(DOOFFSPRING)

toTable(HDOmetadata)
#>              name
#> 1        DBSCHEMA
#> 2 DBSCHEMAVERSION
#> 3   HDOSOURCENAME
#> 4     HDOSOURCURL
#> 5   HDOSOURCEDATE
#> 6         Db type
#>                                                                                        value
#> 1                                                                                     HDO_DB
#> 2                                                                                        1.0
#> 3                                                                           Disease Ontology
#> 4 https://github.com/DiseaseOntology/HumanDiseaseOntology/blob/main/src/ontology/HumanDO.obo
#> 5                                                                                   20220706
#> 6                                                                                      HDODb
HDOMAPCOUNTS
#>  HDOANCESTOR  HDOCHILDREN HDOOFFSPRING   HDOPARENTS      HDOTERM 
#>      "66768"      "11034"      "66768"      "11034"      "11003"

Fetch whole DO terms

In HDO.db, HDOTERM represet the whole DO terms and their names. The users can also get their aliases and synonyms from HDOALIAS and HDOSYNONYM, respectively.

convert HDOTERM to table

doterm <- toTable(HDOTERM)
head(doterm)
#>           doid                     term
#> 1 DOID:0001816             angiosarcoma
#> 2 DOID:0002116                pterygium
#> 3 DOID:0014667    disease of metabolism
#> 4 DOID:0040001           shrimp allergy
#> 5 DOID:0040002          aspirin allergy
#> 6 DOID:0040003 benzylpenicillin allergy

convert HDOTERM to list

dotermlist <- as.list(HDOTERM)
head(dotermlist)
#> $`DOID:0001816`
#> [1] "angiosarcoma"
#> 
#> $`DOID:0002116`
#> [1] "pterygium"
#> 
#> $`DOID:0014667`
#> [1] "disease of metabolism"
#> 
#> $`DOID:0040001`
#> [1] "shrimp allergy"
#> 
#> $`DOID:0040002`
#> [1] "aspirin allergy"
#> 
#> $`DOID:0040003`
#> [1] "benzylpenicillin allergy"

get alias of DOID:0001816

doalias <- as.list(HDOALIAS)
doalias[['DOID:0001816']]
#> [1] "DOID:267"  "DOID:4508"

get synonym of DOID:0001816

dosynonym <- as.list(HDOSYNONYM)
dosynonym[['DOID:0001816']]
#> [1] "\"hemangiosarcoma\" EXACT []"

Fetch the relationship between DO terms

Similar to DO.db, we provide four Bimap objects to represent relationship between DO terms: HDOANCESTOR,HDOPARENTS,HDOOFFSPRING, and HDOCHILDREN.

HDOANCESTOR

HDOANCESTOR describes the association between DO terms and their ancestral terms based on a directed acyclic graph (DAG) defined by the Disease Ontology. We can use toTable function in AnnotationDbi package to get a two-column data.frame: the first column means the DO term ids, and the second column means their ancestor terms.

get ancestor of “DOID:0001816”

HDOPARENTS

HDOPARENTS describes the association between DO terms and their direct parent terms based on DAG. We can use toTable function in AnnotationDbi package to get a two-column data.frame: the first column means the DO term ids, and the second column means their parent terms.

get parent term of “DOID:0001816”

HDOOFFSPRING

HDOPARENTS describes the association between DO terms and their offspring
terms based on DAG. it’s the exact opposite of HDOANCESTOR, whose usage is similar to it.

get offspring of “DOID:0001816”

HDOCHILDREN

HDOCHILDREN describes the association between DO terms and their direct children terms based on DAG. it’s the exact opposite of HDOPARENTS, whose usage is similar to it.

get children of “DOID:4”

The HDO.db support the select(), keys(), keytypes(), and columns interface.

Semantic similarity analysis

Please go to https://yulab-smu.top/biomedical-knowledge-mining-book/ for the vignette.

Disease enrichment analysis

Please go to https://yulab-smu.top/biomedical-knowledge-mining-book/dose-enrichment.html for the vignette.

sessionInfo()
#> R version 4.2.0 Patched (2022-05-05 r82321)
#> Platform: x86_64-apple-darwin19.6.0 (64-bit)
#> Running under: macOS Catalina 10.15.7
#> 
#> Matrix products: default
#> BLAS:   /Users/ka36530_ca/R-stuff/bin/R-4-2/lib/libRblas.dylib
#> LAPACK: /Users/ka36530_ca/R-stuff/bin/R-4-2/lib/libRlapack.dylib
#> 
#> locale:
#> [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#> [1] AnnotationDbi_1.59.1 IRanges_2.31.2       S4Vectors_0.35.4    
#> [4] Biobase_2.57.1       BiocGenerics_0.43.4  HDO.db_0.99.1       
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.0.9             GenomeInfoDb_1.33.7    XVector_0.37.1        
#>  [4] bslib_0.4.0            compiler_4.2.0         jquerylib_0.1.4       
#>  [7] bitops_1.0-7           zlibbioc_1.43.0        tools_4.2.0           
#> [10] digest_0.6.29          bit_4.0.4              jsonlite_1.8.0        
#> [13] RSQLite_2.2.17         evaluate_0.16          memoise_2.0.1         
#> [16] pkgconfig_2.0.3        png_0.1-7              rlang_1.0.6           
#> [19] DBI_1.1.3              cli_3.4.1              yaml_2.3.5            
#> [22] xfun_0.33              fastmap_1.1.0          GenomeInfoDbData_1.2.9
#> [25] stringr_1.4.1          httr_1.4.4             knitr_1.40            
#> [28] Biostrings_2.65.6      sass_0.4.2             vctrs_0.4.2           
#> [31] bit64_4.0.5            R6_2.5.1               rmarkdown_2.16        
#> [34] blob_1.2.3             magrittr_2.0.3         htmltools_0.5.3       
#> [37] KEGGREST_1.37.3        stringi_1.7.8          RCurl_1.98-1.8        
#> [40] cachem_1.0.6           crayon_1.5.1