This article summarizes short case studies and solutions arising from user queries.
cellxgenedp 1.10.0
For each case study, ensure that cellxgenedp (see the Bioconductor package landing page, or GitHub.io site) is installed (additional installation options are at https://mtmorgan.github.io/cellxgenedp/).
if (!"BiocManager" %in% rownames(installed.packages()))
install.packages("BiocManager", repos = "https://CRAN.R-project.org")
BiocManager::install("cellxgenedp")
Load the package.
library(cellxgenedp)
This case study was developed in response to the following Slack question:
CELLxGENE’s webpage is using different ontologies and displaying them in an easy to interogate manner (choosing amongst 3 possible coarseness for cell types, tissues and age) I was wondering if this simplified tree of the 3 subgroups for cell type, tissue and age categories was available somewhere?
As indicated in the question, CELLxGENE provides some access to ontologies through a hand-curated three-tiered classification of specific facets; the tiers can be retrieved from publicly available code, but one might want to develop a more flexible or principled approach.
CELLxGENE dataset facets like ‘disease’ and ‘cell type’ use terms from ontologies. Ontologies arrange terms in directed acyclic graphs, and use of ontologies can be useful to identify related datasets. For instance, one might be interesed in cancer-related datasets (derived from the ‘carcinoma’ term in the corresponding ontology) in general, rather than, e.g., ‘B-cell non-Hodgkins lymphoma’.
In exploring this question in R, I found myself developing the OLSr package to query and process ontologies from the EMBL-EBI Ontology Lookup Service. See the ‘Case Study: CELLxGENE Ontologies’ article in the OLSr package for full details.
#> R version 4.4.1 (2024-06-14)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.1 LTS
#>
#> Matrix products: default
#> BLAS: /home/biocbuild/bbs-3.20-bioc/R/lib/libRblas.so
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_GB LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: America/New_York
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] cellxgenedp_1.10.0 dplyr_1.1.4
#> [3] SingleCellExperiment_1.28.0 SummarizedExperiment_1.36.0
#> [5] Biobase_2.66.0 GenomicRanges_1.58.0
#> [7] GenomeInfoDb_1.42.0 IRanges_2.40.0
#> [9] S4Vectors_0.44.0 BiocGenerics_0.52.0
#> [11] MatrixGenerics_1.18.0 matrixStats_1.4.1
#> [13] zellkonverter_1.16.0 BiocStyle_2.34.0
#>
#> loaded via a namespace (and not attached):
#> [1] dir.expiry_1.14.0 xfun_0.48 bslib_0.8.0
#> [4] htmlwidgets_1.6.4 rhdf5_2.50.0 lattice_0.22-6
#> [7] rhdf5filters_1.18.0 rjsoncons_1.3.1 vctrs_0.6.5
#> [10] tools_4.4.1 generics_0.1.3 curl_5.2.3
#> [13] parallel_4.4.1 tibble_3.2.1 fansi_1.0.6
#> [16] pkgconfig_2.0.3 Matrix_1.7-1 lifecycle_1.0.4
#> [19] GenomeInfoDbData_1.2.13 compiler_4.4.1 httpuv_1.6.15
#> [22] htmltools_0.5.8.1 sass_0.4.9 yaml_2.3.10
#> [25] tidyr_1.3.1 later_1.3.2 pillar_1.9.0
#> [28] crayon_1.5.3 jquerylib_0.1.4 DT_0.33
#> [31] DelayedArray_0.32.0 cachem_1.1.0 abind_1.4-8
#> [34] mime_0.12 basilisk_1.18.0 tidyselect_1.2.1
#> [37] digest_0.6.37 purrr_1.0.2 bookdown_0.41
#> [40] fastmap_1.2.0 grid_4.4.1 cli_3.6.3
#> [43] SparseArray_1.6.0 magrittr_2.0.3 S4Arrays_1.6.0
#> [46] utf8_1.2.4 withr_3.0.2 promises_1.3.0
#> [49] filelock_1.0.3 UCSC.utils_1.2.0 rmarkdown_2.28
#> [52] XVector_0.46.0 httr_1.4.7 reticulate_1.39.0
#> [55] png_0.1-8 HDF5Array_1.34.0 shiny_1.9.1
#> [58] evaluate_1.0.1 knitr_1.48 basilisk.utils_1.18.0
#> [61] rlang_1.1.4 Rcpp_1.0.13 xtable_1.8-4
#> [64] glue_1.8.0 BiocManager_1.30.25 jsonlite_1.8.9
#> [67] Rhdf5lib_1.28.0 R6_2.5.1 zlibbioc_1.52.0