enterotyping

A workflow for identifying enterotypes based on the relative abundance of gut microbiota was implemented refereed on the reports of Arumugam[^2]

library(mbOmic)
library(data.table)

First of all, the dataset of microbiota relative abundance was retrived from the enterotypes weblink. The missing value was imputed using KNN by impute package.

dat <- read.delim('http://enterotypes.org/ref_samples_abundance_MetaHIT.txt')
dat <- impute::impute.knn(as.matrix(dat), k = 100)
dat <- as.data.frame(dat$data+0.001) 
setDT(dat, keep.rownames = TRUE)
dat

Constructe the bSet class and then estimate the the proper cluster number using the estimate_k function. The estimate_k function take advantage of Jensen-Shannon divergence to cluster the samples and the number of clusters was optimizated by Calinski-Harabasz (CH) Index and Silhouette Coefficient.

The estimate_k returns verCHI class, a S3 class containing a optimal cluster results, optimal number cluster, a minmum CHI, a minmum Silhouette value, and Jensen-Shannon divergence matrix.

dat <- bSet(b =  dat)
res <- estimate_k(dat)
res
#> optimal number of cluster: 4
#> Max CHI: 164.642158008611
#> Silhouette: 0.181445495999067

The proper number of cluster is 4.

Next, the enterotyping function was used to identify the enterotype for each cluster and it returns a 3-length list. This list contains two enterotypes matrices and a unidentified samples vector. Cluster 2, 3, and 4 was enterotype Bacteroides, Prevotella, and Ruminococcus, resepectively.

ret=enterotyping(dat, res$verOptCluster) 
ret
#> $enterotypes
#>      Enterotype        max which   cluster
#> 1:  Bacteroides 0.36724946     2 cluster 2
#> 2:   Prevotella 0.29692944     3 cluster 3
#> 3: Ruminococcus 0.02416713     4 cluster 4
#> 
#> $data
#>      Samples   Enterotype   cluster
#>   1:  MH0087  Bacteroides cluster 2
#>   2:  MH0156  Bacteroides cluster 2
#>   3:  MH0444  Bacteroides cluster 2
#>   4:  MH0333  Bacteroides cluster 2
#>   5:  MH0233  Bacteroides cluster 2
#>  ---                               
#> 234:  MH0012 Ruminococcus cluster 4
#> 235:  MH0415 Ruminococcus cluster 4
#> 236:  MH0457 Ruminococcus cluster 4
#> 237:  MH0442 Ruminococcus cluster 4
#> 238:  MH0448 Ruminococcus cluster 4
#> 
#> $UnIdentifiedSamples
#>  [1] "MH0277" "MH0161" "MH0046" "MH0175" "MH0152" "MH0104" "MH0151" "MH0189"
#>  [9] "MH0030" "MH0157" "MH0063" "MH0075" "MH0141" "MH0169" "MH0050" "MH0286"
#> [17] "MH0096" "MH0053" "MH0217" "MH0098" "MH0009" "MH0197" "MH0065" "MH0173"
#> [25] "MH0168" "MH0070" "MH0077" "MH0288" "MH0200" "MH0031" "MH0183" "MH0132"
#> [33] "MH0144" "MH0124" "MH0430" "MH0276" "MH0407" "MH0428" "MH0126" "MH0447"

Furthermore, this result was validated by enterotypes results given by the enterotype website.

enterotypes <- read.table(system.file('extdata', 'enterotype.txt', package = 'mbOmic'))
enterotypes <- enterotypes[samples(dat),]
table(res$verOptCluster, enterotypes$ET)
#>    
#>     ET_B ET_F ET_P
#>   1    0   21   19
#>   2   67    5    0
#>   3    0    0   40
#>   4    3  123    0

SessionInfo

devtools::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.2.0 RC (2022-04-19 r82224)
#>  os       Ubuntu 20.04.4 LTS
#>  system   x86_64, linux-gnu
#>  ui       X11
#>  language (EN)
#>  collate  C
#>  ctype    en_US.UTF-8
#>  tz       America/New_York
#>  date     2022-04-27
#>  pandoc   2.5 @ /usr/bin/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package          * version  date (UTC) lib source
#>  ade4               1.7-19   2022-04-19 [2] CRAN (R 4.2.0)
#>  AnnotationDbi      1.58.0   2022-04-27 [2] Bioconductor
#>  assertthat         0.2.1    2019-03-21 [2] CRAN (R 4.2.0)
#>  backports          1.4.1    2021-12-13 [2] CRAN (R 4.2.0)
#>  base64enc          0.1-3    2015-07-28 [2] CRAN (R 4.2.0)
#>  Biobase            2.56.0   2022-04-27 [2] Bioconductor
#>  BiocGenerics       0.42.0   2022-04-27 [2] Bioconductor
#>  Biostrings         2.64.0   2022-04-27 [2] Bioconductor
#>  bit                4.0.4    2020-08-04 [2] CRAN (R 4.2.0)
#>  bit64              4.0.5    2020-08-30 [2] CRAN (R 4.2.0)
#>  bitops             1.0-7    2021-04-24 [2] CRAN (R 4.2.0)
#>  blob               1.2.3    2022-04-10 [2] CRAN (R 4.2.0)
#>  brio               1.1.3    2021-11-30 [2] CRAN (R 4.2.0)
#>  bslib              0.3.1    2021-10-06 [2] CRAN (R 4.2.0)
#>  cachem             1.0.6    2021-08-19 [2] CRAN (R 4.2.0)
#>  callr              3.7.0    2021-04-20 [2] CRAN (R 4.2.0)
#>  checkmate          2.1.0    2022-04-21 [2] CRAN (R 4.2.0)
#>  class              7.3-20   2022-01-16 [2] CRAN (R 4.2.0)
#>  cli                3.3.0    2022-04-25 [2] CRAN (R 4.2.0)
#>  cluster            2.1.3    2022-03-28 [2] CRAN (R 4.2.0)
#>  clusterSim         0.49-2   2021-01-06 [2] CRAN (R 4.2.0)
#>  codetools          0.2-18   2020-11-04 [2] CRAN (R 4.2.0)
#>  colorspace         2.0-3    2022-02-21 [2] CRAN (R 4.2.0)
#>  crayon             1.5.1    2022-03-26 [2] CRAN (R 4.2.0)
#>  data.table       * 1.14.2   2021-09-27 [2] CRAN (R 4.2.0)
#>  DBI                1.1.2    2021-12-20 [2] CRAN (R 4.2.0)
#>  desc               1.4.1    2022-03-06 [2] CRAN (R 4.2.0)
#>  devtools           2.4.3    2021-11-30 [2] CRAN (R 4.2.0)
#>  digest             0.6.29   2021-12-01 [2] CRAN (R 4.2.0)
#>  doParallel         1.0.17   2022-02-07 [2] CRAN (R 4.2.0)
#>  dplyr              1.0.8    2022-02-08 [2] CRAN (R 4.2.0)
#>  dynamicTreeCut     1.63-1   2016-03-11 [2] CRAN (R 4.2.0)
#>  e1071              1.7-9    2021-09-16 [2] CRAN (R 4.2.0)
#>  ellipsis           0.3.2    2021-04-29 [2] CRAN (R 4.2.0)
#>  evaluate           0.15     2022-02-18 [2] CRAN (R 4.2.0)
#>  extrafont          0.18     2022-04-12 [2] CRAN (R 4.2.0)
#>  extrafontdb        1.0      2012-06-11 [2] CRAN (R 4.2.0)
#>  fansi              1.0.3    2022-03-24 [2] CRAN (R 4.2.0)
#>  fastcluster        1.2.3    2021-05-24 [2] CRAN (R 4.2.0)
#>  fastmap            1.1.0    2021-01-25 [2] CRAN (R 4.2.0)
#>  foreach            1.5.2    2022-02-02 [2] CRAN (R 4.2.0)
#>  foreign            0.8-82   2022-01-16 [2] CRAN (R 4.2.0)
#>  Formula            1.2-4    2020-10-16 [2] CRAN (R 4.2.0)
#>  fs                 1.5.2    2021-12-08 [2] CRAN (R 4.2.0)
#>  generics           0.1.2    2022-01-31 [2] CRAN (R 4.2.0)
#>  GenomeInfoDb       1.32.0   2022-04-27 [2] Bioconductor
#>  GenomeInfoDbData   1.2.8    2022-04-21 [2] Bioconductor
#>  ggplot2            3.3.5    2021-06-25 [2] CRAN (R 4.2.0)
#>  glue               1.6.2    2022-02-24 [2] CRAN (R 4.2.0)
#>  GO.db              3.15.0   2022-04-21 [2] Bioconductor
#>  gridExtra          2.3      2017-09-09 [2] CRAN (R 4.2.0)
#>  gtable             0.3.0    2019-03-25 [2] CRAN (R 4.2.0)
#>  highr              0.9      2021-04-16 [2] CRAN (R 4.2.0)
#>  Hmisc              4.7-0    2022-04-19 [2] CRAN (R 4.2.0)
#>  htmlTable          2.4.0    2022-01-04 [2] CRAN (R 4.2.0)
#>  htmltools          0.5.2    2021-08-25 [2] CRAN (R 4.2.0)
#>  htmlwidgets        1.5.4    2021-09-08 [2] CRAN (R 4.2.0)
#>  httr               1.4.2    2020-07-20 [2] CRAN (R 4.2.0)
#>  igraph             1.3.1    2022-04-20 [2] CRAN (R 4.2.0)
#>  impute             1.70.0   2022-04-27 [2] Bioconductor
#>  IRanges            2.30.0   2022-04-27 [2] Bioconductor
#>  iterators          1.0.14   2022-02-05 [2] CRAN (R 4.2.0)
#>  jpeg               0.1-9    2021-07-24 [2] CRAN (R 4.2.0)
#>  jquerylib          0.1.4    2021-04-26 [2] CRAN (R 4.2.0)
#>  jsonlite           1.8.0    2022-02-22 [2] CRAN (R 4.2.0)
#>  KEGGREST           1.36.0   2022-04-27 [2] Bioconductor
#>  knitr              1.39     2022-04-26 [2] CRAN (R 4.2.0)
#>  lattice            0.20-45  2021-09-22 [2] CRAN (R 4.2.0)
#>  latticeExtra       0.6-29   2019-12-19 [2] CRAN (R 4.2.0)
#>  lifecycle          1.0.1    2021-09-24 [2] CRAN (R 4.2.0)
#>  magrittr           2.0.3    2022-03-30 [2] CRAN (R 4.2.0)
#>  MASS               7.3-57   2022-04-22 [2] CRAN (R 4.2.0)
#>  Matrix             1.4-1    2022-03-23 [2] CRAN (R 4.2.0)
#>  matrixStats        0.62.0   2022-04-19 [2] CRAN (R 4.2.0)
#>  mbOmic           * 1.0.0    2022-04-27 [1] Bioconductor
#>  memoise            2.0.1    2021-11-26 [2] CRAN (R 4.2.0)
#>  mnormt             2.0.2    2020-09-01 [2] CRAN (R 4.2.0)
#>  munsell            0.5.0    2018-06-12 [2] CRAN (R 4.2.0)
#>  nlme               3.1-157  2022-03-25 [2] CRAN (R 4.2.0)
#>  nnet               7.3-17   2022-01-16 [2] CRAN (R 4.2.0)
#>  pillar             1.7.0    2022-02-01 [2] CRAN (R 4.2.0)
#>  pkgbuild           1.3.1    2021-12-20 [2] CRAN (R 4.2.0)
#>  pkgconfig          2.0.3    2019-09-22 [2] CRAN (R 4.2.0)
#>  pkgdown            2.0.3    2022-04-24 [2] CRAN (R 4.2.0)
#>  pkgload            1.2.4    2021-11-30 [2] CRAN (R 4.2.0)
#>  png                0.1-7    2013-12-03 [2] CRAN (R 4.2.0)
#>  preprocessCore     1.58.0   2022-04-27 [2] Bioconductor
#>  prettyunits        1.1.1    2020-01-24 [2] CRAN (R 4.2.0)
#>  processx           3.5.3    2022-03-25 [2] CRAN (R 4.2.0)
#>  proxy              0.4-26   2021-06-07 [2] CRAN (R 4.2.0)
#>  ps                 1.7.0    2022-04-23 [2] CRAN (R 4.2.0)
#>  psych              2.2.3    2022-03-19 [2] CRAN (R 4.2.0)
#>  purrr              0.3.4    2020-04-17 [2] CRAN (R 4.2.0)
#>  R2HTML             2.3.2    2016-06-23 [2] CRAN (R 4.2.0)
#>  R6                 2.5.1    2021-08-19 [2] CRAN (R 4.2.0)
#>  RColorBrewer       1.1-3    2022-04-03 [2] CRAN (R 4.2.0)
#>  Rcpp               1.0.8.3  2022-03-17 [2] CRAN (R 4.2.0)
#>  RCurl              1.98-1.6 2022-02-08 [2] CRAN (R 4.2.0)
#>  remotes            2.4.2    2021-11-30 [2] CRAN (R 4.2.0)
#>  rgl                0.108.3  2021-11-21 [2] CRAN (R 4.2.0)
#>  rlang              1.0.2    2022-03-04 [2] CRAN (R 4.2.0)
#>  rmarkdown          2.14     2022-04-25 [2] CRAN (R 4.2.0)
#>  rpart              4.1.16   2022-01-24 [2] CRAN (R 4.2.0)
#>  rprojroot          2.0.3    2022-04-02 [2] CRAN (R 4.2.0)
#>  RSQLite            2.2.12   2022-04-02 [2] CRAN (R 4.2.0)
#>  rstudioapi         0.13     2020-11-12 [2] CRAN (R 4.2.0)
#>  Rttf2pt1           1.3.10   2022-02-07 [2] CRAN (R 4.2.0)
#>  S4Vectors          0.34.0   2022-04-27 [2] Bioconductor
#>  sass               0.4.1    2022-03-23 [2] CRAN (R 4.2.0)
#>  scales             1.2.0    2022-04-13 [2] CRAN (R 4.2.0)
#>  sessioninfo        1.2.2    2021-12-06 [2] CRAN (R 4.2.0)
#>  stringi            1.7.6    2021-11-29 [2] CRAN (R 4.2.0)
#>  stringr            1.4.0    2019-02-10 [2] CRAN (R 4.2.0)
#>  survival           3.3-1    2022-03-03 [2] CRAN (R 4.2.0)
#>  testthat           3.1.4    2022-04-26 [2] CRAN (R 4.2.0)
#>  tibble             3.1.6    2021-11-07 [2] CRAN (R 4.2.0)
#>  tidyselect         1.1.2    2022-02-21 [2] CRAN (R 4.2.0)
#>  tmvnsim            1.0-2    2016-12-15 [2] CRAN (R 4.2.0)
#>  usethis            2.1.5    2021-12-09 [2] CRAN (R 4.2.0)
#>  utf8               1.2.2    2021-07-24 [2] CRAN (R 4.2.0)
#>  vctrs              0.4.1    2022-04-13 [2] CRAN (R 4.2.0)
#>  visNetwork         2.1.0    2021-09-29 [2] CRAN (R 4.2.0)
#>  WGCNA              1.71     2022-04-22 [2] CRAN (R 4.2.0)
#>  withr              2.5.0    2022-03-03 [2] CRAN (R 4.2.0)
#>  xfun               0.30     2022-03-02 [2] CRAN (R 4.2.0)
#>  XVector            0.36.0   2022-04-27 [2] Bioconductor
#>  yaml               2.3.5    2022-02-21 [2] CRAN (R 4.2.0)
#>  zlibbioc           1.42.0   2022-04-27 [2] Bioconductor
#> 
#>  [1] /tmp/RtmpvToPOC/Rinst2c681c42aa7dd8
#>  [2] /home/biocbuild/bbs-3.15-bioc/R/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────