Exploring a MgDb Object

Nathan D. Olson

2018-04-30

The MgDb Class in the metagenomeFeatures package includes the sequences and taxonomic information for a 16S database. The following vignette demonstrates the class methods for exploring and subsetting a MgDb-class object using the gg85 included in the metagenomeFeatures package. MgDb-class object with full databases are in separate packages such as the greengenes13.5MgDb package.

Demonstration MgDb-class Object

library(metagenomeFeatures)
## Loading required package: Biobase
## Loading required package: BiocGenerics
## Loading required package: parallel
## 
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:parallel':
## 
##     clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
##     clusterExport, clusterMap, parApply, parCapply, parLapply,
##     parLapplyLB, parRapply, parSapply, parSapplyLB
## The following objects are masked from 'package:stats':
## 
##     IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
## 
##     Filter, Find, Map, Position, Reduce, anyDuplicated, append,
##     as.data.frame, basename, cbind, colMeans, colSums, colnames,
##     dirname, do.call, duplicated, eval, evalq, get, grep, grepl,
##     intersect, is.unsorted, lapply, lengths, mapply, match, mget,
##     order, paste, pmax, pmax.int, pmin, pmin.int, rank, rbind,
##     rowMeans, rowSums, rownames, sapply, setdiff, sort, table,
##     tapply, union, unique, unsplit, which, which.max, which.min
## Welcome to Bioconductor
## 
##     Vignettes contain introductory material; view with
##     'browseVignettes()'. To cite Bioconductor, see
##     'citation("Biobase")', and for packages 'citation("pkgname")'.
## Warning: replacing previous import 'lazyeval::is_formula' by
## 'purrr::is_formula' when loading 'metagenomeFeatures'
## Warning: replacing previous import 'lazyeval::is_atomic' by
## 'purrr::is_atomic' when loading 'metagenomeFeatures'
gg85 <- get_gg13.8_85MgDb()
gg85
## MgDb object:[1] "Metadata"
## |ACCESSION_DATE: Mon Apr  2 13:30:09 2018
## |URL: ftp://greengenes.microbio.me/greengenes_release/gg_13_8_otus
## |DB_TYPE_NAME: GreenGenes
## |DB_VERSION: 13.8 85% OTUS
## |DB_TYPE_VALUE: MgDb
## |DB_SCHEMA_VERSION: 2.0
## [1] "Sequence Data:"
## [1] "DECIPHER formatted seqDB"
## [1] "Taxonomy Data:"
## # Source:   table<Seqs> [?? x 11]
## # Database: sqlite 3.22.0
## #   [/tmp/Rtmp2N7cJP/Rinst4b4c48229f79/metagenomeFeatures/extdata/gg13.8_85.sqlite]
##    row_names identifier description Keys   Kingdom  Phylum  Class   Ord   
##        <int> <chr>      <chr>       <chr>  <chr>    <chr>   <chr>   <chr> 
##  1         1 MgDb       1111561     11115… k__Bact… p__Pro… c__Gam… o__Le…
##  2         2 MgDb       1111421     11114… k__Bact… p__Pro… c__Alp… o__Rh…
##  3         3 MgDb       1111090     11110… k__Bact… p__Act… c__Nit… o__Ni…
##  4         4 MgDb       1110893     11108… k__Bact… p__Bac… c__[Sa… o__[S…
##  5         5 MgDb       1110814     11108… k__Bact… p__BRC1 c__     o__   
##  6         6 MgDb       1110088     11100… k__Bact… p__Pro… c__Gam… o__   
##  7         7 MgDb       1109993     11099… k__Bact… p__Chl… c__Deh… o__   
##  8         8 MgDb       1109948     11099… k__Bact… p__Pla… c__[Br… o__Br…
##  9         9 MgDb       1109493     11094… k__Bact… p__Pla… c__vad… o__   
## 10        10 MgDb       1109328     11093… k__Bact… p__Chl… c__Ana… o__S0…
## # ... with more rows, and 3 more variables: Family <chr>, Genus <chr>,
## #   Species <chr>
## [1] "Tree Data:"
## 
## Phylogenetic tree with 5088 tips and 5087 internal nodes.
## 
## Tip labels:
##  4479984, 540377, 811993, 823988, 4397176, 4446470, ...
## 
## Rooted; includes branch lengths.

MgDb Methods

taxa_keytypes

taxa_keytypes(gg85)
##  [1] "row_names"   "identifier"  "description" "Keys"        "Kingdom"    
##  [6] "Phylum"      "Class"       "Ord"         "Family"      "Genus"      
## [11] "Species"
taxa_columns(gg85)
## [1] "Keys"    "Kingdom" "Phylum"  "Class"   "Ord"     "Family"  "Genus"  
## [8] "Species"
head(taxa_keys(gg85, keytype = c("Kingdom")))
## # A tibble: 6 x 1
##   Kingdom    
##   <chr>      
## 1 k__Bacteria
## 2 k__Bacteria
## 3 k__Bacteria
## 4 k__Bacteria
## 5 k__Bacteria
## 6 k__Bacteria

Select Methods

Used to retrieve db entries for a specified taxonomic group or id list, can return either taxonomic, sequences information, or both.

Selecting taxonomic information

## # A tibble: 27 x 8
##    Keys    Kingdom     Phylum            Class  Ord   Family Genus Species
##    <chr>   <chr>       <chr>             <chr>  <chr> <chr>  <chr> <chr>  
##  1 1047956 k__Bacteria p__Proteobacteria c__Ga… o__E… f__En… g__   s__    
##  2 818108  k__Bacteria p__Proteobacteria c__Ga… o__E… f__En… g__   s__    
##  3 651366  k__Bacteria p__Proteobacteria c__Ga… o__E… f__En… g__   s__    
##  4 592303  k__Bacteria p__Proteobacteria c__Ga… o__E… f__En… g__P… s__    
##  5 575794  k__Bacteria p__Proteobacteria c__Ga… o__E… f__En… g__   s__    
##  6 559954  k__Bacteria p__Proteobacteria c__Ga… o__E… f__En… g__   s__    
##  7 368586  k__Bacteria p__Proteobacteria c__Ga… o__E… f__En… g__   s__    
##  8 289174  k__Bacteria p__Proteobacteria c__Ga… o__E… f__En… g__P… s__shi…
##  9 268585  k__Bacteria p__Proteobacteria c__Ga… o__E… f__En… g__C… s__    
## 10 232927  k__Bacteria p__Proteobacteria c__Ga… o__V… f__Vi… g__   s__    
## # ... with 17 more rows

Selecting sequence information

##   A DNAStringSet instance of length 27
##      width seq                                         names               
##  [1]  1366 ATTGAACGCTGGCGGCAGGC...GTGAATACGTTCCCGGGCCT 1047956
##  [2]  1410 ACGGTACACAGAGAGCTTGC...TTCGGGAGGGCGCTTACCAC 818108
##  [3]  1421 ATTGAACGCTGGCGGCAAGC...GCCCGTCACACCATGGGAGT 651366
##  [4]  1453 AGTCGAGCGGTAACAGTGGG...CATGACTGGGGGAAGTCGTA 592303
##  [5]  1419 ATTGAACGCTGGCGGCAAGC...GCCCGTCACACCATGGGAGT 575794
##  ...   ... ...
## [23]  1383 TGGGAAACTGCCTGATGGAG...AACCTTCGGGAGGGCGGTTT 4336809
## [24]  1443 GGGTGAGTAATGTCTGGGAA...GGTTGCAAAAGAAGTAGGTA 656881
## [25]  1563 AGAGTTTGATCCTGGCTCAG...GAAGTCGTAACAAGGTAACC 4371215
## [26]  1392 GCGGCGGACGGGTGAGTAAT...TGGGTAGTTTAACCTTCGGG 4375861
## [27]  1389 TCGTGCGGTAATAGAGGAAC...AGCAAGTAGTTTAACCTAAA 4443068

Selecting all

## $taxa
## # A tibble: 2 x 8
##   Keys    Kingdom     Phylum            Class  Ord    Family Genus Species
##   <chr>   <chr>       <chr>             <chr>  <chr>  <chr>  <chr> <chr>  
## 1 661785  k__Bacteria p__Proteobacteria c__Ga… o__Vi… f__Vi… g__V… s__    
## 2 4375861 k__Bacteria p__Proteobacteria c__Ga… o__Vi… f__Vi… g__V… s__    
## 
## $seq
##   A DNAStringSet instance of length 2
##     width seq                                          names               
## [1]  1420 AGAGTTTGATCATGGCTCAGA...TTCATGACTGGGGTGAAGTC 661785
## [2]  1392 GCGGCGGACGGGTGAGTAATG...TGGGTAGTTTAACCTTCGGG 4375861
## 
## $tree
## 
## Phylogenetic tree with 2 tips and 1 internal nodes.
## 
## Tip labels:
## [1] "661785"  "4375861"
## 
## Rooted; includes branch lengths.
## R version 3.5.0 (2018-04-23)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.4 LTS
## 
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.7-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.7-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] parallel  stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
## [1] metagenomeFeatures_2.0.0 Biobase_2.40.0          
## [3] BiocGenerics_0.26.0     
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.16      XVector_0.20.0    compiler_3.5.0   
##  [4] pillar_1.2.2      dbplyr_1.2.1      bindr_0.1.1      
##  [7] zlibbioc_1.26.0   tools_3.5.0       digest_0.6.15    
## [10] bit_1.1-12        nlme_3.1-137      RSQLite_2.1.0    
## [13] evaluate_0.10.1   memoise_1.1.0     tibble_1.4.2     
## [16] lattice_0.20-35   pkgconfig_2.0.1   rlang_0.2.0      
## [19] cli_1.0.0         DBI_0.8           yaml_2.1.18      
## [22] bindrcpp_0.2.2    stringr_1.3.0     dplyr_0.7.4      
## [25] knitr_1.20        Biostrings_2.48.0 S4Vectors_0.18.0 
## [28] IRanges_2.14.0    tidyselect_0.2.4  stats4_3.5.0     
## [31] rprojroot_1.3-2   bit64_0.9-7       grid_3.5.0       
## [34] glue_1.2.0        R6_2.2.2          rmarkdown_1.9    
## [37] DECIPHER_2.8.0    purrr_0.2.4       blob_1.1.1       
## [40] magrittr_1.5      backports_1.1.2   htmltools_0.3.6  
## [43] assertthat_0.2.0  ape_5.1           utf8_1.1.3       
## [46] stringi_1.1.7     lazyeval_0.2.1    crayon_1.3.4