metagenomeFeatures
and associated annotation packages can be used to obtain phylogentic trees, and representative sequences for 16S rRNA marker gene sequences when closed reference clustering is used.metagenomeFeatures
and the Greengenes 16S rRNA database version 13.8 85% OTUs to obtain a phylogenetic tree and representative sequences for XYZ study obtained from QIITA.The gg 13.8 85% OTU is provided as part of the metagenomeFeatures
package.
gg85
is a MgDb
class object with the taxonomic heirarchy, sequence data, and phylogeny for the Greengenes database clustered at the 0.85 similarity threshold.
gg85
#> MgDb object:[1] "Metadata"
#> |ACCESSION_DATE: Mon Apr 2 13:30:09 2018
#> |URL: ftp://greengenes.microbio.me/greengenes_release/gg_13_8_otus
#> |DB_TYPE_NAME: GreenGenes
#> |DB_VERSION: 13.8 85% OTUS
#> |DB_TYPE_VALUE: MgDb
#> |DB_SCHEMA_VERSION: 2.0
#> [1] "Sequence Data:"
#> [1] "DECIPHER formatted seqDB"
#> [1] "Taxonomy Data:"
#> # Source: table<Seqs> [?? x 11]
#> # Database: sqlite 3.22.0
#> # [/tmp/Rtmp2N7cJP/Rinst4b4c48229f79/metagenomeFeatures/extdata/gg13.8_85.sqlite]
#> row_names identifier description Keys Kingdom Phylum Class Ord
#> <int> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 1 MgDb 1111561 11115… k__Bact… p__Pro… c__Gam… o__Le…
#> 2 2 MgDb 1111421 11114… k__Bact… p__Pro… c__Alp… o__Rh…
#> 3 3 MgDb 1111090 11110… k__Bact… p__Act… c__Nit… o__Ni…
#> 4 4 MgDb 1110893 11108… k__Bact… p__Bac… c__[Sa… o__[S…
#> 5 5 MgDb 1110814 11108… k__Bact… p__BRC1 c__ o__
#> 6 6 MgDb 1110088 11100… k__Bact… p__Pro… c__Gam… o__
#> 7 7 MgDb 1109993 11099… k__Bact… p__Chl… c__Deh… o__
#> 8 8 MgDb 1109948 11099… k__Bact… p__Pla… c__[Br… o__Br…
#> 9 9 MgDb 1109493 11094… k__Bact… p__Pla… c__vad… o__
#> 10 10 MgDb 1109328 11093… k__Bact… p__Chl… c__Ana… o__S0…
#> # ... with more rows, and 3 more variables: Family <chr>, Genus <chr>,
#> # Species <chr>
#> [1] "Tree Data:"
#>
#> Phylogenetic tree with 5088 tips and 5087 internal nodes.
#>
#> Tip labels:
#> 4479984, 540377, 811993, 823988, 4397176, 4446470, ...
#>
#> Rooted; includes branch lengths.
For this vignette we are using 16S rRNA data from Rousk et al. 2010, a soil microbiome study, https://qiita.ucsd.edu/study/description/94. A BIOM and qiime mapping file for the study can be obtained from QIITA. A vector of Greengenes for the study cluster centers is included in this package for use in this vignette.
soil_mgF <- annotateFeatures(gg85, soil_gg_ids)
#> Found more than one class "phylo" in cache; using the first, from namespace 'metagenomeFeatures'
#> Also defined by 'tidytree'
#> Found more than one class "phylo" in cache; using the first, from namespace 'metagenomeFeatures'
#> Also defined by 'tidytree'
#> Found more than one class "phylo" in cache; using the first, from namespace 'metagenomeFeatures'
#> Also defined by 'tidytree'
#> Found more than one class "phylo" in cache; using the first, from namespace 'metagenomeFeatures'
#> Also defined by 'tidytree'
The resulting mgFeatures
class object has the taxonomic heirarchy, phylogeny, and sequence data for the study OTUs.
soil_mgF
#> mgFeatures with 61 rows and 8 columns
#> Keys Kingdom Phylum Class
#> <character> <character> <character> <character>
#> 1 1107824 k__Bacteria p__Proteobacteria c__Gammaproteobacteria
#> 2 824826 k__Bacteria p__Proteobacteria c__Alphaproteobacteria
#> 3 694268 k__Bacteria p__WPS-2 c__
#> 4 579266 k__Bacteria p__Proteobacteria c__Betaproteobacteria
#> 5 558862 k__Bacteria p__Chlorobi c__SJA-28
#> ... ... ... ... ...
#> 57 4389227 k__Bacteria p__Acidobacteria c__iii1-8
#> 58 4391683 k__Bacteria p__Proteobacteria c__Alphaproteobacteria
#> 59 4421369 k__Bacteria p__Acidobacteria c__[Chloracidobacteria]
#> 60 4477112 k__Bacteria p__WS2 c__SHA-109
#> 61 4479102 k__Bacteria p__OP11 c__OP11-4
#> Ord Family Genus Species
#> <character> <character> <character> <character>
#> 1 o__Legionellales f__Coxiellaceae g__ s__
#> 2 o__ f__ g__ s__
#> 3 o__ f__ g__ s__
#> 4 o__Burkholderiales f__ g__ s__
#> 5 o__ f__ g__ s__
#> ... ... ... ... ...
#> 57 o__32-20 f__ g__ s__
#> 58 o__Sphingomonadales f__Sphingomonadaceae g__ s__
#> 59 o__RB41 f__Ellin6075 g__ s__
#> 60 o__ f__ g__ s__
#> 61 o__ f__ g__ s__
Sequence data
mgF_seq(soil_mgF)
#> A DNAStringSet instance of length 61
#> width seq names
#> [1] 1502 TAGAGTTTGATCCTGGCTCA...AGTCGTAACAAGGTAGCCGT 1107824
#> [2] 1396 AGAGTTTGATCATGGCTCAG...GCCTTGTACACACCGCCCGT 824826
#> [3] 1424 AACGCTGGCGGCGTGCCTAA...GGTAAGGGGGACGAAGTCGT 694268
#> [4] 1498 AGAGTTTGATCCTGGCTCAG...GTCGTAACAAGGTAGCCGTA 579266
#> [5] 1432 AGAGTTTGATCATGGCTCAG...CAGAAGTAGTTAGCCTAACC 558862
#> ... ... ...
#> [57] 1378 TGCTTAACACATGCAAGTCG...TTGCACACACCGCCCGTCAC 4389227
#> [58] 1606 AGAGTTTGATCCTGGCTCAG...GAAGTCGTAACAAGGTAACC 4391683
#> [59] 1496 AGAGTTTGATCCTGGCTCAG...GAAGTCGTAACAAGGTAACC 4421369
#> [60] 1403 GAACGCTGGCGGTACGTCTG...CGGCCGAAGGTGGAGTCAGT 4477112
#> [61] 1336 GATGAACGCTGGCGGCGTGC...CAAAGTTGGGGGCGCCCGAA 4479102
Tree data