The metagenomeFeatures package and associated annotation packages (e.g. greengene13.5MgDb) can be used to and taxonomic annotations to MRexperiment objects. The metagenomeSeq package includes the MRexperiment-class, and has a number of methods for analyzing marker-gene metagenome datasets. This vignette demonstrate how to add taxonomic annotation to a MRexperiment-class object using metagenomeFeatures.
To generate a MRexperiment object with taxonomic information you need;
For MRexperiment objects, the assayData and phenoData need to have the sample sample order similarly assayData and featureData need to have the sample OTU order.
Required Packages
library(metagenomeFeatures)
library(metagenomeSeq)
A file with count data from the msd16s study is included in the metagenomeFeatures package.
dataDirectory <- system.file("extdata", package = "metagenomeFeatures")
assay_data <- load_meta(file.path(dataDirectory, "msd16s_counts.csv"),sep = ",")
pheno_data <- AnnotatedDataFrame(read.csv(file.path(dataDirectory,
"msd16s_sample_data.csv"),
row.names = 1, stringsAsFactors = F))
The msd16s_query_df
is a dataframe with greengenes taxonomic ids for the msd16s data.
data("msd16s_query_df")
head(msd16s_query_df)
## Keys OTU
## 1 228054 54
## 2 228057 94
## 3 228050 113
## 4 228052 117
## 5 73399 145
## 6 11545 202
A subset of the greengenes database, msd16s_MgDb
is used to generate a MgDb-object for annotating the msd16s data. This database is only used here for demonstration purposes, the gg13.5MgDb
database in the greengenes13.5MgDb package can also be used to generate the metagenomeAnnotation object.
msd16s_feature_data <- annotate(msd16s_MgDb, query_df = msd16s_query_df)
## Joining by: "Keys"
## Using the greengenes13.5MgDb annotation package
# library(greengenes13.5MgDb)
# msd16s_feature_data <- annotate(gg13.5MgDb, query_df = msd16s_query_df)
Prepare the metagneomeAnnotation object for use in a MRexperiment by defining featureName and ordering the OTUs to that they agree with the assayData. Note that the OTU column in msd16s_feature_data
are factors and are first converted to numeric before ordering.
OTU_numbers <- as.numeric(as.character(msd16s_feature_data$OTU))
featureNames(msd16s_feature_data) <- OTU_numbers
ord <- sort(OTU_numbers)
msd16s_feature_data <- msd16s_feature_data[order(OTU_numbers),]
obj = newMRexperiment(assay_data$counts,
phenoData=pheno_data,
featureData = msd16s_feature_data)
obj
## MRexperiment (storageMode: environment)
## assayData: 13530 features, 992 samples
## element names: counts
## protocolData: none
## phenoData
## sampleNames: 100259 100262 ... 602385 (992 total)
## varLabels: Type Country ... Dysentery (5 total)
## varMetadata: labelDescription
## featureData
## featureNames: 54 94 ... 275111 (13530 total)
## fvarLabels: Keys OTU ... Species (9 total)
## fvarMetadata: labelDescription
## experimentData: use 'experimentData(object)'
## Annotation:
sessionInfo()
## R Under development (unstable) (2015-09-09 r69333)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 14.04.3 LTS
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats4 parallel stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] Biostrings_2.39.0 XVector_0.11.0
## [3] IRanges_2.5.0 S4Vectors_0.9.0
## [5] metagenomeSeq_1.13.0 RColorBrewer_1.1-2
## [7] glmnet_2.0-2 foreach_1.4.3
## [9] Matrix_1.2-2 limma_3.27.0
## [11] metagenomeFeatures_1.1.0 Biobase_2.31.0
## [13] BiocGenerics_0.17.0
##
## loaded via a namespace (and not attached):
## [1] Rcpp_0.12.1 formatR_1.2.1
## [3] futile.logger_1.4.1 GenomeInfoDb_1.7.0
## [5] bitops_1.0-6 futile.options_1.0.0
## [7] iterators_1.0.8 tools_3.3.0
## [9] zlibbioc_1.17.0 digest_0.6.8
## [11] evaluate_0.8 RSQLite_1.0.0
## [13] lattice_0.20-33 DBI_0.3.1
## [15] yaml_2.1.13 dplyr_0.4.3
## [17] stringr_1.0.0 hwriter_1.3.2
## [19] knitr_1.11 caTools_1.17.1
## [21] gtools_3.5.0 grid_3.3.0
## [23] R6_2.1.1 BiocParallel_1.5.0
## [25] rmarkdown_0.8.1 gdata_2.17.0
## [27] latticeExtra_0.6-26 lambda.r_1.1.7
## [29] magrittr_1.5 gplots_2.17.0
## [31] matrixStats_0.14.2 Rsamtools_1.23.0
## [33] codetools_0.2-14 htmltools_0.2.6
## [35] GenomicRanges_1.23.0 GenomicAlignments_1.7.0
## [37] assertthat_0.1 ShortRead_1.29.0
## [39] SummarizedExperiment_1.1.0 KernSmooth_2.23-15
## [41] stringi_0.5-5 lazyeval_0.1.10