The microbiome R package facilitates exploration and analysis of microbiome profiling data, in particular 16S taxonomic profiling.
This vignette provides a brief overview with example data sets from published microbiome profiling studies (Lahti et al. 2014, Lahti et al. (2013), O’Keefe et al. (2015)). A more comprehensive tutorial is available on-line.
Tools are provided for the manipulation, statistical analysis, and visualization of taxonomic profiling data. In addition to targeted case-control studies, the package facilitates scalable exploration of large population cohorts (Lahti et al. 2014). Whereas sample collections are rapidly accumulating for the human body and other environments, few general-purpose tools for targeted microbiome analysis are available in R. This package supports the independent phyloseq data format and expands the available toolkit in order to facilitate the standardization of the analyses and the development of best practices. See also the related PathoStat pipeline mare pipeline, phylofactor, and structSSI for additional 16S rRNA amplicon analysis tools in R. The aim is to complement the other available packages, but in some cases alternative solutions have been necessary in order to streamline the tools and to improve complementarity.
We welcome feedback, bug reports, and suggestions for new features from the user community via the issue tracker and pull requests. See the Github site for source code and other details. These R tools have been utilized in recent publications and in introductory courses (Salonen et al. 2014, Faust et al. (2015), Shetty et al. (2017)), and they are released under the Two-clause FreeBSD license.
Loading the package in R (after installation from Bioconductor):
The microbiome package relies on the independent phyloseq data format. This contains an OTU table (taxa abundances), sample metadata (age, BMI, sex, …), taxonomy table (mapping between OTUs and higher-level taxonomic classifications), and a phylogenetic tree (relations between the taxa).
Example data sets are provided to facilitate reproducible examples and further methods development.
The HITChip Atlas data set Lahti et al. Nat. Comm. 5:4344, 2014 contains 130 genus-level taxonomic groups across 1006 western adults. Load the example data in R with
# Data from # http://www.nature.com/ncomms/2014/140708/ncomms5344/full/ncomms5344.html data(atlas1006) atlas1006
## phyloseq-class experiment-level object ## otu_table() OTU Table: [ 130 taxa and 1151 samples ] ## sample_data() Sample Data: [ 1151 samples by 10 sample variables ] ## tax_table() Taxonomy Table: [ 130 taxa by 3 taxonomic ranks ]
The on-line tutorial provides many additional tools and examples, with more thorough descriptions. This package and tutorials are work in progress. We welcome feedback, for instance via issue Tracker, pull requests, or via Gitter.
Thanks to all contributors. Financial support has been provided by Academy of Finland (grants 256950 and 295741), University of Turku, Department of Mathematics and Statistics. In addition, the work has been supported by Laboratory of Microbiology, Wageningen University, The Netherlands. This work relies on the independent phyloseq package and data structures for R-based microbiome analysis developed by Paul McMurdie and Susan Holmes. This work also utilizes a number of independent R extensions, including dplyr (Wickham and Francois 2016), ggplot2 (Wickham 2009), phyloseq (McMurdie and Holmes 2013), and vegan (Oksanen et al. 2015).
Faust, Karoline, Leo Lahti, Didier Gonze, Willem M de Vos, and Jeroen Raes. 2015. “Metagenomics Meets Time Series Analysis: Unraveling Microbial Community Dynamics.” Current Opinion in Microbiology 25 (June):56–66. https://doi.org/10.1016/j.mib.2015.04.004.
Lahti, Leo, Jarkko Salojarvi, Anne Salonen, Marten Scheffer, and Willem M. de Vos. 2014. “Tipping Elements in the Human Intestinal Ecosystem.” Nature Communications 5 (July):4344. https://doi.org/10.1038/ncomms5344.
Lahti, Leo, Anne Salonen, Riina A. Kekkonen, Jarkko Salojarvi, Jonna Jalanka-Tuovinen, Airi Palva, Matej Orešič, and Willem M. de Vos. 2013. “Associations between the human intestinal microbiota, Lactobacillus rhamnosus GG and serum lipids indicated by integrated analysis of high-throughput profiling data.” PeerJ 1:e32. https://doi.org/10.7717/peerj.32.
McMurdie, Paul J., and Susan Holmes. 2013. “Phyloseq: An R Package for Reproducible Interactive Analysis and Graphics of Microbiome Census Data.” PLoS ONE 8 (4):e61217. http://dx.plos.org/10.1371/journal.pone.0061217.
Oksanen, Jari, F. Guillaume Blanchet, Roeland Kindt, Pierre Legendre, Peter R. Minchin, R. B. O’Hara, Gavin L. Simpson, Peter Solymos, M. Henry H. Stevens, and Helene Wagner. 2015. Vegan: Community Ecology Package. http://CRAN.R-project.org/package=vegan.
O’Keefe, SJD, JV Li, L Lahti, J Ou, F Carbonero, K Mohammed, JM Posma, et al. 2015. “Fat, Fiber and Cancer Risk in African, Americans and Rural Africans.” Nature Communications 6 (April):6342. https://doi.org/10.1038/ncomms7342.
Salonen, Anne, Leo Lahti, Jarkko Salojärvi, Grietje Holtrop, Katri Korpela, Sylvia Duncan, Priya Date, et al. 2014. “Impact of Diet and Individual Variation on Intestinal Microbiota Composition and Fermentation Products in Obese Men.” ISME Journal 8:2218–30. https://doi.org/10.1038/ismej.2014.63.
Shetty, SA, F Hugenholtz, L Lahti, H Smidt, WM de Vos, and A Danchin. 2017. “Intestinal Microbiome Landscaping: Insight in Community Assemblage and Implications for Microbial Modulation Strategies.” FEMS Microbiology Reviews, fuw045. https://doi.org/10.1093/femsre/fuw045.
Wickham, Hadley. 2009. Ggplot2: Elegant Graphics for Data Analysis. Springer New York. http://had.co.nz/ggplot2/book.
Wickham, Hadley, and Romain Francois. 2016. dplyr: A Grammar of Data Manipulation. https://CRAN.R-project.org/package=dplyr.