Contents

1 Introduction

The microbiome R package facilitates exploration and analysis of microbiome profiling data, in particular 16S taxonomic profiling.

This vignette provides a brief overview with example data sets from published microbiome profiling studies (Lahti et al. 2014, Lahti et al. (2013), O’Keefe et al. (2015)). A more comprehensive tutorial is available on-line.

Tools are provided for the manipulation, statistical analysis, and visualization of taxonomic profiling data. In addition to targeted case-control studies, the package facilitates scalable exploration of large population cohorts (Lahti et al. 2014). Whereas sample collections are rapidly accumulating for the human body and other environments, few general-purpose tools for targeted microbiome analysis are available in R. This package supports the independent phyloseq data format and expands the available toolkit in order to facilitate the standardization of the analyses and the development of best practices. See also the related PathoStat pipeline mare pipeline, phylofactor, and structSSI for additional 16S rRNA amplicon analysis tools in R. The aim is to complement the other available packages, but in some cases alternative solutions have been necessary in order to streamline the tools and to improve complementarity.

We welcome feedback, bug reports, and suggestions for new features from the user community via the issue tracker and pull requests. See the Github site for source code and other details. These R tools have been utilized in recent publications and in introductory courses (Salonen et al. 2014, Faust et al. (2015), Shetty et al. (2017)), and they are released under the Two-clause FreeBSD license.

Kindly cite the work as follows: “Leo Lahti et al. (Bioconductor, 2017). Tools for microbiome analysis in R. Microbiome package version . URL: (http://microbiome.github.io/microbiome)

2 Installation

To install microbiome package in R (Bioconductor development version), use

library(BiocManager)
if (!requireNamespace("BiocManager", quietly=TRUE))
    install.packages("BiocManager")
BiocManager::install(version = "devel")
BiocManager::install("microbiome")

Then load the package in R

library(microbiome)  

3 Data

The microbiome package relies on the independent phyloseq data format. This contains an OTU table (taxa abundances), sample metadata (age, BMI, sex, …), taxonomy table (mapping between OTUs and higher-level taxonomic classifications), and a phylogenetic tree (relations between the taxa).

3.1 Example data sets

Example data sets are provided to facilitate reproducible examples and further methods development.

The HITChip Atlas data set Lahti et al. Nat. Comm. 5:4344, 2014 contains 130 genus-level taxonomic groups across 1006 western adults. Load the example data in R with

# Data from 
# http://www.nature.com/ncomms/2014/140708/ncomms5344/full/ncomms5344.html
data(atlas1006) 
atlas1006
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 130 taxa and 1172 samples ]
## sample_data() Sample Data:       [ 1172 samples by 10 sample variables ]
## tax_table()   Taxonomy Table:    [ 130 taxa by 2 taxonomic ranks ]

The two-week diet swap study between western (USA) and traditional (rural Africa) diets, reported in O’Keefe et al. Nat. Comm. 6:6342, 2015

data(dietswap) # Data from http://dx.doi.org/10.1038/ncomms7342
dietswap
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 130 taxa and 222 samples ]
## sample_data() Sample Data:       [ 222 samples by 8 sample variables ]
## tax_table()   Taxonomy Table:    [ 130 taxa by 2 taxonomic ranks ]

A parallel profiling of gut microbiota versus blood metabolites from Lahti et al. PeerJ 1:e32, 2013 to characterize associations between human intestinal microbiota and blood serum lipids

data(peerj32) # Data from https://peerj.com/articles/32/

3.2 Data import

You can import 16S profiling data from standard formats (Mother, BIOM, CSV, etc.). See the tutorial for details.

3.3 Data manipulation

A phyloseq object can be subsetted, filtered, aggregated, transformed, and otherwise manipulated. For a comprehensive list of tools, see the online tutorial.

To convert absolute counts to compositional (relative) abundances, for instance, use

# dietswap is a phyloseq object; see above
dietswap.compositional <- transform(dietswap, "compositional")

4 Ecosystem indices

4.1 Alpha diversity, richness, evenness, dominance, and rarity

Commonly used ecosystem state variables include various indices to quantify alpha diversities, richness, evenness, dominance, and rarity (see functions with similar names). We provide a comprehensive set of such indices via a standardized interface.

The function global calls these indicators with default parameters. For further options, see tutorial.

g <- global(atlas1006, index = "gini")

Visually-Weighted Regression curve with smoothed error bars is based on the can be used to visualize sample variables (1), here the relation between age and diversity. This function operates on standard data frames.

# Estimate Shannon diversity and add it to the phyloseq object
sample_data(atlas1006)$diversity <- global(atlas1006, index = "shannon")[,1]

# Compare age and microbiome diversity
plot_regression(diversity ~ age, meta(atlas1006))

5 Core microbiota analysis

Population frequencies, or prevalence, of the taxonomic groups exceeding a given detection threshold can be calculated with

p <- prevalence(dietswap, detection = 0, sort = TRUE)

The core microbiota refers to the set of taxa that are detected in a remarkable fraction of the population above a given abundance threshold (see e.g. Jalanka-Tuovinen et al. 2011, Salonen et al. (2012)). The core subset can be sliced from a phyloseq object as follows.

# Taxa with over 50% prevance at .2% relative abundance
dietswap.core <- core(dietswap.compositional, 
                    detection = .2/100, prevalence = 50/100)
dietswap.core
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 39 taxa and 222 samples ]
## sample_data() Sample Data:       [ 222 samples by 8 sample variables ]
## tax_table()   Taxonomy Table:    [ 39 taxa by 2 taxonomic ranks ]

Similar functions are available also for rare and variable taxa, see the microbiome tutorial for a full description.

To visualize the core as in (Shetty et al. 2017), use

library(ggplot2, quiet = TRUE)
p <- plot_core(transform(dietswap.core, "compositional"), 
    plot.type = "heatmap", 
    colours = gray(seq(0,1,length=5)),
    prevalences = seq(.05, 1, .05), 
    detections = 10^seq(log10(1e-3), log10(.2), length = 10), 
    horizontal = TRUE) +
    xlab("Detection Threshold (Relative Abundance (%))") 
print(p)    

6 Microbiome composition

Composition heatmap: Z-transformed taxon abundances

tmp <- plot_composition(dietswap.core, plot.type = "heatmap", transform = "Z", 
            mar = c(6, 13, 1, 1), sample.sort = "nationality")

Composition barplot

plot_composition(transform(dietswap.core, "compositional"), 
    plot.type = "barplot", sample.sort = "neatmap")

Visualize sample similarities, or the microbiome landscape (Shetty et al. 2017) on an ordination map based on various ordination methods (PCoA, NMDS, RDA etc) that are available through various other R packages.

# Project the samples with the given method and dissimilarity measure. 
# Ordinate the data; note that some ordinations are sensitive to random seed
# "quiet" is used to suppress intermediate outputs
set.seed(423542)

# Get ordination
x <- dietswap.core
quiet(x.ord <- ordinate(x, method = "NMDS", distance = "bray"))
# Pick the projected data (first two columns + metadata)
quiet(proj <- phyloseq::plot_ordination(x, x.ord, justDF=TRUE))
# Rename the projection axes
names(proj)[1:2] <- paste("Comp", 1:2, sep=".")

p <- plot_landscape(proj[, 1:2], col = proj$nationality, legend = TRUE)
print(p)

7 Association heatmaps

Let us cross-correlate example data containing microbiome profiling and blood serum lipids study (Lahti et al. 2013).

# Define data sets to cross-correlate
data(peerj32)
x <- log10(peerj32$microbes)   # OTU Log10 (44 samples x 130 genera)
y <- as.matrix(peerj32$lipids) # Lipids (44 samples x 389 lipids)

# Cross correlate data sets and return a table
correlation.table <- associate(x, y, 
    method = "spearman", mode = "table", p.adj.threshold = 0.05, n.signif = 1)

kable(head(correlation.table))
X1 X2 Correlation p.adj
1648 Ruminococcus gnavus et rel. TG(54:5).2 0.7164958 0.0022842
384 Moraxellaceae PC(40:3e) -0.6941863 0.0029225
1829 Uncultured Bacteroidetes TG(56:2).1 -0.6987375 0.0029225
349 Lactobacillus plantarum et rel. PC(40:3) -0.6877976 0.0031520
1198 Ruminococcus gnavus et rel. TG(52:5) 0.6806216 0.0037518
264 Moraxellaceae PC(38:4).1 -0.6700504 0.0038414

Let us rearrange the data and plot the heatmap and highlight significant correlations with stars as in (Lahti et al. 2013).

heat(correlation.table, "X1", "X2", fill = "Correlation", 
    star = "p.adj", p.adj.threshold = 0.05) 

8 Bistability and tipping elements

The microbiome package provides tools to quantify stability and bimodality in abundance data, following (Lahti et al. 2014). Let use Dialister as an example. The tipping point is here set manually but it can also be estimated from the data.

# Use relative abundances and plot subject variation 
# during the follow-up period
pseq <- transform(atlas1006, "compositional")
plot_tipping(pseq, "Dialister", tipping.point = 0.004)

# Show the bimodal population distribution
hotplot(pseq, "Dialister", tipping.point = 0.004)

9 Further reading

The on-line tutorial provides many additional tools and examples, with more thorough descriptions. This package and tutorials are work in progress. We welcome feedback, for instance via issue Tracker, pull requests, or via Gitter.

10 Acknowledgements

Thanks to all contributors. Financial support has been provided by Academy of Finland (grants 256950 and 295741), University of Turku, Department of Mathematics and Statistics. In addition, the work has been supported by VIB lab for Bioinformatics and (eco-)systems biology, VIB/KULeuven, Belgium, Molecular Ecology group, Laboratory of Microbiology, Wageningen University, Netherlands, and Department of Veterinary Bioscience, University of Helsinki, Finland. This work relies on the independent phyloseq package and data structures for R-based microbiome analysis developed by Paul McMurdie and Susan Holmes. This work also utilizes a number of independent R extensions, including dplyr (Wickham and Francois 2016), ggplot2 (Wickham 2009), phyloseq (McMurdie and Holmes 2013), and vegan (Oksanen et al. 2015).

References

Faust, Karoline, Leo Lahti, Didier Gonze, Willem M de Vos, and Jeroen Raes. 2015. “Metagenomics Meets Time Series Analysis: Unraveling Microbial Community Dynamics.” Current Opinion in Microbiology 25 (June):56–66. https://doi.org/10.1016/j.mib.2015.04.004.

Jalanka-Tuovinen, Jonna, Anne Salonen, Janne Nikkilä, Outi Immonen, Riina Kekkonen, Leo Lahti, Airi Palva, and Willem M. de Vos. 2011. “Intestinal Microbiota in Healthy Adults: Temporal Analysis Reveals Individual and Common Core and Relation to Intestinal Symptoms.” PLoS One 6 (7):e23035. https://doi.org/10.7717/peerj.32.

Lahti, Leo, Jarkko Salojarvi, Anne Salonen, Marten Scheffer, and Willem M. de Vos. 2014. “Tipping Elements in the Human Intestinal Ecosystem.” Nature Communications 5 (July):4344. https://doi.org/10.1038/ncomms5344.

Lahti, Leo, Anne Salonen, Riina A. Kekkonen, Jarkko Salojarvi, Jonna Jalanka-Tuovinen, Airi Palva, Matej Orešič, and Willem M. de Vos. 2013. “Associations between the human intestinal microbiota, Lactobacillus rhamnosus GG and serum lipids indicated by integrated analysis of high-throughput profiling data.” PeerJ 1:e32. https://doi.org/10.7717/peerj.32.

McMurdie, Paul J., and Susan Holmes. 2013. “Phyloseq: An R Package for Reproducible Interactive Analysis and Graphics of Microbiome Census Data.” PLoS ONE 8 (4):e61217. http://dx.plos.org/10.1371/journal.pone.0061217.

Oksanen, Jari, F. Guillaume Blanchet, Roeland Kindt, Pierre Legendre, Peter R. Minchin, R. B. O’Hara, Gavin L. Simpson, Peter Solymos, M. Henry H. Stevens, and Helene Wagner. 2015. Vegan: Community Ecology Package. http://CRAN.R-project.org/package=vegan.

O’Keefe, SJD, JV Li, L Lahti, J Ou, F Carbonero, K Mohammed, JM Posma, et al. 2015. “Fat, Fiber and Cancer Risk in African, Americans and Rural Africans.” Nature Communications 6 (April):6342. https://doi.org/10.1038/ncomms7342.

Salonen, Anne, Leo Lahti, Jarkko Salojärvi, Grietje Holtrop, Katri Korpela, Sylvia Duncan, Priya Date, et al. 2014. “Impact of Diet and Individual Variation on Intestinal Microbiota Composition and Fermentation Products in Obese Men.” ISME Journal 8:2218–30. https://doi.org/10.1038/ismej.2014.63.

Salonen, Anne, Jarkko Salojärvi, Leo Lahti, and and Willem M. de Vos. 2012. “The Adult Intestinal Core Microbiota Is Determined by Analysis Depth and Health Status.” Clinical Microbiology and Infection 18 (Suppl. 4):16–20. https://doi.org/10.1111/j.1469-0691.2012.03855.x.

Shetty, SA, F Hugenholtz, L Lahti, H Smidt, WM de Vos, and A Danchin. 2017. “Intestinal Microbiome Landscaping: Insight in Community Assemblage and Implications for Microbial Modulation Strategies.” FEMS Microbiology Reviews, fuw045. https://doi.org/10.1093/femsre/fuw045.

Wickham, Hadley. 2009. Ggplot2: Elegant Graphics for Data Analysis. Springer New York. http://had.co.nz/ggplot2/book.

Wickham, Hadley, and Romain Francois. 2016. dplyr: A Grammar of Data Manipulation. https://CRAN.R-project.org/package=dplyr.