Contents

library(microbiomeDataSets)
#> Loading required package: SummarizedExperiment
#> Loading required package: MatrixGenerics
#> Loading required package: matrixStats
#> 
#> Attaching package: 'MatrixGenerics'
#> The following objects are masked from 'package:matrixStats':
#> 
#>     colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
#>     colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
#>     colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
#>     colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
#>     colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
#>     colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
#>     colWeightedMeans, colWeightedMedians, colWeightedSds,
#>     colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
#>     rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
#>     rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
#>     rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
#>     rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
#>     rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
#>     rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
#>     rowWeightedSds, rowWeightedVars
#> Loading required package: GenomicRanges
#> Loading required package: stats4
#> Loading required package: BiocGenerics
#> Loading required package: parallel
#> 
#> Attaching package: 'BiocGenerics'
#> The following objects are masked from 'package:parallel':
#> 
#>     clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
#>     clusterExport, clusterMap, parApply, parCapply, parLapply,
#>     parLapplyLB, parRapply, parSapply, parSapplyLB
#> The following objects are masked from 'package:stats':
#> 
#>     IQR, mad, sd, var, xtabs
#> The following objects are masked from 'package:base':
#> 
#>     Filter, Find, Map, Position, Reduce, anyDuplicated, append,
#>     as.data.frame, basename, cbind, colnames, dirname, do.call,
#>     duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
#>     lapply, mapply, match, mget, order, paste, pmax, pmax.int, pmin,
#>     pmin.int, rank, rbind, rownames, sapply, setdiff, sort, table,
#>     tapply, union, unique, unsplit, which.max, which.min
#> Loading required package: S4Vectors
#> 
#> Attaching package: 'S4Vectors'
#> The following objects are masked from 'package:base':
#> 
#>     I, expand.grid, unname
#> Loading required package: IRanges
#> Loading required package: GenomeInfoDb
#> Loading required package: Biobase
#> Welcome to Bioconductor
#> 
#>     Vignettes contain introductory material; view with
#>     'browseVignettes()'. To cite Bioconductor, see
#>     'citation("Biobase")', and for packages 'citation("pkgname")'.
#> 
#> Attaching package: 'Biobase'
#> The following object is masked from 'package:MatrixGenerics':
#> 
#>     rowMedians
#> The following objects are masked from 'package:matrixStats':
#> 
#>     anyMissing, rowMedians
#> Loading required package: TreeSummarizedExperiment
#> Loading required package: SingleCellExperiment
#> Loading required package: Biostrings
#> Loading required package: XVector
#> 
#> Attaching package: 'Biostrings'
#> The following object is masked from 'package:base':
#> 
#>     strsplit
#> Loading required package: MultiAssayExperiment

1 Microbiome example data sets

The data sets are primarily named by the first author of the associated publication, together with a descriptive suffix. Aliases are provided for some of the data sets.

A table of the available data sets is available through the availableDataSets function.

availableDataSets()
#>             Dataset
#> 1       LahtiMLData
#> 2        LahtiMData
#> 3       LahtiWAData
#> 4      OKeefeDSData
#> 5 SilvermanAGutData
#> 6        SongQAData
#> 7   SprockettTHData

The data is usually loaded as a TreeSummarizedExperiment or a MultiAssayExperiment, if microbiome associated data was used in the respective study. For more information on how to use these objects, please refer to the vignettes of the packages.

1.1 HIT Chip Data

The following data sets are based on the Human Intestinal Tract (HIT)Chip phylogenetic 16S microarray (Rajilić-Stojanović et al. 2009). This profiling technology differs from the more widely used 16S rRNA amplicon sequencing.

Since the data is also available in phyloseq format through the microbiome R package and are referenced usually via aliases, they are described in more detail.

1.1.1 Intestinal microbiota profiling of 1006 Western adults

This data set from Lahti et al. Nat. Comm. 5:4344, 2014 comes with 130 genus-like taxonomic groups across 1006 western adults with no reported health complications. Some subjects have also short time series.

LahtiWAData()
#> class: TreeSummarizedExperiment 
#> dim: 130 1151 
#> metadata(0):
#> assays(1): counts
#> rownames(130): Actinomycetaceae Aerococcus ... Xanthomonadaceae
#>   Yersinia et rel.
#> rowData names(3): Phylum Family Genus
#> colnames(1151): Sample-1 Sample-2 ... Sample-1171 Sample-1172
#> colData names(10): age sex ... time sample
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
#> rowLinks: NULL
#> rowTree: NULL
#> colLinks: NULL
#> colTree: NULL

# Alias
# atlas1006()

1.1.2 Diet swap between Rural and Western populations

A two-week diet swap study between western (USA) and traditional (rural Africa) diets, reported in O’Keefe et al. Nat. Comm. 6:6342, 2015.

OKeefeDSData()
#> snapshotDate(): 2021-05-18
#> see ?microbiomeDataSets and browseVignettes('microbiomeDataSets') for documentation
#> loading from cache
#> see ?microbiomeDataSets and browseVignettes('microbiomeDataSets') for documentation
#> loading from cache
#> see ?microbiomeDataSets and browseVignettes('microbiomeDataSets') for documentation
#> loading from cache
#> class: TreeSummarizedExperiment 
#> dim: 130 222 
#> metadata(0):
#> assays(1): counts
#> rownames(130): Actinomycetaceae Aerococcus ... Xanthomonadaceae
#>   Yersinia et rel.
#> rowData names(3): Phylum Family Genus
#> colnames(222): Sample-1 Sample-2 ... Sample-221 Sample-222
#> colData names(8): subject sex ... timepoint.within.group bmi_group
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
#> rowLinks: NULL
#> rowTree: NULL
#> colLinks: NULL
#> colTree: NULL

# Alias
# dietswap()

1.1.3 Intestinal microbiota versus blood metabolites

Data set from Lahti et al. PeerJ 1:e32, 2013 characterizes associations between human intestinal microbiota and blood serum lipids. Note that this data set contains an additional assay of lipid species, and is therefore provided as MultiAssayExperiment object.

LahtiMLData()
#> snapshotDate(): 2021-05-18
#> see ?microbiomeDataSets and browseVignettes('microbiomeDataSets') for documentation
#> loading from cache
#> see ?microbiomeDataSets and browseVignettes('microbiomeDataSets') for documentation
#> loading from cache
#> see ?microbiomeDataSets and browseVignettes('microbiomeDataSets') for documentation
#> loading from cache
#> see ?microbiomeDataSets and browseVignettes('microbiomeDataSets') for documentation
#> loading from cache
#> A MultiAssayExperiment object of 2 listed
#>  experiments with user-defined names and respective classes.
#>  Containing an ExperimentList class object of length 2:
#>  [1] microbiome: TreeSummarizedExperiment with 130 rows and 44 columns
#>  [2] lipids: SummarizedExperiment with 389 rows and 44 columns
#> Functionality:
#>  experiments() - obtain the ExperimentList instance
#>  colData() - the primary/phenotype DataFrame
#>  sampleMap() - the sample coordination DataFrame
#>  `$`, `[`, `[[` - extract colData columns, subset, or experiment
#>  *Format() - convert into a long or wide DataFrame
#>  assays() - convert ExperimentList to a SimpleList of matrices
#>  exportClass() - save all data to files

# Alias
# peerj32()