miaViz
implements plotting function to work with TreeSummarizedExperiment
and related objects in a context of microbiome analysis. For more general
plotting function on SummarizedExperiment
objects the scater
package offers
several options, such as plotColData
, plotExpression
and plotRowData
.
To install miaViz
, install BiocManager
first, if it is not installed.
Afterwards use the install
function from BiocManager
and load miaViz
.
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("miaViz")
library(miaViz)
data(GlobalPatterns, package = "mia")
in contrast to other fields of sequencing based fields of research for which
expression of genes is usually studied, microbiome research uses the more
term Abundance to described the numeric data measured and analyzed.
Technically, especially in context of SummarizedExperiment
objects, there is
no difference. Therefore plotExpression
can be used to plot Abundance
data.
plotAbundance
can be used as well and as long as rank
is set NULL
, it
behaves as plotExpression
.
plotAbundance(GlobalPatterns, rank = NULL,
features = "549322", assay_name = "counts")
However, if the rank
is set not NULL
a bar plot is returned. At the same
time the features
argument can be set to NULL
(default).
GlobalPatterns <- transformCounts(GlobalPatterns, method = "relabundance")
plotAbundance(GlobalPatterns, rank = "Kingdom", assay_name = "relabundance")
With subsetting to selected features the plot can be fine tuned.
prev_phylum <- getPrevalentTaxa(GlobalPatterns, rank = "Phylum",
detection = 0.01)
plotAbundance(GlobalPatterns[rowData(GlobalPatterns)$Phylum %in% prev_phylum],
rank = "Phylum",
assay_name = "relabundance")
The features
argument is reused for plotting data along the different samples.
In the next example the SampleType is plotted along the samples. In this case
the result is a list, which can combined using external tools, for example
patchwork
.
library(patchwork)
plots <- plotAbundance(GlobalPatterns[rowData(GlobalPatterns)$Phylum %in% prev_phylum],
features = "SampleType",
rank = "Phylum",
assay_name = "relabundance")
plots$abundance / plots$SampleType +
plot_layout(heights = c(9, 1))
Further example about composition barplot can be found at Orchestrating Microbiome Analysis (Lahti, Shetty, and Ernst 2021).
To visualize prevalence within the dataset, two functions are available,
plotTaxaPrevalence
, plotPrevalenceAbundance
and plotPrevalence
.
plotTaxaPrevalence
produces a so-called landscape plot, which
visualizes the prevalence of samples across abundance thresholds.
plotTaxaPrevalence(GlobalPatterns, rank = "Phylum",
detections = c(0, 0.001, 0.01, 0.1, 0.2))
plotPrevalenceAbundance
plot the prevalence depending on the mean relative
abundance on the chosen taxonomic level.
plotPrevalentAbundance(GlobalPatterns, rank = "Family",
colour_by = "Phylum") +
scale_x_log10()
plotPrevalence
plot the number of samples and their prevalence across
different abundance thresholds. Abundance steps can be adjusted using the
detections
argument, whereas the analyzed prevalence steps is set using the
prevalences
argument.
plotPrevalence(GlobalPatterns,
rank = "Phylum",
detections = c(0.01, 0.1, 1, 2, 5, 10, 20)/100,
prevalences = seq(0.1, 1, 0.1))
The information stored in the rowTree
can be directly plotted. However,
sizes of stored trees have to be kept in mind and plotting of large trees
rarely makes sense.
For this example we limit the information plotted to the top 100 taxa as judged by mean abundance on the genus level.
library(scater)
library(mia)
altExp(GlobalPatterns,"Genus") <- agglomerateByRank(GlobalPatterns,"Genus")
altExp(GlobalPatterns,"Genus") <- addPerFeatureQC(altExp(GlobalPatterns,"Genus"))
rowData(altExp(GlobalPatterns,"Genus"))$log_mean <-
log(rowData(altExp(GlobalPatterns,"Genus"))$mean)
rowData(altExp(GlobalPatterns,"Genus"))$detected <-
rowData(altExp(GlobalPatterns,"Genus"))$detected / 100
top_taxa <- getTopTaxa(altExp(GlobalPatterns,"Genus"),
method="mean",
top=100L,
assay_name="counts")
Colour, size and shape of tree tips and nodes can be decorated based on data
present in the SE
object or by providing additional information via the
other_fields
argument. Note that currently information for nodes have to be
provided via the other_fields
arguments.
Data will be matched via the node
or label
argument depending on which was
provided. label
takes precedent.
plotRowTree(altExp(GlobalPatterns,"Genus")[top_taxa,],
tip_colour_by = "log_mean",
tip_size_by = "detected")
Tip and node labels can be shown as well. Setting show_label = TRUE
shows the
tip labels only …
plotRowTree(altExp(GlobalPatterns,"Genus")[top_taxa,],
tip_colour_by = "log_mean",
tip_size_by = "detected",
show_label = TRUE)
… whereas node labels can be selectively shown by providing a named logical
vector to show_label
.
Please not that currently ggtree
only can plot node labels in a rectangular
layout.
labels <- c("Genus:Providencia", "Genus:Morganella", "0.961.60")
plotRowTree(altExp(GlobalPatterns,"Genus")[top_taxa,],
tip_colour_by = "log_mean",
tip_size_by = "detected",
show_label = labels,
layout="rectangular")
Information can also be visualized on the edges of the tree plot.
plotRowTree(altExp(GlobalPatterns,"Genus")[top_taxa,],
edge_colour_by = "Phylum",
tip_colour_by = "log_mean")
Similar to tree data, graph data can also be plotted in conjunction with
SummarizedExperiment
objects. Since the graph data in itself cannot be stored
in a specialized slot, a graph object can be provided separately or as an
element from the metedata
.
Here we load an example graph. As graph data, all objects types accepted by
as_tbl_graph
from the tidygraph
package are supported.
data(col_graph)
In the following examples, the weight
data is automatically generated from the
graph data. The SummarizedExperiment
provided is required to have overlapping
rownames with the node names of the graph. Using this link the graph plot
can incorporated data from the SummarizedExperiment
.
plotColGraph(col_graph,
altExp(GlobalPatterns,"Genus"),
colour_by = "SampleType",
edge_colour_by = "weight",
edge_width_by = "weight",
show_label = TRUE)
As mentioned the graph data can be provided from the metadata
of the
SummarizedExperiment
.
metadata(altExp(GlobalPatterns,"Genus"))$graph <- col_graph
This produces the same plot as shown above.
# Load data from miaTime package
library("miaTime")
data("SilvermanAGutData")
silverman <- SilvermanAGutData
silverman <- transformCounts(silverman, method = "relabundance")
taxa <- getTopTaxa(silverman, 2)
Data from samples collected along time can be visualized using plotSeries
.
The x
argument is used to reference data from the colData
to use as
descriptor for ordering the data. The y
argument selects the feature to show.
Since plotting a lot of features is not advised a maximum of 20 features can
plotted at the same time.
plotSeries(silverman,
x = "DAY_ORDER",
y = taxa,
colour_by = "Family")
If replicated data is present, data is automatically used for calculation of the
mean
and sd
and plotted as a range. Data from different assays can be used
for plotting via the assay_name
.
plotSeries(silverman[taxa,],
x = "DAY_ORDER",
colour_by = "Family",
linetype_by = "Phylum",
assay_name = "relabundance")
Additional variables can be used to modify line type aesthetics.
plotSeries(silverman,
x = "DAY_ORDER",
y = getTopTaxa(silverman, 5),
colour_by = "Family",
linetype_by = "Phylum",
assay_name = "counts")
To visualize the relative relations between two groupings among the factor data,
two functions are available for the purpose; plotColTile
and plotRowTile
.
data(GlobalPatterns)
se <- GlobalPatterns
plotColTile(se,"SampleType","Primer") +
theme(axis.text.x.top = element_text(angle = 45, hjust = 0))
Searching for groups that are similar to each other among the samples, could be
approached with the Dirichlet Multinomial Mixtures (Holmes, Harris, and Quince 2012).
After using runDMN
from the mia
package, several k values as a number of clusters
are used to observe the best fit (see also getDMN
and getBestDMNFit
).
To visualize the fit using e.g. “laplace” as a measure of goodness of fit:
data(dmn_se, package = "mia")
names(metadata(dmn_se))
#> [1] "DMN"
# plot the fit
plotDMNFit(dmn_se, type = "laplace")
More examples and materials are available at Orchestrating Microbiome Analysis (Lahti, Shetty, and Ernst 2021).
sessionInfo()
#> R version 4.2.1 (2022-06-23)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 20.04.5 LTS
#>
#> Matrix products: default
#> BLAS: /home/biocbuild/bbs-3.16-bioc/R/lib/libRblas.so
#> LAPACK: /home/biocbuild/bbs-3.16-bioc/R/lib/libRlapack.so
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_GB LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] scater_1.26.0 scuttle_1.8.0
#> [3] patchwork_1.1.2 miaViz_1.6.0
#> [5] ggraph_2.1.0 ggplot2_3.3.6
#> [7] mia_1.6.0 MultiAssayExperiment_1.24.0
#> [9] TreeSummarizedExperiment_2.6.0 Biostrings_2.66.0
#> [11] XVector_0.38.0 SingleCellExperiment_1.20.0
#> [13] SummarizedExperiment_1.28.0 Biobase_2.58.0
#> [15] GenomicRanges_1.50.0 GenomeInfoDb_1.34.0
#> [17] IRanges_2.32.0 S4Vectors_0.36.0
#> [19] BiocGenerics_0.44.0 MatrixGenerics_1.10.0
#> [21] matrixStats_0.62.0 BiocStyle_2.26.0
#>
#> loaded via a namespace (and not attached):
#> [1] ggtree_3.6.0 ggnewscale_0.4.8
#> [3] ggbeeswarm_0.6.0 colorspace_2.0-3
#> [5] ellipsis_0.3.2 BiocNeighbors_1.16.0
#> [7] aplot_0.1.8 farver_2.1.1
#> [9] graphlayouts_0.8.3 ggrepel_0.9.1
#> [11] bit64_4.0.5 fansi_1.0.3
#> [13] decontam_1.18.0 codetools_0.2-18
#> [15] splines_4.2.1 sparseMatrixStats_1.10.0
#> [17] cachem_1.0.6 knitr_1.40
#> [19] polyclip_1.10-4 jsonlite_1.8.3
#> [21] cluster_2.1.4 ggforce_0.4.1
#> [23] BiocManager_1.30.19 compiler_4.2.1
#> [25] assertthat_0.2.1 Matrix_1.5-1
#> [27] fastmap_1.1.0 lazyeval_0.2.2
#> [29] cli_3.4.1 tweenr_2.0.2
#> [31] BiocSingular_1.14.0 htmltools_0.5.3
#> [33] tools_4.2.1 rsvd_1.0.5
#> [35] igraph_1.3.5 gtable_0.3.1
#> [37] glue_1.6.2 GenomeInfoDbData_1.2.9
#> [39] reshape2_1.4.4 dplyr_1.0.10
#> [41] Rcpp_1.0.9 jquerylib_0.1.4
#> [43] vctrs_0.5.0 ape_5.6-2
#> [45] nlme_3.1-160 DECIPHER_2.26.0
#> [47] DelayedMatrixStats_1.20.0 xfun_0.34
#> [49] stringr_1.4.1 beachmat_2.14.0
#> [51] lifecycle_1.0.3 irlba_2.3.5.1
#> [53] zlibbioc_1.44.0 MASS_7.3-58.1
#> [55] scales_1.2.1 tidygraph_1.2.2
#> [57] parallel_4.2.1 RColorBrewer_1.1-3
#> [59] yaml_2.3.6 memoise_2.0.1
#> [61] gridExtra_2.3 ggfun_0.0.7
#> [63] yulab.utils_0.0.5 sass_0.4.2
#> [65] stringi_1.7.8 RSQLite_2.2.18
#> [67] highr_0.9 ScaledMatrix_1.6.0
#> [69] tidytree_0.4.1 permute_0.9-7
#> [71] BiocParallel_1.32.0 rlang_1.0.6
#> [73] pkgconfig_2.0.3 bitops_1.0-7
#> [75] evaluate_0.17 lattice_0.20-45
#> [77] purrr_0.3.5 labeling_0.4.2
#> [79] treeio_1.22.0 cowplot_1.1.1
#> [81] bit_4.0.4 tidyselect_1.2.0
#> [83] plyr_1.8.7 magrittr_2.0.3
#> [85] bookdown_0.29 R6_2.5.1
#> [87] magick_2.7.3 generics_0.1.3
#> [89] DelayedArray_0.24.0 DBI_1.1.3
#> [91] pillar_1.8.1 withr_2.5.0
#> [93] mgcv_1.8-41 RCurl_1.98-1.9
#> [95] tibble_3.1.8 crayon_1.5.2
#> [97] utf8_1.2.2 rmarkdown_2.17
#> [99] viridis_0.6.2 grid_4.2.1
#> [101] blob_1.2.3 vegan_2.6-4
#> [103] digest_0.6.30 tidyr_1.2.1
#> [105] gridGraphics_0.5-1 munsell_0.5.0
#> [107] DirichletMultinomial_1.40.0 ggplotify_0.1.0
#> [109] beeswarm_0.4.0 viridisLite_0.4.1
#> [111] vipor_0.4.5 bslib_0.4.0
Holmes, Ian, Keith Harris, and Christopher Quince. 2012. “Dirichlet Multinomial Mixtures: Generative Models for Microbial Metagenomics.” PLOS ONE. https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0030126.
Lahti, Leo, Sudarshan Shetty, and Felix GM Ernst. 2021. “Orchestrating Microbiome Analysis.” https://microbiome.github.io/OMA/.