You can’t even begin to understand biology, you can’t understand life, unless you understand what it’s all there for, how it arose - and that means evolution.
— Richard Dawkins
Citation
If you use ggtree in published research, please cite:
G Yu, DK Smith, H Zhu, Y Guan, TTY Lam*. ggtree: an R package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods in Ecology and Evolution. 2017, 8(1):28-36. doi: 10.1111/2041-210X.12628.
Introduction
This project arose from our needs to annotate nucleotide substitutions in the phylogenetic tree, and we found that there is no tree visualization software can do this easily. Existing tree viewers are designed for displaying phylogenetic tree, but not annotating it. Although some tree viewers can displaying bootstrap values in the tree, it is hard/impossible to display other information in the tree. Our first solution for displaying nucleotide substituitions in the tree is to add this information in the node/tip names and use traditional tree viewer to show it. We displayed the information in the tree successfully, but we believe this indirect approach is inefficient.
Previously, phylogenetic trees were much smaller. Annotation of phylogenetic trees was not as necessary as nowadays much more data is becomming available. We want to associate our experimental data, for instance antigenic change, with the evolution relationship. Visualizing these associations in a phylogenetic tree can help us to identify evolution patterns. We believe we need a next generation tree viewer that should be programmable and extensible. It can view a phylogenetic tree easily as we did with classical software and support adding annotation data in a layer above the tree. This is the objective of developing the ggtree (Yu et al. 2017). Common tasks of annotating a phylogenetic tree should be easy and complicated tasks can be possible to achieve by adding multiple layers of annotation.
The ggtree is designed by extending the ggplot2 (Wickham 2009) package. It is based on the grammar of graphics and takes all the good parts of ggplot2. There are other R packages that implement tree viewer using ggplot2, including OutbreakTools, phyloseq (McMurdie and Holmes 2013) and ggphylo; they mostly create complex tree view functions for their specific needs. Internally, these packages interpret a phylogenetic as a collection of lines, which makes it hard to annotate diverse user input that are related to node (taxa). The ggtree is different to them by interpreting a tree as a collection of taxa and allowing general flexibilities of annotating phylogenetic tree with diverse types of user inputs.
Getting data into R
Most of the tree viewer software (including R packages) focus on Newick and Nexus file format, while there are file formats from different evolution analysis software that contain supporting evidences within the file that are ready for annotating a phylogenetic tree. The treeio package supports several file formats and software outputs. It brings analysis findings to R users for further analysis (e.g. summarization, visualization, comparison and test, etc.). It also allows external data to be mapped on the phylogeny. Please refer to the treeio vignette for more details.
Users can use the following command to open the vignette:
All the data parsed/integrated by treeio package can be used to visualize or annotate phylogenetic tree in ggtree (Yu et al. 2017).
Tree Visualization and Annotation
Tree Visualization in ggtree is easy, with one line of command ggtree(tree_object)
. It supports several layouts, including rectangular, slanted, circular and fan for phylogram and cladogram, equal_angle and daylight for unrooted layout, time-scaled and two dimentional phylogenies. Tree Visualization vignette describes these feature in details.
We implement several functions to manipulate a phylogenetic tree visually, including viewing selected clade to explore large tree, taxa clustering, rotating clade or tree, zoom out or collapsing clades etc..
Function | Descriptiotn |
---|---|
collapse | collapse a selecting clade |
expand | expand collapsed clade |
flip | exchange position of 2 clades that share a parent node |
groupClade | grouping clades |
groupOTU | grouping OTUs by tracing back to most recent common ancestor |
identify | interactive tree manipulation |
rotate | rotating a selected clade by 180 degree |
rotate_tree | rotating circular layout tree by specific angle |
scaleClade | zoom in or zoom out selecting clade |
open_tree | convert a tree to fan layout by specific open angle |
Details and examples can be found in Tree Manipulation vignette.
Most of the phylogenetic trees are scaled by evolutionary distance (substitution/site), in ggtree a phylogenetic tree can be re-scaled by any numerical variable inferred by evolutionary analysis ( e.g. species divergence time, dN/dS, etc). Numerical and category variable can be used to color a phylogenetic tree.
The ggtree package provides several layers to annotate a phylogenetic tree. These layers are building blocks that can be freely combined together to create complex tree visualization.
Layer | Description |
---|---|
geom_balance | highlights the two direct descendant clades of an internal node |
geom_cladelabel | annotate a clade with bar and text label |
geom_cladelabel2 | annotate a clade with bar and text label for unrooted layout |
geom_hilight | highlight a clade with rectangle |
geom_hilight_encircle | highlight a clade with xspline for unrooted layout |
geom_label2 | modified version of geom_label, with subsetting supported |
geom_nodelab | layer for node labels, which can be text or image |
geom_nodepoint | annotate internal nodes with symbolic points |
geom_point2 | modified version of geom_point, with subsetting supported |
geom_range | bar layer to present uncertainty of evolutionary inference |
geom_rootpoint | annotate root node with symbolic point |
geom_segment2 | modified version of geom_segment, with subsetting supported |
geom_strip | annotate associated taxa with bar and (optional) text label |
geom_taxalink | associate two related taxa by linking them with a curve |
geom_text2 | modified version of geom_text, with subsetting supported |
geom_tiplab | layer of tip labels, which can be text or image |
geom_tiplab2 | layer of tip labels for circular layout |
geom_tippoint | annotate external nodes with symbolic points |
geom_tree | tree structure layer, with multiple layout supported |
geom_treescale | tree branch scale legend |
ggtree supports creating phylomoji using Emoji fonts, please refer to the Phylomoji vignette.
ggtree integrates phylopic database and silhouette images of organisms can be downloaded and used to annotate phylogenetic directly. ggtree also supports using local or remote images to annotate a phylogenetic tree. For details, please refer to the ggimage package vignette, which can be opened via the following command:
Visualizing an annotated phylogenetic tree with numerical matrix (e.g. genotype table), multiple sequence alignment and subplots are also supported in ggtree
. Examples of annotating phylogenetic trees can be found in the Tree Annotation vignette.
Vignette Entry
- Tree Data Import
- Tree Visualization
- Tree Manipulation
- Tree Annotation
- Phylomoji
- Annotating phylogenetic tree with images
- Annotate a phylogenetic tree with insets
ggtree homepage: https://guangchuangyu.github.io/software/ggtree (contains more information about the package, more documentation, a gallery of beautiful published images and links to related resources).
Need helps?
If you have questions/issues, please visit ggtree homepage first. Your problems are mostly documented. If you think you found a bug, please follow the guide and provide a reproducible example to be posted on github issue tracker. For questions, please post to google group. Users are highly recommended to subscribe to the mailing list.
For Chinese user, you can follow me on WeChat (微信).
Session info
Here is the output of sessionInfo()
on the system on which this document was compiled:
## R version 3.5.1 Patched (2018-07-12 r74967)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.5 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.7-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.7-bioc/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## loaded via a namespace (and not attached):
## [1] compiler_3.5.1 backports_1.1.2 magrittr_1.5 rprojroot_1.3-2
## [5] htmltools_0.3.6 tools_3.5.1 prettydoc_0.2.1 yaml_2.2.0
## [9] Rcpp_0.12.18 stringi_1.2.4 rmarkdown_1.10 highr_0.7
## [13] knitr_1.20 stringr_1.3.1 digest_0.6.15 evaluate_0.11
References
McMurdie, Paul J., and Susan Holmes. 2013. “Phyloseq: An R Package for Reproducible Interactive Analysis and Graphics of Microbiome Census Data.” PLoS ONE 8 (4):e61217. https://doi.org/10.1371/journal.pone.0061217.
Wickham, Hadley. 2009. Ggplot2: Elegant Graphics for Data Analysis. 1st ed. Springer.
Yu, Guangchuang, David K. Smith, Huachen Zhu, Yi Guan, and Tommy Tsan-Yuk Lam. 2017. “Ggtree: An R Package for Visualization and Annotation of Phylogenetic Trees with Their Covariates and Other Associated Data.” Methods in Ecology and Evolution 8 (1):28–36. https://doi.org/10.1111/2041-210X.12628.