You can’t even begin to understand biology, you can’t understand life, unless you understand what it’s all there for, how it arose - and that means evolution.

— Richard Dawkins

Citation

If you use ggtree in published research, please cite:

G Yu, DK Smith, H Zhu, Y Guan, TTY Lam*. ggtree: an R package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods in Ecology and Evolution. 2017, 8(1):28-36. doi: 10.1111/2041-210X.12628.

Introduction

This project arose from our needs to annotate nucleotide substitutions in the phylogenetic tree, and we found that there is no tree visualization software can do this easily. Existing tree viewers are designed for displaying phylogenetic tree, but not annotating it. Although some tree viewers can displaying bootstrap values in the tree, it is hard/impossible to display other information in the tree. Our first solution for displaying nucleotide substituitions in the tree is to add this information in the node/tip names and use traditional tree viewer to show it. We displayed the information in the tree successfully, but we believe this indirect approach is inefficient.

Previously, phylogenetic trees were much smaller. Annotation of phylogenetic trees was not as necessary as nowadays much more data is becomming available. We want to associate our experimental data, for instance antigenic change, with the evolution relationship. Visualizing these associations in a phylogenetic tree can help us to identify evolution patterns. We believe we need a next generation tree viewer that should be programmable and extensible. It can view a phylogenetic tree easily as we did with classical software and support adding annotation data in a layer above the tree. This is the objective of developing the ggtree (Yu et al. 2017). Common tasks of annotating a phylogenetic tree should be easy and complicated tasks can be possible to achieve by adding multiple layers of annotation.

The ggtree is designed by extending the ggplot2 (Wickham 2009) package. It is based on the grammar of graphics and takes all the good parts of ggplot2. There are other R packages that implement tree viewer using ggplot2, including OutbreakTools, phyloseq (McMurdie and Holmes 2013) and ggphylo; they mostly create complex tree view functions for their specific needs. Internally, these packages interpret a phylogenetic as a collection of lines, which makes it hard to annotate diverse user input that are related to node (taxa). The ggtree is different to them by interpreting a tree as a collection of taxa and allowing general flexibilities of annotating phylogenetic tree with diverse types of user inputs.

Getting data into R

Most of the tree viewer software (including R packages) focus on Newick and Nexus file format, while there are file formats from different evolution analysis software that contain supporting evidences within the file that are ready for annotating a phylogenetic tree. The treeio package supports several file formats and software outputs. It brings analysis findings to R users for further analysis (e.g. summarization, visualization, comparison and test, etc.). It also allows external data to be mapped on the phylogeny. Please refer to the treeio vignette for more details.

Users can use the following command to open the vignette:

All the data parsed/integrated by treeio package can be used to visualize or annotate phylogenetic tree in ggtree (Yu et al. 2017).

Tree Visualization and Annotation

Tree Visualization in ggtree is easy, with one line of command ggtree(tree_object). It supports several layouts, including rectangular, slanted, circular and fan for phylogram and cladogram, equal_angle and daylight for unrooted layout, time-scaled and two dimentional phylogenies. Tree Visualization vignette describes these feature in details.

We implement several functions to manipulate a phylogenetic tree visually, including viewing selected clade to explore large tree, taxa clustering, rotating clade or tree, zoom out or collapsing clades etc..

Tree manipulation functions.
Function Descriptiotn
collapse collapse a selecting clade
expand expand collapsed clade
flip exchange position of 2 clades that share a parent node
groupClade grouping clades
groupOTU grouping OTUs by tracing back to most recent common ancestor
identify interactive tree manipulation
rotate rotating a selected clade by 180 degree
rotate_tree rotating circular layout tree by specific angle
scaleClade zoom in or zoom out selecting clade
open_tree convert a tree to fan layout by specific open angle

Details and examples can be found in Tree Manipulation vignette.

Most of the phylogenetic trees are scaled by evolutionary distance (substitution/site), in ggtree a phylogenetic tree can be re-scaled by any numerical variable inferred by evolutionary analysis ( e.g. species divergence time, dN/dS, etc). Numerical and category variable can be used to color a phylogenetic tree.

The ggtree package provides several layers to annotate a phylogenetic tree. These layers are building blocks that can be freely combined together to create complex tree visualization.

Geom layers defined in ggtree.
Layer Description
geom_balance highlights the two direct descendant clades of an internal node
geom_cladelabel annotate a clade with bar and text label
geom_cladelabel2 annotate a clade with bar and text label for unrooted layout
geom_hilight highlight a clade with rectangle
geom_hilight_encircle highlight a clade with xspline for unrooted layout
geom_label2 modified version of geom_label, with subsetting supported
geom_nodelab layer for node labels, which can be text or image
geom_nodepoint annotate internal nodes with symbolic points
geom_point2 modified version of geom_point, with subsetting supported
geom_range bar layer to present uncertainty of evolutionary inference
geom_rootpoint annotate root node with symbolic point
geom_segment2 modified version of geom_segment, with subsetting supported
geom_strip annotate associated taxa with bar and (optional) text label
geom_taxalink associate two related taxa by linking them with a curve
geom_text2 modified version of geom_text, with subsetting supported
geom_tiplab layer of tip labels, which can be text or image
geom_tiplab2 layer of tip labels for circular layout
geom_tippoint annotate external nodes with symbolic points
geom_tree tree structure layer, with multiple layout supported
geom_treescale tree branch scale legend

ggtree supports creating phylomoji using Emoji fonts, please refer to the Phylomoji vignette.

ggtree integrates phylopic database and silhouette images of organisms can be downloaded and used to annotate phylogenetic directly. ggtree also supports using local or remote images to annotate a phylogenetic tree. For details, please refer to the ggimage package vignette, which can be opened via the following command:

Visualizing an annotated phylogenetic tree with numerical matrix (e.g. genotype table), multiple sequence alignment and subplots are also supported in ggtree. Examples of annotating phylogenetic trees can be found in the Tree Annotation vignette.

Vignette Entry

ggtree homepage: https://guangchuangyu.github.io/software/ggtree (contains more information about the package, more documentation, a gallery of beautiful published images and links to related resources).

Need helps?

If you have questions/issues, please visit ggtree homepage first. Your problems are mostly documented. If you think you found a bug, please follow the guide and provide a reproducible example to be posted on github issue tracker. For questions, please post to google group. Users are highly recommended to subscribe to the mailing list.

For Chinese user, you can follow me on WeChat (微信).

Session info

Here is the output of sessionInfo() on the system on which this document was compiled:

## R version 3.5.1 Patched (2018-07-12 r74967)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.5 LTS
## 
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.7-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.7-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] compiler_3.5.1  backports_1.1.2 magrittr_1.5    rprojroot_1.3-2
##  [5] htmltools_0.3.6 tools_3.5.1     prettydoc_0.2.1 yaml_2.2.0     
##  [9] Rcpp_0.12.18    stringi_1.2.4   rmarkdown_1.10  highr_0.7      
## [13] knitr_1.20      stringr_1.3.1   digest_0.6.15   evaluate_0.11

References

McMurdie, Paul J., and Susan Holmes. 2013. “Phyloseq: An R Package for Reproducible Interactive Analysis and Graphics of Microbiome Census Data.” PLoS ONE 8 (4):e61217. https://doi.org/10.1371/journal.pone.0061217.

Wickham, Hadley. 2009. Ggplot2: Elegant Graphics for Data Analysis. 1st ed. Springer.

Yu, Guangchuang, David K. Smith, Huachen Zhu, Yi Guan, and Tommy Tsan-Yuk Lam. 2017. “Ggtree: An R Package for Visualization and Annotation of Phylogenetic Trees with Their Covariates and Other Associated Data.” Methods in Ecology and Evolution 8 (1):28–36. https://doi.org/10.1111/2041-210X.12628.