Annotate clades

ggtree (Yu et al. 2017) implements geom_cladelabel layer to annotate a selected clade with a bar indicating the clade with a corresponding label.

The geom_cladelabel layer accepts a selected internal node number. To get the internal node number, please refer to Tree Manipulation vignette.

Users can set the parameter, align = TRUE, to align the clade label, and use the parameter, offset, to adjust the position.

Users can change the color of the clade label via the parameter color.

Users can change the angle of the clade label text and relative position from text to bar via the parameter offset.text.

The size of the bar and text can be changed via the parameters barsize and fontsize respectively.

Users can also use geom_label to label the text.

Labelling associated taxa (Monophyletic, Polyphyletic or Paraphyletic)

geom_cladelabel is designed for labelling Monophyletic (Clade) while there are related taxa that are not form a clade. ggtree provides geom_strip to add a strip/bar to indicate the association with optional label (see the issue).

Highlight clades

ggtree implements geom_hilight layer, that accepts an internal node number and add a layer of rectangle to highlight the selected clade.

Another way to highlight selected clades is setting the clades with different colors and/or line types as demonstrated in Tree Manipulation vignette.

Highlight balances

In addition to geom_hilight, ggtree also implements geom_balance which is designed to highlight neighboring subclades of a given internal node.

Highlight clades for unrooted tree

ggtree provides geom_hilight_encircle to support highlight clades for unrooted layout trees.

Taxa connection

Some evolutionary events (e.g. reassortment, horizontal gene transfer) can be modeled by a simple tree. ggtree provides geom_taxalink layer that allows drawing straight or curved lines between any of two nodes in the tree, allow it to represent evolutionary events by connecting taxa.

Tree annotation with output from evolution software

The treeio package implemented several parser functions to parse output from commonly used software in evolutionary biology.

Here, we used BEAST (Bouckaert et al. 2014) output as an example. For details, please refer to the Importer vignette.

Tree annotation with user specified annotation

Integrating user data to annotate phylogenetic tree can be done at different levels. The treeio package implements full_join methods to combine tree data to phylogenetic tree object. The tidytree package supports linking tree data to phylogeny using tidyverse verbs. ggtree supports mapping external data to phylogeny for visualization and annotation on the fly.

The %<+% operator

Suppose we have the following data that associate with the tree and would like to attach the data in the tree.

taxa place value
D GZ 78.4
K CZ 72.7
C GZ 83.0
H HK 102.6
E GZ 75.3
M NA 67.1
J CZ 70.4
A GZ 51.5
B GZ 56.6
L CZ 79.6
F HK 55.9
I CZ 68.0
G HK 86.1

We can imaging that the place column stores the location that we isolated the species and value column stores numerical values (e.g. bootstrap values).

We have demonstrated using the operator, %<%, to update a tree view with a new tree. Here, we will introduce another operator, %<+%, that attaches annotation data to a tree view. The only requirement of the input data is that its first column should be matched with the node/tip labels of the tree.

After attaching the annotation data to the tree by %<+%, all the columns in the data are visible to ggtree. As an example, here we attach the above annotation data to the tree view, p, and add a layer that showing the tip labels and colored them by the isolation site stored in place column.

Once the data was attached, it is always attached. So that we can add other layers to display these information easily.

Visualize tree with associated matrix

The gheatmap function is designed to visualize phylogenetic tree with heatmap of associated matrix.

In the following example, we visualized a tree of H3 influenza viruses with their associated genotype.

The width parameter is to control the width of the heatmap. It supports another parameter offset for controlling the distance between the tree and the heatmap, for instance to allocate space for tip labels.

For time-scaled tree, as in this example, it’s more often to use x axis by using theme_tree2. But with this solution, the heatmap is just another layer and will change the x axis. To overcome this issue, we implemented scale_x_ggtree to set the x axis more reasonable.

Visualize tree with multiple sequence alignment

With msaplot function, user can visualize multiple sequence alignment with phylogenetic tree, as demonstrated below:

A specific slice of the alignment can also be displayed by specific window parameter.

Plot tree with associated data

For associating phylogenetic tree with different type of plot produced by user’s data, ggtree provides facet_plot function which accepts an input data.frame and a geom function to draw the input data. The data will be displayed in an additional panel of the plot.

Plot tree with images and suplots

Please refer to the following vignettes:

References

Bouckaert, Remco, Joseph Heled, Denise Kühnert, Tim Vaughan, Chieh-Hsi Wu, Dong Xie, Marc A. Suchard, Andrew Rambaut, and Alexei J. Drummond. 2014. “BEAST 2: A Software Platform for Bayesian Evolutionary Analysis.” PLoS Comput Biol 10 (4):e1003537. https://doi.org/10.1371/journal.pcbi.1003537.

Yu, Guangchuang, David K. Smith, Huachen Zhu, Yi Guan, and Tommy Tsan-Yuk Lam. 2017. “Ggtree: An R Package for Visualization and Annotation of Phylogenetic Trees with Their Covariates and Other Associated Data.” Methods in Ecology and Evolution 8 (1):28–36. https://doi.org/10.1111/2041-210X.12628.