You can’t even begin to understand biology, you can’t understand life, unless you understand what it’s all there for, how it arose - and that means evolution. — Richard Dawkins

Citation

If you use ggtree in published research, please cite:

G Yu, D Smith, H Zhu, Y Guan, TTY Lam,
ggtree: an R package for visualization and annotation of phylogenetic tree with different types of meta-data.
submitted.

Introduction

This project arose from our needs to annotate nucleotide substitutions in the phylogenetic tree, and we found that there is no tree visualization software can do this easily. Existing tree viewers are designed for displaying phylogenetic tree, but not annotating it. Although some tree viewers can displaying bootstrap values in the tree, it is hard/impossible to display other information in the tree. Our first solution for displaying nucleotide substituitions in the tree is to add this information in the node/tip names and use traditional tree viewer to show it. We displayed the information in the tree successfully, but We believe this indirect approach is inefficient.

In the old day, phylogenetic tree is often small. At that time, as we almost didn’t have a need to annotate a tree; displaying the evolution relationships is mostly enough. Nowadays, we can obtain a lot of data from different experiments, and we want to associate our data, for instance antigenic change, with the evolution relationship. Visualizing these associations in the phylogenetic tree can help us to identify evolution patterns. We believe we need a next generation tree viewer that should be programmable and extensible. It can view a phylogenetic tree easily as we did with classical software and support adding annotation data in a layer above the tree. This is the objective of developing the ggtree. Common tasks of annotating a phylogenetic tree should be easy and complicated tasks can be possible to achieve by adding multiple layers of annotation.

The ggtree is designed by extending the ggplot21 package. It is based on the grammar of graphics and takes all the good parts of ggplot2. There are other R packages that implement tree viewer using ggplot2, including OutbreakTools, phyloseq2 and ggphylo; they mostly create complex tree view functions for their specific needs. Internally, these packages interpret a phylogenetic as a collection of lines, which makes it hard to annotate diverse user input that are related to node (taxa). The ggtree is different to them by interpreting a tree as a collection of taxa and allowing general flexibilities of annotating phylogenetic tree with diverse types of user inputs.

Tree visualization

viewing tree with ggtree

ggtree extend ggplot to support viewing phylogenetic tree. It implements geom_tree layer for displaying phylogenetic trees, as shown below:

nwk <- system.file("extdata", "sample.nwk", package="ggtree")
x <- readLines(nwk)
cat(substring(x, 1, 56), "\n", substring(x, 57), "\n")
## (((((((A:4,B:4):6,C:5):8,D:6):3,E:21):10,((F:4,G:12):14, 
##  H:8):13):13,((I:5,J:2):30,(K:11,L:11):2):17):4,M:56);
library("ggplot2")
library("ggtree")

tree <- read.tree(nwk)
ggplot(tree, aes(x, y)) + geom_tree() + theme_tree() + xlab("") + ylab("")

This example tree was obtained from Chapter 34 of Inferring Phylogenies3. The function, ggtree, was implemented as a short cut to visualize a tree, and it works exactly the same as shown above.

ggtree takes all the advantages of ggplot2. For example, we can change the color, size and type of the lines as we do with ggplot2.

ggtree(tree, color="steelblue", size=0.5, linetype="dotted")

By default, the tree is viewing in ladderize form, user can set the parameter ladderize = FALSE to disable it.

ggtree(tree, ladderize=FALSE)

The branch.length is used to scale the edge, user can set the parameter branch.length = "none" to only viewing the tree topology (cladogram) or other numerical variable to scale the tree (e.g. dN/dS).

ggtree(tree, branch.length="none")

support multiple phylogenetic classes

ggtree defined several S4 classes to store phylogenetic object and its associated annotation, including:

In addition, it also supports phylo (defined by ape4), and phylo4 (defined by phylobase)

User can use ggtree(object) command to view the phylogenetic tree directly, and annotation data stored in these objects can be added as demonstrated in Tree annotation with output from evolution software session.

layout

Currently, ggtree supports several layout, including:

for Phylogram (by default) and Cladogram if user explicitly setting branch.length='none'.

And unrooted layout.

Unrooted layout was implemented by the equal-angle algorithm that described in Inferring Phylogenies3.

library("gridExtra")
grid.arrange(ggtree(tree) + ggtitle("(Phylogram) rectangular layout"),
             ggtree(tree, branch.length='none') + ggtitle("(Cladogram) rectangular layout"),
         ggtree(tree, layout="slanted") + ggtitle("(Phylogram) slanted layout"),
             ggtree(tree, layout="slanted", branch.length='none') + ggtitle("(Cladogram) slanted layout"),
         ggtree(tree, layout="circular") + ggtitle("(Phylogram) circular layout"),
             ggtree(tree, layout="circular", branch.length="none") + ggtitle("(Cladogram) circular layout"),
         ggtree(tree, layout="unrooted") + ggtitle("unrooted layout"),
         ncol=2)

two dimensional tree

ggtree implemented 2 dimensional tree. It accepts parameter yscale to scale the y-axis based on the selected tree attribute. The attribute should be numerical variable. If it is character/category variable, user should provides a name vector of mapping the variable to numeric by passing it to parameter yscale_mapping.

tree2d <- read.beast(system.file("extdata", "twoD.tree", package="ggtree"))
ggtree(tree2d, mrsd = "2014-05-01",
       yscale="NGS", yscale_mapping=c(N2=2, N3=3, N4=4, N5=5, N6=6, N7=7)) +
           theme_classic() + 
               theme(panel.grid.major=element_line(color="grey20", linetype="dotted", size=.3),
                     panel.grid.major.y=element_blank()) +
                         scale_y_continuous(labels=paste0("N", 2:7))

In this example, the figure demonstrates the quantity of y increase along the trunk. User can highlight the trunk with different line size or color using the functions we described below.

display evolution distance

To show evolution distance, user can use add_legend function.

ggtree(tree) %>% add_legend()

We can also use theme_tree2() or ggtree(showDistance=TRUE)

ggtree(tree) + theme_tree2()

display nodes/tips

Show all the internal nodes and tips in the tree can be done by adding a layer of points using geom_nodepoint, geom_tippoint or geom_point.

ggtree(tree)+geom_point(aes(shape=isTip, color=isTip), size=3)

p <- ggtree(tree) + geom_nodepoint(color="#b5e521", alpha=1/4, size=10)
p + geom_tippoint(color="#FDAC4F", shape=8, size=3)

display labels

Users can use geom_text to display the node/tip labels:

p + geom_text(aes(label=label), size=3, color="purple", hjust=-0.3)

For circular and unrooted layout, ggtree supports rotating node labels according to the angles of the branches.

ggtree(tree, layout="circular") + geom_text(aes(label=label, angle=angle), size=3, color="purple", vjust=-0.3)

By default, the positions are based on the node positions, we can change them to based on the middle of the branch/edge.

p + geom_text(aes(x=branch, label=label), size=3, color="purple", vjust=-0.3)

Based on the middle of branches is very useful when annotating transition from parent node to child node.

theme

theme_tree() defined a totally blank canvas, while theme_tree2() add phylogenetic distance legend. These two themes all accept a parameter of bgcolor that defined the background color.

grid.arrange(
    ggtree(rtree(30), color="red") + theme_tree("steelblue"),
    ggtree(rtree(20), color="white") + theme_tree("black"),
    ncol=2)

update tree viewing with a new tree

In the display nodes/tips section, we have a p object that stored the tree viewing of 13 tips and internal nodes highlighted with specific colored big dots. If you want to applied this pattern (we can imaging a more complex one) to a new tree, you don’t need to build the tree step by step. ggtree provides an operator, %<%, for applying the visualization pattern to a new tree.

For example, the pattern in the p object will be applied to a new tree with 50 tips as shown below:

p %<% rtree(50)

Another example can be found in CODEML session.

Tree annotation

zoom on a portion of tree

ggtree provides gzoom function that similar to zoom function provided in ape. This function plots simultaneously a whole phylogenetic tree and a portion of it. It aims at exploring very large trees.

library("ape")
data(chiroptera)
library("ggtree")
gzoom(chiroptera, grep("Plecotus", chiroptera$tip.label))

color tree

In ggtree, coloring phylogenetic tree is easy, by using aes(color=VAR) to map the color of tree based on a specific variable (numeric and category are both supported).

ggtree(tree, aes(color=branch.length)) +
    scale_color_continuous(low="green", high="red") +
        theme(legend.position="bottom")

User can use any feature, including clade posterior and dN/dS etc., to scale the color of the tree.

annotate clade

ggtree implements annotation_clade and annotation_clade2 functions to annotate a selected clade with a bar indicating that clade with a corresponding label.

The annotation_clade function accepts a selected internal node number and annotates that selected clade, while annotation_clade2 functions accepts two tip labels (upper one and lower one) to annotate the clade.

User can use geom_text to display all the node numbers, and select interesting clade to annotate.

ggtree(tree) + geom_text(aes(label=node))

p <- ggtree(tree) + geom_tiplab()
annotation_clade(p, node=17, "selected clade", offset.text=2)

annotation_clade2(p, "B", "E", "Clade X", offset.text=2) %>%
    annotation_clade2("G", "H", "Clade Y", bar.size=4, font.size=8, offset=5, offset.text=4, color="steelblue")

The parameter bar.size is used to control the width of the bar and the font.size parameter is to control the font size of the clade lable. The parameter offset is used to control the distance from the annotation to the tree, while offset.text to control the distance from clade label to bar.

highlight clades

ggtree implements hilight function, that accepts tree view and internal node number and add a layer of rectangle to highlight the selected clade.

ggtree(tree) %>% hilight(node=21, fill="steelblue", alpha=.6) %>%
    hilight(node=17, fill="darkgreen", alpha=.6)

ggtree(tree, layout="fan") %>% hilight(node=21, fill="steelblue", alpha=.6) %>%
     hilight(node=23, fill="darkgreen", alpha=.6)

Another way to highlight selected clades is setting the clades with different colors and/or line types as demonstrated in group clades section.

collapse clade

With collapse function, user can collapse a selected clade.

cp <- ggtree(tree) %>% collapse(node=21)
cp + geom_point(subset=.(node == 21), size=5, shape=23, fill="steelblue")

expand collapsed clade

The collapsed clade can be expanded via expand function.

cp %>% expand(node=21)

flip clades

The positions of two selected branches can be flip over using flip function.

set.seed(2015-06-30)
p1 <- ggtree(rtree(30)) + geom_text(aes(label=node))
p2 <- flip(p1, node1=45, node2=33)
p3 <- flip(p2, 32, 58)
grid.arrange(p1, p2, p3, ncol=3)

rotate clade

A selected clade can be rotated by 180 degree using rotate function.

set.seed(2015-07-01)
p1 <- ggtree(rtree(30)) + geom_text(aes(label=node))
p1 <- hilight(p1, 33)
p2 <- rotate(p1, 33)
grid.arrange(p1, p2, ncol=2)

group OTUs

ggtree provides groupOTU function to group tips and all their related ancestors.

tree <- groupOTU(tree, focus=c("A", "B", "C", "D", "E"))
ggtree(tree, aes(color=group)) + geom_tiplab()

groupOTU can also input a list of tip groups.

cls <- list(c1=c("A", "B", "C", "D", "E"),
            c2=c("F", "G", "H"),
            c3=c("L", "K", "I", "J"),
            c4="M")

tree <- groupOTU(tree, cls)
library("colorspace")
ggtree(tree, aes(color=group, linetype=group)) + geom_text(aes(label=label),  hjust=-.25) +
     scale_color_manual(values=c("black", rainbow_hcl(4))) + theme(legend.position="right")