You can’t even begin to understand biology, you can’t understand life, unless you understand what it’s all there for, how it arose - and that means evolution. — Richard Dawkins
If you use ggtree in published research, please cite:
G Yu, D Smith, H Zhu, Y Guan, TTY Lam,
ggtree: an R package for visualization and annotation of phylogenetic tree with different types of meta-data.
submitted.
This project arose from our needs to annotate nucleotide substitutions in the phylogenetic tree, and we found that there is no tree visualization software can do this easily. Existing tree viewers are designed for displaying phylogenetic tree, but not annotating it. Although some tree viewers can displaying bootstrap values in the tree, it is hard/impossible to display other information in the tree. Our first solution for displaying nucleotide substituitions in the tree is to add this information in the node/tip names and use traditional tree viewer to show it. We displayed the information in the tree successfully, but We believe this indirect approach is inefficient.
In the old day, phylogenetic tree is often small. At that time, as we almost didn’t have a need to annotate a tree; displaying the evolution relationships is mostly enough. Nowadays, we can obtain a lot of data from different experiments, and we want to associate our data, for instance antigenic change, with the evolution relationship. Visualizing these associations in the phylogenetic tree can help us to identify evolution patterns. We believe we need a next generation tree viewer that should be programmable and extensible. It can view a phylogenetic tree easily as we did with classical software and support adding annotation data in a layer above the tree. This is the objective of developing the ggtree. Common tasks of annotating a phylogenetic tree should be easy and complicated tasks can be possible to achieve by adding multiple layers of annotation.
The ggtree is designed by extending the ggplot21 package. It is based on the grammar of graphics and takes all the good parts of ggplot2. There are other R packages that implement tree viewer using ggplot2, including OutbreakTools, phyloseq2 and ggphylo; they mostly create complex tree view functions for their specific needs. Internally, these packages interpret a phylogenetic as a collection of lines
, which makes it hard to annotate diverse user input that are related to node (taxa). The ggtree is different to them by interpreting a tree as a collection of taxa
and allowing general flexibilities of annotating phylogenetic tree with diverse types of user inputs.
ggtree
ggtree extend ggplot
to support viewing phylogenetic tree. It implements geom_tree
layer for displaying phylogenetic trees, as shown below:
nwk <- system.file("extdata", "sample.nwk", package="ggtree")
x <- readLines(nwk)
cat(substring(x, 1, 56), "\n", substring(x, 57), "\n")
## (((((((A:4,B:4):6,C:5):8,D:6):3,E:21):10,((F:4,G:12):14,
## H:8):13):13,((I:5,J:2):30,(K:11,L:11):2):17):4,M:56);
library("ggplot2")
library("ggtree")
tree <- read.tree(nwk)
ggplot(tree, aes(x, y)) + geom_tree() + theme_tree() + xlab("") + ylab("")
This example tree was obtained from Chapter 34 of Inferring Phylogenies3. The function, ggtree
, was implemented as a short cut to visualize a tree, and it works exactly the same as shown above.
ggtree
takes all the advantages of ggplot2
. For example, we can change the color, size and type of the lines as we do with ggplot2
.
ggtree(tree, color="steelblue", size=0.5, linetype="dotted")
By default, the tree is viewing in ladderize form, user can set the parameter ladderize = FALSE
to disable it.
ggtree(tree, ladderize=FALSE)
The branch.length
is used to scale the edge, user can set the parameter branch.length = "none"
to only viewing the tree topology (cladogram) or other numerical variable to scale the tree (e.g. dN/dS).
ggtree(tree, branch.length="none")
ggtree defined several S4 classes to store phylogenetic object and its associated annotation, including:
beast
codeml_mlc
codeml
hyphy
jplace
paml_rst
raxml
r8s
In addition, it also supports phylo
(defined by ape4), and phylo4
(defined by phylobase)
User can use ggtree(object)
command to view the phylogenetic tree directly, and annotation data stored in these objects can be added as demonstrated in Tree annotation with output from evolution software
session.
Currently, ggtree
supports several layout, including:
rectangular
(by default)slanted
fan
or circular
for Phylogram
(by default) and Cladogram
if user explicitly setting branch.length='none'
.
And unrooted
layout.
Unrooted layout was implemented by the equal-angle algorithm
that described in Inferring Phylogenies3.
library("gridExtra")
grid.arrange(ggtree(tree) + ggtitle("(Phylogram) rectangular layout"),
ggtree(tree, branch.length='none') + ggtitle("(Cladogram) rectangular layout"),
ggtree(tree, layout="slanted") + ggtitle("(Phylogram) slanted layout"),
ggtree(tree, layout="slanted", branch.length='none') + ggtitle("(Cladogram) slanted layout"),
ggtree(tree, layout="circular") + ggtitle("(Phylogram) circular layout"),
ggtree(tree, layout="circular", branch.length="none") + ggtitle("(Cladogram) circular layout"),
ggtree(tree, layout="unrooted") + ggtitle("unrooted layout"),
ncol=2)
ggtree implemented 2 dimensional tree. It accepts parameter yscale
to scale the y-axis based on the selected tree attribute. The attribute should be numerical variable. If it is character/category variable, user should provides a name vector of mapping the variable to numeric by passing it to parameter yscale_mapping
.
tree2d <- read.beast(system.file("extdata", "twoD.tree", package="ggtree"))
ggtree(tree2d, mrsd = "2014-05-01",
yscale="NGS", yscale_mapping=c(N2=2, N3=3, N4=4, N5=5, N6=6, N7=7)) +
theme_classic() +
theme(panel.grid.major=element_line(color="grey20", linetype="dotted", size=.3),
panel.grid.major.y=element_blank()) +
scale_y_continuous(labels=paste0("N", 2:7))
In this example, the figure demonstrates the quantity of y increase along the trunk. User can highlight the trunk with different line size or color using the functions we described below.
To show evolution distance, user can use add_legend
function.
ggtree(tree) %>% add_legend()
We can also use theme_tree2()
or ggtree(showDistance=TRUE)
ggtree(tree) + theme_tree2()
Show all the internal nodes and tips in the tree can be done by adding a layer of points using geom_nodepoint
, geom_tippoint
or geom_point
.
ggtree(tree)+geom_point(aes(shape=isTip, color=isTip), size=3)
p <- ggtree(tree) + geom_nodepoint(color="#b5e521", alpha=1/4, size=10)
p + geom_tippoint(color="#FDAC4F", shape=8, size=3)
Users can use geom_text
to display the node/tip labels:
p + geom_text(aes(label=label), size=3, color="purple", hjust=-0.3)
For circular
and unrooted
layout, ggtree supports rotating node labels according to the angles of the branches.
ggtree(tree, layout="circular") + geom_text(aes(label=label, angle=angle), size=3, color="purple", vjust=-0.3)
By default, the positions are based on the node positions, we can change them to based on the middle of the branch/edge.
p + geom_text(aes(x=branch, label=label), size=3, color="purple", vjust=-0.3)
Based on the middle of branches is very useful when annotating transition from parent node to child node.
theme_tree()
defined a totally blank canvas, while theme_tree2()
add phylogenetic distance legend. These two themes all accept a parameter of bgcolor
that defined the background color.
grid.arrange(
ggtree(rtree(30), color="red") + theme_tree("steelblue"),
ggtree(rtree(20), color="white") + theme_tree("black"),
ncol=2)
In the display nodes/tips section, we have a p
object that stored the tree viewing of 13 tips and internal nodes highlighted with specific colored big dots. If you want to applied this pattern (we can imaging a more complex one) to a new tree, you don’t need to build the tree step by step. ggtree provides an operator, %<%
, for applying the visualization pattern to a new tree.
For example, the pattern in the p
object will be applied to a new tree with 50 tips as shown below:
p %<% rtree(50)
Another example can be found in CODEML
session.
ggtree provides gzoom
function that similar to zoom
function provided in ape. This function plots simultaneously a whole phylogenetic tree and a portion of it. It aims at exploring very large trees.
library("ape")
data(chiroptera)
library("ggtree")
gzoom(chiroptera, grep("Plecotus", chiroptera$tip.label))
In ggtree, coloring phylogenetic tree is easy, by using aes(color=VAR)
to map the color of tree based on a specific variable (numeric and category are both supported).
ggtree(tree, aes(color=branch.length)) +
scale_color_continuous(low="green", high="red") +
theme(legend.position="bottom")
User can use any feature, including clade posterior and dN/dS etc., to scale the color of the tree.
ggtree implements annotation_clade
and annotation_clade2
functions to annotate a selected clade with a bar indicating that clade with a corresponding label.
The annotation_clade
function accepts a selected internal node number and annotates that selected clade, while annotation_clade2
functions accepts two tip labels (upper one and lower one) to annotate the clade.
User can use geom_text
to display all the node numbers, and select interesting clade to annotate.
ggtree(tree) + geom_text(aes(label=node))
p <- ggtree(tree) + geom_tiplab()
annotation_clade(p, node=17, "selected clade", offset.text=2)
annotation_clade2(p, "B", "E", "Clade X", offset.text=2) %>%
annotation_clade2("G", "H", "Clade Y", bar.size=4, font.size=8, offset=5, offset.text=4, color="steelblue")
The parameter bar.size
is used to control the width of the bar and the font.size
parameter is to control the font size of the clade lable. The parameter offset
is used to control the distance from the annotation to the tree, while offset.text
to control the distance from clade label to bar.
ggtree implements hilight
function, that accepts tree view and internal node number and add a layer of rectangle to highlight the selected clade.
ggtree(tree) %>% hilight(node=21, fill="steelblue", alpha=.6) %>%
hilight(node=17, fill="darkgreen", alpha=.6)
ggtree(tree, layout="fan") %>% hilight(node=21, fill="steelblue", alpha=.6) %>%
hilight(node=23, fill="darkgreen", alpha=.6)
Another way to highlight selected clades is setting the clades with different colors and/or line types as demonstrated in group clades
section.
With collapse
function, user can collapse a selected clade.
cp <- ggtree(tree) %>% collapse(node=21)
cp + geom_point(subset=.(node == 21), size=5, shape=23, fill="steelblue")
The collapsed clade can be expanded via expand
function.
cp %>% expand(node=21)
The positions of two selected branches can be flip over using flip function.
set.seed(2015-06-30)
p1 <- ggtree(rtree(30)) + geom_text(aes(label=node))
p2 <- flip(p1, node1=45, node2=33)
p3 <- flip(p2, 32, 58)
grid.arrange(p1, p2, p3, ncol=3)
A selected clade can be rotated by 180 degree using rotate function.
set.seed(2015-07-01)
p1 <- ggtree(rtree(30)) + geom_text(aes(label=node))
p1 <- hilight(p1, 33)
p2 <- rotate(p1, 33)
grid.arrange(p1, p2, ncol=2)
ggtree provides groupOTU
function to group tips and all their related ancestors.
tree <- groupOTU(tree, focus=c("A", "B", "C", "D", "E"))
ggtree(tree, aes(color=group)) + geom_tiplab()
groupOTU
can also input a list of tip groups.
cls <- list(c1=c("A", "B", "C", "D", "E"),
c2=c("F", "G", "H"),
c3=c("L", "K", "I", "J"),
c4="M")
tree <- groupOTU(tree, cls)
library("colorspace")
ggtree(tree, aes(color=group, linetype=group)) + geom_text(aes(label=label), hjust=-.25) +
scale_color_manual(values=c("black", rainbow_hcl(4))) + theme(legend.position="right")