You can’t even begin to understand biology, you can’t understand life, unless you understand what it’s all there for, how it arose - and that means evolution. — Richard Dawkins
If you use ggtree in published research, please cite:
G Yu, D Smith, H Zhu, Y Guan, TTY Lam,
ggtree: an R package for visualization and annotation of phylogenetic tree with different types of meta-data.
submitted.
This project arose from our needs to annotate nucleotide substitutions in the phylogenetic tree, and we found that there is no tree visualization software can do this easily. Existing tree viewers are designed for displaying phylogenetic tree, but not annotating it. Although some tree viewers can displaying bootstrap values in the tree, it is hard/impossible to display other information in the tree. Our first solution for displaying nucleotide substituitions in the tree is to add this information in the node/tip names and use traditional tree viewer to show it. We displayed the information in the tree successfully, but We believe this indirect approach is inefficient.
In the old day, phylogenetic tree is often small. At that time, as we almost didn’t have a need to annotate a tree; displaying the evolution relationships is mostly enough. Nowadays, we can obtain a lot of data from different experiments, and we want to associate our data, for instance antigenic change, with the evolution relationship. Visualizing these associations in the phylogenetic tree can help us to identify evolution patterns. We believe we need a next generation tree viewer that should be programmable and extensible. It can view a phylogenetic tree easily as we did with classical software and support adding annotation data in a layer above the tree. This is the objective of developing the ggtree. Common tasks of annotating a phylogenetic tree should be easy and complicated tasks can be possible to achieve by adding multiple layers of annotation.
The ggtree is designed by extending the ggplot21 package. It is based on the grammar of graphics and takes all the good parts of ggplot2. There are other R packages that implement tree viewer using ggplot2, including OutbreakTools, phyloseq2 and ggphylo; they mostly create complex tree view functions for their specific needs. Internally, these packages interpret a phylogenetic as a collection of lines
, which makes it hard to annotate diverse user input that are related to node (taxa). The ggtree is different to them by interpreting a tree as a collection of taxa
and allowing general flexibilities of annotating phylogenetic tree with diverse types of user inputs.
ggtree
ggtree extend ggplot
to support viewing phylogenetic tree. It implements geom_tree
layer for displaying phylogenetic trees, as shown below:
nwk <- system.file("extdata", "sample.nwk", package="ggtree")
x <- readLines(nwk)
cat(substring(x, 1, 56), "\n", substring(x, 57), "\n")
## (((((((A:4,B:4):6,C:5):8,D:6):3,E:21):10,((F:4,G:12):14,
## H:8):13):13,((I:5,J:2):30,(K:11,L:11):2):17):4,M:56);
library("ggplot2")
library("ggtree")
tree <- read.tree(nwk)
ggplot(tree, aes(x, y)) + geom_tree() + theme_tree() + xlab("") + ylab("")
This example tree was obtained from Chapter 34 of Inferring Phylogenies3. The function, ggtree
, was implemented as a short cut to visualize a tree, and it works exactly the same as shown above.
ggtree
takes all the advantages of ggplot2
. For example, we can change the color, size and type of the lines as we do with ggplot2
.
ggtree(tree, color="steelblue", size=0.5, linetype="dotted")
By default, the tree is viewing in ladderize form, user can set the parameter ladderize = FALSE
to disable it.
ggtree(tree, ladderize=FALSE)
The branch.length
is used to scale the edge, user can set the parameter branch.length = "none"
to only viewing the tree topology (cladogram) or other numerical variable to scale the tree (e.g. dN/dS).
ggtree(tree, branch.length="none")
ggtree defined several S4 classes to store phylogenetic object and its associated annotation, including:
beast
codeml_mlc
codeml
hyphy
jplace
paml_rst
raxml
r8s
In addition, it also supports phylo
(defined by ape4), and phylo4
(defined by phylobase)
User can use ggtree(object)
command to view the phylogenetic tree directly, and annotation data stored in these objects can be added as demonstrated in Tree annotation with output from evolution software
session.
Currently, ggtree
supports several layout, including:
rectangular
(by default)slanted
fan
or circular
for Phylogram
(by default) and Cladogram
if user explicitly setting branch.length='none'
.
And unrooted
layout.
Unrooted layout was implemented by the equal-angle algorithm
that described in Inferring Phylogenies3.
library("gridExtra")
grid.arrange(ggtree(tree) + ggtitle("(Phylogram) rectangular layout"),
ggtree(tree, branch.length='none') + ggtitle("(Cladogram) rectangular layout"),
ggtree(tree, layout="slanted") + ggtitle("(Phylogram) slanted layout"),
ggtree(tree, layout="slanted", branch.length='none') + ggtitle("(Cladogram) slanted layout"),
ggtree(tree, layout="circular") + ggtitle("(Phylogram) circular layout"),
ggtree(tree, layout="circular", branch.length="none") + ggtitle("(Cladogram) circular layout"),
ggtree(tree, layout="unrooted") + ggtitle("unrooted layout"),
ncol=2)
ggtree implemented 2 dimensional tree. It accepts parameter yscale
to scale the y-axis based on the selected tree attribute. The attribute should be numerical variable. If it is character/category variable, user should provides a name vector of mapping the variable to numeric by passing it to parameter yscale_mapping
.
tree2d <- read.beast(system.file("extdata", "twoD.tree", package="ggtree"))
ggtree(tree2d, mrsd = "2014-05-01",
yscale="NGS", yscale_mapping=c(N2=2, N3=3, N4=4, N5=5, N6=6, N7=7)) +
theme_classic() +
theme(panel.grid.major=element_line(color="grey20", linetype="dotted", size=.3),
panel.grid.major.y=element_blank()) +
scale_y_continuous(labels=paste0("N", 2:7))
In this example, the figure demonstrates the quantity of y increase along the trunk. User can highlight the trunk with different line size or color using the functions we described below.
To show evolution distance, user can use add_legend
function.
ggtree(tree) %>% add_legend()
We can also use theme_tree2()
or ggtree(showDistance=TRUE)
ggtree(tree) + theme_tree2()
Show all the internal nodes and tips in the tree can be done by adding a layer of points using geom_nodepoint
, geom_tippoint
or geom_point
.
ggtree(tree)+geom_point(aes(shape=isTip, color=isTip), size=3)
p <- ggtree(tree) + geom_nodepoint(color="#b5e521", alpha=1/4, size=10)
p + geom_tippoint(color="#FDAC4F", shape=8, size=3)
Users can use geom_text
to display the node/tip labels:
p + geom_text(aes(label=label), size=3, color="purple", hjust=-0.3)
For circular
and unrooted
layout, ggtree supports rotating node labels according to the angles of the branches.
ggtree(tree, layout="circular") + geom_text(aes(label=label, angle=angle), size=3, color="purple", vjust=-0.3)
By default, the positions are based on the node positions, we can change them to based on the middle of the branch/edge.
p + geom_text(aes(x=branch, label=label), size=3, color="purple", vjust=-0.3)
Based on the middle of branches is very useful when annotating transition from parent node to child node.
theme_tree()
defined a totally blank canvas, while theme_tree2()
add phylogenetic distance legend. These two themes all accept a parameter of bgcolor
that defined the background color.
grid.arrange(
ggtree(rtree(30), color="red") + theme_tree("steelblue"),
ggtree(rtree(20), color="white") + theme_tree("black"),
ncol=2)
In the display nodes/tips section, we have a p
object that stored the tree viewing of 13 tips and internal nodes highlighted with specific colored big dots. If you want to applied this pattern (we can imaging a more complex one) to a new tree, you don’t need to build the tree step by step. ggtree provides an operator, %<%
, for applying the visualization pattern to a new tree.
For example, the pattern in the p
object will be applied to a new tree with 50 tips as shown below:
p %<% rtree(50)
Another example can be found in CODEML
session.
ggtree provides gzoom
function that similar to zoom
function provided in ape. This function plots simultaneously a whole phylogenetic tree and a portion of it. It aims at exploring very large trees.
library("ape")
data(chiroptera)
library("ggtree")
gzoom(chiroptera, grep("Plecotus", chiroptera$tip.label))
In ggtree, coloring phylogenetic tree is easy, by using aes(color=VAR)
to map the color of tree based on a specific variable (numeric and category are both supported).
ggtree(tree, aes(color=branch.length)) +
scale_color_continuous(low="green", high="red") +
theme(legend.position="bottom")
User can use any feature, including clade posterior and dN/dS etc., to scale the color of the tree.
ggtree implements annotation_clade
and annotation_clade2
functions to annotate a selected clade with a bar indicating that clade with a corresponding label.
The annotation_clade
function accepts a selected internal node number and annotates that selected clade, while annotation_clade2
functions accepts two tip labels (upper one and lower one) to annotate the clade.
User can use geom_text
to display all the node numbers, and select interesting clade to annotate.
ggtree(tree) + geom_text(aes(label=node))
p <- ggtree(tree) + geom_tiplab()
annotation_clade(p, node=17, "selected clade", offset.text=2)
annotation_clade2(p, "B", "E", "Clade X", offset.text=2) %>%
annotation_clade2("G", "H", "Clade Y", bar.size=4, font.size=8, offset=5, offset.text=4, color="steelblue")
The parameter bar.size
is used to control the width of the bar and the font.size
parameter is to control the font size of the clade lable. The parameter offset
is used to control the distance from the annotation to the tree, while offset.text
to control the distance from clade label to bar.
ggtree implements hilight
function, that accepts tree view and internal node number and add a layer of rectangle to highlight the selected clade.
ggtree(tree) %>% hilight(node=21, fill="steelblue", alpha=.6) %>%
hilight(node=17, fill="darkgreen", alpha=.6)
ggtree(tree, layout="fan") %>% hilight(node=21, fill="steelblue", alpha=.6) %>%
hilight(node=23, fill="darkgreen", alpha=.6)
Another way to highlight selected clades is setting the clades with different colors and/or line types as demonstrated in group clades
section.
With collapse
function, user can collapse a selected clade.
cp <- ggtree(tree) %>% collapse(node=21)
cp + geom_point(subset=.(node == 21), size=5, shape=23, fill="steelblue")
The collapsed clade can be expanded via expand
function.
cp %>% expand(node=21)
The positions of two selected branches can be flip over using flip function.
set.seed(2015-06-30)
p1 <- ggtree(rtree(30)) + geom_text(aes(label=node))
p2 <- flip(p1, node1=45, node2=33)
p3 <- flip(p2, 32, 58)
grid.arrange(p1, p2, p3, ncol=3)
A selected clade can be rotated by 180 degree using rotate function.
set.seed(2015-07-01)
p1 <- ggtree(rtree(30)) + geom_text(aes(label=node))
p1 <- hilight(p1, 33)
p2 <- rotate(p1, 33)
grid.arrange(p1, p2, ncol=2)
ggtree provides groupOTU
function to group tips and all their related ancestors.
tree <- groupOTU(tree, focus=c("A", "B", "C", "D", "E"))
ggtree(tree, aes(color=group)) + geom_tiplab()
groupOTU
can also input a list of tip groups.
cls <- list(c1=c("A", "B", "C", "D", "E"),
c2=c("F", "G", "H"),
c3=c("L", "K", "I", "J"),
c4="M")
tree <- groupOTU(tree, cls)
library("colorspace")
ggtree(tree, aes(color=group, linetype=group)) + geom_text(aes(label=label), hjust=-.25) +
scale_color_manual(values=c("black", rainbow_hcl(4))) + theme(legend.position="right")
groupOTU
also work with tree view (ggplot2 object).
p <- ggtree(tree)
groupOTU(p, LETTERS[1:5]) + aes(color=group) + geom_tiplab() + scale_color_manual(values=c("black", "firebrick"))
In this example, we first build a tree based on the iris data, then grouping the tree based on different spacies.
data(iris)
rn <- paste0(iris[,5], "_", 1:150)
rownames(iris) <- rn
d_iris <- dist(iris[,-5], method="man")
tree_iris <- bionj(d_iris)
tree_iris <- groupOTU(tree_iris, list(setosa = rn[1:50],
versicolor = rn[51:100],
virginica_145 = rn[101:150]))
cols <- rainbow_hcl(4)
ggtree(tree_iris, aes(color=group)) +
geom_text(aes(label=label), hjust=-.1) +
scale_color_manual(values=cols, breaks=1:3,
labels=c("Setosa", "Versicolor", "Virginica")) +
theme(legend.position="right")
This example demonstrates how the separation of the bionj
is very good with the setosa
species, but misses in labeling several versicolor
and virginica
species.
As demonstrated above, groupOTU
is used for clustering related OTUs. Related OTUs are not necessarily within a clade, they can be distantly related as demonstrated in iris example
. groupOTU
works fine for monophyletic (clade), polyphyletic and paraphyletic. If user wants to hilight a specific clade, we provides a more friendly function groupClade
that accept an internal node or a vector of internal nodes to cluster clade/clades that works similar to groupOTU
. User can also use hilight
function demonstrated in hilight clades
section for highlighting selected clades.
tree <- groupClade(tree, node=21)
ggtree(tree, aes(color=group, linetype=group))
tree <- groupClade(tree, node=c(21, 17))
ggtree(tree, aes(color=group, linetype=group))
It also works with tree view, just like groupOTU
.
With groupOTU
and groupClade
, it’s easy to highlight selected taxa and easy to selecte taxa to display related feature.
ggtree(tree, aes(color=group, linetype=group)) +
geom_text(aes(label=label), subset=.(group==2), hjust = -.5) +
geom_text(aes(label=label), subset=.(group==1), hjust = -.5, color="blue")
In collapse clade
, we have illustrated how to collapse selected clades. Another approach is to zoom out clade to a small scale.
grid.arrange(ggtree(tree) %>% hilight(21, "steelblue"),
ggtree(tree) %>% scaleClade(21, scale=0.3) %>% hilight(21, "steelblue"),
ncol=2)
Of course, scaleClade
can accept scale
larger than 1 and zoom in the selected portion.
grid.arrange(ggtree(tree) %>% hilight(17, fill="steelblue") %>%
hilight(21, fill="darkgreen"),
ggtree(tree) %>% scaleClade(17, scale=2) %>% scaleClade(21, scale=0.3) %>%
hilight(17, "steelblue") %>% hilight(21, fill="darkgreen"),
ncol=2)
PhyloPic is a database that stores reusable silhouette images of organisms. ggtree supports downloading images from PhyloPic and annotating phylogenetic tree with the downloaded images.
pp <- ggtree(tree) %>% phylopic("79ad5f09-cf21-4c89-8e7d-0c82a00ce728", color="steelblue", alpha = .3)
print(pp)
pp %>% phylopic("67382184-5135-4faa-8e98-eadff02c3e8a", color="#86B875", alpha=.8, node=4) %>%
phylopic("d3563b54-780f-4711-a49a-7ea051e9dacc", color="darkcyan", alpha=.8, node=17, width=.2)
In ggtree, we implemented several functions to parse the output from PAML5, HYPHY6, EPA7, PPLACER8 and BEAST9 and defined several classes to store phylogenetic object and associated annotation.
Classes include:
paml_rst
for rst
file obtained by PAML5, including BASEML
and CODEML
.codeml_mlc
for mlc
file obtained by CODEML
.codeml
for interpreting rst
and mlc
files obtained by CODEML
.hyphy
for HYPHY6 output.jplace
for EPA7 and PPLACER8 output.beast
for BEAST9jplace
class is also designed to store user specific annotation data, and serves as a standard format for tree annotation within the ggtree package. Please refer to the jplace file format
session.
For each classes, we defined read.className
to parse input file and output a corresponding object, get.fields
method to get the annotation features available in the object, access methods to get these features, and plot
methods for quickly viewing these annotation features.
file <- system.file("extdata/BEAST", "beast_mcc.tree", package="ggtree")
beast <- read.beast(file)
beast
## 'beast' S4 object that stored information of
## '/tmp/RtmpDhRuy5/Rinst4d99721ed4e4/ggtree/extdata/BEAST/beast_mcc.tree'.
##
## ...@ tree:
## Phylogenetic tree with 15 tips and 14 internal nodes.
##
## Tip labels:
## A_1995, B_1996, C_1995, D_1987, E_1996, F_1997, ...
##
## Rooted; includes branch lengths.
##
## with the following features available:
## 'height', 'height_0.95_HPD', 'height_median', 'height_range', 'length',
## 'length_0.95_HPD', 'length_median', 'length_range', 'posterior', 'rate',
## 'rate_0.95_HPD', 'rate_median', 'rate_range'.
Since %
is not a valid character in names
, all the feature names that contain x%
will convert to 0.x
. For example, length_95%_HPD
will be changed to length_0.95_HPD
.
plot(beast, annotation="length_0.95_HPD", branch.length="none") + theme_tree()
User can round the digits by setting the parameter ndigits
. The default value is 2.
plot(beast, annotation="height", ndigits=3, annotation.color="red")
rst
filerst
file from baseml
is similar to codeml
output. The only difference is the space in the sequences. For baseml
, each ten bases are separated by one space, while for codeml
, each three bases (triplet) are separated by one space. We defined a read.paml_rst
to parse rst
file. It supports baseml
and codeml
output. The information will be stored in paml_rst
S4 object.
rstfile <- system.file("extdata/PAML_Baseml", "rst", package="ggtree")
rst <- read.paml_rst(rstfile)
rst
## 'paml_rst' S4 object that stored information of
## '/tmp/RtmpDhRuy5/Rinst4d99721ed4e4/ggtree/extdata/PAML_Baseml/rst'.
##
## ...@ tree:
## Phylogenetic tree with 15 tips and 13 internal nodes.
##
## Tip labels:
## A, B, C, D, E, F, ...
## Node labels:
## 16, 17, 18, 19, 20, 21, ...
##
## Unrooted; includes branch lengths.
##
## with the following features available:
## 'marginal_subs', 'joint_subs', 'marginal_AA_subs', 'joint_AA_subs'.
The function read.paml_rst
can accept only one parameter, rstfile
, and the output can be used to view the phylogeny and the substitution annotation.
p <- plot(rst, annotation="marginal_AA_subs", annotation.color="steelblue")
print(p)
rst
file from CODEML
is similar to BASEML
, and also parsed by read.paml_rst
function. The plot
method works also in the same way.
If you remember the %<%
operator introduced in update tree viewing with a new tree
session, you can use it to update a tree view with a new object.
In last session, we use rstfile
of BASEML
to build a tree view with amino acid substitution annotated. The following example use another rstfile
from CODEML
to update the tree view.
rstfile <- system.file("extdata/PAML_Codeml", "rst", package="ggtree")
rst <- read.paml_rst(rstfile)
p %<% rst
You can found that these two figures have different evolution distances, and substitutions inferred from BASEML
and CODEML
are slightly different.
mlcfile
contains dN/dS
estimation.
mlcfile <- system.file("extdata/PAML_Codeml", "mlc", package="ggtree")
mlc <- read.codeml_mlc(mlcfile)
mlc
## 'codeml_mlc' S4 object that stored information of
## '/tmp/RtmpDhRuy5/Rinst4d99721ed4e4/ggtree/extdata/PAML_Codeml/mlc'.
##
## ...@ tree:
## Phylogenetic tree with 15 tips and 13 internal nodes.
##
## Tip labels:
## A, B, C, D, E, F, ...
## Node labels:
## 16, 17, 18, 19, 20, 21, ...
##
## Unrooted; includes branch lengths.
##
## with the following features available:
## 't', 'N', 'S', 'dN_vs_dS', 'dN', 'dS', 'N_x_dN', 'S_x_dS'.
Please aware that /
and *
are not valid characters in names
, they were changed to _vs_
and _x_
respectively.
So dN_vs_dS
is dN/dS
, N_x_dN
is N*dN
, and S_x_dS
is S*dS
.
plot(mlc, branch.length="branch.length", annotation="dN_vs_dS", annotation.color="blue", ndigits=3)
The paramter branch.length
can be one of available annotations:
get.fields(mlc)
## [1] "t" "N" "S" "dN_vs_dS" "dN" "dS"
## [7] "N_x_dN" "S_x_dS"
For example, if we set branch.length
to dN_vs_dS
, it will plot the \(\omega\) (dN/dS
) tree:
plot(mlc, branch.length="dN_vs_dS", annotation="dN_vs_dS", ndigits=3)
We can also plot the dN
or dS
tree and others. The parameter annotation
can also be one of the available annotations.
CODEML
output: rst and mlc filesWe annotate the tree with information presented in rstfile
and mlcfile
separately as demonstrated in previous sessions.
We can also use both of them and it’s highly recommended. All the features in both files are available for annotation.
ml <- read.codeml(rstfile, mlcfile)
ml
## 'codeml' S4 object that stored information of
## '/tmp/RtmpDhRuy5/Rinst4d99721ed4e4/ggtree/extdata/PAML_Codeml/rst' and
## '/tmp/RtmpDhRuy5/Rinst4d99721ed4e4/ggtree/extdata/PAML_Codeml/mlc'.
##
## ...@ tree:
## Phylogenetic tree with 15 tips and 13 internal nodes.
##
## Tip labels:
## A, B, C, D, E, F, ...
## Node labels:
## 16, 17, 18, 19, 20, 21, ...
##
## Unrooted; includes branch lengths.
##
## with the following features available:
## 'marginal_subs', 'joint_subs', 'marginal_AA_subs', 'joint_AA_subs',
## 't', 'N', 'S', 'dN_vs_dS',
## 'dN', 'dS', 'N_x_dN', 'S_x_dS',
## .
So we can annotate dN/dS
with the tree in rstfile
and amino acid substitutions with the tree in mlcfile
.
plot(ml, branch.length="rst.branch.length", annotation="dN_vs_dS")
plot(ml, branch.length="mlc.branch.length", annotation="marginal_AA_subs")
plot(ml, branch.length="dN", annotation="joint_AA_subs", annotation.color="darkgreen")
nwk <- system.file("extdata/HYPHY", "labelledtree.tree", package="ggtree")
ancseq <- system.file("extdata/HYPHY", "ancseq.nex", package="ggtree")
tipfas <- system.file("extdata", "pa.fas", package="ggtree")
hy <- read.hyphy(nwk, ancseq, tipfas)
hy
## 'hyphy' S4 object that stored information of
## '/tmp/RtmpDhRuy5/Rinst4d99721ed4e4/ggtree/extdata/HYPHY/labelledtree.tree',
## '/tmp/RtmpDhRuy5/Rinst4d99721ed4e4/ggtree/extdata/HYPHY/ancseq.nex' and
## '/tmp/RtmpDhRuy5/Rinst4d99721ed4e4/ggtree/extdata/pa.fas'.
##
## ...@ tree:
## Phylogenetic tree with 15 tips and 13 internal nodes.
##
## Tip labels:
## K, N, D, L, J, G, ...
## Node labels:
## Node1, Node2, Node3, Node4, Node5, Node12, ...
##
## Unrooted; includes branch lengths.
##
## with the following features available:
## 'subs', 'AA_subs'.
plot(hy, annotation="AA_subs")
raxml_file <- system.file("extdata/RAxML", "RAxML_bipartitionsBranchLabels.H3", package="ggtree")
raxml <- read.raxml(raxml_file)
plot(raxml)
r8s output contains 3 trees, namely TREE
, RATO
and PHYLO
for time tree, rate tree and absolute substitution tree respectively.
r8s <- read.r8s(system.file("extdata/r8s", "H3_r8s_output.log", package="ggtree"))
ggtree(r8s, branch.length="TREE", mrsd="2014-01-01") + theme_tree2()
branch.length
should be one of TREE
, RATO
or PHYLO
for selecting the corresponding tree.
User can also view 3 trees simultaneously.
ggtree(get.tree(r8s)) + facet_wrap(~.id, scales="free_x") + theme_tree2()
EPA7 and PPLACER8 have common output file format, jplace
.
jpf <- system.file("extdata/sample.jplace", package="ggtree")
jp <- read.jplace(jpf)
print(jp)
## 'jplace' S4 object that stored information of
## '/tmp/RtmpDhRuy5/Rinst4d99721ed4e4/ggtree/extdata/sample.jplace'.
##
## ...@ tree:
## Phylogenetic tree with 13 tips and 12 internal nodes.
##
## Tip labels:
## A, B, C, D, E, F, ...
##
## Rooted; includes branch lengths.
##
## with the following features availables:
## 'edge_num', 'likelihood', 'like_weight_ratio', 'distal_length', 'pendant_length'.
In ggtree, we provide get.placements
method to access the placement.
## get only best hit
get.placements(jp, by="best")
## name edge_num likelihood like_weight_ratio distal_length pendant_length
## 1 AA 24 -61371.30 0.333344 3e-06 0.003887
## 2 BB 1 -61312.21 0.333335 1e-06 0.000003
## 3 BB 1 -61312.21 0.333335 1e-06 0.000003
## 4 BB 2 -61312.21 0.333322 3e-06 0.000003
## 5 CC 8 -61312.23 0.200011 1e-06 0.000003
## 6 CC 8 -61312.23 0.200011 1e-06 0.000003
## 7 CC 9 -61312.23 0.200000 3e-06 0.000003
## get all placement
get.placements(jp, by="all")
## name edge_num likelihood like_weight_ratio distal_length pendant_length
## 1 AA 24 -61371.30 0.333344 0.000003 0.003887
## 2 BB 1 -61312.21 0.333335 0.000001 0.000003
## 3 BB 2 -61312.21 0.333322 0.000003 0.000003
## 4 BB 550 -61312.21 0.333322 0.000961 0.000003
## 5 CC 8 -61312.23 0.200011 0.000001 0.000003
## 6 CC 9 -61312.23 0.200000 0.000003 0.000003
## 7 CC 10 -61312.23 0.199992 0.000003 0.000003
This is only a tiny sample file. In reality, EPA and PPLACER may place thousands of short reads on a reference tree.
We may, for example, count the number of placement and annotate this information in the tree. We do not provide a plot
method for jplace
object, since we use this file format as a standard annotation format in ggtree package and have no assumption of information it may stored. Please refer to jplace file format session.
In ggtree, objects can be merged and evidences inferred from different phylogenetic analyses can be combined or compared and visualized.
User can use the command like:
tree <- merge_tree(tree_object_1, tree_object_2)
## it's chainable, and serveral tree objects can be merged into one tree object
tree <- merge_tree(tree_object_1, tree_object_2) %>% merge_tree(tree_object_3) %>% merge_tree(tree_object_4)
Here we use a tree analyzed by BEAST and CodeML and merge them into one.
beast_file <- system.file("examples/MCC_FluA_H3.tree", package="ggtree")
beast_tree <- read.beast(beast_file)
rst_file <- system.file("examples/rst", package="ggtree")
mlc_file <- system.file("examples/mlc", package="ggtree")
codeml_tree <- read.codeml(rst_file, mlc_file)
merged_tree <- merge_tree(beast_tree, codeml_tree)
merged_tree
## 'beast' S4 object that stored information of
## '/tmp/RtmpDhRuy5/Rinst4d99721ed4e4/ggtree/examples/MCC_FluA_H3.tree'.
##
## ...@ tree:
## Phylogenetic tree with 76 tips and 75 internal nodes.
##
## Tip labels:
## A/Hokkaido/30-1-a/2013, A/New_York/334/2004, A/New_York/463/2005, A/New_York/452/1999, A/New_York/238/2005, A/New_York/523/1998, ...
##
## Rooted; includes branch lengths.
##
## with the following features available:
## 'height', 'height_0.95_HPD', 'height_median', 'height_range', 'length',
## 'length_0.95_HPD', 'length_median', 'length_range', 'posterior', 'rate',
## 'rate_0.95_HPD', 'rate_median', 'rate_range', 't', 'N',
## 'S', 'dN_vs_dS', 'dN', 'dS', 'N_x_dN',
## 'S_x_dS', 'marginal_subs', 'joint_subs', 'marginal_AA_subs', 'joint_AA_subs',
## .
After merging, all evidences inferred from different tools can be used to annotate the tree simultaneously. In this example, we used ‘dN’ inferred by CodeML to color the tree and annotate the tree with ‘posterior’ inferred by BEAST.
ggtree(merged_tree, aes(color=dN), mrsd="2013-01-01", ndigits = 3) +
geom_text(aes(label=posterior), vjust=.1, hjust=-.1, size=5) + theme_tree2() +
scale_color_continuous(low="green", high="red") + theme(legend.position="right")
ggplot2
layersWe implemented several plot
methods for easily viewing annotation data. Users are not restricted to plot
methods provided. They can use geom_text
to add annotation layer. All annotation data are visible to ggplot2
.
In the following example, we use the codeml
object to visualize the \(\omega\) (dN/dS
) tree, and annotate the tree with dN
and dS
.
ggtree(ml, branch.length="dN_vs_dS") +
geom_text(aes(x=branch, label=dN),
size=3, vjust=-0.5, color="red") +
geom_text(aes(x=branch, label=dS),
size=3, vjust=1.2, color="darkgreen")
%<+%
operatorWe provides several functions to parse and store information from common software output, and corresponding plot
methods for visualizing annotation in the tree.
Here, we would like to demonstrate how to inject user specific annotation data in a tree.
Suppose we have the following data that associated with the tree and would like to attach the data in the tree.
nwk <- system.file("extdata", "sample.nwk", package="ggtree")
tree <- read.tree(nwk)
p <- ggtree(tree)
dd <- data.frame(taxa = LETTERS[1:13],
place = c(rep("GZ", 5), rep("HK", 3), rep("CZ", 4), NA),
value = round(abs(rnorm(13, mean=70, sd=10)), digits=1))
## you don't need to order the data
## data was reshuffled just for demonstration
dd <- dd[sample(1:13, 13), ]
row.names(dd) <- NULL
print(dd)
taxa | place | value |
---|---|---|
D | GZ | 76.8 |
J | CZ | 86.6 |
F | HK | 53.8 |
L | CZ | 81.4 |
G | HK | 66.5 |
H | HK | 67.8 |
B | GZ | 96.9 |
C | GZ | 60.5 |
K | CZ | 49.2 |
I | CZ | 55.4 |
M | NA | 67.6 |
A | GZ | 63.8 |
E | GZ | 64.2 |
We can imaging that the place
column is the place we isolated the species and value
column stored numerical values for example bootstrap values.
We have shown using the operator, %<%
, to update a tree view with a new tree. Here, we will introduce another operator, %<+%
, that attaches annotation data to a tree view. The only requirement of the input data is that its first column should be matched with the node/tip labels of the tree.
After attaching the annotation data to the tree by %<+%
, all the columns in the data are visible to ggplot2
. As an example, here we attach the above annotation data to the tree view, p
, and add a layer that showing the tip labels and colored them by the isolation site stored in place
column.
p <- p %<+% dd + geom_text(aes(color=place, label=label), hjust=-0.5) +
geom_tippoint(aes(size=value, shape=place, color=place), alpha=0.25)
p+theme(legend.position="right")
Once the data was attached, it is always attached. So we can add another layer to display the isolation sites easily.
p <- p + geom_text(aes(color=place, label=place), hjust=1, vjust=-0.4, size=3)
print(p)
And another layer showing numerical values:
p <- p + geom_text(aes(color=place, label=value), hjust=1, vjust=1.4, size=3)
print(p)
jplace
file formatThe jplace
file format was defined by Masten10 for phylogenetic placements. We employed this file format to store phylogenetic tree and user specific annotation data. Suppose we have a tree, and the associated data as shown below:
tree <- system.file("extdata", "pa.nwk", package="ggtree")
data <- read.csv(system.file("extdata", "pa_subs.csv", package="ggtree"), stringsAsFactor=FALSE)
print(tree)
## [1] "/tmp/RtmpDhRuy5/Rinst4d99721ed4e4/ggtree/extdata/pa.nwk"
head(data)
## label subs gc
## 1 A R262K/K603R 0.444
## 2 B R262K 0.442
## 3 C N272S/I354T 0.439
## 4 D K104R/F105L/E382D/F480S/I596T 0.449
## 5 E T208S/V379I/K609R 0.443
## 6 F S277P/L549I 0.444
The data
contains amino acid substitutions from parent node to child node and GC contents of each node. We can annotate the tree as demonstrated in user specific annotation session.
ggtree provides a function, write.jplace
, to combine a tree and an associated data and store them to a single jplace
file.
outfile <- tempfile()
write.jplace(tree, data, outfile)
Then read.jplace
function was designed to read the jplace
file and store the information to a jplace
object.
jp <- read.jplace(outfile)
print(jp)
## 'jplace' S4 object that stored information of
## '/tmp/Rtmp2yKAEM/file528b7c1c83fe'.
##
## ...@ tree:
## Phylogenetic tree with 15 tips and 13 internal nodes.
##
## Tip labels:
## K, N, D, L, J, G, ...
##
## Unrooted; includes branch lengths.
##
## with the following features availables:
## 'label', 'subs', 'gc'.
Now we know the jp
object stored the tree and the associated amino acid substitution and GC content information, we can view the tree and display the associated annotation data on it directly by ggtree
.
ggtree(jp, showDistance=TRUE) +
geom_text(aes(x=branch, label=subs), color="purple", vjust=-1, size=3) +
geom_text(aes(label=gc), color="steelblue", hjust=-.6, size=3) +
geom_text(aes(label=label), hjust=-.5)
At first we implemented gplot
function to visualize tree with heatmap but it has an issue that it can’t always guarantee the heatmap aligning to the tree properly, since the line up is between two figures and it’s currently not supported internally by ggplot2. I have implemented another function gheatmap
that can do the line up properly by creating a new layer above the tree.
In the following example, we visualized a tree of H3 influenza viruses with their associated genotype.
genotype_file <- system.file("examples/Genotype.txt", package="ggtree")
genotype <- read.table(genotype_file, sep="\t", stringsAsFactor=F)
p <- ggtree(beast_tree, mrsd="2013-01-01") %>% add_legend(x=2008, y=5)
p <- p + geom_tiplab(size=3)
gheatmap(p, genotype, offset = 2, width=0.5)
The width parameter is to control the width of the heatmap. It supports another parameter offset for controling the distance between the tree and the heatmap, for instance left space for tip labels.
For time scaled tree, as in this example, it’s more often to use x axis by using theme_tree2
. But with this solution, the heatmap is just another layer and will change the x
axis. To overcome this issue, we implemented scale_x_ggtree
to set the x axis more reasonable. User can also use gplot
and tweak the positions of two plot to align properly.
p <- ggtree(beast_tree, mrsd="2013-01-01") + geom_tiplab(size=3, align=TRUE) + theme_tree2()
## Warning: The plyr::rename operation has created duplicates for the
## following name(s): (`size`)
pp <- (p + scale_y_continuous(expand=c(0, 0.3))) %>%
gheatmap(genotype, offset=4, width=0.5, colnames=FALSE) %>%
scale_x_ggtree()
pp + theme(legend.position="right")
With msaplot
function, user can visualizes multiple sequence alignment with phylogenetic tree, as demonstrated below:
fasta <- system.file("examples/FluA_H3_AA.fas", package="ggtree")
msaplot(ggtree(beast_tree), fasta)
A specific slice of the alignment can also be displayed by specific window parameter.
If you have any, let me know. Thx!
Here is the output of sessionInfo()
on the system on which this document was compiled:
## R version 3.2.2 (2015-08-14)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 14.04.3 LTS
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats4 parallel stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] ggtree_1.0.21 gridExtra_2.0.0 ggplot2_1.0.1
## [4] ape_3.3 Biostrings_2.36.4 XVector_0.8.0
## [7] IRanges_2.2.9 S4Vectors_0.6.6 BiocGenerics_0.14.0
## [10] colorspace_1.2-6 BiocStyle_1.6.0
##
## loaded via a namespace (and not attached):
## [1] Rcpp_0.12.1 formatR_1.2.1 plyr_1.8.3 highr_0.5.1
## [5] tools_3.2.2 zlibbioc_1.14.0 digest_0.6.8 jsonlite_0.9.17
## [9] evaluate_0.8 gtable_0.1.2 nlme_3.1-122 lattice_0.20-33
## [13] png_0.1-7 yaml_2.1.13 proto_0.3-10 stringr_1.0.0
## [17] knitr_1.11 fftwtools_0.9-7 locfit_1.5-9.1 grid_3.2.2
## [21] jpeg_0.1-8 rmarkdown_0.8 reshape2_1.4.1 magrittr_1.5
## [25] scales_0.3.0 htmltools_0.2.6 MASS_7.3-44 abind_1.4-3
## [29] EBImage_4.10.1 tiff_0.1-5 labeling_0.3 stringi_0.5-5
## [33] munsell_0.4.2
1.Wickham, H. Ggplot2: Elegant graphics for data analysis. (Springer, 2009).
2.McMurdie, P. J. & Holmes, S. Phyloseq: An r package for reproducible interactive analysis and graphics of microbiome census data. PLoS ONE 8, e61217 (2013).
3.Felsenstein, J. Inferring phylogenies. (Sinauer Associates, 2003).
4.Paradis, E., Claude, J. & Strimmer, K. APE: Analyses of phylogenetics and evolution in r language. Bioinformatics 20, 289–290 (2004).
5.Yang, Z. PAML 4: Phylogenetic analysis by maximum likelihood. Molecular Biology and Evolution 24, 1586–1591 (2007).
6.Pond, S. L. K., Frost, S. D. W. & Muse, S. V. HyPhy: Hypothesis testing using phylogenies. Bioinformatics 21, 676–679 (2005).
7.Berger, S. A., Krompass, D. & Stamatakis, A. Performance, accuracy, and web server for evolutionary placement of short sequence reads under maximum likelihood. Systematic Biology 60, 291–302 (2011).
8.Matsen, F. A., Kodner, R. B. & Armbrust, E. V. Pplacer: Linear time maximum-likelihood and bayesian phylogenetic placement of sequences onto a fixed reference tree. BMC Bioinformatics 11, 538 (2010).
9.Bouckaert, R. et al. BEAST 2: A software platform for bayesian evolutionary analysis. PLoS Comput Biol 10, e1003537 (2014).
10.Matsen, F. A., Hoffman, N. G., Gallagher, A. & Stamatakis, A. A format for phylogenetic placements. PLoS ONE 7, e31009 (2012).