The ggtree
package should not be viewed solely as a standalone software. While it is useful for viewing, annotating and manipulating phylogenetic trees, it is also an infrastructure that enables evolutionary evidences that inferred by commonly used software packages in the field to be used in R
. For instance, dN/dS values or ancestral sequences inferred by CODEML1, clade support values (posterior) inferred by BEAST2 and short read placement by EPA3 and pplacer4. These evolutionary evidences are not only used in annotating phylogenetic tree in ggtree
but can also be further analyzed in R
.
Most of the tree viewer software (including R
packages) focus on Newick
and Nexus
file formats, while there are file formats from different evolution analysis software that contain supporting evidences within the file that are ready for annotating a phylogenetic tree. The ggtree
package define several parser functions and S4
classes to store statistical evidences inferred by commonly used software packages. It supports several file formats, including:
ape
)ape
)and software output from:
The ggtree
package implement several parser functions, including:
read.beast
for parsing output of BEASEread.codeml
for parsing output of CODEML (rst
and mlc
files)read.codeml_mlc
for parsing mlc
file (output of CODEML
)read.hyphy
for parsing output of HYPHYread.jplace
for parsing jplace
file including output from EPA and pplacerread.nhx
for parsing NHX
file including output from PHYLODOG and RevBayesread.paml_rst
for parsing rst
file (output of BASEML
and CODEML
)read.r8s
for parsing output of r8sread.raxml
for parsing output of RAxMLCorrespondingly, ggtree
defines several S4
classes to store evolutionary evidences inferred by these software packages, including:
apeBootstrap
for bootstrap analysis of ape::boot.phylo()
10, output of apeBoot()
defined in ggtree
beast
for storing output of read.beast()
codeml
for storing output of read.codeml()
codeml_mlc
for storing output of read.codeml_mlc()
hyphy
for storing output of read.hyphy()
jplace
for storing output of read.jplace()
nhx
for storing output of read.nhx()
paml_rst
for rst
file obtained by PAML, including BASEML
and CODEML
.phangorn
for storing ancestral sequences inferred by R
package phangorn
11, output of phyPML()
defined in ggtree
r8s
for storing output of read.r8s()
raxml
for storing output of read.raxml()
The jplace
class is also designed to store user specified annotation data.
Here is an overview of these S4
classes:
In addition, ggtree
also supports phylo
, multiPhylo
(defined by ape
10), phylo4
, phylo4d
(defined by phylobase
) obkData
(defined in OutbreakTools
) and phyloseq
(defined in phyloseq
).
In ggtree
, tree objects can be merged and evidences inferred from different phylogenetic analyses can be combined or compared and visualized.
Viewing a phylogenetic tree in ggtree
is easy by using the command ggtree(tree_object)
and annotating a phylogenetic tree is simple by adding graphic layers using the grammar of graphics.
For each class, we defined get.fields
method to get the annotation features that available in the object that can be used to annotate a phylogenetic tree directly in ggtree
. A get.tree
method can be used to convert tree object to phylo
(or multiPhylo
for r8s
) object that are widely supported by other R
packages.
The groupOTU
method is used for clustering related OTUs (from tips to their most recent common ancestor). Related OTUs are not necessarily within a clade, they can be distantly related. groupOTU
works fine for monophyletic (clade), polyphyletic and paraphyletic, while groupClade
only works for clade (monophyletic). These methods are useful for clustering related OTUs or clades.
The fortify
method is used to convert tree object to a data.frame
which is familiar by R
users and easy to manipulate. The output data.frame
contains tree information and all evolutionary evidences (if available, e.g. dN/dS in codeml
object).
Detail descriptions of slots
defined in each class are documented in class man pages. Users can use class?className
(e.g. class?beast
) to access man page of a class.
file <- system.file("extdata/BEAST", "beast_mcc.tree", package="ggtree")
beast <- read.beast(file)
beast
## 'beast' S4 object that stored information of
## '/tmp/Rtmpmw360M/Rinst2ce3898b1a3/ggtree/extdata/BEAST/beast_mcc.tree'.
##
## ...@ tree:
## Phylogenetic tree with 15 tips and 14 internal nodes.
##
## Tip labels:
## A_1995, B_1996, C_1995, D_1987, E_1996, F_1997, ...
##
## Rooted; includes branch lengths.
##
## with the following features available:
## 'height', 'height_0.95_HPD', 'height_median', 'height_range', 'length',
## 'length_0.95_HPD', 'length_median', 'length_range', 'posterior', 'rate',
## 'rate_0.95_HPD', 'rate_median', 'rate_range'.
Since %
is not a valid character in names
, all the feature names that contain x%
will convert to 0.x
. For example, length_95%_HPD
will be changed to length_0.95_HPD
.
The get.fields
method return all available features that can be used for annotation.
get.fields(beast)
## [1] "height" "height_0.95_HPD" "height_median"
## [4] "height_range" "length" "length_0.95_HPD"
## [7] "length_median" "length_range" "posterior"
## [10] "rate" "rate_0.95_HPD" "rate_median"
## [13] "rate_range"
Users can use ggtree(beast)
to visualize the tree and add layer to annotate it.
ggtree(beast, ndigits=2, branch.length = 'none') + geom_text(aes(x=branch, label=length_0.95_HPD), vjust=-.5, color='firebrick')