The ggtree package should not be viewed solely as a standalone software. While it is useful for viewing, annotating and manipulating phylogenetic trees, it is also an infrastructure that enables evolutionary evidences that inferred by commonly used software packages in the field to be used in R. For instance, dN/dS values or ancestral sequences inferred by CODEML1, clade support values (posterior) inferred by BEAST2 and short read placement by EPA3 and pplacer4. These evolutionary evidences are not only used in annotating phylogenetic tree in ggtree but can also be further analyzed in R.

Supported File Formats

Most of the tree viewer software (including R packages) focus on Newick and Nexus file formats, while there are file formats from different evolution analysis software that contain supporting evidences within the file that are ready for annotating a phylogenetic tree. The ggtree package define several parser functions and S4 classes to store statistical evidences inferred by commonly used software packages. It supports several file formats, including:

and software output from:

Parser functions

The ggtree package implement several parser functions, including:

S4 classes

Correspondingly, ggtree defines several S4 classes to store evolutionary evidences inferred by these software packages, including:

The jplace class is also designed to store user specified annotation data.

Here is an overview of these S4 classes:

In addition, ggtree also supports phylo, multiPhylo (defined by ape10), phylo4, phylo4d (defined by phylobase) obkData (defined in OutbreakTools) and phyloseq (defined in phyloseq).

In ggtree, tree objects can be merged and evidences inferred from different phylogenetic analyses can be combined or compared and visualized.

Viewing a phylogenetic tree in ggtree is easy by using the command ggtree(tree_object) and annotating a phylogenetic tree is simple by adding graphic layers using the grammar of graphics.

For each class, we defined get.fields method to get the annotation features that available in the object that can be used to annotate a phylogenetic tree directly in ggtree. A get.tree method can be used to convert tree object to phylo (or multiPhylo for r8s) object that are widely supported by other R packages.

The groupOTU method is used for clustering related OTUs (from tips to their most recent common ancestor). Related OTUs are not necessarily within a clade, they can be distantly related. groupOTU works fine for monophyletic (clade), polyphyletic and paraphyletic, while groupClade only works for clade (monophyletic). These methods are useful for clustering related OTUs or clades.

The fortify method is used to convert tree object to a data.frame which is familiar by R users and easy to manipulate. The output data.frame contains tree information and all evolutionary evidences (if available, e.g. dN/dS in codeml object).

Detail descriptions of slots defined in each class are documented in class man pages. Users can use class?className (e.g. class?beast) to access man page of a class.

Getting Tree Data into R

Parsing BEAST output

file <- system.file("extdata/BEAST", "beast_mcc.tree", package="ggtree")
beast <- read.beast(file)
beast
## 'beast' S4 object that stored information of
##   '/tmp/Rtmpmw360M/Rinst2ce3898b1a3/ggtree/extdata/BEAST/beast_mcc.tree'.
## 
## ...@ tree: 
## Phylogenetic tree with 15 tips and 14 internal nodes.
## 
## Tip labels:
##  A_1995, B_1996, C_1995, D_1987, E_1996, F_1997, ...
## 
## Rooted; includes branch lengths.
## 
## with the following features available:
##   'height',  'height_0.95_HPD',  'height_median',    'height_range', 'length',
##   'length_0.95_HPD', 'length_median',    'length_range', 'posterior',    'rate',
##   'rate_0.95_HPD',   'rate_median',  'rate_range'.

Since % is not a valid character in names, all the feature names that contain x% will convert to 0.x. For example, length_95%_HPD will be changed to length_0.95_HPD.

The get.fields method return all available features that can be used for annotation.

get.fields(beast)
##  [1] "height"          "height_0.95_HPD" "height_median"  
##  [4] "height_range"    "length"          "length_0.95_HPD"
##  [7] "length_median"   "length_range"    "posterior"      
## [10] "rate"            "rate_0.95_HPD"   "rate_median"    
## [13] "rate_range"

Users can use ggtree(beast) to visualize the tree and add layer to annotate it.

ggtree(beast, ndigits=2, branch.length = 'none') + geom_text(aes(x=branch, label=length_0.95_HPD), vjust=-.5, color='firebrick')