Introduction

Phylogenetic trees are commonly used to present evolutionary relationships of species. Information associated with taxon species/strains may be further analyzed in the context of the evolutionary history depicted by the phylogenetic tree. For example, host information of the influenza virus strains in the tree could be studied to understand host range of a virus linage. Moreover, such meta-data (e.g., isolation host, time, location, etc.) directly associated with taxon strains are also often subjected to further evolutionary or comparative phylogenetic models and analyses, to infer their dynamics associated with the evolutionary or transmission processes of the virus. All these meta-data or other phenotypic or experimental data are stored either as the annotation data associated with the nodes or branches, and are often produced in inconsistent format by different analysis programs.

Getting trees in to R is still limited. Newick and Nexus can be imported by several packages, including ape, phylobase. NeXML format can be parsed by RNeXML. However, analysis results from widely used software packages in this field are not well supported. SIMMAP output can be parsed by phyext2 and phytools. Although PHYLOCH can import BEAST and MrBayes output, only internal node attributes were parsed and tip attributes were ignore. Many other software outputs are mainly required programming expertise to import the tree with associated data. Linking external data, including experimental and clinical data, to phylogeny is another obstacle for evolution biologists.

The treeio package defines base classes and functions for phylogenetic tree input and output. It is an infrastructure that enables evolutionary evidences that inferred by commonly used software packages to be used in R. For instance, dN/dS values or ancestral sequences inferred by CODEML (Yang 2007), clade support values (posterior) inferred by BEAST (Bouckaert et al. 2014) and short read placement by EPA (Berger, Krompass, and Stamatakis 2011) and pplacer (Matsen, Kodner, and Armbrust 2010). These evolutionary evidences can be further analyzed in R and used to annotate phylogenetic tree using ggtree (Yu et al. 2017). The growth of analysis tools and models available introduces a challenge to integrate different varieties of data and analysis results from different sources for an integral analysis on the the same phylogenetic tree background. The treeio package provides a merge_tree function to allow combining tree data obtained from different sources. In addition, treeio also enables external data to be linked to phylogenetic tree structure.

Getting tree data from evolutionary analysis result

To fill the gap that most of the tree formats or software outputs cannot be easily parsed with the same software/platform, treeio implemented several functions for parsing various tree file formats and outputs of common evolutionary analysis software. Not only the tree structure can be parsed but also the associated data and evolutionary inferences, including NHX annotation, clock rate inferences (from BEAST or r8s (Sanderson 2003) programs), snynonymous and non-synonymous substitutions (from CodeML), and ancestral sequence construction (from HyPhy, BaseML or CodeML), etc..

Currently, treeio is able to read the following file formats: Newick, Nexus, New Hampshire eXtended format (NHX), jplace and Phylip as well as the data outputs from the following analysis programs: BEAST, EPA, HyPhy, MrBayes, PAML, PHYLDOG, pplacer, r8s, RAxML and RevBayes.

Parser functions defined in treeio
Parser function Description
read.beast parsing output of BEAST
read.codeml parsing output of CodeML (rst and mlc files)
read.codeml_mlc parsing mlc file (output of CodeML)
read.hyphy parsing output of HYPHY
read.jplace parsing jplace file including output of EPA and pplacer
read.mrbayes parsing output of MrBayes
read.newick parsing newick string, with ability to parse node label as support values
read.nhx parsing NHX file including output of PHYLDOG and RevBayes
read.paml_rst parsing rst file (output of BaseML or CodeML)
read.phylip parsing phylip file (phylip alignment + newick string)
read.r8s parsing output of r8s
read.raxml parsing output of RAxML

After parsing, storage of the tree structure with associated data is made through a S4 class, treedata, defined in the treeio package. These parsed data are mapped to the tree branches and nodes inside treedata object, so that they can be efficiently used to visually annotate the tree using ggtree package (Yu et al. 2017). treeio provides functions to merge these phylogeny-associated data for comparison and further analysis.

Parsing BEAST output

## 'treedata' S4 object that stored information of
##  '/tmp/Rtmpc0Tpwl/Rinst21857df23c40/treeio/extdata/BEAST/beast_mcc.tree'.
## 
## ...@ phylo: 
## Phylogenetic tree with 15 tips and 14 internal nodes.
## 
## Tip labels:
##  A_1995, B_1996, C_1995, D_1987, E_1996, F_1997, ...
## 
## Rooted; includes branch lengths.
## 
## with the following features available:
##  'height',   'height_0.95_HPD',  'height_median',    'height_range', 'length',
##  'length_0.95_HPD',  'length_median',    'length_range', 'posterior',    'rate',
##  'rate_0.95_HPD',    'rate_median',  'rate_range'.

Since % is not a valid character in names, all the feature names that contain x% will convert to 0.x. For example, length_95%_HPD will be changed to length_0.95_HPD.

The get.fields method return all available features that can be used for annotation.

##  [1] "height"          "height_0.95_HPD" "height_median"  
##  [4] "height_range"    "length"          "length_0.95_HPD"
##  [7] "length_median"   "length_range"    "posterior"      
## [10] "rate"            "rate_0.95_HPD"   "rate_median"    
## [13] "rate_range"

Parsing MrBayes output

## 'treedata' S4 object that stored information of
##  '/tmp/Rtmpc0Tpwl/Rinst21857df23c40/treeio/extdata/MrBayes/Gq_nxs.tre'.
## 
## ...@ phylo: 
## Phylogenetic tree with 12 tips and 10 internal nodes.
## 
## Tip labels:
##  B_h, B_s, G_d, G_k, G_q, G_s, ...
## 
## Unrooted; includes branch lengths.
## 
## with the following features available:
##  'length_0.95HPD',   'length_mean',  'length_median',    'prob', 'prob_percent',
##  'prob+-sd', 'prob_range',   'prob_stddev'.

Parsing PAML output

The read.paml_rst function can parse rst file from BASEML and CODEML. The only difference is the space in the sequences. For BASEML, each ten bases are separated by one space, while for CODEML, each three bases (triplet) are separated by one space.

## 'treedata' S4 object that stored information of
##  '/tmp/Rtmpc0Tpwl/Rinst21857df23c40/treeio/extdata/PAML_Baseml/rst'.
## 
## ...@ phylo: 
## Phylogenetic tree with 15 tips and 13 internal nodes.
## 
## Tip labels:
##  A, B, C, D, E, F, ...
## Node labels:
##  16, 17, 18, 19, 20, 21, ...
## 
## Unrooted; includes branch lengths.
## 
## with the following features available:
##  'subs', 'AA_subs'.

Similarly, we can parse the rst file from CODEML.

## 'treedata' S4 object that stored information of
##  '/tmp/Rtmpc0Tpwl/Rinst21857df23c40/treeio/extdata/PAML_Codeml/rst'.
## 
## ...@ phylo: 
## Phylogenetic tree with 15 tips and 13 internal nodes.
## 
## Tip labels:
##  A, B, C, D, E, F, ...
## Node labels:
##  16, 17, 18, 19, 20, 21, ...
## 
## Unrooted; includes branch lengths.
## 
## with the following features available:
##  'subs', 'AA_subs'.

Ancestral sequences inferred by BASEML or CODEML via marginal or joint ML reconstruction methods will be stored in the S4 object and mapped to tree nodes. treeio will automatically determine the substitutions between the sequences at the both ends of each branch. Amino acid substitution will also be determined by translating nucleotide sequences to amino acid sequences. These computed substitutions will also be stored in the S4 object.

CODEML infers selection pressure and estimated dN/dS, dN and dS. These information are stored in output file mlc, which can be parsed by read.codeml_mlc function.

## 'treedata' S4 object that stored information of
##  '/tmp/Rtmpc0Tpwl/Rinst21857df23c40/treeio/extdata/PAML_Codeml/mlc'.
## 
## ...@ phylo: 
## Phylogenetic tree with 15 tips and 13 internal nodes.
## 
## Tip labels:
##  A, B, C, D, E, F, ...
## Node labels:
##  16, 17, 18, 19, 20, 21, ...
## 
## Unrooted; includes branch lengths.
## 
## with the following features available:
##  't',    'N',    'S',    'dN_vs_dS', 'dN',   'dS',   'N_x_dN',   'S_x_dS'.

In previous session, we separately parsed rst and mlc files. However, they can also be parsed together using read.codeml function.

## 'treedata' S4 object that stored information of
##  '/tmp/Rtmpc0Tpwl/Rinst21857df23c40/treeio/extdata/PAML_Codeml/rst',
##  '/tmp/Rtmpc0Tpwl/Rinst21857df23c40/treeio/extdata/PAML_Codeml/mlc'.
## 
## ...@ phylo: 
## Phylogenetic tree with 15 tips and 13 internal nodes.
## 
## Tip labels:
##  A, B, C, D, E, F, ...
## Node labels:
##  16, 17, 18, 19, 20, 21, ...
## 
## Unrooted; includes branch lengths.
## 
## with the following features available:
##  'subs', 'AA_subs',  't',    'N',    'S',    'dN_vs_dS', 'dN',   'dS',   'N_x_dN',
##  'S_x_dS'.

All the features in both rst and mlc files were imported into a single S4 object and hence are available for further annotation and visualization. For example, we can annotate and display both dN/dS (from mlc file) and amino acid substitutions (derived from rst file) on the same phylogenetic tree.

Parsing HyPhy output

Ancestral sequences inferred by HyPhy are stored in the Nexus output file, which contains the tree topology and ancestral sequences. To parse this data file, users can use the read.hyphy.seq function.

## 13 DNA sequences in binary format stored in a list.
## 
## All sequences of same length: 2148 
## 
## Labels:
## Node1
## Node2
## Node3
## Node4
## Node5
## Node12
## ...
## 
## Base composition:
##     a     c     g     t 
## 0.335 0.208 0.237 0.220

To map the sequences on the tree, user shall also provide an internal-node-labelled tree. If users want to determine substitution, they need also provide tip sequences.

## 'treedata' S4 object that stored information of
##  '/tmp/Rtmpc0Tpwl/Rinst21857df23c40/treeio/extdata/HYPHY/labelledtree.tree'.
## 
## ...@ phylo: 
## Phylogenetic tree with 15 tips and 13 internal nodes.
## 
## Tip labels:
##  K, N, D, L, J, G, ...
## Node labels:
##  Node1, Node2, Node3, Node4, Node5, Node12, ...
## 
## Unrooted; includes branch lengths.
## 
## with the following features available:
##  'subs', 'AA_subs'.

Parsing r8s output

r8s uses parametric, semiparametric and nonparametric methods to relax molecular clock to allow better estimations of divergence times and evolution rates (@ Sanderson 2003). It outputs three trees in log file, namely TREE, RATO and PHYLO for time tree, rate tree and absolute substitution tree respectively.

Time tree is scaled by divergence time, rate tree is scaled by substitution rate and absolute substitution tree is scaled by absolute number of substitution. After parsing the file, all these three trees are stored in a multiPhylo object.

## 3 phylogenetic trees

Parsing output of RAxML bootstraping analysis

RAxML bootstraping analysis output a Newick tree text that is not standard as it stores bootstrap values inside square brackets after branch lengths. This file usually cannot be parsed by traditional Newick parser, such as ape::read.tree. The function read.raxml can read such file and stored the bootstrap as an additional features, which can be used to display on the tree or used to color tree branches, etc..

## 'treedata' S4 object that stored information of
##  '/tmp/Rtmpc0Tpwl/Rinst21857df23c40/treeio/extdata/RAxML/RAxML_bipartitionsBranchLabels.H3'.
## 
## ...@ phylo: 
## Phylogenetic tree with 64 tips and 62 internal nodes.
## 
## Tip labels:
##  A_Hokkaido_M1_2014_H3N2_2014, A_Czech_Republic_1_2014_H3N2_2014, FJ532080_A_California_09_2008_H3N2_2008, EU199359_A_Pennsylvania_05_2007_H3N2_2007, EU857080_A_Hong_Kong_CUHK69904_2006_H3N2_2006, EU857082_A_Hong_Kong_CUHK7047_2005_H3N2_2005, ...
## 
## Unrooted; includes branch lengths.
## 
## with the following features available:
##  'bootstrap'.

Parsing NHX tree

NHX (New Hampshire eXtended) format is an extension of Newick by introducing NHX tags. NHX is commonly used in phylogenetics software (including PHYLDOG (Boussau et al. 2013), RevBayes (Höhna et al. 2014)) for storing statistical inferences. The following codes imported a NHX tree with associated data inferred by PHYLDOG.

## 'treedata' S4 object that stored information of
##  '/tmp/Rtmpc0Tpwl/Rinst21857df23c40/treeio/extdata/NHX/phyldog.nhx'.
## 
## ...@ phylo: 
## Phylogenetic tree with 16 tips and 15 internal nodes.
## 
## Tip labels:
##  Prayidae_D27SS7@2825365, Kephyes_ovata@2606431, Chuniphyes_multidentata@1277217, Apolemia_sp_@1353964, Bargmannia_amoena@263997, Bargmannia_elongata@946788, ...
## 
## Rooted; includes branch lengths.
## 
## with the following features available:
##  'Ev',   'S',    'ND'.

Parsing Phylip tree

Phylip format contains multiple sequence alignment of taxa in Phylip sequence format with corresponding Newick tree text that was built from taxon sequences. Sequence alignment can be sorted based on the tree structure and displayed at the right hand side of the tree using ggtree (Yu et al. 2017).

## 'treedata' S4 object that stored information of
##  '/tmp/Rtmpc0Tpwl/Rinst21857df23c40/treeio/extdata/sample.phy'.
## 
## ...@ phylo: 
## Phylogenetic tree with 15 tips and 13 internal nodes.
## 
## Tip labels:
##  K, N, D, L, J, G, ...
## 
## Unrooted; no branch lengths.

Parsing EPA and pplacer output

EPA (Berger, Krompass, and Stamatakis 2011) and PPLACER (Matsen, Kodner, and Armbrust 2010) have common output file format, jplace, which can be parsed by read.jplace() function.

## 'treedata' S4 object that stored information of
##  '/tmp/Rtmpc0Tpwl/Rinst21857df23c40/treeio/extdata/EPA.jplace'.
## 
## ...@ phylo: 
## Phylogenetic tree with 493 tips and 492 internal nodes.
## 
## Tip labels:
##  CIR000447A, CIR000479, CIR000078, CIR000083, CIR000070, CIR000060, ...
## 
## Rooted; includes branch lengths.
## 
## with the following features available:
##  'nplace'.

The number of evolutionary placement on each branch will be calculated and stored as the nplace feature, which can be mapped to line size and/or color using ggtree (Yu et al. 2017).

Parsing jtree format

The jtree is a JSON based format that was defined in this treeio package to support tree data inter change. Phylogenetic tree with associated data can be exported to a single jtree file using write.jtree function. The jtree can be easily parsed using any JSON parser. The jtree format contains three keys: tree, data and metadata. The tree value contains tree text extended from Newick tree format by putting the edge number in curly braces after branch length. The data value contains node/branch-specific data, while metadata value contains additional meta information.

## 'treedata' S4 object that stored information of
##  '/tmp/RtmpLnnP56/file22a839743f34.jtree'.
## 
## ...@ phylo: 
## Phylogenetic tree with 15 tips and 14 internal nodes.
## 
## Tip labels:
##  K_2013, N_2010, D_1987, L_1980, J_1983, G_1992, ...
## 
## Rooted; includes branch lengths.
## 
## with the following features available:
##  'height',   'height_0.95_HPD',  'height_range', 'length',   'length_0.95_HPD',
##  'length_median',    'length_range', 'rate', 'rate_0.95_HPD',    'rate_median',
##  'rate_range',   'height_median',    'posterior'.

Linking external data to phylogeny

In addition to analysis findings that are associated with the tree as we showed above, there is a wide range of heterogeneous data, including phenotypic data, experimental data and clinical data etc., that need to be integrated and linked to phylogeny. For example, in the study of viral evolution, tree nodes may associated with epidemiological information, such as location, age and subtype. Functional annotations may need to be mapped on gene trees for comparative genomics studies. To facilitate data integration, treeio provides full_join method to link external data to phylogeny and stored in treedata object.

Here are examples of linking external data to a phylogenetic tree. After that, we can use exporter to combine the tree and the data to a single tree file. The data that mapped on the phylogenetic tree can also be used to visualize or annotate the tree using ggtree (Yu et al. 2017).

## 'treedata' S4 object that stored information of
##  '/tmp/Rtmpc0Tpwl/Rinst21857df23c40/treeio/extdata/BEAST/beast_mcc.tree'.
## 
## ...@ phylo: 
## Phylogenetic tree with 15 tips and 14 internal nodes.
## 
## Tip labels:
##  A_1995, B_1996, C_1995, D_1987, E_1996, F_1997, ...
## 
## Rooted; includes branch lengths.
## 
## with the following features available:
##  'height',   'height_0.95_HPD',  'height_median',    'height_range', 'length',
##  'length_0.95_HPD',  'length_median',    'length_range', 'posterior',    'rate',
##  'rate_0.95_HPD',    'rate_median',  'rate_range',   'trait'.
## 'treedata' S4 object that stored information of
##  '/tmp/Rtmpc0Tpwl/Rinst21857df23c40/treeio/extdata/BEAST/beast_mcc.tree'.
## 
## ...@ phylo: 
## Phylogenetic tree with 15 tips and 14 internal nodes.
## 
## Tip labels:
##  A_1995, B_1996, C_1995, D_1987, E_1996, F_1997, ...
## 
## Rooted; includes branch lengths.
## 
## with the following features available:
##  'height',   'height_0.95_HPD',  'height_median',    'height_range', 'length',
##  'length_0.95_HPD',  'length_median',    'length_range', 'posterior',    'rate',
##  'rate_0.95_HPD',    'rate_median',  'rate_range',   'fake_trait',
##  'another_trait'.

Combining tree data

The treeio package serves as an infrastructure that enables various types of phylogenetic data inferred from common analysis programs to be imported and used in R. For instance dN/dS or ancestral sequences estimated by CODEML, and clade support values (posterior) inferred by BEAST/MrBayes. In addition, treeio package supports linking external data to phylogeny. It brings these external phylogenetic data (either from software output or exteranl sources) to the R community and make it available for further analysis in R. Furthermore, treeio can combine multiple phylogenetic trees together into one with their node/branch-specific attribute data. Essentially, as a result, one such attribute (e.g., substitution rate) can be mapped to another attribute (e.g., dN/dS) of the same node/branch for comparison and further computations.

A previously published data set, seventy-six H3 hemagglutinin gene sequences of a lineage containing swine and human influenza A viruses (Liang et al. 2014), was here to demonstrate the utilities of comparing evolutionary statistics inferred by different software. The dataset was re-analyzed by BEAST for timescale estimation and CODEML for synonymous and non-synonymous substitution estimation. In this example, we first parsed the outputs from BEAST using read.beast and from CODEML using read.codeml into two treedata objects. Then the two objects containing separate sets of node/branch-specific data were merged via the merge_tree function.

## 'treedata' S4 object that stored information of
##  '/home/biocbuild/bbs-3.7-bioc/R/library/ggtree/examples/MCC_FluA_H3.tree',
##  '/home/biocbuild/bbs-3.7-bioc/R/library/ggtree/examples/rst',
##  '/home/biocbuild/bbs-3.7-bioc/R/library/ggtree/examples/mlc'.
## 
## ...@ phylo: 
## Phylogenetic tree with 76 tips and 75 internal nodes.
## 
## Tip labels:
##  A/Hokkaido/30-1-a/2013, A/New_York/334/2004, A/New_York/463/2005, A/New_York/452/1999, A/New_York/238/2005, A/New_York/523/1998, ...
## 
## Rooted; includes branch lengths.
## 
## with the following features available:
##  'height',   'height_0.95_HPD',  'height_median',    'height_range', 'length',
##  'length_0.95_HPD',  'length_median',    'length_range', 'posterior',    'rate',
##  'rate_0.95_HPD',    'rate_median',  'rate_range',   'subs', 'AA_subs',  't',    'N',
##  'S',    'dN_vs_dS', 'dN',   'dS',   'N_x_dN',   'S_x_dS'.

After merging the beast_tree and codeml_tree objects, all node/branch-specific data imported from BEAST and CODEML output files are all available in the merged_tree object. The tree object was converted to tidy data frame using tidytree package and visualized as hexbin scatterplot of dN/dS, dN and dS inferred by CODEML versus rate (substitution rate in unit of substitutions/site/year) inferred by BEAST on the same branches.

Using merge_tree, we are able to compare analysis results using identical model from different software packages or different models using different or identical software. It also allows users to integrate different analysis finding from different software packages. Merging tree data is not restricted to software findings, associating external data to analysis findings is also granted. The merge_tree function is chainable and allows several tree objects to be merged into one.

## 'treedata' S4 object that stored information of
##  '/home/biocbuild/bbs-3.7-bioc/R/library/ggtree/examples/MCC_FluA_H3.tree',
##  '/home/biocbuild/bbs-3.7-bioc/R/library/ggtree/examples/rst',
##  '/home/biocbuild/bbs-3.7-bioc/R/library/ggtree/examples/mlc'.
## 
## ...@ phylo: 
## Phylogenetic tree with 76 tips and 75 internal nodes.
## 
## Tip labels:
##  A/Hokkaido/30-1-a/2013, A/New_York/334/2004, A/New_York/463/2005, A/New_York/452/1999, A/New_York/238/2005, A/New_York/523/1998, ...
## 
## Rooted; includes branch lengths.
## 
## with the following features available:
##  'height',   'height_0.95_HPD',  'height_median',    'height_range', 'length',
##  'length_0.95_HPD',  'length_median',    'length_range', 'posterior',    'rate',
##  'rate_0.95_HPD',    'rate_median',  'rate_range',   'subs', 'AA_subs',  't',    'N',
##  'S',    'dN_vs_dS', 'dN',   'dS',   'N_x_dN',   'S_x_dS',   'fake_trait',
##  'another_trait'.

The triple_tree object showed above contains analysis results obtained from BEAST and CODEML, and evolutionary trait from external sources. All these information can be used to annotate the tree using ggtree (Yu et al. 2017).

Getting information from treedata object

After the tree was imported, users may want to extract information that stored in the treedata object. treeio provides several accessor methods to extract tree structure, features/attributes that stored in the object and their corresponding values.

The get.tree or as.phylo methods can convert the treedata object to phylo object which is the fundamental tree object in the R community and many packages work with phylo object.

## 
## Phylogenetic tree with 76 tips and 75 internal nodes.
## 
## Tip labels:
##  A/Hokkaido/30-1-a/2013, A/New_York/334/2004, A/New_York/463/2005, A/New_York/452/1999, A/New_York/238/2005, A/New_York/523/1998, ...
## 
## Rooted; includes branch lengths.

The get.fields method return a vector of features/attributes that stored in the object and associated with the phylogeny.

##  [1] "height"          "height_0.95_HPD" "height_median"  
##  [4] "height_range"    "length"          "length_0.95_HPD"
##  [7] "length_median"   "length_range"    "posterior"      
## [10] "rate"            "rate_0.95_HPD"   "rate_median"    
## [13] "rate_range"

The get.data method return a tibble of all the associated data.

## # A tibble: 151 x 14
##    height height_0.95_HPD height_median height_range length length_0.95_HPD
##     <dbl> <list>                  <dbl> <list>        <dbl> <list>         
##  1   19   <dbl [2]>                19   <dbl [2]>     2.34  <dbl [2]>      
##  2   17   <dbl [2]>                17   <dbl [2]>     1.18  <dbl [2]>      
##  3   14   <dbl [2]>                14   <dbl [2]>     0.966 <dbl [2]>      
##  4   12   <dbl [2]>                12   <dbl [2]>     1.87  <dbl [2]>      
##  5    9   <dbl [2]>                 9   <dbl [2]>     2.93  <dbl [2]>      
##  6   10   <dbl [2]>                10   <dbl [2]>     0.827 <dbl [2]>      
##  7   10   <dbl [2]>                10   <dbl [2]>     0.834 <dbl [2]>      
##  8   10.8 <dbl [2]>                10.8 <dbl [2]>     0.233 <dbl [2]>      
##  9    9   <dbl [2]>                 9   <dbl [2]>     1.28  <dbl [2]>      
## 10    9   <dbl [2]>                 9   <dbl [2]>     0.414 <dbl [2]>      
## # ... with 141 more rows, and 8 more variables: length_median <dbl>,
## #   length_range <list>, posterior <dbl>, rate <dbl>,
## #   rate_0.95_HPD <list>, rate_median <dbl>, rate_range <list>, node <int>

If users are only interesting a subset of the features/attributes return by get.fields, they can extract the information from the output of get.data or directly subset the data by [ or [[.

## # A tibble: 151 x 2
##     node height
##    <int>  <dbl>
##  1    10   19  
##  2     9   17  
##  3    36   14  
##  4    31   12  
##  5    29    9  
##  6    28   10  
##  7    39   10  
##  8    90   10.8
##  9    16    9  
## 10     2    9  
## # ... with 141 more rows
## [1] 19 17 14 12  9 10

Manipulating tree data using tidytree

All the tree data parsed/merged by treeio can be converted to tidy data frame using the tidytree package. The tidytree package provides tidy interfaces to manipulate tree with associated data. For instances, external data can be linked to phylogeny or evolutionary data obtained from different sources can be merged using tidyverse verbs. After the tree data was manipulated, it can be converted back to treedata object and exported to a single tree file, further analyzed in R or visualized using ggtree (Yu et al. 2017).

For more details, please refer to the tidytree package vignette.

Visualizing tree data with ggtree

treeio is seamlessly integrated into the ggtree (Yu et al. 2017) package and all the information either directly imported or linking from external sources can be used to visualize and annotate the tree.

See the ggtree package vignettes for more details:

Need helps?

If you have questions/issues, please visit treeio homepage first. Your problems are mostly documented. If you think you found a bug, please follow the guide and provide a reproducible example to be posted on github issue tracker. For questions, please post to Bioconductor support site and tag your post with treeio.

For Chinese user, you can follow me on WeChat (微信).

References

Berger, Simon A., Denis Krompass, and Alexandros Stamatakis. 2011. “Performance, Accuracy, and Web Server for Evolutionary Placement of Short Sequence Reads Under Maximum Likelihood.” Systematic Biology 60 (3):291–302. https://doi.org/10.1093/sysbio/syr010.

Bouckaert, Remco, Joseph Heled, Denise Kühnert, Tim Vaughan, Chieh-Hsi Wu, Dong Xie, Marc A. Suchard, Andrew Rambaut, and Alexei J. Drummond. 2014. “BEAST 2: A Software Platform for Bayesian Evolutionary Analysis.” PLoS Comput Biol 10 (4):e1003537. https://doi.org/10.1371/journal.pcbi.1003537.

Boussau, Bastien, Gergely J. Szöllősi, Laurent Duret, Manolo Gouy, Eric Tannier, and Vincent Daubin. 2013. “Genome-Scale Coestimation of Species and Gene Trees.” Genome Research 23 (2):323–30. https://doi.org/10.1101/gr.141978.112.

Höhna, Sebastian, Tracy A. Heath, Bastien Boussau, Michael J. Landis, Fredrik Ronquist, and John P. Huelsenbeck. 2014. “Probabilistic Graphical Model Representation in Phylogenetics.” Systematic Biology 63 (5):753–71. https://doi.org/10.1093/sysbio/syu039.

Liang, Huyi, Tommy Tsan-Yuk Lam, Xiaohui Fan, Xinchun Chen, Yu Zeng, Ji Zhou, Lian Duan, et al. 2014. “Expansion of Genotypic Diversity and Establishment of 2009 H1N1 Pandemic-Origin Internal Genes in Pigs in China.” Journal of Virology, July, JVI.01327–14. https://doi.org/10.1128/JVI.01327-14.

Matsen, Frederick A, Robin B Kodner, and E Virginia Armbrust. 2010. “Pplacer: Linear Time Maximum-Likelihood and Bayesian Phylogenetic Placement of Sequences onto a Fixed Reference Tree.” BMC Bioinformatics 11 (1):538. https://doi.org/10.1186/1471-2105-11-538.

Sanderson, Michael J. 2003. “R8s: Inferring Absolute Rates of Molecular Evolution and Divergence Times in the Absence of a Molecular Clock.” Bioinformatics 19 (2):301–2. https://doi.org/10.1093/bioinformatics/19.2.301.

Yang, Ziheng. 2007. “PAML 4: Phylogenetic Analysis by Maximum Likelihood.” Molecular Biology and Evolution 24 (8):1586–91. https://doi.org/10.1093/molbev/msm088.

Yu, Guangchuang, David K. Smith, Huachen Zhu, Yi Guan, and Tommy Tsan-Yuk Lam. 2017. “Ggtree: An R Package for Visualization and Annotation of Phylogenetic Trees with Their Covariates and Other Associated Data.” Methods in Ecology and Evolution 8 (1):28–36. https://doi.org/10.1111/2041-210X.12628.