1 Introduction

The sitePath package does hierarchical search for fixation events given multiple sequence alignment and phylogenetic tree. These fixation events can be specific to a phylogenetic lineages or shared by multiple lineages. This is achieved by three major steps:

  1. Import tree and sequence alignment
  2. Resolve phylogenetic lineages
  3. Hierarchical search for fixation and parallel mutations

2 Import data

There’re various R packages for parsing phylogenetic tree and multiple sequence alignment files. For now, sitepath accepts phylo object and alignment object. Functions from ggtree and seqinr are able to handle most file formats.

2.1 Parse phylogenetic tree

The S3 phylo class is a common data structure for phylogenetic analysis in R. The CRAN package ape provides basic parsing function for reading tree files. The Bioconductor package ggtree provides more comprehensive parsing utilities.


tree_file <- system.file("extdata", "ZIKV.newick", package = "sitePath")
tree <- read.tree(tree_file)

It is highly recommended that the file stores a rooted tree as R would consider the tree is rooted by default and re-rooting the tree in R is difficult.

Also, we expect the tree to have no super long branches. A bad example is shown below:

2.2 Parse and match sequence alignment

Most multiple sequence alignment format can be parsed by seqinr. There is a wrapper function for parsing and adding the sequence alignment.

alignment_file <- system.file("extdata", "ZIKV.fasta", package = "sitePath")
tree <- addMSA(tree, alignment_file, "fasta")

3 Phylogenetic lineages

The names in tree and alignment must be matched. We exploit polymorphism of each site to find the major branches. Before finding putative phylogenetic lineages, there involves a few more steps to evaluate the impact of threshold on result.

3.1 The impact of threshold on resolving lineages

In the current version, the resolving function only takes sequence similarity as one single threshold. The impact of threshold depends on the tree topology hence there is no universal choice. The function sneakPeak samples thresholds and calculates the resulting number of paths. The use of this function can be of great help in choosing the threshold.

preassessment <- sneakPeek(tree, makePlot = TRUE)

3.2 Choose a threshold

Use the return of the function lineagePath for downstream analysis. The choice of the threshold really depends. You can use the result from sneakPeak as a reference for threshold choosing. Here 0.05 is used as an example.

paths <- lineagePath(preassessment, similarity = 0.05)
#> This is a 'lineagePath' object.
#> 4 lineage paths using 18.1 as "major SNP" threshold

You can visualize the result.


4 Mutation detection

Now you’re ready to find fixation and parallel mutations.

4.1 Entropy minimization

The sitesMinEntropy function perform entropy minimization on every site for each lineage. The fixation and parallel mutations can be derived from the function’s return value.

minEntropy <- sitesMinEntropy(paths)

4.2 Fixation mutations

The hierarchical search is done by fixationSites function. The function detects the site with fixation mutation.

fixations <- fixationSites(minEntropy)
#> This is a 'fixationSites' object.
#> Result for 4 paths:
#> 109 139 894 2074 2086 2634 3045 3144 988 1143 2842 3398 107 1118 3353 
#> No reference sequence specified. Using alignment numbering

To get the position of all the resulting sites, allSitesName can be used on the return of fixationSites and also other functions like SNPsites and parallelSites.

allSites <- allSitesName(fixations)
#>  [1] "109"  "139"  "894"  "2074" "2086" "2634" "3045" "3144" "988"  "1143"
#> [11] "2842" "3398" "107"  "1118" "3353"

If you want to retrieve the result of a single site, you can pass the result of fixationSites and the site index to extractSite function. The output is a sitePath object which stores the tip names.

sp <- extractSite(fixations, 139)

It is also possible to retrieve the tips involved in the fixation of the site.

extractTips(fixations, 139)
#> [[1]]
#>  [1] "ANK57896" "AMD61711" "AQS26698" "APG56458" "AUI42289" "AMR39834"
#>  [7] "AWH65848" "APO08504" "AMX81917" "AVZ47169" "AMX81916" "AMD61710"
#> [13] "AMK49492" "AMX81915" "AOC50652" "APH11611" "BBC70847" "AUF35022"
#> [19] "ATL14618" "AUF35021" "AVV62004" "BAX00477"
#> attr(,"AA")
#> [1] "S"
#> [[2]]
#>   [1] "BAV89190"     "AOI20067"     "AMM43325"     "AMM43326"     "AUI42329"    
#>   [6] "AUI42330"     "ANC90425"     "AMT75536"     "ANF16414"     "AMR68932"    
#>  [11] "ANA12599"     "AMM39806"     "AMR39830"     "AMV49165"     "AMO03410"    
#>  [16] "ANO46307"     "AVG19275"     "ANN44857"     "ANO46306"     "ANO46309"    
#>  [21] "ANO46305"     "ANO46303"     "ARB08102"     "ANO46302"     "AHZ13508"    
#>  [26] "ANO46304"     "ANO46301"     "ANO46308"     "AOG18296"     "AOO19564"    
#>  [31] "AUI42194"     "APC60215"     "AMQ48986"     "ATG29307"     "ART29828"    
#>  [36] "AWF93617"     "ATG29284"     "ATG29287"     "ATG29303"     "AWF93619"    
#>  [41] "AWF93618"     "AQM74762"     "AUD54964"     "AQM74761"     "ATG29306"    
#>  [46] "ASL68974"     "ATG29267"     "ASL68978"     "AQX32985"     "ATG29315"    
#>  [51] "AQZ41956"     "ARI68105"     "ASU55505"     "AQZ41949"     "ASL68979"    
#>  [56] "ATG29299"     "ATI21641"     "ATG29270"     "ATG29291"     "AOY08536"    
#>  [61] "ANO46297"     "ANO46298"     "AQZ41950"     "AQZ41951"     "ARU07183"    
#>  [66] "ANG09399"     "AQZ41954"     "AOY08533"     "AQZ41947"     "AQZ41948"    
#>  [71] "ATG29292"     "ATG29295"     "AOW32303"     "AVZ25033"     "AOC50654"    
#>  [76] "AQZ41953"     "ATG29301"     "ATG29276"     "APO08503"     "AMC13913"    
#>  [81] "AMC13912"     "APO39243"     "APO39229"     "AQZ41952"     "AQZ41955"    
#>  [86] "AMK49165"     "ARB07976"     "APB03018"     "AMC13911"     "APB03019"    
#>  [91] "ASU55416"     "ANK57897"     "AWH65849"     "AMZ03556"     "ASU55417"    
#>  [96] "ANW07476"     "APY24199"     "AMA12086"     "AMH87239"     "APY24198"    
#> [101] "APO36913"     "ALX35659"     "AOG18295"     "ANQ92019"     "AML81028"    
#> [106] "APY24200"     "AMD16557"     "ARU07074"     "AOX49264"     "AOX49265"    
#> [111] "AOY08518"     "ARB07962"     "AMX81919"     "AMM39805"     "ARX97119"    
#> [116] "AMB37295"     "AMK79468"     "AML82110"     "AMR39831"     "AMX81918"    
#> [121] "ANC90426"     "ALU33341"     "ASB32509"     "AMA12085"     "AMU04506"    
#> [126] "AMA12087"     "AMA12084"     "AQU12485"     "AMS00611"     "AMQ48981"    
#> [131] "AOY08538"     "APH11492"     "AOY08517"     "AOY08541"     "AOO54270"    
#> [136] "AND01116"     "ARU07076"     "AMK49164"     "APG56457"     "AOR82892"    
#> [141] "ATB53752"     "ANH10698"     "AOR82893"     "ARU07075"     "AMB18850"    
#> [146] "YP_009428568" "AMQ48982"     "ART29823"     "APW84876"     "ASK51714"    
#> [151] "ARB07953"     "APW84872"     "AOY08525"     "APW84873"     "AOY08535"    
#> [156] "AVZ25035"     "ARB07932"     "AOY08523"     "AOY08542"     "ASW34087"    
#> [161] "AOY08537"     "APB03020"     "ART29826"     "ART29825"     "AOS90220"    
#> [166] "AMN14620"     "APW84874"     "APW84875"     "BAV82373"     "AOS90221"    
#> [171] "AOS90224"     "APB03021"     "APO39232"     "AOS90223"     "APO39237"    
#> [176] "ANH22038"     "APW84877"     "APO39236"     "AOY08546"     "AOY08516"    
#> [181] "APO39233"     "AOS90222"     "AOO53981"     "AOY08521"     "AOO85388"    
#> [186] "APO39228"     "ARB07967"     "ANF04752"     "AOE22997"     "APQ41782"    
#> [191] "APQ41786"     "ASU55393"     "ASU55404"     "ASU55423"     "ANB66182"    
#> [196] "ASU55425"     "ASU55420"     "AQX32986"     "ASU55422"     "APQ41784"    
#> [201] "ANC90428"     "ASU55415"     "ASU55418"     "ARM59239"     "ASU55408"    
#> [206] "ASU55424"     "ASU55390"     "ASU55419"     "ASU55391"     "AMM39804"    
#> [211] "ASU55411"     "ANB66183"     "ASU55421"     "AMZ03557"     "ASU55392"    
#> [216] "AQX32987"     "ASU55403"     "ASU55399"     "APQ41783"     "ANS60026"    
#> [221] "ANB66184"     "ASU55426"     "ASU55412"     "ASU55413"     "ASU55410"    
#> [226] "ASU55397"     "ASU55400"     "ASU55409"     "APB03017"     "ASU55395"    
#> [231] "ASU55396"     "AOY08524"     "ASU55394"     "ASU55414"     "ASU55405"    
#> [236] "AMC33116"     "ASU55406"     "ASU55398"     "ASU55407"     "AMQ34003"    
#> [241] "AMQ34004"     "ASU55401"     "ASU55402"    
#> attr(,"AA")
#> [1] "N"

Use plot on a sitePath object to visualize the fixation mutation of a single site. Alternatively, use plotSingleSite on an fixationSites object with the site specified.


plotSingleSite(fixations, 139)
#> Warning: `mutate_()` was deprecated in dplyr 0.7.0.
#> Please use `mutate()` instead.
#> See vignette('programming') for more help

To have an overall view of the transition of fixation mutation, use plot on an fixationSites object.


4.3 Parallel mutation

Parallel mutation can be found by the parallelSites function. There are four ways of defining the parallel mutation: all, exact, pre and post. Here exact is used as an example.

paraSites <- parallelSites(minEntropy, minSNP = 1, mutMode = "exact")
#> This is a 'parallelSites' object.
#> Result for 4 paths:
#> 105 1226 1264 1717 988 2611 2787 2749 3162 1857 1016 1171 1327 3046 3328 2634 3076 106 2357 573 940 1180 1404 
#> No reference sequence specified. Using alignment numbering

The result of a single site can be visualized by plotSingleSite function.

plotSingleSite(paraSites, 105)
#> Warning: `filter_()` was deprecated in dplyr 0.7.0.
#> Please use `filter()` instead.
#> See vignette('programming') for more help