Advanced User Guide - SangerAlignment (AB1)¶
SangerAlignment is the highest class level in sangeranalyseR showed in Figure_1. It contains SangerContig list and the contigs alignment result. Users can access to SangerContig and SangerRead instance inside SangerAlignment instance. In this section, we are going to go through detailed sangeranalyseR data analysis steps in SangerAlignment level from AB1 file input.

Figure 1. Classes hierarchy in sangeranalyseR, SangerAlignment level.¶
Preparing SangerAlignment AB1 input¶
The main input file format to create SangerAlignment instance is AB1. Before starting the analysis, users need to prepare a directory which contains all the AB1 files. Here are some filename regulations:
Note
All the input files must have .ab1 as its file extension
The reads that belong to the same contig must have the same contig name in its filename.
Forward or reverse direction also has to be specified in the filename.
There are three parameters, parentDirectory
, suffixForwardRegExp
, and suffixReverseRegExp
, that users need to provide so that program can automatically group all AB1 files.
Note
parentDirectory
: The root directory that contains all the AB1 files. It can be absolute or relative path. We suggest users to put only target AB1 files inside this directory without other unrelated files.suffixForwardRegExp
: The value of this parameter is a regular expression that matches all filenames in forward direction.grepl
function in R is used to select forward reads from all AB1 files.suffixReverseRegExp
: The value of this parameter is a regular expression that matches all filenames in reverse direction.grepl
function in R is used to select reverse reads from all AB1 files.
For basic input files preparation example, please go to Beginners Guide. Here, we have another more complicated example.

Figure 2. Input ab1 files inside the parent directory, ./tmp/
.¶
Figure_2 shows the file naming regulation and directory hierarchy. In this example, the parent directory is extdata
and the directories in first layer are Allolobophora_chlorotica
and Drosophila_melanogaster
. All target AB1 files need to be inside parent directory but it is not necessary to put them in the same level. sangeranalyseR will recursively search all files with .ab1 file extension and automatically group reads with the same contig name. The direction of reads in each contig will be grouped by matching suffixForwardRegExp
and suffixReverseRegExp
with filenames. Therefore, it is important to carefully select suffixForwardRegExp
and suffixReverseRegExp
. The bad file naming regulation and wrong regex matching might accidentally include reverse reads into the forward read list or vice versa, which will make the program generate totally wrong results. Therefore, users should have a consistent naming strategy. In this example, "_[0-9]+_F
", "_[0-9]+_R
" for matching forward and reverse reads are highly suggested and are used as default. It is a good habit to index your reads in the same contig group because there might be more than one read that are in the forward or reverse direction.

Figure 3. Suggested AB1 file naming regulation - SangerAlignment.¶
Figure_3 shows the suggested AB1 file naming regulation. Users are strongly recommended to follow this file naming regulation and use the default suffixForwardRegExp
: "_[0-9]+_F
" and suffixReverseRegExp
: "_[0-9]+_R
" to reduce any chance of error.
Creating SangerAlignment instance from AB1¶
After preparing the input directory, we can create the SangerAlignment S4 instance by running SangerAlignment
constructor function or new
method. The constructor function is a wrapper for new
method and it makes instance creation more intuitive. Most parameters in the constructor have their own default values. In the constructor below, we list important parameters. For a simpler command, please go to Quick Start Guide.
sangerAlignment <- SangerAlignment(inputSource = "ABIF",
parentDirectory = "./tmp/",
suffixForwardRegExp = "_[0-9]+_F",
suffixReverseRegExp = "_[0-9]+_R",
refAminoAcidSeq = "SRQWLFSTNHKDIGTLYFIFGAWAGMVGTSLSILIRAELGHPGALIGDDQIYNVIVTAHAFIMIFFMVMPIMIGGFGNWLVPLMLGAPDMAFPRMNNMSFWLLPPALSLLLVSSMVENGAGTGWTVYPPLSAGIAHGGASVDLAIFSLHLAGISSILGAVNFITTVINMRSTGISLDRMPLFVWSVVITALLLLLSLPVLAGAITMLLTDRNLNTSFFDPAGGGDPILYQHLFWFFGHPEVYILILPGFGMISHIISQESGKKETFGSLGMIYAMLAIGLLGFIVWAHHMFTVGMDVDTRAYFTSATMIIAVPTGIKIFSWLATLHGTQLSYSPAILWALGFVFLFTVGGLTGVVLANSSVDIILHDTYYVVAHFHYVLSMGAVFAIMAGFIHWYPLFTGLTLNNKWLKSHFIIMFIGVNLTFFPQHFLGLAGMPRRYSDYPDAYTTWNIVSTIGSTISLLGILFFFFIIWESLVSQRQVIYPIQLNSSIEWYQNTPPAEHSYSELPLLTN",
TrimmingMethod = "M1",
M1TrimmingCutoff = 0.0001,
M2CutoffQualityScore = NULL,
M2SlidingWindowSize = NULL,
baseNumPerRow = 100,
heightPerRow = 200,
signalRatioCutoff = 0.33,
showTrimmed = TRUE,
minReadsNum = 2,
minReadLength = 20,
minFractionCall = 0.5,
maxFractionLost = 0.5,
geneticCode = GENETIC_CODE,
acceptStopCodons = TRUE,
readingFrame = 1,
minFractionCallSA = 0.5,
maxFractionLostSA = 0.5,
processorsNum = NULL)
The inputs of SangerAlignment
constructor function and new
method are same. For more details about SangerAlignment inputs and slots definition, please refer to sangeranalyseR reference manual (need update).
Updating SangerAlignment quality trimming parameters¶
In the previous Creating SangerAlignment instance from AB1 part, the constructor function will apply the quality trimming parameters to all reads. After creating the SangerAlignment S4 instance, users can change the trimming parameters by running updateQualityParam
function which will update all reads with the new trimming parameters and redo reads alignment in SangerContig and contigs alignment in SangerAlignment. If users want to do quality trimming read by read instead all at once, please read Launching SangerAlignment Shiny app.
newSangerAlignment <- updateQualityParam(sangerAlignment,
TrimmingMethod = "M2",
M1TrimmingCutoff = NULL,
M2CutoffQualityScore = 29,
M2SlidingWindowSize = 15)
Launching SangerAlignment Shiny app¶
We create an interactive local Shiny app for users to go into each SangerRead and SangerContig in SangerAlignment instance. Users only need to run one function with previously created instance as input and the SangerAlignment Shiny app will pop up. Here, we will go through pages in the three levels.
launchApp(sangerAlignment)
SangerAlignment page (SA app)¶
Figure 4 is the initial page and the toppest layer of SangerAlignment App. It provides basic parameters in SangerAlignment instance, contigs alignment result and phylogenetic tree etc. Before checking the results, users need to click “Re-calculate Contigs Alignment” button to do contigs alignment in order to get the updated results. From the left-hand side panel, we can clearly see the hierarchy of the SangerAlignment S4 instance and easily access to all reads and contigs in it.

Figure 4. SangerAlignment Shiny app initial page - SangerAlignment Page.¶
Scroll down a bit, users can see the contigs alignment result generated by DECIPHER R package embedded in SangerAlignment page. Figure 5 shows the contigs alignment result.

Figure 5. SangerAlignment Page - contigs alignment result.¶
In SangerAlignment page, the phylogenetic tree result is provided as well (Figure 6). The tree is generated by ape R package which uses neighbor-joining algorithm.

Figure 6. SangerAlignment Page - phylogenetic tree result.¶
SangerContig page (SA app)¶
Now, let's go to the page in the next level, SangerContig page. Users can click into all contigs and check their results. Figure 7 shows the overview page of Contig 1. Notice that there is a red “Re-calculate Contig” button. After changing the quality trimming parameters, users need to click the button before checking the results below in order to get the updated information.

Figure 7. SangerAlignment Shiny app - SangerContig page.¶
The information provided in this page includes : “input parameters”, “genetic code table”, “reference amino acid sequence”, “reads alignment”, “difference data frame”, “dendrogram”, “sample distance heatmap”, “indels data frame”, “stop codons data frame”. Figure 8 and Figure 9 show part of the results in the SangerContig page. The results are dynamic based on the trimming parameters from user inputs.

Figure 8. SangerContig page - contig-related parameters, genetic code and reference amino acid sequence.¶

Figure 9. SangerContig page - reads alignment and difference data frame.¶
SangerRead page (SA app)¶
Now, let's go to the page in the lowest level, SangerRead page. SangerRead page contains all details of a read including its trimming and chromatogram inputs and results. All reads are in "forward" or "reverse" direction. Under "Contig Overview" tab (SangerContig page), there are two expendable tabs, “Forward Reads” and “Reverse Reads” storing corresponding reads on the left-hand side navigation panel in Figure 10. In this example, there are one read in each tab and Figure 10 shows the “1 - 1 Forward Read” page. It provides basic information, quality trimming inputs, chromatogram plotting inputs etc. Primary/secondary sequences in this figure are dynamic based on the signalRatioCutoff
value for base calling and the length of them are always same. Another thing to mention is that primary/secondary sequences and the sequences in the chromatogram in Figure 15 below will always be same after trimming and their color codings for A/T/C/G are same as well.

Figure 10. SangerAlignment Shiny app - SangerRead page.¶
In quality trimming steps, we removes fragment at both ends of sequencing reads with low quality score. It is important because trimmed reads will improves alignment results. Figure 11 shows the UI for Trimming Method 1 (M1): ‘Modified Mott Trimming’. This method is implemented in Phred. Users can change the cutoff score and click “Apply Trimming Parameters" button to update the UI. The value of input must be between 0 and 1. If the input is invalid, the cutoff score will be set to default 0.0001.

Figure 11. SangerRead page - Trimming Method 1 (M1): ‘Modified Mott Trimming’ UI.¶
Figure 12 shows another quality trimming methods for users to choose from, Trimming Method 2 (M2): ‘Trimmomatics Sliding Window Trimming’. This method is implemented in Trimmomatics. Users can change the cutoff quality score as well as sliding window size and click “Apply Trimming Parameters" button to update the UI. The value of cutoff quality score must be between 0 and 60 (default 20); the value of sliding window size must be between 0 and 40 (default 10). If the inputs are invalid, their values will be set to default.

Figure 12. SangerRead page - Trimming Method 2 (M2): ‘Trimmomatics Sliding Window Trimming’ UI.¶
Figure 13 shows the quality report before and after trimming. After clicking the “Apply Trimming Parameters” button, the values of these information boxes will be updated to the latest values.

Figure 13. SangerRead page - read quality report before / after trimming.¶
In Figure 14, the x-axis is the index of the base pairs; the y-axis is the Phred quality score. The green horizontal bar at the top of the plot is the raw read region and the orange horizontal bar represents the trimmed read region. Both Figure 14 trimming plot and Figure 15 chromatogram will be updated once users change the quality trimming parameters and click the “Apply Trimming Parameters" button in Figure 15.

Figure 14. SangerRead page - quality trimming plot.¶
If we only see primary and secondary sequences in the table, we will loose some variations. Chromatogram is very helpful to check the peak resolution. Figure 15 shows the panel of plotting chromatogram. Users can change four parameters: Base Number Per Row
, Height Per Row
, Signal Ratio Cutoff
, and Show Trimmed Region
. Among them, Signal Ratio Cutoff
is the key parameter. If its value is default value 0.33, it indicates that the lower peak should be at least 1/3rd as high as the higher peak for it count as a secondary peak.

Figure 15. SangerRead page - chromatogram panel.¶
Here is an example of applying new chromatogram parameters. We click “Show Trimmed Region” to set its value from FALSE to TRUE. Figure 16 shows the loading notification popup during base calling and chromatogram plotting.

Figure 16. SangerRead page - loading notification popup during replotting chromatogram.¶
After replotting the chromatogram, trimmed region is showed in red striped region. Figure 17 shows part of the the chromatogram (1 bp ~ 240 bp). Moreover, chromatogram will be replotted when trimmed positions or chromatogram parameters are updated.

Figure 17. SangerRead page - chromatogram with trimmed region showed.¶
To let users browse the trimmed primary/secondary sequences without finding “Trimming Start Point” and “Trimming End Point” by themselves, we provide the final trimmed primary/secondary sequences that will be used for reads alignment in table format with quality scores in Figure 18. Frameshift amino acid sequences are also provided.

Figure 18. SangerRead page - trimmed primary/secondary sequences and Phred quality score in table format.¶
We have updated the trimming and chromatogram parameters for each read. Now, we need to click “Re-calculate contig” button to do alignment again. Last but not least, we can save all data into a new ‘SangerContig’ S4 instance by clicking “Save S4 instance button”. New S4 instance will be saved in Rda format. Users can run readRDS
function to load it into current R environment. Figure 19 shows some hints in the save notification popup.

Figure 19. SangerRead page - saving notification popup.¶
Writing SangerAlignment FASTA files (AB1)¶
Users can write the SangerAlignment instance to FASTA files. There are four options for users to choose from in selection
parameter.
contigs_unalignment
: Writing contigs into a single FASTA file.contigs_alignment
: Writing contigs alignment and contigs consensus read to a single FASTA file.all_reads
: Writing all reads to a single FASTA file.all
: Writing contigs, contigs alignment, and all reads into three different files.
Below is the one-line function that users need to run. This function mainly depends on writeXStringSet
function in Biostrings R package. Users can set the compression level through writeFasta
function.
writeFasta(sangerAlignment,
outputDir = tempdir(),
compress = FALSE,
compression_level = NA,
selection = "all")
Users can download the output FASTA file of this example through the following three links:
Generating SangerAlignment report (AB1)¶
Last but not least, users can save SangerAlignment instance into a report after the analysis. The report will be generated in HTML by knitting Rmd files. There are two parameters, includeSangerContig
and includeSangerRead
, for users to decide which level the SangerAlignment report will go. Moreover, after the reports are generated, users can easily navigate through reports in different levels within the HTML file.
includeSangerContig
: Whether users want to generate the report of each SangerContig in SangerAlignment.includeSangerRead
: IfincludeSangerContig
isTRUE
, then users can set this value to decide whether they want to include SangerRead reports in each SangerContig.
One thing to pay attention to is that if users have many reads, it will take quite a long time to write out all reports. If users only want to generate the contigs alignment, remember to set includeSangerContig
and includeSangerRead
to FALSE
in order to save time.
generateReport(sangerAlignment,
outputDir = tempdir(),
includeSangerContig = TRUE,
includeSangerRead = TRUE)
Users can access to 'Basic Information', 'Contigs Consensus', 'Contigs Alignment' and 'Contigs Tree' sections inside the generated SangerContig html report of this example. Furthermore, users can also navigate through html reports of all forward and reverse SangerRead in this SangerContig report.