AdjustAlignment {DECIPHER} | R Documentation |
Makes small adjustments by shifting groups of gaps left and right to find their optimal positioning in a multiple sequence alignment.
AdjustAlignment(myXStringSet, perfectMatch = 5, misMatch = 0, gapLetter = -3, gapOpening = -0.1, gapExtension = 0, substitutionMatrix = NULL, shiftPenalty = -0.2, threshold = 0.1, weight = 1, processors = 1)
myXStringSet |
An |
perfectMatch |
Numeric giving the reward for aligning two matching nucleotides in the alignment. Only used for |
misMatch |
Numeric giving the cost for aligning two mismatched nucleotides in the alignment. Only used for |
gapLetter |
Numeric giving the cost for aligning gaps to letters. A lower value (more negative) encourages the overlapping of gaps across different sequences in the alignment. |
gapOpening |
Numeric giving the cost for opening or closing a gap in the alignment. |
gapExtension |
Numeric giving the cost for extending an open gap in the alignment. |
substitutionMatrix |
Either a substitution matrix representing the substitution scores for an alignment or the name of the amino acid substitution matrix to use in alignment. The latter may be one of the following: “BLOSUM45”, “BLOSUM50”, “BLOSUM62”, “BLOSUM80”, “BLOSUM100”, “PAM30”, “PAM40”, “PAM70”, “PAM120”, “PAM250”, or “MIQS”. The default (NULL) will use the |
shiftPenalty |
Numeric giving the cost for every additional position that a group of gaps is shifted. |
threshold |
Numeric specifying the improvement in score required to permanently apply an adjustment to the alignment. |
weight |
A numeric vector of weights for each sequence, or a single number implying equal weights. |
processors |
The number of processors to use, or |
The process of multiple sequence alignment often results in the integration of small imperfections into the final alignment. Some of these errors are obvious by-eye, which encourages manual refinement of automatically generated alignments. However, the manual refinement process is inherently subjective and time consuming. AdjustAlignment
refines an existing alignment in a process similar to that which might be applied manually, but in a repeatable and must faster fashion. This function shifts all of the gaps in an alignment to the left and right to find their optimal positioning. The optimal position is defined as the position that maximizes the alignment “score”, which is determined by the input parameters. The resulting alignment will be similar to the input alignment but with many imperfections eliminated. Note that the affine gap penalties here are different from the more flexible penalties used in AlignProfiles
, and have been optimized independently.
An XStringSet
of aligned sequences.
Erik Wright eswright@pitt.edu
Wright, E. S. (2015). DECIPHER: harnessing local sequence context to improve protein multiple sequence alignment. BMC Bioinformatics, 16, 322. http://doi.org/10.1186/s12859-015-0749-z
Wright, E. S. (2020). RNAconTest: comparing tools for noncoding RNA multiple sequence alignment based on structural consistency. RNA 2020, 26, 531-540.
AlignSeqs
, AlignTranslation
, PFASUM
, StaggerAlignment
# a trivial example aa <- AAStringSet(c("ARN-PK", "ARRP-K")) aa # input alignment AdjustAlignment(aa) # output alignment # specifying an alternative substitution matrix AdjustAlignment(aa, substitutionMatrix="BLOSUM62") # a real example fas <- system.file("extdata", "Streptomyces_ITS_aligned.fas", package="DECIPHER") dna <- readDNAStringSet(fas) dna # input alignment adjustedDNA <- AdjustAlignment(dna) # output alignment BrowseSeqs(adjustedDNA, highlight=1) adjustedDNA==dna # most sequences were adjusted (those marked FALSE)