odseq_unaligned {odseq} | R Documentation |
Provided a similarity matrix (like the ones provided using string kernels in kebabs). It will then compute a score for each sequence and perform bootstrap to provide information on the distribution of the scores, which is used to distinguish outlier sequences.
odseq_unaligned(distance_matrix, B = 100, threshold = 0.025, type = "similarity")
distance_matrix |
A numeric matrix representing either similarity or distance among unaligned sequences. Package kebabs may be useful for this task. |
B |
Integer indicating the number of bootstrap replicates to be run. The higher the more robust the detection should be. |
threshold |
Float indicating the probability to be left at the right of the bootstrap scores distribution when computing outliers. This parameter may need some tuning depending on each specific problem |
type |
A string indicating the type of distance metric used. Either |
Returns a logical vector, where TRUE
indicates an outlier.
José Jiménez <jose@jimenezluna.com>
[1] OD-seq: outlier detection in multiple sequence alignments. Peter Jehl, Fabian Sievers and Desmond G. Higgins. BMC Bioinformatics. 2015.
library(kebabs) data(seqs) sp <- spectrumKernel(k = 3) mat <- getKernelMatrix(sp, seqs) odseq_unaligned(mat, B = 1000, threshold = 0.025, type = "similarity")