fastqPairedFilter {dada2} | R Documentation |
fastqPairedFilter filters pairs of input fastq files (can be compressed) based on several
user-definable criteria, and outputs those read pairs which pass the filter in both directions
to two new fastq file (also can be compressed). Several functions
in the ShortRead
package are leveraged to do this filtering. The filtered forward/reverse reads
remain identically ordered.
fastqPairedFilter(fn, fout, maxN = c(0, 0), truncQ = c(2, 2), truncLen = c(0, 0), maxLen = c(Inf, Inf), minLen = c(20, 20), trimLeft = c(0, 0), minQ = c(0, 0), maxEE = c(Inf, Inf), rm.phix = c(TRUE, TRUE), matchIDs = FALSE, primer.fwd = NULL, id.sep = "\\s", id.field = NULL, n = 1e+06, OMP = TRUE, compress = TRUE, verbose = FALSE, ...)
fn |
(Required). A |
fout |
(Required). A FILTERING AND TRIMMING ARGUMENTS If a length 1 vector is provided, the same parameter value is used for the forward and reverse reads. If a length 2 vector is provided, the first value is used for the forward reads, and the second for the reverse reads. |
maxN |
(Optional). Default 0.
After truncation, sequences with more than |
truncQ |
(Optional). Default 2.
Truncate reads at the first instance of a quality score less than or equal to |
truncLen |
(Optional). Default 0 (no truncation).
Truncate reads after |
maxLen |
(Optional). Default Inf (no maximum). Remove reads with length greater than maxLen. maxLen is enforced on the raw reads. |
minLen |
(Optional). Default 20. Remove reads with length less than minLen. minLen is enforced after all other trimming and truncation. |
trimLeft |
(Optional). Default 0.
The number of nucleotides to remove from the start of each read. If both |
minQ |
(Optional). Default 0. After truncation, reads contain a quality score below minQ will be discarded. |
maxEE |
(Optional). Default |
rm.phix |
(Optional). Default TRUE.
If TRUE, discard reads that match against the phiX genome, as determined by
|
matchIDs |
(Optional). Default FALSE.
Whether to enforce matching between the id-line sequence identifiers of the forward and reverse fastq files.
If TRUE, only paired reads that share id fields (see below) are output.
If FALSE, no read ID checking is done.
Note: |
primer.fwd |
(Optional). Default NULL. A character string defining the forward primer. Only allows unambiguous nucleotides. The primer will be compared to the first len(primer.fwd) nucleotides at the start of the forward and reverse reads. If there is not an exact match, the paired read is filtered out. If detected on the reverse read, the fwd/rev reads are swapped. ID MATCHING ARGUMENTS The following optional arguments enforce matching between the sequence identification strings in the forward and reverse reads, and can automatically detect and match ID fields in Illumina format, e.g: EAS139:136:FC706VJ:2:2104:15343:197393. ID matching is not required when using standard Illumina output fastq files. |
id.sep |
(Optional). Default "\s" (white-space).
The separator between fields in the id-line of the input fastq files. Passed to the |
id.field |
(Optional). Default NULL (automatic detection). The field of the id-line containing the sequence identifier. If NULL (the default) and matchIDs is TRUE, the function attempts to automatically detect the sequence identifier field under the assumption of Illumina formatted output. |
n |
(Optional). The number of records (reads) to read in and filter at any one time.
This controls the peak memory requirement so that very large fastq files are supported.
Default is |
OMP |
(Optional). Default TRUE.
Whether or not to use OMP multithreading when calling |
compress |
(Optional). Default TRUE. Whether the output fastq files should be gzip compressed. |
verbose |
(Optional). Default FALSE. Whether to output status messages. |
... |
(Optional). Arguments passed on to |
integer(2)
.
The number of reads read in, and the number of reads that passed the filter and were output.
fastqFilter
FastqStreamer
trimTails
testFastqF = system.file("extdata", "sam1F.fastq.gz", package="dada2") testFastqR = system.file("extdata", "sam1R.fastq.gz", package="dada2") filtFastqF <- tempfile(fileext=".fastq.gz") filtFastqR <- tempfile(fileext=".fastq.gz") fastqPairedFilter(c(testFastqF, testFastqR), c(filtFastqF, filtFastqR), maxN=0, maxEE=2) fastqPairedFilter(c(testFastqF, testFastqR), c(filtFastqF, filtFastqR), trimLeft=c(10, 20), truncLen=c(240, 200), maxEE=2, rm.phix=TRUE, verbose=TRUE)