enrich_motifs {universalmotif} | R Documentation |
Enrich for input motifs in a set of sequences.
Description
Given a set of target and background sequences, test if the input motifs
are significantly enriched in the targets sequences relative to the
background sequences. See the "Sequence manipulation and scanning" vignette.
Usage
enrich_motifs(motifs, sequences, bkg.sequences, max.p = 1e-05,
max.q = 1e-05, max.e = 0.001, qval.method = "fdr", threshold = 1e-04,
threshold.type = "pvalue", verbose = 0, RC = FALSE, use.freq = 1,
shuffle.k = 2, shuffle.method = "euler", return.scan.results = FALSE,
nthreads = 1, rng.seed = sample.int(10000, 1), motif_pvalue.k = 8,
use.gaps = TRUE, allow.nonfinite = FALSE, warn.NA = TRUE,
no.overlaps = FALSE, no.overlaps.by.strand = FALSE,
no.overlaps.strat = "score", respect.strand = FALSE,
motif_pvalue.method = c("dynamic", "exhaustive"),
scan_sequences.qvals.method = c("BH", "fdr", "bonferroni"))
Arguments
motifs |
See convert_motifs() for acceptable motif formats.
|
sequences |
XStringSet Sequences to scan. Alphabet
should match motif.
|
bkg.sequences |
XStringSet Optional. If missing,
shuffle_sequences() is used to create background sequences from
the input sequences.
|
max.p |
numeric(1) P-value threshold.
|
max.q |
numeric(1) Adjusted P-value threshold. This is only useful
if multiple motifs are being enriched for.
|
max.e |
numeric(1) . The E-value is calculated by multiplying the adjusted
P-value with the number of input motifs times two
(McLeay and Bailey 2010).
|
qval.method |
character(1) See stats::p.adjust() .
|
threshold |
numeric(1) See details.
|
threshold.type |
character(1) One of c('pvalue', 'qvalue', 'logodds', 'logodds.abs') . See details.
|
verbose |
numeric(1) 0 for no output, 4 for max verbosity.
|
RC |
logical(1) If TRUE , check reverse complement of the input
sequences. Only available for DNA/RNA.
|
use.freq |
numeric(1) The default, 1, uses the motif matrix (from
the motif['motif'] slot) to search for sequences. If a higher
number is used, then the matching k-let matrix from the
motif['multifreq'] slot is used. See add_multifreq() .
|
shuffle.k |
numeric(1) The k-let size to use when shuffling input
sequences. Only used if no background sequences are input. See
shuffle_sequences() .
|
shuffle.method |
character(1) One of c('euler', 'markov', 'linear') .
See shuffle_sequences() .
|
return.scan.results |
logical(1) Return output from
scan_sequences() . For large jobs, leaving this as
FALSE can save a small amount time by preventing construction of the complete
results data.frame from scan_sequences() .
|
nthreads |
numeric(1) Run scan_sequences() in parallel with nthreads
threads. nthreads = 0 uses all available threads.
Note that no speed up will occur for jobs with only a single motif and
sequence.
|
rng.seed |
numeric(1) Set random number generator seed. Since shuffling
can occur simultaneously in multiple threads using C++, it cannot communicate
with the regular R random number generator state and thus requires an
independent seed. Each individual sequence in an XStringSet object will be
given the following seed: rng.seed * index . See shuffle_sequences() .
|
motif_pvalue.k |
numeric(1) Control motif_pvalue() approximation.
See motif_pvalue() .
|
use.gaps |
logical(1) Set this to FALSE to ignore motif gaps, if
present.
|
allow.nonfinite |
logical(1) If FALSE , then apply a pseudocount if
non-finite values are found in the PWM. Note that if the motif has a
pseudocount greater than zero and the motif is not currently of type PWM,
then this parameter has no effect as the pseudocount will be
applied automatically when the motif is converted to a PWM internally. This
value is set to FALSE by default in order to stay consistent with
pre-version 1.8.0 behaviour. A message will be printed if a pseudocount
is applied. To disable this, set options(pseudocount.warning=FALSE) .
|
warn.NA |
logical(1) Whether to warn about the presence of non-standard
letters in the input sequence, such as those in masked sequences.
|
no.overlaps |
logical(1) Remove overlapping hits from the same motifs.
Overlapping hits from different motifs are preserved. Please note that the
current implementation of this feature can add significantly to the run
time for large inputs.
|
no.overlaps.by.strand |
logical(1) Whether to discard overlapping hits
from the opposite strand, or to only discard overlapping hits on the same
strand.
|
no.overlaps.strat |
character(1) One of c("score", "order") .
The former option keeps the highest scoring overlapping hit (and the first
of these within ties), and the latter simply keeps the first overlapping hit.
|
respect.strand |
logical(1) If motifs are DNA/RNA,
then setting this option to TRUE will make scan_sequences() only
scan the strands of the input sequences as indicated in the motif
strand slot.
|
motif_pvalue.method |
character(1) One of c("dynamic", "exhaustive") .
Algorithm used for calculating P-values. The "exhaustive" method
involves finding all possible motif matches at or above the specified
score using a branch-and-bound algorithm, which can be computationally
intensive (Hartman et al., 2013). Additionally, the computation
must be repeated for each hit. The "dynamic" method calculates the
distribution of possible motif scores using a much faster dynamic
programming algorithm, and can be recycled for multiple
scores (Grant et al., 2011). The only
disadvantage is the inability to use allow.nonfinite = TRUE .
See motif_pvalue() for details.
|
scan_sequences.qvals.method |
character(1) One of
c("fdr", "BH", "bonferroni") . The method for calculating adjusted
P-values for individual motif hits. These are described in depth in the
Sequence Searches vignette.
|
Details
To find enriched motifs, scan_sequences()
is run on both
target and background sequences.
stats::fisher.test()
is run to test for enrichment.
See scan_sequences()
for more info on scanning parameters.
Value
DataFrame
Enrichment results in a DataFrame
. Function args and
(optionally) scan results are stored in the metadata
slot.
Author(s)
Benjamin Jean-Marie Tremblay benjamin.tremblay@uwaterloo.ca
References
McLeay R, Bailey TL (2010). “Motif Enrichment Analysis: A unified
framework and method evaluation.” BMC Bioinformatics, 11.
See Also
scan_sequences()
, shuffle_sequences()
,
add_multifreq()
, motif_pvalue()
Examples
data(ArabidopsisPromoters)
data(ArabidopsisMotif)
if (R.Version()$arch != "i386") {
enrich_motifs(ArabidopsisMotif, ArabidopsisPromoters, threshold = 0.01)
}
[Package
universalmotif version 1.12.0
Index]