anchor_pair_enrich {spatzie} | R Documentation |
Determine whether motifs between paired bed regions have a statistically significant relationship. Options for significance are motif score correlation, motif count correlation, or hypergeometric motif co-occurrence.
anchor_pair_enrich(interaction_data, method = c("count", "score", "match"))
interaction_data |
an interactionData object of paired genomic regions | ||||||
method |
method for co-occurrence, valid options include:
|
an interactionData object where obj$pair_motif_enrich
contains
the p-values for significance of seeing a higher co-occurrence than
what we get by chance.
We assume motif scores follow a normal distribution and are independent between enhancers and promoters. We can therefore compute how correlated scores of any two transcription factor motifs are between enhancer and promoter regions using Pearson's product-moment correlation coefficient:
r = \frac{∑ (x^{\prime}_i - \bar{x}^{\prime})(y^{\prime}_i - \bar{y}^{\prime})}{√{∑(x^{\prime}_i - \bar{x}^{\prime})^2∑(y^{\prime}_i - \bar{y}^{\prime})^2}}
, where the input vectors \boldsymbol{x} and \boldsymbol{y} from above are transformed to vectors \boldsymbol{x^{\prime}} and \boldsymbol{y^{\prime}} by replacing the set of scores with the maximum score for each region:
x^{\prime}_i = \max x_i
x^{\prime}_i is then the maximum motif score of motif a in the promoter region of interaction i, y^{\prime}_i is the maximum motif score of motif b in the enhancer region of interaction i, and \bar{x}^{\prime} and \bar{y}^{\prime} are the sample means.
Significance is then computed by transforming the correlation coefficient r to test statistic t, which is Student t-distributed with n - 2 degrees of freedom.
t = \frac{r√{n-2}}{√{1-r^2}}
All p-values are calculated as one-tailed p-values of the probability that scores are greater than or equal to r.
Instead of calculating the correlation of motif scores directly, the count-based correlation metric first tallies the number of instances of a given motif within an enhancer or a promoter region, which are defined as all positions in those regions with motif score p-values of less than 5 * 10^{-5}. Formally, the input vectors \boldsymbol{x} and \boldsymbol{y} are transformed to vectors \boldsymbol{x^{\prime\prime}} and \boldsymbol{y^{\prime\prime}} by replacing the set of scores with the cardinality of the set:
x^{\prime\prime}_i = |x_i|
And analogous for y^{\prime\prime}_i. Finally, the correlation coefficient r between \boldsymbol{x^{\prime\prime}} and \boldsymbol{y^{\prime\prime}} and its associated significance are calculated as described above.
Instance co-occurrence uses the presence or absence of a motif within an enhancer or promoter to determine a statistically significant association, thus \boldsymbol{x^{\prime\prime\prime}} and \boldsymbol{y^{\prime\prime\prime}} are defined by:
x^{\prime\prime\prime}_i = \boldsymbol{1}_{x^{\prime\prime}_i > 0}
Instance co-occurrence is computed using the hypergeometric test:
p = ∑_{k=I_{ab}}^{P_a} \frac{binom(P_a, k) binom(n - P_a, E_b - k)}{binom(n, E_b)},
where I_{ab} is the number of interactions that contain a match for motif a in the promoter and motif b in the enhancer, P_a is the number of promoters that contain motif a (P_a = ∑^n_i x^{\prime\prime\prime}_i), E_b is the number of enhancers that contain motif b (E_b = ∑^n_i y^{\prime\prime\prime}_i), and n is the total number of interactions, which is equal to the number of promoters and to the number of enhancers.
Jennifer Hammelman
Konstantin Krismer
## Not run: genome_id <- "BSgenome.Mmusculus.UCSC.mm9" if (!(genome_id %in% rownames(utils::installed.packages()))) { BiocManager::install(genome_id, update = FALSE, ask = FALSE) } genome <- BSgenome::getBSgenome(genome_id) motifs_file <- system.file("extdata/motifs_subset.txt.gz", package = "spatzie") motifs <- TFBSTools::readJASPARMatrix(motifs_file, matrixClass = "PFM") yy1_pd_interaction <- scan_motifs(spatzie::interactions_yy1, motifs, genome) yy1_pd_interaction <- filter_motifs(yy1_pd_interaction, 0.4) yy1_pd_count_corr <- anchor_pair_enrich(yy1_pd_interaction, method = "count") ## End(Not run) res <- anchor_pair_enrich(spatzie::scan_interactions_example_filtered, method = "score")