metapod 1.10.1
The metapod package provides utilities for meta-analyses on precomputed \(p\)-values, typicall generated from differential analyses of some sort. This enables users to consolidate multiple related tests into a single metric for easier interpretation and prioritization of interesting events. The functionality here was originally developed as part of the csaw package, to combine inferences from adjacent windows inside a genomic region; as well as from the scran package, to merge statistics from multiple batches or from multiple pairwise comparisons during marker gene detection. Most functions in this package involve combining multiple (independent or not) \(p\)-values into a single combined \(p\)-value. Additional functions are also provided to summarize the direction of any effect.
metapod provides two families of functions to combine a group of \(p\)-values into a single statistic. The first family considers parallel vectors of \(p\)-values where each group of \(p\)-values is defined from corresponding elements across vectors. This is useful for combining different tests on the same set of features, e.g., DE analyses with the same gene annotation. We illustrate this below by using Simes’ method to combine two parallel vectors:
library(metapod)
p1 <- rbeta(10000, 0.5, 1)
p2 <- rbeta(10000, 0.5, 1)
par.combined <- parallelSimes(list(p1, p2))
str(par.combined)
## List of 3
## $ p.value : num [1:10000] 0.2597 0.0568 0.7838 0.023 0.844 ...
## $ representative: int [1:10000] 1 1 1 2 1 2 2 1 2 2 ...
## $ influential :List of 2
## ..$ : logi [1:10000] TRUE TRUE TRUE FALSE TRUE FALSE ...
## ..$ : logi [1:10000] TRUE FALSE TRUE TRUE TRUE TRUE ...
The other approach is to consider groups of \(p\)-values within a single numeric vector. This is useful when the \(p\)-values to be combined are generated as part of a single analysis. The motivating example is that of adjacent windows in the same genomic region for ChIP-seq, but one can also imagine the same situation for other features, e.g., exons in RNA-seq. In this situation, we need a grouping factor to define the groups within our vector:
p <- rbeta(10000, 0.5, 1)
g <- sample(100, length(p), replace=TRUE)
gr.combined <- groupedSimes(p, g)
str(gr.combined)
## List of 3
## $ p.value : Named num [1:100] 0.005838 0.000657 0.00651 0.002942 0.003326 ...
## ..- attr(*, "names")= chr [1:100] "1" "2" "3" "4" ...
## $ representative: Named int [1:100] 2866 9589 7014 9593 2004 6392 6788 3595 9109 1681 ...
## ..- attr(*, "names")= chr [1:100] "1" "2" "3" "4" ...
## $ influential : logi [1:10000] FALSE FALSE FALSE FALSE FALSE FALSE ...
Both families of functions return the combined \(p\)-value along with some other helpful statistics.
The representative
vector specifies the index of the representative test for each group;
this is a test that is particularly important to the calculation of the combined \(p\)-value and can be used to, e.g., obtain a representative effect size.
Similarly, the influential
vector indicates which of the individual tests in each group are “influential” in computing the combined \(p\)-value.
This is defined as the subset of tests that, if their \(p\)-values were set to unity, would alter the combined \(p\)-value.
The following methods yield a significant result for groups where any of the individual tests are significant:
parallelSimes()
and groupedSimes()
) is robust to dependencies between tests and supports weighting of individual tests.parallelFisher()
and groupedFisher()
) assumes that tests are independent and does not support weighting.parallelPearson()
and groupedPearson()
) assumes that tests are independent and does not support weighting.parallelStouffer()
and groupedStouffer()
) assumes that tests are independent and does support weighting.Fisher’s method is most sensitive to the smallest \(p\)-value while Pearson’s method is most sensitive to the largest \(p\)-value. Stouffer’s method serves as a compromise between these two extremes. Simes’ method is more conservative than all three but is still functional in the presence of dependencies.
The following methods yield a significant result for groups where some of the individual tests are significant (i.e., a minimum number or proportion of individual tests reject the null hypothesis):
parallelWilkinson()
and groupedWilkinson()
) assumes that tests are independent and does not support weighting.parallelHolmMin()
and groupedHolmMin()
) does not require independence and supports weighting.Technically, Wilkinson’s method can reject if any of the individual nulls are significant. However, it has a dramatic increase in detection power when the specified minimum is attained, hence its inclusion in this category.
Finally, Berger’s intersection union test will yield a significant result for groups where all of the individual tests are significant. This does not require independence but does not support weighting.
The summarizeParallelDirection()
and summarizeGroupedDirection()
functions will summarize the direction of effect of all influential tests for each group.
For example, if all influential tests have positive log-fold changes, the group would be reported as having a direction of "up"
.
Alternatively, if influential tests have both positive and negative log-fold changes, the reported direction would be "mixed"
.
By only considering the influential tests, we avoid noise from “less significant” tests that do not contribute to the final \(p\)-value.
logfc1 <- rnorm(10000)
logfc2 <- rnorm(10000)
par.dir <- summarizeParallelDirection(list(logfc1, logfc2),
influential = par.combined$influential)
table(par.dir)
## par.dir
## down mixed up
## 4268 1497 4235
countParallelDirection()
and countGroupedDirection
will just count the number of significant tests changing in each direction.
This is done after correcting for the number of tests in each group with the Benjamini-Hochberg or Holm methods.
These counts are somewhat simpler to interpret than the summarized direction but have their own caveats - see ?countParallelDirection
for details.
par.dir2 <- countParallelDirection(list(p1, p2), list(logfc1, logfc2))
str(par.dir2)
## List of 2
## $ down: int [1:10000] 0 0 0 1 0 2 0 0 0 0 ...
## $ up : int [1:10000] 0 0 0 0 0 0 0 0 2 0 ...
Another approach is to just use the direction of effect of the representative test. This is usually the best approach if a single direction and effect size is required, e.g., for visualization purposes.
All functions support log.p=TRUE
to accept log-transformed \(p\)-values as input and to return log-transformed output.
This is helpful in situations where underflow would cause generation of zero \(p\)-values.
Several functions support weighting of the individual tests within each group. This is helpful if some of the tests are a priori known to be more important than others.
The combineParallelPValues()
and combineGroupedPValues()
wrapper functions will dispatch to the requested method
,
making it easier to switch between methods inside other packages:
gr.combined2 <- combineGroupedPValues(p, g, method="holm-min")
str(gr.combined2)
## List of 3
## $ p.value : Named num [1:100] 1 1 1 1 1 1 1 1 1 1 ...
## ..- attr(*, "names")= chr [1:100] "1" "2" "3" "4" ...
## $ representative: Named int [1:100] 1617 9880 3892 7357 3998 9914 9559 2543 2157 971 ...
## ..- attr(*, "names")= chr [1:100] "1" "2" "3" "4" ...
## $ influential : logi [1:10000] FALSE TRUE FALSE TRUE FALSE FALSE ...
sessionInfo()
## R version 4.3.2 Patched (2023-11-13 r85521)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.3 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.18-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] metapod_1.10.1 knitr_1.45 BiocStyle_2.30.0
##
## loaded via a namespace (and not attached):
## [1] digest_0.6.33 R6_2.5.1 bookdown_0.37
## [4] fastmap_1.1.1 xfun_0.41 cachem_1.0.8
## [7] htmltools_0.5.7 rmarkdown_2.25 lifecycle_1.0.4
## [10] cli_3.6.2 sass_0.4.8 jquerylib_0.1.4
## [13] compiler_4.3.2 tools_4.3.2 evaluate_0.23
## [16] bslib_0.6.1 Rcpp_1.0.11 yaml_2.3.8
## [19] BiocManager_1.30.22 jsonlite_1.8.8 rlang_1.1.2