Contents

Package: bacon
Authors: Maarten van Iterson [aut, cre], Erik van Zwet [ctb]
Modified: Fri Mar 25 11:54:05 2016
Compiled: Mon Oct 17 20:13:48 2016

1 Introduction

bacon can be used to remove inflation and bias often observed in epigenome- and transcriptome-wide association studies(Iterson et al. 2016).

To this end bacon constructs an empirical null distribution using a Gibbs Sampling algorithm by fitting a three-component normal mixture on z-scores. One component is forced, using prior knowledge, to represent the null distribution with mean and standard deviation representing the bias and inflation. The other two components are necessary to capture the amount of true associations present in the data, which we assume unknown but small.

bacon provides functionality to inspect the output of the Gibbs Sampling algorithm, i.e., traces, posterior distributions and the mixture fit, are provided. Furthermore, inflation- and bias-corrected test-statistics or P-values are extracted easily. In addition, functionality for performing fixed-effect meta-analysis are provided.

The function bacon requires a vector or a matrix of z-scores, e.g., those extracted from association analyses using a linear regression approach. For fixed-effect meta-analysis a matrix of effect-sizes and standard errors is required.

The vignette illustrates the use of bacon using simulated z-scores, effect-sizes and standard errors to avoid long runtimes. If multiple sets of test-statisics or effect-sizes and standard errors are provided, the Gibbs Sampler Algorithm can be executed on multiple nodes to reduce computation time using functionality provide by BiocParallel-package.

2 One set of test-statistics

A vector containing 2000 z-scores is generated from a normal mixture distribution 90% of the z-scores were drawn from $$\mathcal{N}(0,1)$$ and the remaining z-scores from $$\mathcal{N}(\mu,1)$$, where $$\mu \sim \mathcal{N}(4,1)$$.

The function bacon executes the Gibbs Sampler Algorithm and stores all input and output in a Bacon-object. Several accessor-functions are available to access data contained in the Bacon-object.

set.seed(12345)
y <- rnormmix(2000, c(0.9, 0, 1, 0, 4, 1))
bc <- bacon(y)
bc
## Bacon-object containing 1 set(s) of 2000 test-statistics.
## ...estimated bias: -0.0086.
## ...estimated inflation: 0.97.
##
## Emprical null estimates are based on 5000 iterations with a burnin-period of 2000.
estimates(bc)
##            p.0        p.1        p.2         mu.0     mu.1      mu.2
## [1,] 0.9085228 0.04003456 0.05144263 -0.008585455 3.022833 -3.034213
##        sigma.0  sigma.1  sigma.2
## [1,] 0.9729677 2.438307 2.789957
inflation(bc)
##   sigma.0
## 0.9729677
bias(bc)
##         mu.0
## -0.008585455

Several methods are provided to inspect the output of the Gibbs Sampler Algorithm, such as traces of all estimates, the posterior distributions, provide as a scatter plot between two parameters, the actual fit of the three component mixture to the histogram of z-scores.

traces(bc, burnin=FALSE)

posteriors(bc)

fit(bc, n=100)

There is also a generic plot function that can generate two types of plots. A histogram of the z-scores with on top the standard normal distribution and the bacon estimated empirical null distribution. Or a quantile-quantile plot of the $$-log_{10}$$ transformed P-values.

plot(bc, type="hist")
## stat_bin() using bins = 30. Pick better value with binwidth.
## Warning: Removed 40 rows containing non-finite values (stat_bin).
## Warning: Removed 1 rows containing missing values (geom_bar).
## Warning: Removed 40 rows containing missing values (geom_path).

## Warning: Removed 40 rows containing missing values (geom_path).

plot(bc, type="qq")

3 Multiple sets of test-statistics

Matrices containing $$2000\times6$$ effect-sizes and standard errors are generated to simulated data for a fixed-effect meta-analyses.

By default the function bacon detects the number of cores/nodes registered, as described in the BiocParallel, to perform bacon in parallel. To run the vignette in general we set it here for convenience to 1 node.

set.seed(12345)
es <- replicate(6, rnormmix(2000, c(0.9, 0, 1, 0, 4, 1)))
se <- replicate(6, 0.8*sqrt(4/rchisq(2000,df=4)))
colnames(es) <- colnames(se) <- LETTERS[1:ncol(se)]
rownames(es) <- rownames(se) <- 1:2000
head(rownames(es))
## [1] "1" "2" "3" "4" "5" "6"
head(colnames(es))
## [1] "A" "B" "C" "D" "E" "F"
library(BiocParallel)
register(MulticoreParam(1, log=TRUE))
bc <- bacon(NULL, es, se)
## Did you registered a biocparallel back-end?
##  Continuing serial!
bc
## Bacon-object containing 6 set(s) of 2000 test-statistics.
## ...estimated bias: -0.0072,-0.02,0.00073,0.027,-0.017,-0.012.
## ...estimated inflation: 1.1,1.1,1.1,1.1,1.1,1.
##
## Emprical null estimates are based on 5000 iterations with a burnin-period of 2000.
estimates(bc)
##         p.0        p.1        p.2          mu.0     mu.1      mu.2
## A 0.8798367 0.04596230 0.07420096 -0.0071923032 2.798940 -2.721537
## B 0.8748018 0.03514827 0.09004994 -0.0195838309 2.950528 -2.346152
## C 0.8764848 0.05558323 0.06793199  0.0007300818 2.945500 -2.924680
## D 0.8881519 0.08040985 0.03143825  0.0265941513 1.782361 -2.994063
## E 0.8626044 0.06886578 0.06852984 -0.0165292415 2.726810 -2.704635
## F 0.8556689 0.08026559 0.06406549 -0.0119121793 2.732789 -2.770299
##    sigma.0  sigma.1   sigma.2
## A 1.112915 3.314581 4.3679065
## B 1.086432 1.678806 4.6548036
## C 1.071829 3.214286 3.1438970
## D 1.123699 4.807509 0.7085753
## E 1.062347 3.584511 3.4328507
## F 1.037343 3.763897 3.2223009
inflation(bc)
##        A        B        C        D        E        F
## 1.112915 1.086432 1.071829 1.123699 1.062347 1.037343
bias(bc)
##             A             B             C             D             E
## -0.0071923032 -0.0195838309  0.0007300818  0.0265941513 -0.0165292415
##             F
## -0.0119121793
head(tstat(bc))
##            A          B           C           D           E           F
## 1  1.6655396 -1.0920842 -1.27039727 -3.54419344  0.22730396 -0.04754207
## 2 -0.9763604 -1.3866084  0.26764794  0.25283954  0.32951747  0.35145701
## 3 -2.3990357 -3.6003210  0.06117898 -0.88181612 -0.14455838  0.05857065
## 4 -0.6691733 -0.5607974  0.53520168  0.03226887 -0.43766649  1.02497293
## 5  0.2316802  1.2621708 -0.90449418  1.01593285 -0.03631256 -0.08534002
## 6  0.9341088 -2.1999929  0.98707127  1.06925040  0.36631468  1.86003051
head(pval(bc))
##            A            B         C            D         E          F
## 1 0.09580515 0.2747960882 0.2039432 0.0003938162 0.8201874 0.96208121
## 2 0.32888589 0.1655612093 0.7889703 0.8003922129 0.7417646 0.72524552
## 3 0.01643831 0.0003178246 0.9512167 0.3778762551 0.8850596 0.95329409
## 4 0.50338494 0.5749356410 0.5925104 0.9742576369 0.6616281 0.30537596
## 5 0.81678638 0.2068873436 0.3657334 0.3096613667 0.9710331 0.93199108
## 6 0.35024777 0.0278074002 0.3236077 0.2849568533 0.7141303 0.06288121
head(se(bc))
##           A         B         C         D         E         F
## 1 0.9405472 1.3831921 1.2506331 0.8857125 0.9575235 0.5686023
## 2 0.5689726 0.6145793 0.9507878 2.2527801 1.3183756 1.3418700
## 3 2.3020580 0.5050518 0.9296272 1.2309362 0.7592934 0.8475108
## 4 0.6104344 0.7547990 1.2170681 1.4928418 1.2937036 0.5585942
## 5 0.7090617 0.6803576 0.6539196 1.3638215 0.9466390 0.7315781
## 6 0.6987288 0.6218310 0.8613305 1.9490522 1.0951570 0.6209050
head(es(bc))
##            A          B           C           D           E           F
## 1  1.5665186 -1.5105622 -1.58880094 -3.13913641  0.21764888 -0.02703253
## 2 -0.5555224 -0.8521808  0.25447639  0.56959188  0.43442779  0.47160962
## 3 -5.5227195 -1.8183488  0.05687364 -1.08545937 -0.10976221  0.04963926
## 4 -0.4084864 -0.4232893  0.65137691  0.04817231 -0.56621074  0.57254397
## 5  0.1642756  0.8587275 -0.59146649  1.38555106 -0.03437488 -0.06243289
## 6  0.6526887 -1.3680237  0.85019464  2.08402484  0.40117207  1.15490232

The accessor-function return as expected matrices of estimates. For the plotting functions an additional index of the ith study or z-score is required.

traces(bc, burnin=FALSE, index=3)

posteriors(bc, index=3)

fit(bc, index=3, n=100)

plot(bc, type="hist")
## stat_bin() using bins = 30. Pick better value with binwidth.
## Warning: Removed 240 rows containing non-finite values (stat_bin).
## Warning: Removed 6 rows containing missing values (geom_bar).
## Warning: Removed 43 rows containing missing values (geom_path).

## Warning: Removed 43 rows containing missing values (geom_path).

plot(bc, type="qq")

4 Fixed-effect meta-analysis

The following code chunk shows how to perform fixed-effect meta-analysis and the inspection of results.

bcm <- meta(bc)
head(pval(bcm))
##            A            B         C            D         E          F
## 1 0.09580515 0.2747960882 0.2039432 0.0003938162 0.8201874 0.96208121
## 2 0.32888589 0.1655612093 0.7889703 0.8003922129 0.7417646 0.72524552
## 3 0.01643831 0.0003178246 0.9512167 0.3778762551 0.8850596 0.95329409
## 4 0.50338494 0.5749356410 0.5925104 0.9742576369 0.6616281 0.30537596
## 5 0.81678638 0.2068873436 0.3657334 0.3096613667 0.9710331 0.93199108
## 6 0.35024777 0.0278074002 0.3236077 0.2849568533 0.7141303 0.06288121
##         meta
## 1 0.16317526
## 2 0.28334439
## 3 0.00295737
## 4 0.96904081
## 5 0.66073276
## 6 0.36393828
plot(bcm, type="qq")

(topTable(bcm))
##      eff.size.meta std.err.meta pval.adj.meta pval.org.meta tstat.meta
## 348      -5.359910    0.2956297  3.662746e-70  1.831373e-73 -18.130485
## 846       3.443643    0.2706334  8.657539e-34  4.328770e-37  12.724384
## 416      -4.076889    0.3214932  1.505245e-33  7.526223e-37 -12.681106
## 1512     -3.237878    0.2701565  8.497234e-30  4.248617e-33 -11.985191
## 251      -4.034903    0.3604693  8.777951e-26  4.388976e-29 -11.193473
## 298      -3.050613    0.2883957  7.548352e-23  3.774176e-26 -10.577876
## 1691     -3.187339    0.3267872  3.562149e-19  1.781074e-22  -9.753562
## 1890     -3.154267    0.3275223  1.186562e-18  5.932810e-22  -9.630693
## 1471      2.877210    0.3014186  2.707361e-18  1.353680e-21   9.545565
## 1382      3.395700    0.3585879  5.615946e-18  2.807973e-21   9.469645
##       eff.size.A std.err.A        pval.A     tstat.A   eff.size.B
## 348  -11.0819895 0.4273486 2.904884e-148 -25.9319668  0.009827814
## 846    5.0360519 0.3662658  5.112747e-43  13.7497194 -0.279218974
## 416   -1.4893601 0.8176432  6.852660e-02  -1.8215282 -0.701289287
## 1512  -7.2342777 0.4196684  1.375239e-66 -17.2380822  0.328109828
## 251   -0.4097908 0.9190774  6.556898e-01  -0.4458719  1.415715375
## 298    0.2144601 0.8010804  7.889198e-01   0.2677136 -6.838817043
## 1691  -9.5485255 0.5661299  7.960802e-64 -16.8663168 -0.665607810
## 1890   0.5355212 0.6390141  4.020066e-01   0.8380429 -2.612984158
## 1471   0.9195156 1.1131974  4.087967e-01   0.8260131  1.406687416
## 1382   0.9907901 1.0123812  3.277416e-01   0.9786729 -1.187498360
##      std.err.B       pval.B      tstat.B  eff.size.C std.err.C
## 348  0.6754494 9.883912e-01   0.01455004 -0.09318322 1.1269342
## 846  0.6867667 6.843236e-01  -0.40657035  7.82688732 0.9516473
## 416  0.6596779 2.877464e-01  -1.06307833 -1.49413438 1.1755563
## 1512 1.2327725 7.901190e-01   0.26615603 -0.43447332 1.7572107
## 251  1.1667886 2.249985e-01   1.21334349 -8.67406883 0.5214465
## 298  0.4993113 1.065372e-42 -13.69650002 -1.21327952 1.4545906
## 1691 1.0759732 5.361733e-01  -0.61861000  0.34180012 1.0144727
## 1890 1.4875895 7.899926e-02  -1.75652233 -0.64663322 0.8153950
## 1471 0.6672896 3.502567e-02   2.10806149 -0.11834295 0.8780139
## 1382 0.6744892 7.830797e-02  -1.76058916 -0.65633019 1.3486064
##            pval.C      tstat.C  eff.size.D std.err.D    pval.D     tstat.D
## 348  9.341001e-01  -0.08268736  0.56495604 0.8650103 0.5136785  0.65312060
## 846  1.958967e-16   8.22456732  0.00945921 0.9002275 0.9916163  0.01050758
## 416  2.037280e-01  -1.27100194 -1.34595157 1.0146027 0.1846477 -1.32657992
## 1512 8.047134e-01  -0.24725169 -0.43736440 0.5269464 0.4065400 -0.82999787
## 251  3.911763e-62 -16.63462822  0.32427895 0.9080442 0.7210035  0.35711801
## 298  4.042225e-01  -0.83410377  0.52299443 0.9310383 0.5742983  0.56173244
## 1691 7.361742e-01   0.33692394  0.40238817 1.0391597 0.6985900  0.38722457
## 1890 4.277600e-01  -0.79303064 -0.19677363 0.9246571 0.8314774 -0.21280714
## 1471 8.927820e-01  -0.13478483 -0.22184170 0.7867669 0.7779694 -0.28196623
## 1382 6.264901e-01  -0.48667291 -0.34737853 1.2235018 0.7764705 -0.28392154
##      eff.size.E std.err.E       pval.E     tstat.E eff.size.F std.err.F
## 348  -1.6784879 0.9295754 7.097300e-02  -1.8056502  1.1997208 1.4236445
## 846   2.1639119 1.2950915 9.475005e-02   1.6708564  0.1088002 0.9347817
## 416  -9.1035873 0.5226627 6.055326e-68 -17.4177094  0.2064136 1.3791910
## 1512  0.3692394 0.9728864 7.042944e-01   0.3795299 -0.9138793 0.6476757
## 251   0.8105348 1.3559705 5.500051e-01   0.5977525 -1.5748380 1.7070150
## 298  -2.2284019 1.2658880 7.834902e-02  -1.7603468 -1.9556960 0.4877381
## 1691 -0.5049402 0.5972012 3.978256e-01  -0.8455110  1.8143550 1.2138166
## 1890 -1.6419829 1.3027522 2.075268e-01  -1.2603954 -8.6640267 0.5626792
## 1471  7.0621512 0.5213878 8.491713e-42  13.5449107  1.6884523 0.8566220
## 1382 -1.1279886 2.1709814 6.033595e-01  -0.5195754  9.1027282 0.5613806
##            pval.F     tstat.F
## 348  3.993901e-01   0.8427110
## 846  9.073427e-01   0.1163910
## 416  8.810307e-01   0.1496628
## 1512 1.582406e-01  -1.4110135
## 251  3.562322e-01  -0.9225684
## 298  6.078940e-05  -4.0097254
## 1691 1.349791e-01   1.4947521
## 1890 1.693113e-53 -15.3978083
## 1471 4.871714e-02   1.9710589
## 1382 3.957530e-59  16.2148951

5 Session Info

## R version 3.3.1 (2016-06-21)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.1 LTS
##
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base
##
## other attached packages:
## [1] bacon_1.2.0        ellipse_0.3-8      BiocParallel_1.8.0
## [4] ggplot2_2.1.0      BiocStyle_2.2.0
##
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.7      digest_0.6.10    assertthat_0.1   plyr_1.8.4
##  [5] grid_3.3.1       gtable_0.2.0     formatR_1.4      magrittr_1.5
##  [9] scales_0.4.0     evaluate_0.10    stringi_1.1.2    rmarkdown_1.1
## [13] labeling_0.3     tools_3.3.1      stringr_1.1.0    munsell_0.4.3
## [17] parallel_3.3.1   yaml_2.1.13      colorspace_1.2-7 htmltools_0.3.5
## [21] knitr_1.14       tibble_1.2

References

Iterson, Maarten M. van, Erik W. van ZwetP. Eline Slagboom, and Bastiaan T. Heijmans. 2016. “Controlling Bias and Inflation in Epigenome- and Transcriptome-Wide Association Studies Using the Empirical Null Distribution.” BioRxiv. Cold Spring Harbor Labs Journals. doi:10.1101/055772.