An overview of topconfects

Paul Harrison

2019-05-02

TOP results by CONfident efFECT Size. Topconfects is an R package intended for RNA-seq or microarray Differntial Expression analysis and similar, where we are interested in placing confidence bounds on many effect sizes—one per gene—from few samples, and ranking genes by these confident effect sizes.

Topconfects builds on TREAT p-values offered by the limma and edgeR packages, or the “greaterAbs” test p-values offered by DESeq2. It tries a range of fold changes, and uses this to rank genes by effect size while maintaining a given FDR. This also produces confidence bounds on the fold changes, with adjustment for multiple testing.

A principled way to avoid using p-values as a proxy for effect size. The difference between a p-value of 1e-6 and 1e-9 has no practical meaning in terms of significance, however tiny p-values are often used as a proxy for effect size. This is a misuse, as they might simply reflect greater quality of evidence (for example RNA-seq average read count or microarray average spot intensity). It is better to reject a broader set of hypotheses, while maintaining a sensible significance level.
No need to guess the best fold change cutoff. TREAT requires a fold change cutoff to be specified. Topconfects instead asks you specify a False Discovery Rate appropriate to your purpose. You can then read down the resulting ranked list of genes as far as you wish. The “confect” value given in the last row that you use is the fold change cutoff required for TREAT to produce that set of genes at the given FDR.

The method is described in:

Harrison PF, Pattison AD, Powell DR, Beilharz TH. 2018. Topconfects: a package for confident effect sizes in differential expression analysis provides improved usability ranking genes of interest. bioRxiv. doi:10.1101/343145

If you want to find top confident differentially expressed genes

Use limma_confects, edger_confects, or deseq2_confects as an alternative final step in your limma, edgeR, or DESeq2 analysis. The limma method is currently much faster than other methods.

For examples, see the vignette “Confident fold change”.

If you have a collection of effect sizes with standard errors

If you have a collection of effect sizes of some sort, with associated standard errors, and possibly associated degrees of freedom, use normal_confects. Errors are assumed to be normally distributed, or t-distributed if degrees of freedom are given.

This is a re-implementation of limma’s TREAT method, which is then supplied to nest_confects (described next). (Alternatively, if the effect sizes are all positive, there is an option to use a one-sided t-test as the underlying hypothesis test.)

If you can calculate p-values for a collection of interval hypotheses

The core algorithm of topconfects is implemented in the function nest_confects. You may supply any function that can calculate p-values for the null hypothesis that an effect size is no more than a specified amount. Testing is performed for n items, and the function should be able to perform this calculation for a subset of these n items and a given amount.

Visualizing results

Use confects_plot to plot confident effect sizes of top genes. The estimated effect size (eg log fold change) is shown as a dot, and the confidence bound is shown as a line.

Use confects_plot_me to gain a global overview. Similar to an MD or MA plot, the x axis is average expression. The y axis is effect size. Estimated effect sizes are shown in grey and confident effect sizes in black (ie a gene with a non-NA confident effect size is shown with both a grey and black dot).

Use rank_rank_plot to compare two rankings.

For examples, see the vignette “Confident fold change”.