1 Introduction to ALDEx2

This provides an overview of the R package ALDEx version 2 (ALDEx2) for differential (relative) abundance analysis of high throughput sequencing count compositional data1 all high throughput sequencing data are compositional (Gloor et al. 2017; Gloor 2023) because of constraints imposed by the instruments. However, Nixon et al.2 not accounting for scale leads to both false positive and false negative inference and is also a principled method of accounting for asymmetry in the underlying environment Nixon et al. (2023) have found that the scale of the data can be modelled and ALDEx2 now includes scale modelling by default.

The package was developed and used initially for multiple-organism RNA-Seq data generated by high-throughput sequencing platforms (meta-RNA-Seq)3 Macklaim et al. (2013), but testing showed that it performed very well with traditional RNA-Seq datasets4 Quinn, Crowley, and Richardson (2018), 16S rRNA gene variable region sequencing5 Bian et al. (n.d.) and selective growth-type (SELEX) experiments6 McMurrough et al. (2014);Wolfs et al. (2016), and that the underlying principles generalize to single-cell sequencing7 Gloor (2023). The analysis method should be applicable to any type of data that is generated by high-throughput8 Fernandes et al. (2014).

Most recently, Nixon et al.9 Nixon et al. (2023) showed that the use of the CLR10 and all other normalizations which are based on ratios — which is to say all of them made strong assumptions about the scale of the underlying system. ALDEx2 now includes methods to include scale uncertainty and so incorporate both compositional and scale uncertainty together in the analysis. See the scaleSim vignette for more information and a complete walk-through.

2 Installation

There are two ways to download and install the most current of ALDEx2. The most recent version of the package will be found at github.com/ggloor/ALDEx_bioc. The package will run with only the base R packages, the standard tidyverse packages and a minimal Bioconductor install, and is capable of running several functions with the parallel package if installed. It is recommended that the package be run on the most up-to-date R and Bioconductor versions.

  • Install stable version from Bioconductor
if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("ALDEx2")

Here all dependencies should be installed automatically.

  • Install development version from github:
install.packages("devtools")
devtools::install_github("ggloor/ALDEx_bioc")

In this case, you may need to install additional packages manually.

## |------------(25%)----------(50%)----------(75%)----------|

3 What ALDEx2 does differently

ALDEx2 is a different tool than others used for differential abundance testing, but that does not mean it is complicated. Instead, it is incredibly simple once you understand four concepts.

  • First, the data you collect is a single point-estimate of the data. That means, if you sequenced the same samples, or even the same library, you would get a slightly different answer. We know this because there are a number of early technical replication papers that show exactly this11 Marioni et al. (2008).

  • Second, that the data collected are probabilistic; that the count data is a representation of the probability of sampling the gene fragment from the mixture of gene fragments in the library. Many other tools assume the data are counts from a multivariate distribution modelled by a Negative Binomial distribution.

  • Third, that we can pretty accurately estimate this probabilisitic technical variation. For this, we assume the data collected are from a multivariate Poisson distribution constrained to have the total number of counts that the instrument can deliver12 Fernandes et al. (2014);Gregory B Gloor et al. (2016). This can be estimated by sampling from the appropriate distribution, in this case a multinomial Dirichlet distribution.

  • Fourth, that the use of the CLR (or any other normalization) makes strong assumptions about the scale of the sampled environment. Nixon et al.13 Nixon et al. (2023) demonstrated how to incorporate scale uncertainty so that the effect of scale on the analysis can be modelled and accounted for.

More formally, the ALDEx2 package estimates per-feature technical variation within each sample using Monte-Carlo instances drawn from the Dirichlet distribution. Sampling from this distribution returns a posterior probability distribution of the observed data under a repeated sampling model. This distribution is converted to a log-ratio that linearizes the differences between features. This log-ratio can be modified to include uncertainty regarding the scale of the data when the log-ratio is calculated. Thus, the values returned by ALDEx2 are posterior estimates of test statistics calculated on log-ratio transformed distributions. ALDEx2 has been altered to ensure that the elements of the posterior statistical tests agree on their sign with the outcome that some edge cases with low predicted p-values are no longer counted as significant at high false discovery rates. Combining two-sample tests into a posterior estimate is somewhat intricate14 Van Zwet and Oosterhoff (1967), but in practice the outcomes should be very similar to those seen before, but edge cases with expected false discovery rates greater than 0.05 have become more rigorous.

ALDEx2 uses by default the centred log-ratio (clr) transformation which is scale invariant15 Aitchison (1986). While this is a convenient assumption for many cases, this can lead to Type 1 and Type 2 errors16 Nixon et al. (2023) and we now suggest that modest amounts of scale be built into the analysis by default.

The sub-compositional coherence property ensures that the answers obtained are consistent when parts of the dataset are removed (e.g., removal of rRNA reads from RNA-seq studies or rare OTU species from 16S rRNA gene amplicon studies). In the absence of scale, all feature abundance values are expressed relative to the geometric mean abundance of other features in a sample. This is conceptually similar to a quantitative PCR where abundances are expressed relative to a standard: in the case of the clr transformation, the standard is the per-sample geometric mean abundance. See Aitchison (1986) for a complete description. If the question at hand is relative change, then ALDEx2 can do this. If the question at hand is scale dependent, then ALDEx2 can do that too!

4 Quick Start: aldex with 2 groups:

The ALDEx2 package in Bioconductor is suitable for the comparison of many different experimental designs.

The test dataset is a single gene library with 1600 sequence variants17 McMurrough et al. (2014) cloned into an expression vector at approximately equimolar amounts. The wild-type version of the gene conferred resistance to a topoisomerase toxin. Seven independent growths of the gene library were conducted under selective and non-selective conditions and the resulting abundances of each variant was read out by sequencing a pooled, barcoded library on an Illumina MiSeq. The data table is accessed by data(selex). In this data table, there are 1600 features and 14 samples. The analysis takes approximately 2 minutes and memory usage tops out at less than 1Gb of RAM on a mobile i7 class processor when we use 128 Dirichlet Monte-Carlo Instances (DMC). On an M2 Pro chip, this takes approximately 6 seconds. For speed concerns we use only the last 400 features and perform only 16 DMC. The command used for ALDEx2 is presented below:

library(ALDEx2)
#subset only the last 400 features for efficiency
data(selex)
selex.sub <- selex[1200:1600,]
conds <- c(rep("NS", 7), rep("S", 7))
x.aldex <- aldex(selex.sub, conds, mc.samples=16, test="t", effect=TRUE, 
  include.sample.summary=FALSE, denom="all", verbose=FALSE, paired.test=FALSE, gamma=NULL)
# note default is FDR of 0.05
par(mfrow=c(1,3))
aldex.plot(x.aldex, type="MA", test="welch", xlab="Log-ratio abundance",
    ylab="Difference", main='Bland-Altman plot')
aldex.plot(x.aldex, type="MW", test="welch", xlab="Dispersion",
    ylab="Difference", main='Effect plot')
aldex.plot(x.aldex, type="volcano", test="welch", xlab="Difference",
    ylab="-1(log10(q))", main='Volcano plot') 
MA, Effect and Volcano plots of ALDEx2 output. The left panel is an Bland-Altman or MA plot that shows the relationship between (relative) Abundance and Difference. The middle panel is an effect plot that shows the relationship between Difference and Dispersion; the lines are equal difference and dispersion. The right hand plot is a volcano plot; the lines represent a posterior predictive p-value of 0.001 and 1.5 fold difference. In all plots features that are not significant are in grey or black. Features that are statistically significant are in red. The log-ratio abundance axis is the clr value for the feature.

Figure 1: MA, Effect and Volcano plots of ALDEx2 output
The left panel is an Bland-Altman or MA plot that shows the relationship between (relative) Abundance and Difference. The middle panel is an effect plot that shows the relationship between Difference and Dispersion; the lines are equal difference and dispersion. The right hand plot is a volcano plot; the lines represent a posterior predictive p-value of 0.001 and 1.5 fold difference. In all plots features that are not significant are in grey or black. Features that are statistically significant are in red. The log-ratio abundance axis is the clr value for the feature.

Figure 1 shows the result of a two sample t-test and effect size calculation using aldex(). We set the test to ‘t’, and effect to TRUE. The ‘t’ option evaluates the data as a two-factor experiment using both the Welch’s t and the Wilcoxon rank tests. The t-test includes an expected value of the false discovery rate correction. The data can be plotted onto Bland-Altmann18 Altman and Bland (1983) (MA) or effect (MW)19 Gregory B Gloor, Macklaim, and Fernandes (2016) or volcano20 Cui and Churchill (2003) plots for for two-way tests using the aldex.plot() function. See the end of the modular section for examples of the plots.

The simple approach outlined above just calls aldex.clr, aldex.ttest, aldex.effect in turn and then merges the data into one dataframe called x.all. We will show these modules in turn, and then examine additional modules.

There is a lot more information that can be obtained than with these three plots. One of the most powerful aspect of ALDEx2 is that everything is calculated on posterior distributions. The underlying posterior distribution can be viewed if you use the individual modules explained next.

5 Modular ALDEx2 is way more informative

For many users, the aldex function may be sufficient. However, for power users there is a lot of value to the modular approach. There are several built-in tools that allow a much more powerful exploration of the data. In addition, all the intermediate values are exposed for the underlying entre log-ratio transformed Dirichlet Monte-Carlo replicate values. This makes it possible for anyone to add the specific R code for their experimental design — a guide to these values is outlined below.

We will use the same example and generate the same plots but include more information that can be useful. First we load the library and the included selex dataset. Then we set the comparison groups. This must be a vector of conditions in the same order as the samples in the input counts table. The aldex command is calling several other functions in the background, and each of them returns diagnostics. These diagnostics are suppressed for this vignette. The code block below shows the full set of command, but are not run. Each is run in an individual code block

data(selex)
#subset only the last 400 features for efficiency
selex.sub <- selex[1200:1600,]
conds <- c(rep("NS", 7), rep("S", 7))
x <- aldex.clr(selex.sub, conds, mc.samples=16, denom="all", verbose=F, gamma=1e-3)
x.tt <- aldex.ttest(x, hist.plot=F, paired.test=FALSE, verbose=FALSE,)
x.effect <- aldex.effect(x, CI=T, verbose=F, include.sample.summary=F, 
  paired.test=FALSE, glm.conds=NULL, useMC=F)
x.all <- data.frame(x.tt,x.effect)
par(mfrow=c(1,2))
aldex.plot(x.all, type="MA", test="welch")
aldex.plot(x.all, type="MW", test="welch")

5.1 The aldex.clr module

The workflow for the modular approach first generates random instances of the centred log-ratio transformed values. There are three inputs: counts table, a vector of conditions, and the number of Monte-Carlo (DMC) instances; and two main parameters: the level of verbosity (TRUE or FALSE) and the scale parameter. We recommend 128 or more mc.samples for the t-test, 1000 for a rigorous effect size calculation, and at least 16 for ANOVA.21 in fact we recommend that the number of samples in the smallest group multiplied by the number of DMC be equal at least 1000 in order to generate a reasonably stable estimate of the posterior distribution Additionally, while in this example, gamma is set to 1e-3 (i.e., almost nothing) we recommend including gamma in the range of 0.25 to 1 for most datasets. See below, and the the scale_sim vignette for more detail.

This operation is fast.

x <- aldex.clr(selex.sub, conds, mc.samples=16, verbose=F, gamma=1e-3)

The output in x contains a lot of information, and you should see the ALDEx2 R documentation for a complete explanation. The impatient can see the commands to access all the available data with methods(class='aldex.clr') and the individual methods can be accessed via help(getFeatures) substituting getFeatures for the appropriate method name.

5.2 The aldex.ttest module

The next operation performs the Welch’s t and Wilcoxon rank test for the situation when there are only two conditions. There is only one input: the aldex object from aldex.clr. Several parameters modify this test: whether or not to perform a paired test paired-test; whether or not to display the histogram of p-values for the first MC instance hist.plot; whether or not to give verbose progress verbose. The significance tests are posterior p-values and are calculated as one-tailed tests with the test statistic doubled to approximate the two-tailed test22 Van Zwet and Oosterhoff (1967). This gets rid of a small number of false positive p-values that arose because of the sign of the test statistic being unstable in the underlying posterior distribution.

This operation is reasonably fast thanks to Thom Quinn and Michelle Nixon.

x.tt <- aldex.ttest(x, hist.plot=F, paired.test=FALSE, verbose=FALSE)

5.3 The aldex.effect module

We estimate and report effect sizes based on the within and between condition values in the case of two conditions. This step is required for plotting; in our lab we base our conclusions primarily on the output of this function23 Macklaim et al. (2013);McMurrough et al. (2014);Bian et al. (n.d.). There is one input: the aldex object from aldex.clr; and several useful parameters. It is possible to include the 95% confidence interval information for the effect size estimate with the boolean flag CI=TRUE. This can be helpful when deciding whether to include or exclude specific features from consideration. We find that a large effect that is driven by an outlier can be because of false positives. New is the option to calculate effect sizes for paired t-tests or repeated samples with the boolean paired.test=TRUE option. In this case the difference between and difference within values are calculated as the median paired difference and the median absolute deviation of that difference. The standardized effect size is their ratio and is usually slightly larger than the unpaired effect size. The supplied selex dataset is actually a paired dataset and can be used to explore this option. Other parameters are include.sample.summary which gives the expected clr values for each feature; useMC for multicore-testing shows this is usually slower than single core; glm.conds when calculating effects for the glm contrasts; and finally verbose if you want to more closely follow progress.

The function has been heavily optimized for speed and includes a lot of vectorization under the hood.

x.effect <- aldex.effect(x, CI=T, verbose=F, include.sample.summary=F, 
  paired.test=FALSE)

5.4 The aldex.plot module

We merge the data into one object for plotting. We see that the plotted data in Figure 1 and 2 are essentially the same.

Note that plotting for the aldex.glm function is different and is shown later.

# merge into one output for convenience
x.all <- data.frame(x.tt,x.effect)

par(mfrow=c(1,3))
aldex.plot(x.all, type="MA", test="welch", main='MA plot')
aldex.plot(x.all, type="MW", test="welch", main='effect plot')
aldex.plot(x.all, type="volcano", test="welch", main='volcano plot')
Output from aldex.plot function. The left panel is the MA plot, the right is the MW (effect) plot. In both plots red represents features called as differentially abundant with q $<0.05$; grey are abundant, but not differentially abundant; black are rare, but not differentially abundant. This function uses the combined output from the aldex.ttest and aldex.effect functions above

Figure 2: Output from aldex.plot function
The left panel is the MA plot, the right is the MW (effect) plot. In both plots red represents features called as differentially abundant with q \(<0.05\); grey are abundant, but not differentially abundant; black are rare, but not differentially abundant. This function uses the combined output from the aldex.ttest and aldex.effect functions above

The aldex.plot() function gives you a lot of control. Figure 2 shows the three plot types; effect (type=‘MW’), MA or Bland-Altmann (type=‘MA’) and volcano (type=‘volcano). They are nearly identical to those in 1, differences are due to the random Monte-Carlo sampling. You can choose to use statistical tests or only effect sizes (type=’welch’ or ‘effect’). You can control the title, axis labels and scales as well. See ?aldex.plot for all this can do!

ALDEx2 also allows you to plot and explore the underlying posterior distributions for each part. This is with the aldex.plotFeature() function and the output is shown in Figure 3. This is a great way to get intuition around the underlying distributions. We see that the posterior distributions are skewed, that there is considerable uncertainty in the difference and effect sizes, and that the tails are very long.

# merge into one output for convenience
# we start with the ouput from aldex.clr
# we provide the name of the row we wish to explore

aldex.plotFeature(x, 'S:E:G:D')
Output from aldex.plotFeature function. This plots the posterior distribution, the difference between and within distributions (dispersion) and the effect size distributions. The values reported are the medians of those distributions, but exploration shows that for all but the most separated parts, there is essentially always some overlap or uncertainty in the posteriors.

Figure 3: Output from aldex.plotFeature function
This plots the posterior distribution, the difference between and within distributions (dispersion) and the effect size distributions. The values reported are the medians of those distributions, but exploration shows that for all but the most separated parts, there is essentially always some overlap or uncertainty in the posteriors.

5.4.1 The effect confidence interval

As noted above, the ALDEx2 package generates a posterior distribution of the probability of observing the count given the data collected. Here we show the importance of this approach by examining the 95% CI of the effect size. Throughout, we use a standardized effect size, similar to the Cohen’s d metric, although our effect size is more robust and more conservative (being approximately 0.7 Cohen’s d when the data are Normally distributed)24 Fernandes, Gloor unpublished.

Examining the outputs of the effect, MA and the posterior distribution plots shows a very important point: there is a tremendous amount of latent variation in sequencing data . We observe that there are a few features that have an expected q value that is statistically significantly different, which are both relatively rare and which have a relatively small difference. Even when identifying features based on the expected effect size can be misleading. We find that the safest approach is to identify those features that where the 95% CI of the effect size does not cross 0.

Examining the figures we see that the features that are the rarest are least likely to reproduce the effect size with simple random sampling. The behaviour of the 95% CI metric is exactly in line with our intuition: the precision of estimation of rare features is poor—if you want more confidence in rare features you must spend more money to sequence more deeply. Figure 4 shows that using both an effect size (or if you wish a p-value cutoff) along with the effect CI cutoff can remove some features that are not truly separable between groups.

With this approach we are accepting the biological variation in the data as received; i.e. we are not inferring any additional biological variation: i.e., the experimental design is always as given. We are identifying those features where simple random sampling of the library would be expected to give the same result every time. This is the approach that was used in (Macklaim et al. 2013), the results of which were independently validated and found to be very robust (Nelson et al. 2015).

This approach requires that the CI=T parameter be set when calculating effect sizes.

Comparing the mean effect with the 95\% CI.  The left panel is MA plot, the right is the MW (effect) plot. In both plots red points indicate features that have an expected effect value of $>2$; those with a blue circle have the 95\% CI that does not overlap 0; and grey are features that are not of interest. Both the raw effect size and the CI test exclude different features from being different. This plot uses only the output of the aldex.effect function. The grey lines in the effect plot show the effect=2 isopleth.

Figure 4: Comparing the mean effect with the 95% CI
The left panel is MA plot, the right is the MW (effect) plot. In both plots red points indicate features that have an expected effect value of \(>2\); those with a blue circle have the 95% CI that does not overlap 0; and grey are features that are not of interest. Both the raw effect size and the CI test exclude different features from being different. This plot uses only the output of the aldex.effect function. The grey lines in the effect plot show the effect=2 isopleth.

5.5 Complex study designs and the aldex.glm module

The aldex.glm module is included so that the probabilistic compositional approach can be used for complex study designs. This module is substantially slower than the two-comparison tests above, but we think it is worth it if you have complex study designs.

Essentially, the approach is the modular approach above but using a model matrix and covariates supplied to the glm function in R. The values returned are the expected values of the glm function given the inputs. In the example below, we are measuring the predictive value of variables A and B independently. See the documentation for the R formula function, or http://faculty.chicagobooth.edu/richard.hahn/teaching/formulanotation.pdf for more information.

Validation of features that are differential under any of the variables identified by the aldex.glm function should be performed by plotting and for this the aldex.glm.effect function as a post-hoc test.

Note that aldex.clr will accept the denom="all" or a user-defined vector of denominator offsets when a model matrix is supplied. Therefore, when it is intended that the downstream analysis is the aldex.glm function only those two denominator options are available. This will be addressed in a future release, and will likely be deprecated because the aldex.clr and all downstream functions are fully compatible with the scale modelling which is both more efficient and more easily understood.

data(selex)
selex.sub <- selex[1200:1600, ]
covariates <- data.frame("A" = sample(0:1, 14, replace = TRUE),
   "B" = c(rep(0, 7), rep(1, 7)),
   "Z" = sample(c(1,2,3), 14, replace=TRUE))
mm <- model.matrix(~ A + Z + B, covariates)
x.glm <- aldex.clr(selex.sub, mm, mc.samples=4, denom="all", verbose=T)
glm.test <- aldex.glm(x.glm, mm, fdr.method='holm')
glm.eff<- aldex.glm.effect(x.glm)

The aldex.glm.effect function will calculate the effect size for all models in the matrix that are binary. The effect size calculations for each binary predictor are output to a named list. These effect sizes, and outputs from aldex.glm can be plotted and displayed as in Figure 5 using the example code below. The aldex.glm.plot function, which mirrors the adlex.plot function for glm and glm.effect outputs.

# NEW plot the glm.test and glm.eff data for particular contrasts
aldex.glm.plot(glm.test, eff=glm.eff, contrast='B', type='MW', test='fdr')
The glm output can be plotted with the aldex.glm.plot function. The values should be nearly identical to the t-test in the same dataset. Here we show an effect (MW) plot of the with significant expected p-values for variables output from the glm function with model B (actual study design groups) highlighted in red. This toy model shows how to use a generalized linear model within theALDEx2 framework. All values are the expected value of the test statistic.

Figure 5: The glm output can be plotted with the aldex.glm.plot function
The values should be nearly identical to the t-test in the same dataset. Here we show an effect (MW) plot of the with significant expected p-values for variables output from the glm function with model B (actual study design groups) highlighted in red. This toy model shows how to use a generalized linear model within theALDEx2 framework. All values are the expected value of the test statistic.

5.6 The aldex.kw module

Alternatively, the user can perform the Kruskal Wallace tests for one-way ANOVA of two or more conditions. Here there are only two inputs: the aldex object from aldex.clr, and the vector of conditions. Note that this is s.l.o.w! and the aldex.glm() function should be used instead. The user is on their own for plotting as this function should be viewed as unsupported.

x.kw <- aldex.kw(x)

6 Adding scale to ALDEx2

While the actual size or scale of the underlying data is lost during the process of sequencing, we can determine if any of the underlying results are dependent on scale. This theory behind this new functionality is outlined in Nixon et al.25 Nixon et al. (2023) and McGovern et al.26 McGovern, Nixon, and Silverman (2023). Essentially, we can think of the aldex.clr function being able to add both measurement error through Dirichlet Monte-Carlos sampling of the actual measurements and scale uncertainty through random sampling of the geometric mean of those measurements. Indeed, as shown below, we can infer the actual scale difference between groups. Scale uncertainty increase the robustness of the results, and can be used to properly centre the data. We recommend using scale instead of using the denominator adjustment approach given in Wu et al.27 Wu et al. (2021). The example below is only a small sample of the scale functionality, see the scale_sim vignette for full description.

We will demonstrate the ALDEx2 scale function in two ways; with essentially no scale (gamma=1e-3) and by making a full scale model. If necessary scale can be ignored completely by setting gamma=NULL, in which case previous default behavior is replicated exactly. The minimally scaled version is simply the outputs we have been examining until now of the in vitro selection dataset. Here we have a standard of truth, and a theoretical grounding that the difference in scale is real. We have a situation where only a minority of the parts change in both relative and absolute abundance. Prior analyses(McMurrough et al. 2014) suffered from a false positive bias because of this, and were not amenable to analysis with any other tool.

set.seed(11)
data(selex)
selex.sub <- selex[1201:1600,]
conds <- c(rep("NS", 7), rep("S", 7))
# unscaled
x.u <- aldex.clr(selex.sub, conds, mc.samples=16, 
    gamma=1e-3, verbose=FALSE)
x.u.e <- aldex.effect(x.u, CI=T)

# scaled
# this represents and 8-fold difference in scale between groups
mu.mod <- c(rep(1,7), rep(8,7))
scale.model <- aldex.makeScaleMatrix(gamma=0.5, mu=mu.mod, conditions=conds, mc.samples=16, log=FALSE)
x.ss <- aldex.clr(selex.sub, conds, mc.samples=16, 
    gamma=scale.model, verbose=FALSE)
x.ss.e <- aldex.effect(x.ss, CI=T)
x.ss.tt <- aldex.ttest(x.ss)
x.ss.all <- cbind(x.ss.e, x.ss.tt)

First, an explanation of the scale model that was made. The aldex.makeScaleMatrix() function makes a full scale model of the data that depends on the ratios between the scales of the two groups. In the case of the SELEX dataset, we are explicitly modelling an 8-fold difference in scale between the two datasets.

Now let’s look at the unscaled posterior distributions in Figure 6,

aldex.plotFeature(x.u, 'S:D:S:D', pooledOnly=T)
The unscaled posterior distribution plot

Figure 6: The unscaled posterior distribution plot

and compare this to Figure 7. The unscaled posteriors are very different, mainly because the control posterior is very narrow. The scaled posterior distributions also very different but are both more dispersed because of the additional uncertainty introduced by the scale model (note the difference in x-axis scales). In addition, the clr values are different because we are supplying the denominator values directly (log2(1) and log2(8) rather than calculating the geometric means of the samples. We stress here that the actual values of the denominator used is irrelevant, instead it is the ratio between the denominators of the two groups that is key.

aldex.plotFeature(x.ss, 'S:D:S:D', pooledOnly=T)
The scaled posterior distribution plot. Note the wider dispersion on the x axis

Figure 7: The scaled posterior distribution plot
Note the wider dispersion on the x axis

We can see the effect of scale if we plot the aldex.effect outputs from scaled and unscaled data. We see that the difference within groups is larger, the effect size is smaller but the difference between groups is unchanged when scale is added as shown in Figure 8.

par(mfrow=c(1,3))
plot(x.u.e$diff.btw, x.ss.e$diff.btw, main='diff.btw', 
 xlab='unscaled diff', ylab='scaled diff', pch=19, col='red')
abline(0,1)
plot(x.u.e$diff.win, x.ss.e$diff.win, main='diff.win', 
 xlab='unscaled dispersion', ylab='scaled dispersion', pch=19, col='red')
abline(0,1)
plot(x.u.e$effect, x.ss.e$effect, main='effect size', 
 xlab='unscaled effect', ylab='scaled effect', pch=19, col='red')
abline(0,1)
Comparing unscaled and scaled difference, dispersion and effect size.

Figure 8: Comparing unscaled and scaled difference, dispersion and effect size

We can also examine the effect plots, showing those parts where the 95% CI of the effect does not overlap 0 for the unscaled and scaled outputs as an example as in Figure 9. We see that adding scale uncertainty to the posterior model results in an increase in the dispersion of the data, but changes the difference between measurement very little.

In this dataset we have a standard of truth in an external growth assay for several of the parts displayed here. The scaled data is more similar to what we expect and the S:E:G:D part is the only one displayed on this plot that has appreciable growth in a single growth assay(McMurrough et al. 2014). Thus, adding scale uncertainty removes a large number of false positive identifications, without affecting the identification of the one true positive. Indeed, the true positive stands out more with scale than with no scale uncertainty added.

par(mfrow=c(1,3))
sgn.u <- sign(x.u.e$effect.low) == sign(x.u.e$effect.high)
sgn.s <- sign(x.ss.e$effect.low) == sign(x.ss.e$effect.high)

plot(x.u.e$diff.win, x.u.e$diff.btw, pch=19, cex=0.3,
   col="grey",xlab="Dispersion", ylab="Difference", 
  main="unscaled-CI", xlim=c(0.2,5), ylim=c(-2.5,10))
points(x.u.e$diff.win[abs(x.u.e$effect) >=2], x.u.e$diff.btw[abs(x.u.e$effect) >=2], pch=19, cex=0.5, 
   col="red")
points(x.u.e$diff.win[sgn.u], x.u.e$diff.btw[sgn.u], 
   cex=0.7, col="blue")
text(x.u.e['S:E:G:D', 'diff.win'], 
   x.u.e['S:E:G:D', 'diff.btw']-0.6, labels='S:E:G:D')
abline(0,2, lty=2, col="grey")
abline(0,-2, lty=2, col="grey")

plot(x.ss.e$diff.win, x.ss.e$diff.btw, pch=19, cex=0.3, col="grey",xlab="Dispersion", ylab="Difference", main="scaled-CI",
   xlim=c(0.2,5), ylim=c(-2.5,10))
points(x.ss.e$diff.win[abs(x.ss.e$effect) >=2], 
  x.ss.e$diff.btw[abs(x.ss.e$effect) >=2], pch=19, cex=0.5, col="red")
points(x.ss.e$diff.win[sgn.u], x.ss.e$diff.btw[sgn.u], cex=0.7, 
   col="blue")
text(x.ss.e['S:E:G:D', 'diff.win'], 
   x.ss.e['S:E:G:D', 'diff.btw']-0.6, labels='S:E:G:D')
abline(0,2, lty=2, col="grey")
abline(0,-2, lty=2, col="grey")

aldex.plot(x.ss.all, test='welch', main="scaled-BH", xlim=c(0.2,6))
abline(0,2, lty=2, col="grey")
abline(0,-2, lty=2, col="grey")
text(x.ss.all['S:E:G:D', 'diff.win'], 
   x.ss.all['S:E:G:D', 'diff.btw']-0.6, labels='S:E:G:D')
Compare the unscaled and scaled plots. Here we see that adding scale uncertainty increases the dispersion with little effect on the difference. As a result, we increase our confidence in the reproducibility of the S:E:G:D feature, which has been shown to have in vitro enzymatic activity. In the third plot, which shows features with a BH valued less than 0.05, the S:E:G:D feature is the clear outlier.

Figure 9: Compare the unscaled and scaled plots
Here we see that adding scale uncertainty increases the dispersion with little effect on the difference. As a result, we increase our confidence in the reproducibility of the S:E:G:D feature, which has been shown to have in vitro enzymatic activity. In the third plot, which shows features with a BH valued less than 0.05, the S:E:G:D feature is the clear outlier.

7 ALDEx2 outputs

7.1 Expected values

ALDEx2 returns expected values for summary statistics. It is important to note that ALDEx uses Bayesian sampling from a Dirichlet distribution to estimate the underlying technical variation. This is controlled by the number of mc.samples, in practice we find that setting this to 16 or 128 is sufficient for most cases as ALDEx2 is estimating the expected value of the distributions28 Fernandes et al. (2013);Fernandes et al. (2014);Gregory B Gloor et al. (2016).

In practical terms, ALDEx2 takes the biological observations as given, but infers technical variation (sequencing the same sample again) multiple times using the aldex.clr function. Thus, the expected values returned are those that would likely have been observed . The user is cautioned that the number of features called as differential will vary somewhat between runs because of this sampling procedure. However, only features with values close to the chosen significance cutoff will vary between runs. The addition of scale allows the model to include potential biological variation that is lost during the sequencing library and data acquisition process.

Several papers have suggested that ALDEx2 is unable to properly control for the false discovery rate since the p-values returned do not follow a random uniform distribution, but rather tend to cluster near a value of 0.529 Hawinkel et al. (2018);Thorsen et al. (2016). This is expected behaviour.

These studies also show that point estimate approaches are very sensitive to particular experimental designs and differences in sparsity and read depth. However, ALDEx2 is not sensitive to these characteristics of the data, but seem to under-report the true FDR. The criticisms misses the mark on ALDEx2 because ALDEx2 reports the p-value across the Dirichlet Monte-Carlo replicates. Features that are differential simply because of the vagaries of random sampling will indeed have a random uniform p-value as point estimates, but will have an expected p-value after repeated random sampling of 0.5. In contrast, features that are differential because of true biological variation are robust to repeated random sampling. Thus, ALDEx2 identifies as differential only those features where simple random sampling (the minimal NULL hypothesis) cannot explain the difference.

In our experience, we observe that ALDEx2 returns a set of features that is very similar to the set returned as the intersect of multiple independent tools—a common recommendation when examining HTS datasets30 Soneson and Delorenzi (2013)

7.2 Explaining the outputs

7.2.1 aldex.effect(x)

(rowname) rab.all rab.win.NS rab.win.S diff.btw diff.win effect overlap
A:D:A:D 1.42494 1.30886 2.45384 1.12261 1.72910 0.471043 0.267260701
A:D:A:E 1.71230 1.49767 4.23315 2.73090 2.38134 1.034873 0.135857781
A:E:A:D 3.97484 1.41163 11.02154 9.64287 2.85008 3.429068 0.000156632

7.2.2 aldex.ttest(x)

(rowname) we.ep we.eBH wi.ep wi.eBH
A:D:A:D 4.030e-01 0.63080 0.239383 0.43732
A:D:A:E 1.154e-01 0.34744 0.040901 0.15725
A:E:A:D 8.987e-05 0.00329 0.000582 0.00820

In the list below, the aldex.ttest function returns the values highlighted with \(\ast\) and the aldex.effect function returns the values highlighted with \(\diamondsuit\). Note that when scale is applied that the clr is only used with the default scale model, and the log-ratio for the full scale model is calculated with a user-defined denominator instead of the geometric mean of each sample. In practice, the ratio between the denominators is more important than the actual way the denominator is calculated.

  1. \(\diamondsuit\) rab.all - median clr value for all samples in the feature
  2. \(\diamondsuit\) rab.win.NS - median clr value for the NS group of samples
  3. \(\diamondsuit\) rab.win.S - median clr value for the S group of samples
  4. \(\diamondsuit\) rab.X1_BNS.q50 - median expression value of features in sample X1_BNS if include.item.summary=TRUE
  5. \(\diamondsuit\) dif.btw - median difference in clr values between S and NS groups
  6. \(\diamondsuit\) dif.win - median of the largest difference in clr values within S and NS groups
  7. \(\diamondsuit\) effect - median effect size: diff.btw / max(diff.win) for all instances
  8. \(\diamondsuit\) overlap - proportion of effect size distribution that overlaps 0 (i.e. no effect)
  9. \(\ast\) we.ep - Expected p-value of Welch’s t-test, a posterior predictive p-value
  10. \(\ast\) we.eBH - Expected Benjamini-Hochberg corrected p-value of Welch’s t test
  11. \(\ast\) wi.ep - Expected p-value of Wilcoxon rank test
  12. \(\ast\) wi.eBH - Expected Benjamini-Hochberg corrected p-value of Wilcoxon test

7.2.3 aldex.glm(x)

The output from the aldex.glm function is somewhat different, although it still returns the expected values of the test statistics.

For example, using the selex dataset and the model matrix from the aldex.glm help information:

conds <- data.frame("A" = sample(0:1, 14, replace = TRUE), 
      "B" = c(rep(0, 7), rep(1, 7)))
mm <- model.matrix(~ A + B, conds) 
x <- aldex.clr(selex, mm, mc.samples=4)
glm.test <- aldex.glm(x)
glm.eff <- aldex.glm.effect(x)

In this example the column names from head(glm.test):

  1. (rowname) - empty header, but rownames displayed
  2. Intercept::Est - the expected intercept term
  3. Intercept::SE - the expected intercept standard error
  4. Intercept::t.val - the expected intersept t statistic
  5. Intercept::pval - the expected intersept p-value
  6. A:Est - the expected A term
  7. A:SE - the expected A standard error
  8. A:t.val - the expected A term t statistic
  9. A:pval - the expected A term p-value term
  10. B:Est - the expected B term
  11. B:SE - the expected B standard error
  12. B:t.val - the expected B term t statistic
  13. B:pval - the expected B term p-value term
  14. Intercept::pval.holm - the expected intersept adjusted p-value
  15. A:pval.holm - the expected A term adjusted p-value
  16. B:pval.holm - the expected B term adjusted p-value

Note that we have generated both a random model A'' and the correct modelB’’. We anticipate that model A will give no statistically significant outputs, and that B will give similar results to a t-test, and indeed this is the case.

The outputs are expected values of the intercept and its coefficients, and the estimates for each model and their coefficients. Also calculated is the Family Wide Error Rate (FWER) for each p-value (by default reported in the model.x Pr(>|t|).holm output. This is a Holm-Bonferroni FWER correction that is more powerful than the Bonferroni correction, and is a step-down correction that is more stringent than a false discovery rate correction such as Benjamini-Hochberg. Note that for a generalized linear model, Tukey’s HSD is not applicable to control the FWER. This is because Tukey’s HSD depends on an ANOVA in the background in R and packages that calculate Tukey’s HSD essentially all do an aov, thus the FWER is calculated on a different test statistic than is calculated by the glm.

The expected value of the Holm-Bonferroni test can be used as the post-hoc standin for Tukey’s HSD. Note, that this is true only when there are a relatively small number of contrasts (up to 10) in the model31 Kim (2015). For convenience the user can also select the Benjamini-Hochberg correction.

7.3 A word about effect size and overlap

The effect size metric used by ALDEx2 is a standardized distributional effect size metric developed specifically for this package. The measure is somewhat robust, allowing up to 20% of the samples to be outliers before the value is affected, returns an effect size that is 71% the size of Cohen’s d on a Normal distribution, and requires at worst twice the number of samples as fully parametric methods (which are not robust) would to estimate values with the same precision. The metric is equally valid for Normal, random uniform and Cauchy distributions^(Fernandes et al. (2019) submitted).

We prefer to use the effect size whenever possible rather than statistical significance since an effect size tells the scientist what they want to know—“what is reproducibly different between groups”; this is not something that p-values deliver. We find that using the effect size returns a consistent set of true positive features irregardless of sample size, unlike p-value based methods. Furthermore, over half of the the false positive features that are observed at low sample sizes have and effect size \(> 0.5 \times \mathrm{E}\) the chosen effect size cutoff \(\mathrm{E}\). This is true regardless of the source of the dataset (Fernandes et al. (2019) submitted).

We suggest that an effect size cutoff of 1 or greater be used when analyzing HTS datasets. If preferred the user can also set a fold-change cutoff as is commonly done with p-value based methods, and volcano plots are now included as an option to help guide the user.

7.3.1 Alternative plotting of outputs

The built-in aldex.plot function described above will usually be sufficient, but for more user control the example in Figure 10 shows a plot that shows which features are found by the Welchs’ or Wilcoxon test individually (blue) or by both (red).

# identify which values are significant in both the t-test and glm tests
found.by.all <- which(x.all$we.eBH < 0.05 & x.all$wi.eBH < 0.05)

# identify which values are significant in fewer than all tests
found.by.one <- which(x.all$we.eBH < 0.05 | x.all$wi.eBH < 0.05)

# plot the within and between variation of the data
plot(x.all$diff.win, x.all$diff.btw, pch=19, cex=0.3, col=rgb(0,0,0,0.3),
 xlab="Dispersion", ylab="Difference")
points(x.all$diff.win[found.by.one], x.all$diff.btw[found.by.one], pch=19,
 cex=0.7, col=rgb(0,0,1,0.5))
points(x.all$diff.win[found.by.all], x.all$diff.btw[found.by.all], pch=19,
 cex=0.7, col=rgb(1,0,0,1))
abline(0,1,lty=2)
abline(0,-1,lty=2)
Differential abundance in the selex dataset using the Welch's t-test or Wilcoxon rank test. Features identified by both tests shown in red. Features identified by only one test are shown in blue dots.  Non-significant features represent rare features if black and abundant features if grey dots.

Figure 10: Differential abundance in the selex dataset using the Welch’s t-test or Wilcoxon rank test
Features identified by both tests shown in red. Features identified by only one test are shown in blue dots. Non-significant features represent rare features if black and abundant features if grey dots.

8 Troubleshooting datasets

In some cases we observe that the data can be asymmetric. This occurs when the data are extremely asymmetric, such as when one group is largely composed of features that are absent in the other group or when the direction of change in the experiment is not symmetric. In this case the geometric mean will not accurately represent the appropriate basis of comparison for each group. An asymmetry can arise for many reasons: in RNA-seq it could arise because samples in one group contain a plasmid and the samples in the other group do not; in metagenomics or 16S rRNA gene sequencing it can arise when the samples in the two groups are taken from different environments; in a selex experiment it can arise because the two groups are under different selective constraints. The asymmetry can manifest either as a difference in sparsity (i.e., one group contains more 0 value features than the other) or as a systematic difference in abundance. When this occurs the geometric mean of each sample and group can be markedly different, and thus an inherent skew in the dataset can occur that leads to false positive and false negative feature calls. Asymmetry generally shows as a the centre of mass of the histogram for the x.all$diff.btw or x.all$effect being not centred around zero32 Wu et al. (2021). We recommend that all datasets be examined for asymmetry.

8.1 Using scale to correct for asymmetric datasets

The preferred approach is to build a full scale model of the asymmetry and to use that as an input to aldex.clr. Fundamentally, asymmetry will often arise because of a difference in size of the underlying environment. For example, one cell type may grow faster than another, or may have a different total number of mRNA molecules than another, or one environment may have a different microbial carrying capacity than another. This is the situation that is best taken care of by a scale model33 Nixon et al. (2023);McGovern, Nixon, and Silverman (2023).

The example dataset below is the simulated synth2 dataset from from Wu et al(Wu et al. 2021). In this example there are 1000 features with 20 (2%) of those being asymmetrically sparse; i.e., 20 features are 0 in one group and non-zero in the other. A further 50 have an approximate 2-fold change (FC is normally distributed). The asymmetry has the effect of changing the geometric mean of the two groups. In addition, the data have an unrealistically low dispersion.

The three plots show the behaviour of unmodified and scale corrected ALDEx2.

The base output shows that some features are differentially abundant because of the inappropriate centre of the data. The approximate centre is the median difference of the points and is noted.

Adding in uncertainty about the scale of the data has a minimal effect on the median difference, but does remove all false positives. However, now there is an asymmetry in the false negatives, in that those in the ‘S’ category are more likely to be called negatives than those in the ‘N’ category.

Finally, adding in uncertainty about scale, and information on the fraction of the features that are asymmetric results in the desired output.

We recommend the scale approach whenever possible, especially if there is some underlying biology that could drive the asymmetry. See the scale_sim vignette for more information on these new capabilities.

The default ALDEx2 scale correction includes only uncertainty and not asymmetry.

set.seed(11)
data(synth2)
# basic ALDEx2
blocks <- c(rep("N", 10),rep("S", 10))
x <- aldex.clr(synth2, blocks, gamma=NULL)
x.e <- aldex.effect(x)

# Add in asymmetry and scale variance
# gamma can be smaller for the same output dispersion
# in the full scale model built here
# make a base scale for group 1 and 2
mu.vec = c(log2(rep(1,10)), log2(rep(1.02,10)))
scale_samples <- aldex.makeScaleMatrix(gamma=0.25, mu=mu.vec, 
  conditions=blocks) 
  
xs <- aldex.clr(synth2, blocks, gamma=scale_samples)
xs.e <- aldex.effect(xs)

# Add in only scale variance with sd=1
xg <- aldex.clr(synth2, blocks, gamma=0.5)
xg.e <- aldex.effect(xg)


par(mfrow=c(1,3))
aldex.plot(x.e, test='effect', main='base', xlim=c(0.1,3))
text(1, 4, labels=paste('median diff =',round(median(x.e$diff.btw),3)))
aldex.plot(xg.e, test='effect', main='ratio=1, sd=0.5', xlim=c(0.1,3))
text(1.6, 4, labels=paste('median diff =',round(median(xg.e$diff.btw),3)))
aldex.plot(xs.e, test='effect', main='ratio=1.02:1, sd=0.5', xlim=c(0.1,3))
text(1.6, 4, labels=paste('median diff =',round(median(xs.e$diff.btw),3)))
Differential abundance in a synthetic dataset using different scale parameters for the clr calculation. In this data 2\% of the features are modelled to be sparse in one group but not the other and a further 5\% of the features are differentially abundant. Features determined to be different between two groups are shown in red. Features that are non-significant are in black. Lines of constant effect are drawn at 0, and  $\pm$ 1.

Figure 11: Differential abundance in a synthetic dataset using different scale parameters for the clr calculation
In this data 2% of the features are modelled to be sparse in one group but not the other and a further 5% of the features are differentially abundant. Features determined to be different between two groups are shown in red. Features that are non-significant are in black. Lines of constant effect are drawn at 0, and \(\pm\) 1.

8.2 Subsetting features to correct for asymmetric datasets

The other approach that can be taken by ALDEx2 is to identify those features that are relatively invariant in all features in the entire dataset even though many features could be asymmetric between the groups. Fundamentally, the log-ratio approach requires that the denominator across all samples be comparable. The output of aldex.clr contains the offset of the features used for the denominator in the @denom slot.

These methods should be considered deprecated and may be removed in future releases since their outcomes can be completely reproduced using scale models which are much more easily explained and which provide insights into the underlying environment from which the data were collected.

8.2.1 Parameter values for denom

The aldex.clr function incorporates multiple approaches to select the denominator that can best deal with asymmetric datasets:

IMPORTANT: no rows should contain all 0 values as they will be removed by the aldex.clr function

  1. all: The default is to calculate the geometric mean of all features using the centred log-ratio of Aitchison34 Aitchison:1986. This is the default method for the compositional data analysis approach.

  2. iqlr: The iqlr method identifies those features that exhibit reproducible variance in the entire dataset. This is called the inter-quartile log-ratio or iqlr approach. For this, a uniform prior of 0.5 is applied to the dataset, the clr transformation is applied, and the variance of each feature is calculated. Those features that have variance values that fall between the first and third quartiles of variance for all features in all groups in the dataset are retained. When aldex.clr is called, the geometric mean abundance of only the retained features is calculated and used as the denominator for log-ratio calculations. Modelling shows that this approach is effective in dealing with datasets with up to 25% of the features being asymmetric. The approach has the advantage it has little or no effect on symmetric datasets and so is a safe approach if the user is unsure if the data is mildly asymmetric.

  3. lvha: This method identifies those features that in the bottom quartile for variance in each group and the top quartile for relative abundance for each sample and across the entire dataset. This method is appropriate when the groups are very asymmetric, but there are some features that are expected to be relatively constant. The basic idea here is to identify those features that are relatively constant across all samples, similar to the features that would be chosen as internal standards in qPCR. Experience suggests that meta-genomic and meta-transcriptomic datasets can benefit from this method of choosing the denominator. This method does not work with the selex dataset, since no features fit the criteria.

  4. zero: This approach identifies and uses only those features that are non-zero in each group. In this approach the per-group non-zero features are used when aldex.clr calculates the geometric mean in the clr transformation. This method is appropriate when the groups are very asymmetric, but the experimentalist must ask whether the comparison is valid in these extreme cases.

  5. user: The last new approach is to let the user define the set of `invariant’ features. In the case of meta-rna-seq, it could be argued that the levels of housekeeping genes should be standard for all samples. In this case the user could define the row indices that correspond to the particular set of housekeeping genes to use as the standard.

  6. iterate: This method identifies those features that are not statistically significantly different between groups using the statistical test of choice. It may be combined with other methods.

8.3 Very large datasets

On occasion ALDEx2 may overflow the memory available. This is seen when very large datasets containing thousands of samples and tens of thousands of features are used.

The workaround is simple. Run the 'aldex function with MC.samples=2 multiple times, and then take the average of the values that you want to use for inference. We recommend taking the mean of at least 16 such replicates. Testing shows that this recapitulates the expected result, and is not (too far off) what ALDEx2 is doing under the hood.

9 Contributors

I am grateful that ALDEx2 has taken on a life of its own.

  • Andrew Fernandes wrote the original ALDEx code, and designed ALDEx2. He developed the effect size metric and the concept of reporting posterior outcomes in all cases.
  • Jean Macklaim found and squished many bugs, performed unit testing, did much of the original validation. Jean also made seminal contributions to simplifying and explaining the output of ALDEx2.
  • Michelle Pistner Nixon and Justin Silverman implemented the scale idea and code and the scale_sim vignette. More scale corrections will be forthcoming.
  • Michelle Pistner Nixon and Justin Silverman implemented the rigorous posterior p-value idea and code
  • Matt Links incorporated several ALDEx2 functions into a multicore environment.
  • Adrianne Albert wrote the correlation and the one-way ANOVA modules in aldex.kw.
  • Ruth Grace Wong added function definitions and made the parallel code functional with BioConducter.
  • Jia Rong Wu developed and implemented the alternate denominator method to correct for asymmetric datasets.
  • Andrew Fernandes, Jean Macklaim and Ruth Grace Wong contributed to the Sum-FunctionsAitchison.R code.
  • Thom Quinn rewrote the t-test and Wilcoxon functions to make them substantially faster. He also wrote much of the current aldex.glm function. Tom’s propr35 Quinn et al. (2017) R package is able to integrate with the output from aldex.clr.
  • Vladimir Mikryukov and everyone named above have contributed bug fixes
  • Greg Gloor currently maintains ALDEx2 and had and has roles in documentation, design, testing and implementation.

10 Version information

Version 1.04 of ALDEx was the version used for the analysis in Macklaim et al.36 Macklaim et al. (2013);Fernandes et al. (2013). This version was suitable only for two-sample two-group comparisons, and provided only effect size estimates of difference between groups. ALDEx v1.0.4 is available at:

. No further changes are expected for that version since it can be replicated completely within ALDEx2 by using only the aldex.clr and aldex.effect commands.

Versions 2.0 to 2.05 were development versions that enabled p-value calculations. Version 2.06 of ALDEx2 was the version used for the analysis in37 Fernandes et al. (2014). This version enabled large sample comparisons by calculating effect size from a random sample of the data rather than from an exhaustive comparison.

Version 2.07 of ALDEx2 was the initial the modular version that exposed the intermediate calculations so that investigators could write functions to analyze different experimental designs. As an example, this version contains an example one-way ANOVA module. This is identical to the version submitted to Bioconductor as 0.99.1.

Future releases of ALDEx2 now use the Bioconductor versioning numbering.

sessionInfo()
## R version 4.4.0 beta (2024-04-15 r86425)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.4 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.19-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] ALDEx2_1.36.0         latticeExtra_0.6-30   lattice_0.22-6       
## [4] zCompositions_1.5.0-3 truncnorm_1.0-9       NADA_1.6-1.1         
## [7] survival_3.6-4        MASS_7.3-60.2         BiocStyle_2.32.0     
## 
## loaded via a namespace (and not attached):
##  [1] SummarizedExperiment_1.34.0 xfun_0.43                  
##  [3] bslib_0.7.0                 Biobase_2.64.0             
##  [5] quadprog_1.5-8              tools_4.4.0                
##  [7] stats4_4.4.0                parallel_4.4.0             
##  [9] highr_0.10                  Matrix_1.7-0               
## [11] RColorBrewer_1.1-3          S4Vectors_0.42.0           
## [13] RcppParallel_5.1.7          lifecycle_1.0.4            
## [15] GenomeInfoDbData_1.2.12     compiler_4.4.0             
## [17] deldir_2.0-4                tinytex_0.50               
## [19] codetools_0.2-20            GenomeInfoDb_1.40.0        
## [21] htmltools_0.5.8.1           sass_0.4.9                 
## [23] yaml_2.3.8                  crayon_1.5.2               
## [25] jquerylib_0.1.4             BiocParallel_1.38.0        
## [27] cachem_1.0.8                DelayedArray_0.30.0        
## [29] magick_2.8.3                abind_1.4-5                
## [31] digest_0.6.35               bookdown_0.39              
## [33] splines_4.4.0               fastmap_1.1.1              
## [35] grid_4.4.0                  cli_3.6.2                  
## [37] SparseArray_1.4.0           magrittr_2.0.3             
## [39] S4Arrays_1.4.0              Rfast_2.1.0                
## [41] UCSC.utils_1.0.0            RcppZiggurat_0.1.6         
## [43] rmarkdown_2.26              XVector_0.44.0             
## [45] httr_1.4.7                  matrixStats_1.3.0          
## [47] jpeg_0.1-10                 multtest_2.60.0            
## [49] interp_1.1-6                png_0.1-8                  
## [51] evaluate_0.23               knitr_1.46                 
## [53] GenomicRanges_1.56.0        IRanges_2.38.0             
## [55] rlang_1.1.3                 Rcpp_1.0.12                
## [57] BiocManager_1.30.22         directlabels_2024.1.21     
## [59] BiocGenerics_0.50.0         jsonlite_1.8.8             
## [61] R6_2.5.1                    MatrixGenerics_1.16.0      
## [63] zlibbioc_1.50.0

Aitchison, J. 1986. The Statistical Analysis of Compositional Data. London, England: Chapman & Hall.

Altman, D. G., and J. M. Bland. 1983. “Measurement in Medicine: The Analysis of Method Comparison Studies.” Journal of the Royal Statistical Society. Series D (the Statistician) 32 (3): pp. 307–17. http://www.jstor.org/stable/2987937.

Bian, Gaorui, Gregory B Gloor, Aihua Gong, Changsheng Jia, Wei Zhang, Jun Hu, Hong Zhang, et al. n.d. “The Gut Microbiota of Healthy Aged Chinese Is Similar to That of the Healthy Young.” mSphere 2 (5): e00327–17. https://doi.org/10.1128/mSphere.00327-17.

Cui, Xiangqin, and Gary A Churchill. 2003. “Statistical Tests for Differential Expression in cDNA Microarray Experiments.” Genome Biol 4 (4): 210.1–210.10.

Fernandes, Andrew D, Jean M Macklaim, Thomas G Linn, Gregor Reid, and Gregory B Gloor. 2013. “ANOVA-Like Differential Expression (Aldex) Analysis for Mixed Population Rna-Seq.” PLoS One 8 (7): e67019. https://doi.org/10.1371/journal.pone.0067019.

Fernandes, Andrew D, Jennifer Ns Reid, Jean M Macklaim, Thomas A McMurrough, David R Edgell, and Gregory B Gloor. 2014. “Unifying the Analysis of High-Throughput Sequencing Datasets: Characterizing RNA-Seq, 16S RRNA Gene Sequencing and Selective Growth Experiments by Compositional Data Analysis.” Microbiome 2: 15.1–15.13. https://doi.org/10.1186/2049-2618-2-15.

Fernandes, Andrew D., Michael T. H. Q. Vu, Lisa-Monique Edward, Jean M. Macklaim, and Gregory B. Gloor. 2019. “A Reproducible Effect Size Is More Useful Than an Irreproducible Hypothesis Test to Analyze High Throughput Sequencing Datasets.” ArVix Preprint: 1809.02623.

Gloor, Gregory B. 2023. “AmIcompositional: Simple Tests for Compositional Behaviour of High Throughput Data with Common Transformations.” Austrian Journal of Statistics 52 (4): 180–97.

Gloor, Gregory B, Jean M. Macklaim, and Andrew D. Fernandes. 2016. “Displaying Variation in Large Datasets: Plotting a Visual Summary of Effect Sizes.” Journal of Computational and Graphical Statistics 25 (3C): 971–79. https://doi.org/10.1080/10618600.2015.1131161.

Gloor, Gregory B., Jean M. Macklaim, Vera Pawlowsky-Glahn, and Juan J. Egozcue. 2017. “Microbiome Datasets Are Compositional: And This Is Not Optional.” Frontiers in Microbiology 8: 2224. https://doi.org/10.3389/fmicb.2017.02224.

Gloor, Gregory B, Jean M Macklaim, Michael Vu, and Andrew D Fernandes. 2016. “Compositional Uncertainty Should Not Be Ignored in High-Throughput Sequencing Data Analysis.” Austrian Journal of Statistics 45: 73–87. https://doi.org/doi:10.17713/ajs.v45i4.122.

Hawinkel, Stijn, Federico Mattiello, Luc Bijnens, and Olivier Thas. 2018. “A Broken Promise : Microbiome Differential Abundance Methods Do Not Control the False Discovery Rate.” BRIEFINGS IN BIOINFORMATICS. http://dx.doi.org/10.1093/bib/bbx104.

Kim, Hae-Young. 2015. “Statistical Notes for Clinical Researchers: Post-Hoc Multiple Comparisons.” Restorative Dentistry & Endodontics 40 (2): 172–76.

Macklaim, M Jean, D Andrew Fernandes, M Julia Di Bella, Jo-Anne Hammond, Gregor Reid, and Gregory B Gloor. 2013. “Comparative Meta-RNA-Seq of the Vaginal Microbiota and Differential Expression by Lactobacillus Iners in Health and Dysbiosis.” Microbiome 1: 15. https://doi.org/doi: 10.1186/2049-2618-1-12.

Marioni, John C, Christopher E Mason, Shrikant M Mane, Matthew Stephens, and Yoav Gilad. 2008. “RNA-seq: An Assessment of Technical Reproducibility and Comparison with Gene Expression Arrays.” Genome Res 18 (9): 1509–17. https://doi.org/10.1101/gr.079558.108.

McGovern, Kyle C., Michelle Pistner Nixon, and Justin D. Silverman. 2023. “Addressing Erroneous Scale Assumptions in Microbe and Gene Set Enrichment Analysis.” bioRxiv. https://doi.org/10.1101/2023.03.10.532120.

McMurrough, Thomas A, Russell J Dickson, Stephanie M F Thibert, Gregory B Gloor, and David R Edgell. 2014. “Control of Catalytic Efficiency by a Coevolving Network of Catalytic and Noncatalytic Residues.” Proc Natl Acad Sci U S A 111 (23): E2376–83. https://doi.org/10.1073/pnas.1322352111.

Nelson, Tiffanie M, Joanna-Lynn C Borgogna, Rebecca M Brotman, Jacques Ravel, Seth T Walk, and Carl J Yeoman. 2015. “Vaginal Biogenic Amines: Biomarkers of Bacterial Vaginosis or Precursors to Vaginal Dysbiosis?” Frontiers in Physiology 6.

Nixon, Michelle Pistner, Jeffrey Letourneau, Lawrence A. David, Nicole A. Lazar, Sayan Mukherjee, and Justin D. Silverman. 2023. “Scale Reliant Inference.” http://arxiv.org/abs/2201.03616.

Quinn, Thomas P, Tamsyn M Crowley, and Mark F Richardson. 2018. “Benchmarking Differential Expression Analysis Tools for Rna-Seq: Normalization-Based Vs. Log-Ratio Transformation-Based Methods.” BMC Bioinformatics 19 (1): 274. https://doi.org/10.1186/s12859-018-2261-8.

Quinn, Thomas, Mark F Richardson, David Lovell, and Tamsyn Crowley. 2017. “Propr: An R-Package for Identifying Proportionally Abundant Features Using Compositional Data Analysis.” bioRxiv. https://doi.org/10.1101/104935.

Soneson, Charlotte, and Mauro Delorenzi. 2013. “A Comparison of Methods for Differential Expression Analysis of RNA-seq Data.” BMC Bioinformatics 14: 91. https://doi.org/10.1186/1471-2105-14-91.

Thorsen, Jonathan, Asker Brejnrod, Martin Mortensen, Morten A Rasmussen, Jakob Stokholm, Waleed Abu Al-Soud, Søren Sørensen, Hans Bisgaard, and Johannes Waage. 2016. “Large-Scale Benchmarking Reveals False Discoveries and Count Transformation Sensitivity in 16S RRNA Gene Amplicon Data Analysis Methods Used in Microbiome Studies.” Microbiome 4 (1): 62. https://doi.org/10.1186/s40168-016-0208-8.

Van Zwet, WR, and J Oosterhoff. 1967. “On the Combination of Independent Test Statistics.” The Annals of Mathematical Statistics 38 (3): 659–80.

Wolfs, Jason M, Thomas A Hamilton, Jeremy T Lant, Marcon Laforet, Jenny Zhang, Louisa M Salemi, Gregory B Gloor, Caroline Schild-Poulter, and David R Edgell. 2016. “Biasing Genome-Editing Events Toward Precise Length Deletions with an Rna-Guided Tevcas9 Dual Nuclease.” Proc Natl Acad Sci U S A, December. https://doi.org/10.1073/pnas.1616343114.

Wu, Jia R., Jean M. Macklaim, Briana L. Genge, and Gregory B. Gloor. 2021. “Finding the Centre: Compositional Asymmetry in High-Throughput Sequencing Datasets.” In Advances in Compositional Data Analysis: Festschrift in Honour of Vera Pawlowsky-Glahn, edited by Peter Filzmoser, Karel Hron, Josep Antoni Martìn-Fernàndez, and Javier Palarea-Albaladejo, 329–46. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-030-71175-7_17.