This document clarifies the differences between MAGeCK1 and gCrisprTools that may be relevant to target detection in CRISPR screens.
The following analyses were performed using a two-condition contrast comparing gRNA abundances estimated from a set of early-timepoint reference samples and a paired set of untreated late-timepoint expansion samples. This comparison was selected because it contains meaningful signals in the context of minimal selective pressure and library distortion, and consequently should be well suited for analysis by the MAGeCK algorithm. In this experiment, triplicate Cas9-expressing cell cultures were infected with a lentiviral library containing 16902 distinct cassettes expressing gRNAs that target 2128 transcribed loci (‘targets’). After sequencing, reads were aligned to the library annotation using the Genentech internal pipeline to produce a count matrix of gRNA cassette observations for each sample. Low abundance gRNAs were discarded from this matrix, and then sample medians were equalized and library sizes scaled to include roughly equivalent total read counts. These count data were then processed by gCrisprTools in a manner mimicking MAGeCK’s default behavior (e.g., using only gRNA-level P-values for ranking and \(\alpha\) disqualification), or by MAGeCK 0.5.3 using the following command:
mageck test -k ControlSamp_Count_data_norm.txt -t 4,5,3 -c 1,2,0 -n normalized –norm-method none –variance-from-all-samples
Overall, MAGeCK and gCrisprTools perform similarly when selecting top candidates and disqualifying poor ones, but MAGeCK tends to assign somewhat better scores to targets for which the associated gRNA counts are extremely low in some samples (and possibly poorly estimated), or where weak signals are present in all associated gRNAs. This is due to differences between the gRNA-level P-values estimated by MAGeCK, which uses a standard Negative Binomial test, and those estimated by more complex modeling frameworks such as Voom/Limma or DESeq2.
MAGeCK and gCrisprTools identify targets whose abundance changes across experimental conditions in a similar manner. First, each algorithm performs some form of normalization, and then estimates the significance of the observed differences in each gRNA’s abundance to generate one-tailed P-values. Both methods then aggregate these gRNA signals to the target level using Robust Rank Alpha Aggregation (RRA\(\alpha\)), a modification of the RRA algorithm.2
Differences in gRNA P-value estimation form the core difference between MAGeCK and gCrisprTools. gCrisprTools estimates gRNA-level P-values from the t-statistics derived from the relevant coefficient estimate of linear model fit in the Voom/Limma framework,3 whereas MAGeCK assigns significance based on the position of the treatement condition mean relative to the estimated Cumultive Density Function of a gRNA-specific Negative Binomial distribution. The P-values assigned by each of these methods follow a roughly monotonic relationship but differ meaningfully in terms of the exact P-value estimates and rankings.
In the RRA algorithm, the normalized gRNA signal ranks (e.g., gRNA rank/N, where N is the number of gRNAs in the screen) associated with each target are assigned a statistic, \(\rho\), based on their distribution on the unit interval. Specifically, each gRNA is assigned a score by comparing its normalized rank to a Beta distribution, appropriately parameterized for the gRNA’s rank position within the set of all gRNAs associated with the target. The smallest observed gRNA \(\rho\) associated with each target is reported as the target-level \(\rho\) score, which is assigned a significance level via permutation of the gRNA labels.
Intuitively, the \(\rho\) score quantifies the extent to which the ranks of the gRNAs associated with a target are skewed toward the bottom of the overall distribution. It is possible for gRNA ranks to deviate from a uniform distribution in ways that are not of interest in the context of a screening experiment, however, and the modified RRA\(\alpha\) version introduces a significance cutoff parameter (\(\alpha\)) to mitigate this concern. The RRA\(\alpha\) algorithm proceeds identically to RRA, but only assigns \(\rho\) statistics to the gRNAs whose signals are “significant” by an external criterion for the purposes of target-level scoring.
gCrisprTools and MAGeCK differ as follows:
Normalization may be handled differently by the two algorithms. By Default, MAGeCK median-scales the libraries unless large numbers of gRNAs are not present in one or more of the samples and a median cannot be meaningfully estimated; in that case, it normalizes the libraries by equalizing the total read counts. gCrisprTools requires the user to explicitly normalize the data, and similarly recommends median scaling.
By default, gCrisprTools removes gRNAs with low abundance in the control samples from consideration prior to analysis while MAGeCK does not.
gRNA P-values are calculated differently, as described below. Briefly, gCrisprTools assigns significance on the basis of the t-statistics returned by Voom/Limma’s linear modelling framework, while MAGeCK assigns significance by comparing the observed gRNA counts to the Cumulative Distribution Function (CDF) of the Negative Binomial distribution estimated for that gRNA.
The default nominal \(\alpha\) cutoff is different in the two approaches. This is a trivial difference because the specific choice of \(\alpha\) does not have a strong effect on the RRA\(\alpha\) results within a reasonable range, and the individual P-value estimates across methods are not comparable anyway.
In the RRA\(\alpha\) step, MAGeCK uses the nominal gRNA P-values to rank the gRNAs and to check against \(\alpha\) to disqualify invariant gRNAs. In practice we find that the ranking gRNAs by their respective P-value point estimates can be suboptimal because the exact P-value assigned to a gRNA can be heavily influenced by modelling assumptions, and so the gRNA rankings may be unstable in cases where the sample distributions are not well estimated or when differences in abundance are large. By default, gCrisprTools ranks gRNAs on the basis of their fold change, and disqualifies gRNAs on the basis of their one-sided P.
When none of the signals associated with a target surpass the \(\alpha\) cutoff, MAGeCK assigns the target a \(\rho\) score on the basis of the lowest normalized rank observed among the associated gRNAs. gCrisprTools assigns such targets a P-value of 1 by construction.
At the target level, gCrisprTools (via Voom/Limma and RR\(\alpha\)) largely returns similar P-values to those estimated by MAGeCK, although some variation is present between the point estimates for individual targets. The P-values that MAGeCK assigns to targets with weak signals are also systematically smaller than those returned by gCrisprTools.
This trend is clearer when we view the ranked P-values against each other- the ranking of individual targets is similar across methods, but P-values are somewhat smaller in the MAGeCK implementation relative to gCrisprTools, especially in the middle of the distribution. This effect is driven by differences in the underlying gRNA P-value estimates provided by gCrisprTools and MAGeCK (see below). Notably, MAGeCK identifies a number of targets with weak to moderate signals that gCrisprTools eliminates from consideration completely; this is due to differences in the treatment of targets with no associated gRNAs passing the \(\alpha\) threshold (see above).
Nevertheless, the high-priority target rankings are similar. There is a set of targets that both methods identify as equivalently optimal hits (dot in the lower left corner corresponding to lowest-ranked targets), and high-priority targets identified in one method are always among the best candidates of the other.
The differences in target prioritization between MAGeCK and gCrisprTools are driven by differences in the gRNA P-value estimation step. Generally speaking, the standard Negative Binomial model employed by MAGeCK produces relatively smaller P-values for gRNAs with very large or very small effects.
This is a consequence of the manner in which each algorithm calculates one-sided statistics; gCrisprTools uses gRNA-level P-values calculated from the voom/limma linear modeling framework (from the t-statistics derived from the relevant coefficient estimate of the linear model), whereas MAGeCK assigns significance to a gRNA by comparing its mean abundance in the test condition to the gRNA-specific Negative Binomial distribution estimated from the control samples. This introduces systematic differences between gRNA P-values estimated by MAGeCK and those computed by other linear modeling frameworks.
Notably, these effects dominate differences in the implementation of the various linear modelling frameworks, and is insensitive to variance shrinkage techniques employed by voom/limma and DESeq2.
This difference has two consequences that directly impact signal aggregation by RRA\(\alpha\). The first is that some gRNA signals are assigned more significance by MAGeCK than Voom/limma; these gRNAs usually to have small probability weights in the test condition, indicating unusually low abundance.
Intuitively, Voom/Limma and DESeq2 downweight gRNAs that deplete to abundances at which the mean is not likely to be accurately estimated, whereas MAGeCK assigns a P-value strictly on the basis of the difference in sample means relative to the gRNA null distribution. In practice this has a small effect; in the example data, the strongest of these effects was observed in a gene ranked eighth overall by MAGeCK and fifteenth by gCrisprTools using MAGeCK’s aggregation framework. At the same time, this illustrates the potential problem with ranking gRNAs by their P-value point estimates; in a scenario where many gRNAs are strongly depleted, the significance of each signal is assigned on the basis of multiple parameters that are not likely to be well estimated experimentally. Consequently, their relative ranks of those signals may not be as meaningful as a direct measure of the gRNA’s behavior with more obvious caveats.
MAGeCK’s P-value estimates also indirectly influence RRA\(\alpha\) aggregation by assigning ambiguous ranks to many gRNA signals. In the analyzed contrast, about 10% of the gRNAs were assigned P-values that were identical to at least one other gRNA, which has multiple modest effects on the internal \(\rho\) statistics computed for each target. Specifically, this redundancy degrades the informativeness of the gRNA rankings throughout the distribution and introduces numerical effects that influence the apparent significance of the corresponding target-level \(\rho\) scores during the permutation step (because tied gRNAs are assigned the same numerical rank).
By omitting the gRNAs redundantly ranked within the experiment, the target-level \(\rho\) scores computed by MAGeCK and gCrisprTools become larger and are more consistently estimated. Increased \(\rho\) scores indicate greater concentration of signals within the RRA framework, and the corresponding targets are more likely to be assigned statistical significance. This is an expected consequence of omitting many gRNAs whose ranks are not informative, and omitting these gRNAs consequently drives stronger signals in both scoring frameworks. The improved fidelity of the \(\rho\) estimates similarly reflects more consistent ordering of the remaining gRNAs across the two methods, which is presumably driven by the absence of ambiguous ranks in the MAGeCK gRNA scores.
In practice, this means that certain patterns of gRNA behavior are favored by MAGeCK more than gCrisprTools and vise-versa. It is worth noting that the following examples are largely edge cases, and the difference in target P-value ranks (as opposed to point estimates) is typically slight.
MAGeCK tends to prefer targets with weak signals that are present for all associated gRNAs.
gCrisprTools somewhat upweights targets associated with a few strong signals that might have elevated variances or weaker signals in some of the associated gRNAs (note the difference in Y-axis scale relative to the hits above).
As described above, both MAGeCK and gCrisprTools calculate target-level scores by generating a \(\rho\) statistic from the set of gRNA signals that are ‘significantly’ altered in the screen. MAGeCK uses gRNA P-values for both ranking and significance testing, but by default gCrisprTools uses a ‘Combined Scoring’ approach, where gRNAs are ranked by their by fold change estimates and the associated P-values are only used to disqualify gRNAs during the \(\rho\) statistic calculation.
Empirically, this approach produces a more stable ranking of the gRNAs within the contrast, which is evident when we directly compare the target-level significance estimates produced by each method. As expected, the target-level P-values associated with each \(\rho\) score are generally smaller with combined scoring, reflecting greater consistency of the signal ranks of the gRNAs associated with the highest-priority targets.
With combined scoring, gCrisprTools prioritizes a number of targets with weaker signals and/or larger variances; these are generally not strong candidates in the MAGeCK framework.
The above examples are both known to play roles in cellular proliferation, and are consequently likely to be true hits in this example.
Li W, Xu H, Xiao T, Cong L, Love MI, Zhang F, Irizarry RA, Liu JS, Brown M, Liu XS. MAGeCK enables robust identification of essential genes from genome-scale CRISPR/Cas9 knockout screens. Genome Biol. 2014;15(12):554. PMID:25476604↩
Kolde R, Laur S, Adler P, Vilo J. Robust rank aggregation for gene list integration and meta-analysis. Bioinformatics. 2012;28(4):573-80. PMID:22247279↩
Law CW, Chen Y, Shi W, Smyth GK. voom: Precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biol. 2014 Feb 3;15(2):R29. doi: 10.1186/gb-2014-15-2-r29.↩