The waddR package


waddR is an R package that provides a 2-Wasserstein distance based statistical test for detecting and describing differential distributions in one-dimensional data. Functions for wasserstein distance calculation, differential distribution testing, and a specialized test for differential expression in scRNA data are provided.

The package waddR provides three sets of utilities to cover distinct use cases, each described in a separate vignette:

These are bundled into the same package, because they are internally dependent: The procedure for detecting differential distributions in single-cell data is a refinement of the general two-sample test, which itself uses the 2-Wasserstein distance to compare two distributions.

Wasserstein Distance functions

The 2-Wasserstein distance is a metric to describe the distance between two distributions, representing two diferent conditions A and B. This package specifically considers the squared 2-Wasserstein distance d := W^2 which offers a decomposition into location, size, and shape terms.

The package waddR offers three functions to calculate the 2-Wasserstein distance, all of which are implemented in Cpp and exported to R with Rcpp for better performance. The function wasserstein_metric is a Cpp reimplementation of the function wasserstein1d from the package transport and offers the most exact results. The functions squared_wass_approx and squared_wass_decomp compute approximations of the squared 2-Wasserstein distance with squared_wass_decomp also returning the decomosition terms for location, size, and shape. See ?wasserstein_metric, ?squared_wass_aprox, and ?squared_wass_decomp.

Two-Sample Testing

This package provides two testing procedures using the 2-Wasserstein distance to test whether two distributions F_A and F_B given in the form of samples are different ba specifically testing the null hypothesis H0: F_A = F_B against the alternative hypothesis H1: F_A != F_B.

The first, semi-parametric (SP), procedure uses a test based on permutations combined with a generalized pareto distribution approximation to estimate small pvalues accurately.

The second procedure (ASY) uses a test based on asymptotic theory which is valid only if the samples can be assumed to come from continuous distributions.

See ?wasserstein.test for more details.

Single Cell Test: The waddR package provides an adaptation of the

semi-parametric testing procedure based on the 2-Wasserstein distance which is specifically tailored to identify differential distributions in single-cell RNA-seqencing (scRNA-seq) data. In particular, a two-stage (TS) approach has been implemented that takes account of the specific nature of scRNA-seq data by separately testing for differential proportions of zero gene expression (using a logistic regression model) and differences in non-zero gene expression (using the semi-parametric 2-Wasserstein distance-based test) between two conditions.

See the documentation of the single cell procedure ? and the test for zero expression levels ?testZeroes for more details.


To install waddR from Bioconductor, use BiocManager with the following commands:

if (!requireNamespace("BiocManager"))

Using BiocManager, the package can also be installed from github directly:


The package waddR can then be used in R:


Session Info

#> R version 4.0.0 (2020-04-24)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 18.04.4 LTS
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.11-bioc/R/lib/
#> LAPACK: /home/biocbuild/bbs-3.11-bioc/R/lib/
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> other attached packages:
#> [1] waddR_1.2.0
#> loaded via a namespace (and not attached):
#>  [1] SummarizedExperiment_1.18.0 statmod_1.4.34             
#>  [3] tidyselect_1.0.0            xfun_0.13                  
#>  [5] purrr_0.3.4                 splines_4.0.0              
#>  [7] lattice_0.20-41             vctrs_0.2.4                
#>  [9] htmltools_0.4.0             stats4_4.0.0               
#> [11] BiocFileCache_1.12.0        yaml_2.2.1                 
#> [13] blob_1.2.1                  rlang_0.4.5                
#> [15] nloptr_1.2.2.1              pillar_1.4.3               
#> [17] glue_1.4.0                  DBI_1.1.0                  
#> [19] BiocParallel_1.22.0         rappdirs_0.3.1             
#> [21] SingleCellExperiment_1.10.0 BiocGenerics_0.34.0        
#> [23] bit64_0.9-7                 dbplyr_1.4.3               
#> [25] matrixStats_0.56.0          GenomeInfoDbData_1.2.3     
#> [27] lifecycle_0.2.0             stringr_1.4.0              
#> [29] zlibbioc_1.34.0             coda_0.19-3                
#> [31] memoise_1.1.0               evaluate_0.14              
#> [33] Biobase_2.48.0              knitr_1.28                 
#> [35] IRanges_2.22.0              GenomeInfoDb_1.24.0        
#> [37] parallel_4.0.0              curl_4.3                   
#> [39] Rcpp_1.0.4.6                arm_1.11-1                 
#> [41] DelayedArray_0.14.0         S4Vectors_0.26.0           
#> [43] XVector_0.28.0              abind_1.4-5                
#> [45] lme4_1.1-23                 bit_1.1-15.2               
#> [47] digest_0.6.25               stringi_1.4.6              
#> [49] dplyr_0.8.5                 GenomicRanges_1.40.0       
#> [51] grid_4.0.0                  tools_4.0.0                
#> [53] bitops_1.0-6                magrittr_1.5               
#> [55] RCurl_1.98-1.2              tibble_3.0.1               
#> [57] RSQLite_2.2.0               crayon_1.3.4               
#> [59] pkgconfig_2.0.3             ellipsis_0.3.0             
#> [61] MASS_7.3-51.6               Matrix_1.2-18              
#> [63] minqa_1.2.4                 assertthat_0.2.1           
#> [65] rmarkdown_2.1               httr_1.4.1                 
#> [67] boot_1.3-25                 R6_2.4.1                   
#> [69] nlme_3.1-147                compiler_4.0.0