This vignette is made for users that are already familiar with the basic condiments workflow described in the first vignette. Here, we will show how to modify the default parameters for the first two steps of the workflow

# For analysis
library(condiments)
library(slingshot)
set.seed(21)

Toy dataset

We rely on the same toy dataset as the first vignette

data("toy_dataset", package = "condiments")
df <- toy_dataset$sd
rd <- as.matrix(df[, c("Dim1", "Dim2")])
sds <- slingshot(rd, df$cl)

The topologyTest function

By default, the topologyTest function requires only two inputs, the sds object and the condition labels. To limit run time for the vignette, we also change the default number of permutations used to generate trajectories under the null by setting the rep argument to \(10\) instead of the default \(100\). As such, the test statistics might be more variable.

top_res <- topologyTest(sds = sds, conditions = df$conditions, rep = 10)

## Generating permuted trajectories

## Running KS-mean test

knitr::kable(top_res)

method	thresh	statistic	p.value
KS_mean	0.01	0	1

Changing the method or the threshold

The topologyTest function can be relatively slow on large datasets. Moreover, when changing the method used to test the null hypothesis that a common trajectory should be fitted, the first permutation part of generating rep trajectories under the null is identical. Therefore, we allow users to specify more than one method and one value of the threshold. Here, we will use both the Kolmogorov-Smirnov test test(Smirnov 1939) and the classifier-test(Lopez-Paz and Oquab 2016).

top_res <- topologyTest(sds = sds, conditions = df$conditions, rep = 10,
                        methods = c("KS_mean", "Classifier"),
                        threshs = c(0, .01, .05, .1))

## Generating permuted trajectories

## Running KS-mean test

## Running Classifier test

knitr::kable(top_res)

method	thresh	statistic	p.value
KS_mean	0	0.0070000	1.0000000
KS_mean	0.01	0.0000000	1.0000000
KS_mean	0.05	0.0000000	1.0000000
KS_mean	0.1	0.0000000	1.0000000
Classifier	0	0.4150000	0.9999821
Classifier	0.01	0.3800000	1.0000000
Classifier	0.05	0.3333333	1.0000000
Classifier	0.1	0.2833333	1.0000000

To see all methods avaible, use /tmp/RtmpWqDRNe/Rinstae5683fca8533/condiments/help/topologyTest and look at the methods argument.

Passing arguments to the test method

For all methods but the KS test, additional paramters can be specified, using a custom argument: args_classifier, args_wass or args_mmd. See the help file for given test more information on those parameters. For example, since the default test based on the wasserstein distance and permutation test is quite slow, we can pass a fast argument.

top_res <- topologyTest(sds = sds, conditions = df$conditions, rep = 10,
                        methods = "wasserstein_permutation",
                        args_wass = list(fast = TRUE, S = 100, iterations  = 10^2))

## Generating permuted trajectories

## Running wassertsein permutation test

knitr::kable(top_res)

method	thresh	statistic	p.value
wasserstein_permutation	NA	1.356887	0.85

Using parallelisation

For now, the first part of the topologyTest has been designed for parallelisation using the BiocParallel package. For example, to run with 4 cores, you can run the following command

library(BiocParallel)
BPPARAM <- bpparam()
BPPARAM$progressbar <- TRUE
BPPARAM$workers <- 4
top_res <- topologyTest(sds = sds, conditions = df$conditions, rep = 100, 
                        parallel = TRUE, BPPARAM = BPPARAM)
knitr::kable(top_res)

Differential progression and fate selection

The tests for the second test are much less compute-intensive, therefore there is no parallelisation. However, the other changes introduce in the previous section are still possible

Default

prog_res <- progressionTest(sds, conditions = df$conditions)
knitr::kable(prog_res)

lineage	statistic	p.value
All	5.506366	0

dif_res <- fateSelectionTest(sds, conditions = df$conditions)

## note: only 1 unique complexity parameters in default grid. Truncating the grid to 1 .
## 
## note: only 1 unique complexity parameters in default grid. Truncating the grid to 1 .

knitr::kable(dif_res)

pair	statistic	p.value
1vs2	0.6611712	8e-07

Changing the method and / or threshold

prog_res <- progressionTest(sds, conditions = df$conditions, method = "Classifier")
knitr::kable(prog_res)

lineage	statistic	p.value
All	0.6026126	0.0012246

dif_res <- fateSelectionTest(sds, conditions = df$conditions, thresh = .05)

## note: only 1 unique complexity parameters in default grid. Truncating the grid to 1 .
## 
## note: only 1 unique complexity parameters in default grid. Truncating the grid to 1 .

knitr::kable(dif_res)

pair	statistic	p.value
1vs2	0.6301802	6.01e-05

Passing more parameters to the test methods

prog_res <- progressionTest(sds, conditions = df$conditions, method = "Classifier",
                            args_classifier = list(method = "rf"))

## Warning in randomForest.default(x, y, mtry = param$mtry, ...): invalid mtry:
## reset to within valid range

## Warning in randomForest.default(x, y, mtry = param$mtry, ...): invalid mtry:
## reset to within valid range

## Warning in randomForest.default(x, y, mtry = param$mtry, ...): invalid mtry:
## reset to within valid range

## Warning in randomForest.default(x, y, mtry = param$mtry, ...): invalid mtry:
## reset to within valid range

## Warning in randomForest.default(x, y, mtry = param$mtry, ...): invalid mtry:
## reset to within valid range

## Warning in randomForest.default(x, y, mtry = param$mtry, ...): invalid mtry:
## reset to within valid range

## Warning in randomForest.default(x, y, mtry = param$mtry, ...): invalid mtry:
## reset to within valid range

## Warning in randomForest.default(x, y, mtry = param$mtry, ...): invalid mtry:
## reset to within valid range

## Warning in randomForest.default(x, y, mtry = param$mtry, ...): invalid mtry:
## reset to within valid range

## Warning in randomForest.default(x, y, mtry = param$mtry, ...): invalid mtry:
## reset to within valid range

## Warning in randomForest.default(x, y, mtry = param$mtry, ...): invalid mtry:
## reset to within valid range

## Warning in randomForest.default(x, y, mtry = param$mtry, ...): invalid mtry:
## reset to within valid range

## Warning in randomForest.default(x, y, mtry = param$mtry, ...): invalid mtry:
## reset to within valid range

## Warning in randomForest.default(x, y, mtry = param$mtry, ...): invalid mtry:
## reset to within valid range

## Warning in randomForest.default(x, y, mtry = param$mtry, ...): invalid mtry:
## reset to within valid range

## Warning in randomForest.default(x, y, mtry = param$mtry, ...): invalid mtry:
## reset to within valid range

## Warning in randomForest.default(x, y, mtry = param$mtry, ...): invalid mtry:
## reset to within valid range

## Warning in randomForest.default(x, y, mtry = param$mtry, ...): invalid mtry:
## reset to within valid range

## Warning in randomForest.default(x, y, mtry = param$mtry, ...): invalid mtry:
## reset to within valid range

## Warning in randomForest.default(x, y, mtry = param$mtry, ...): invalid mtry:
## reset to within valid range

## Warning in randomForest.default(x, y, mtry = param$mtry, ...): invalid mtry:
## reset to within valid range

## Warning in randomForest.default(x, y, mtry = param$mtry, ...): invalid mtry:
## reset to within valid range

## note: only 1 unique complexity parameters in default grid. Truncating the grid to 1 .

knitr::kable(prog_res)

lineage	statistic	p.value
All	0.5890991	0.0043539

dif_res <- fateSelectionTest(sds, conditions = df$conditions)

## note: only 1 unique complexity parameters in default grid. Truncating the grid to 1 .
## 
## note: only 1 unique complexity parameters in default grid. Truncating the grid to 1 .

knitr::kable(dif_res)

pair	statistic	p.value
1vs2	0.6341441	3.41e-05

Conclusion

For all of the above procedures, it is important to note that we are making multiple comparisons. The p-values we obtain from these tests should be corrected for multiple testing, especially for trajectories with a large number of lineages.

That said, trajectory inference is often one of the last computational methods in a very long analysis pipeline (generally including gene-level quantification, gene filtering / feature selection, and dimensionality reduction). Hence, we strongly discourage the reader from putting too much faith in any p-value that comes out of this analysis. Such values may be useful suggestions, indicating particular features or cells for follow-up study, but should generally not be treated as meaningful statistical quantities.

If some commands and parameters are still unclear after going through this vignette, do not hesitate to open an issue on the condiments Github repository.

Session Info

sessionInfo()

## R version 4.3.0 RC (2023-04-13 r84269)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.2 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.17-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] caret_6.0-94                lattice_0.21-8             
##  [3] viridis_0.6.2               viridisLite_0.4.1          
##  [5] RColorBrewer_1.1-3          ggplot2_3.4.2              
##  [7] tidyr_1.3.0                 dplyr_1.1.2                
##  [9] slingshot_2.8.0             TrajectoryUtils_1.8.0      
## [11] SingleCellExperiment_1.22.0 SummarizedExperiment_1.30.0
## [13] Biobase_2.60.0              GenomicRanges_1.52.0       
## [15] GenomeInfoDb_1.36.0         IRanges_2.34.0             
## [17] S4Vectors_0.38.0            BiocGenerics_0.46.0        
## [19] MatrixGenerics_1.12.0       matrixStats_0.63.0         
## [21] princurve_2.1.6             condiments_1.8.0           
## [23] knitr_1.42                 
## 
## loaded via a namespace (and not attached):
##   [1] jsonlite_1.8.4            magrittr_2.0.3           
##   [3] ggbeeswarm_0.7.1          spatstat.utils_3.0-2     
##   [5] farver_2.1.1              rmarkdown_2.21           
##   [7] zlibbioc_1.46.0           vctrs_0.6.2              
##   [9] spatstat.explore_3.1-0    DelayedMatrixStats_1.22.0
##  [11] RCurl_1.98-1.12           htmltools_0.5.5          
##  [13] BiocNeighbors_1.18.0      pROC_1.18.0              
##  [15] sass_0.4.5                parallelly_1.35.0        
##  [17] bslib_0.4.2               plyr_1.8.8               
##  [19] lubridate_1.9.2           cachem_1.0.7             
##  [21] igraph_1.4.2              lifecycle_1.0.3          
##  [23] iterators_1.0.14          pkgconfig_2.0.3          
##  [25] rsvd_1.0.5                Matrix_1.5-4             
##  [27] R6_2.5.1                  fastmap_1.1.1            
##  [29] GenomeInfoDbData_1.2.10   future_1.32.0            
##  [31] digest_0.6.31             colorspace_2.1-0         
##  [33] tensor_1.5                scater_1.28.0            
##  [35] irlba_2.3.5.1             beachmat_2.16.0          
##  [37] spatstat.linnet_3.1-0     labeling_0.4.2           
##  [39] randomForest_4.7-1.1      fansi_1.0.4              
##  [41] spatstat.sparse_3.0-1     timechange_0.2.0         
##  [43] polyclip_1.10-4           abind_1.4-5              
##  [45] mgcv_1.8-42               compiler_4.3.0           
##  [47] rngtools_1.5.2            proxy_0.4-27             
##  [49] withr_2.5.0               doParallel_1.0.17        
##  [51] BiocParallel_1.34.0       spatstat.model_3.2-3     
##  [53] highr_0.10                MASS_7.3-59              
##  [55] lava_1.7.2.1              DelayedArray_0.26.0      
##  [57] ModelMetrics_1.2.2.2      tools_4.3.0              
##  [59] vipor_0.4.5               beeswarm_0.4.0           
##  [61] future.apply_1.10.0       nnet_7.3-18              
##  [63] goftest_1.2-3             glue_1.6.2               
##  [65] nlme_3.1-162              grid_4.3.0               
##  [67] reshape2_1.4.4            generics_0.1.3           
##  [69] recipes_1.0.6             gtable_0.3.3             
##  [71] spatstat.data_3.0-1       class_7.3-21             
##  [73] data.table_1.14.8         ScaledMatrix_1.8.0       
##  [75] BiocSingular_1.16.0       utf8_1.2.3               
##  [77] XVector_0.40.0            spatstat.geom_3.1-0      
##  [79] ggrepel_0.9.3             RANN_2.6.1               
##  [81] foreach_1.5.2             pillar_1.9.0             
##  [83] stringr_1.5.0             limma_3.56.0             
##  [85] Ecume_0.9.1               splines_4.3.0            
##  [87] survival_3.5-5            deldir_1.0-6             
##  [89] tidyselect_1.2.0          scuttle_1.10.0           
##  [91] pbapply_1.7-0             transport_0.13-0         
##  [93] gridExtra_2.3             xfun_0.39                
##  [95] hardhat_1.3.0             distinct_1.12.0          
##  [97] timeDate_4022.108         stringi_1.7.12           
##  [99] yaml_2.3.7                evaluate_0.20            
## [101] codetools_0.2-19          kernlab_0.9-32           
## [103] spatstat_3.0-5            tibble_3.2.1             
## [105] cli_3.6.1                 rpart_4.1.19             
## [107] munsell_0.5.0             jquerylib_0.1.4          
## [109] Rcpp_1.0.10               globals_0.16.2           
## [111] spatstat.random_3.1-4     parallel_4.3.0           
## [113] gower_1.0.1               doRNG_1.8.6              
## [115] sparseMatrixStats_1.12.0  bitops_1.0-7             
## [117] listenv_0.9.0             ipred_0.9-14             
## [119] scales_1.2.1              prodlim_2023.03.31       
## [121] e1071_1.7-13              crayon_1.5.2             
## [123] purrr_1.0.1               rlang_1.1.0

References

Lopez-Paz, David, and Maxime Oquab. 2016. “Revisiting Classifier Two-Sample Tests.” Arxiv, October, 1–15. http://arxiv.org/abs/1610.06545.

Smirnov, Nikolai V. 1939. “On the Estimation of the Discrepancy Between Empirical Curves of Distribution for Two Independent Samples.” Bull. Math. Univ. Moscou 2 (2): 3–14.

More controls for the tests used in the condiments workflow

Hector Roux de Bézieux

25 April , 2023