In this document, a typical analysis using the groupProbPlot function is shown. The indention with this function is to display differences between groups, tissues, stimulations or similar, with a single-cell resolution. The idea is that a cell that comes from a cell type that is specific for one of the two investigated groups will be surrounded exclusively by euclidean nearest neighbors that come from the same group. This is the basis for the analysis: in the standard case, the individual cell is given a number between -1 and 1 that reflects which fraction of the 100 closest neighbors in the euclidean space created by all the input markers that come from one group (-1) or the other (1). The scale is tweaked to reflect that the middle in this case corresponds to a likelihood of a perfect mix with 50% of the cells from each group. For an introduction to the package and example data description, see the general DepecheR package vinjette.
This is how to install the package, if that has not already been done:
if(!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("DepecheR")
For visualization purposes, some 2-dimensional representation of the data is necessary. This could just be two of the variables used to construct the probability vector, but it is more informative to include data on all variables, aided by e.g. tSNE or UMAP. In this case, we will display the data with tSNE.
library(DepecheR)
data("testData")
data("testDataSNE")
This function differs from the other group differentiation functions in the DepecheR package in that no clustering output from the depeche function or any other clustering algorithm is needed as input. Instead, all the indata that the euclidean nearest neighbors should be identified from needs to be added, together with a group identity vector and the 2D data used to display the data. Optionally, the resulting group probability vector can be returned, which will be the case in this example.
dataTrans <-
testData[, c("SYK", "CD16", "CD57", "EAT.2", "CD8", "NKG2C", "CD2", "CD56")]
testData$groupProb <- groupProbPlot(xYData = testDataSNE$Y,
groupVector = testData$label,
groupName1 = "Group_1",
groupName2 = "Group_2",
dataTrans = dataTrans)
## [1] "Done with k-means"
## [1] "Now the first bit is done, and the iterative part takes off"
## [1] "Clusters 1 to 7 smoothed in 2.9159369468689 . Now, 13 clusters are
## [1] left."
## [1] "Clusters 8 to 14 smoothed in 0.925199031829834 . Now, 6 clusters are
## [1] left."
## [1] "Clusters 15 to 20 smoothed in 0.905373096466064 . Now, 0 clusters are
## [1] left."
When running this function, the output is a high-resolution plot saved to disc. A low resolution variant of the result (made small for BioConductor size constraint reasons) is shown here. In this case, the groups are so separated, that almost all cells show a 100% probability of belonging to one of the groups or the other. This is unusual with real data, so the white fields are generally larger.
sessionInfo()
## R version 4.2.1 (2022-06-23)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.5 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.16-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.16-bioc/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] DepecheR_1.14.0 knitr_1.40 BiocStyle_2.26.0
##
## loaded via a namespace (and not attached):
## [1] ggrepel_0.9.1 Rcpp_1.0.9 lattice_0.20-45
## [4] tidyr_1.2.1 FNN_1.1.3.1 corpcor_1.6.10
## [7] snow_0.4-4 gtools_3.9.3 assertthat_0.2.1
## [10] digest_0.6.30 foreach_1.5.2 utf8_1.2.2
## [13] RSpectra_0.16-1 plyr_1.8.7 R6_2.5.1
## [16] ellipse_0.4.3 evaluate_0.17 ggplot2_3.3.6
## [19] pillar_1.8.1 gplots_3.1.3 rlang_1.0.6
## [22] gdata_2.18.0.1 jquerylib_0.1.4 gmodels_2.18.1.1
## [25] Matrix_1.5-1 rmarkdown_2.17 moments_0.14.1
## [28] rARPACK_0.11-0 BiocParallel_1.32.0 stringr_1.4.1
## [31] igraph_1.3.5 munsell_0.5.0 compiler_4.2.1
## [34] xfun_0.34 pkgconfig_2.0.3 htmltools_0.5.3
## [37] doSNOW_1.0.20 tidyselect_1.2.0 tibble_3.1.8
## [40] gridExtra_2.3 bookdown_0.29 codetools_0.2-18
## [43] matrixStats_0.62.0 viridisLite_0.4.1 fansi_1.0.3
## [46] dplyr_1.0.10 MASS_7.3-58.1 bitops_1.0-7
## [49] grid_4.2.1 jsonlite_1.8.3 gtable_0.3.1
## [52] lifecycle_1.0.3 DBI_1.1.3 magrittr_2.0.3
## [55] scales_1.2.1 KernSmooth_2.23-20 cli_3.4.1
## [58] stringi_1.7.8 cachem_1.0.6 reshape2_1.4.4
## [61] viridis_0.6.2 robustbase_0.95-0 bslib_0.4.0
## [64] generics_0.1.3 vctrs_0.5.0 RColorBrewer_1.1-3
## [67] iterators_1.0.14 mixOmics_6.22.0 tools_4.2.1
## [70] glue_1.6.2 DEoptimR_1.0-11 purrr_0.3.5
## [73] parallel_4.2.1 fastmap_1.1.0 yaml_2.3.6
## [76] colorspace_2.0-3 BiocManager_1.30.19 caTools_1.18.2
## [79] beanplot_1.3.1 sass_0.4.2