Contents

1 Introduction

In this vignette, we provide an overview of the basic functionality and usage of the scds package, which interfaces with SingleCellExperiment objects.

2 Installation

Install the scds package using Bioconductor:

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("scds", version = "3.9")

Or from github:

library(devtools)
devtools::install_github('kostkalab/scds')

3 Quick start

scds takes as input a SingleCellExperiment object (see here SingleCellExperiment), where raw counts are stored in a counts assay, i.e. assay(sce,"counts"). An example dataset created by sub-sampling the cell-hashing cell-lines data set (see https://satijalab.org/seurat/hashing_vignette.html) is included with the package and accessible via data("sce").Note that scds is designed to workd with larger datasets, but for the purposes of this vignette, we work with a smaller example dataset. We apply scds to this data and compare/visualize reasults:

3.1 Example data set

Get example data set provided with the package.

library(scds)
library(scater)
library(rsvd)
library(Rtsne)
library(cowplot)
set.seed(30519)
data("sce_chcl")
sce = sce_chcl #- less typing
dim(sce)
## [1] 2000 2000

We see it contains 2,000 genes and 2,000 cells, 216 of which are identified as doublets:

table(sce$hto_classification_global)
## 
##  Doublet Negative  Singlet 
##      216       83     1701

We can visualize cells/doublets after projecting into two dimensions:

logcounts(sce) = log1p(counts(sce))
vrs            = apply(logcounts(sce),1,var)
pc             = rpca(t(logcounts(sce)[order(vrs,decreasing=TRUE)[1:100],]))
ts             = Rtsne(pc$x[,1:10],verb=FALSE)

reducedDim(sce,"tsne") = ts$Y; rm(ts,vrs,pc)
plotReducedDim(sce,"tsne",color_by="hto_classification_global")

3.2 Computational doublet annotation

We now run the scds doublet annotation approaches. Briefly, we identify doublets in two complementary ways: cxds is based on co-expression of gene pairs and works with absence/presence calls only, while bcds uses the full count information and a binary classification approach using artificially generated doublets. cxds_bcds_hybrid combines both approaches, for more details please consult (this manuscript). Each of the three methods returns a doublet score, with higher scores indicating more “doublet-like” barcodes.

#- Annotate doublet using co-expression based doublet scoring:
sce = cxds(sce,retRes = TRUE)
sce = bcds(sce,retRes = TRUE,verb=TRUE)
sce = cxds_bcds_hybrid(sce)
par(mfcol=c(1,3))
boxplot(sce$cxds_score   ~ sce$doublet_true_labels, main="cxds")
boxplot(sce$bcds_score   ~ sce$doublet_true_labels, main="bcds")
boxplot(sce$hybrid_score ~ sce$doublet_true_labels, main="hybrid")

3.3 Visualizing gene pairs

For cxds we can identify and visualize gene pairs driving doublet annoataions, with the expectation that the two genes in a pair might mark different types of cells (see manuscript). In the following we look at the top three pairs, each gene pair is a row in the plot below:

scds =
top3 = metadata(sce)$cxds$topPairs[1:3,]
rs   = rownames(sce)
hb   = rowData(sce)$cxds_hvg_bool
ho   = rowData(sce)$cxds_hvg_ordr[hb]
hgs  = rs[ho]

l1 =  ggdraw() + draw_text("Pair 1", x = 0.5, y = 0.5)
p1 = plotReducedDim(sce,"tsne",color_by=hgs[top3[1,1]])
p2 = plotReducedDim(sce,"tsne",color_by=hgs[top3[1,2]])

l2 =  ggdraw() + draw_text("Pair 2", x = 0.5, y = 0.5)
p3 = plotReducedDim(sce,"tsne",color_by=hgs[top3[2,1]])
p4 = plotReducedDim(sce,"tsne",color_by=hgs[top3[2,2]])

l3 = ggdraw() + draw_text("Pair 3", x = 0.5, y = 0.5)
p5 = plotReducedDim(sce,"tsne",color_by=hgs[top3[3,1]])
p6 = plotReducedDim(sce,"tsne",color_by=hgs[top3[3,2]])

plot_grid(l1,p1,p2,l2,p3,p4,l3,p5,p6,ncol=3, rel_widths = c(1,2,2))

4 Session Info

sessionInfo()
## R Under development (unstable) (2024-11-20 r87352)
## Platform: x86_64-apple-darwin20
## Running under: macOS Monterey 12.7.6
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0
## 
## locale:
## [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: America/New_York
## tzcode source: internal
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] cowplot_1.1.3               Rtsne_0.17                 
##  [3] rsvd_1.0.5                  scater_1.35.0              
##  [5] ggplot2_3.5.1               scuttle_1.17.0             
##  [7] SingleCellExperiment_1.29.1 SummarizedExperiment_1.37.0
##  [9] Biobase_2.67.0              GenomicRanges_1.59.1       
## [11] GenomeInfoDb_1.43.1         IRanges_2.41.1             
## [13] S4Vectors_0.45.2            BiocGenerics_0.53.3        
## [15] generics_0.1.3              MatrixGenerics_1.19.0      
## [17] matrixStats_1.4.1           scds_1.23.0                
## [19] BiocStyle_2.35.0           
## 
## loaded via a namespace (and not attached):
##  [1] tidyselect_1.2.1        viridisLite_0.4.2       farver_2.1.2           
##  [4] dplyr_1.1.4             vipor_0.4.7             viridis_0.6.5          
##  [7] fastmap_1.2.0           pROC_1.18.5             digest_0.6.37          
## [10] lifecycle_1.0.4         magrittr_2.0.3          compiler_4.5.0         
## [13] rlang_1.1.4             sass_0.4.9              tools_4.5.0            
## [16] utf8_1.2.4              yaml_2.3.10             data.table_1.16.2      
## [19] knitr_1.49              labeling_0.4.3          S4Arrays_1.7.1         
## [22] xgboost_1.7.8.1         DelayedArray_0.33.2     plyr_1.8.9             
## [25] abind_1.4-8             BiocParallel_1.41.0     withr_3.0.2            
## [28] grid_4.5.0              fansi_1.0.6             beachmat_2.23.1        
## [31] colorspace_2.1-1        scales_1.3.0            tinytex_0.54           
## [34] cli_3.6.3               rmarkdown_2.29          crayon_1.5.3           
## [37] httr_1.4.7              ggbeeswarm_0.7.2        cachem_1.1.0           
## [40] zlibbioc_1.53.0         parallel_4.5.0          BiocManager_1.30.25    
## [43] XVector_0.47.0          vctrs_0.6.5             Matrix_1.7-1           
## [46] jsonlite_1.8.9          bookdown_0.41           BiocSingular_1.23.0    
## [49] BiocNeighbors_2.1.0     ggrepel_0.9.6           irlba_2.3.5.1          
## [52] beeswarm_0.4.0          magick_2.8.5            jquerylib_0.1.4        
## [55] glue_1.8.0              codetools_0.2-20        gtable_0.3.6           
## [58] UCSC.utils_1.3.0        ScaledMatrix_1.15.0     munsell_0.5.1          
## [61] tibble_3.2.1            pillar_1.9.0            htmltools_0.5.8.1      
## [64] GenomeInfoDbData_1.2.13 R6_2.5.1                evaluate_1.0.1         
## [67] lattice_0.22-6          bslib_0.8.0             Rcpp_1.0.13-1          
## [70] gridExtra_2.3           SparseArray_1.7.2       xfun_0.49              
## [73] pkgconfig_2.0.3