1 Introduction

The function AddModuleScore_UCell() allows operating directly on Seurat objects. UCell scores are calculated from raw counts or normalized data, and returned as metadata columns. The example below defines some simple signatures, and applies them on single-cell data stored in a Seurat object.

To see how this function differs from Seurat’s own AddModuleScore() (not based on per-cell ranks) see this vignette.

2 Get some testing data

For this demo, we will download a single-cell dataset of lung cancer (Zilionis et al. (2019) Immunity) through the scRNA-seq package. This dataset contains >170,000 single cells; for the sake of simplicity, in this demo will we focus on immune cells, according to the annotations by the authors, and downsample to 5000 cells.

library(scRNAseq)

lung <- ZilionisLungData()
immune <- lung$Used & lung$used_in_NSCLC_immune
lung <- lung[,immune]
lung <- lung[,1:5000]

exp.mat <- Matrix::Matrix(counts(lung),sparse = TRUE)

3 Define gene signatures

Here we define some simple gene sets based on the “Human Cell Landscape” signatures Han et al. (2020) Nature. You may edit existing signatures, or add new one as elements in a list.

signatures <- list(
    Tcell = c("CD3D","CD3E","CD3G","CD2","TRAC"),
    Myeloid = c("CD14","LYZ","CSF1R","FCER1G","SPI1","LCK-"),
    NK = c("KLRD1","NCAM1","NKG7","CD3D-","CD3E-"),
    Plasma_cell = c("IGKC","IGHG3","IGHG1","IGHA1","CD19-")
)

4 Run UCell on Seurat object

library(UCell)
library(Seurat)
seurat.object <- CreateSeuratObject(counts = exp.mat, 
                                    project = "Zilionis_immune")
seurat.object <- AddModuleScore_UCell(seurat.object, 
                                      features=signatures, name=NULL)
head(seurat.object[[]])
##             orig.ident nCount_RNA nFeature_RNA Tcell   Myeloid NK Plasma_cell
## bcHTNA Zilionis_immune       7516         2613     0 0.5234000  0  0.00000000
## bcHNVA Zilionis_immune       5684         1981     0 0.5120000  0  0.01991667
## bcALZN Zilionis_immune       4558         1867     0 0.3593333  0  0.00000000
## bcFWBP Zilionis_immune       2915         1308     0 0.1558000  0  0.00000000
## bcBJYE Zilionis_immune       3576         1548     0 0.4639333  0  0.00000000
## bcGSBJ Zilionis_immune       2796         1270     0 0.5460000  0  0.00000000

Generate PCA and UMAP embeddings

seurat.object <- NormalizeData(seurat.object)
seurat.object <- FindVariableFeatures(seurat.object, 
                     selection.method = "vst", nfeatures = 500)
  
seurat.object <- ScaleData(seurat.object)
seurat.object <- RunPCA(seurat.object, npcs = 20, 
                        features=VariableFeatures(seurat.object)) 
seurat.object <- RunUMAP(seurat.object, reduction = "pca", 
                         dims = 1:20, seed.use=123)

Visualize UCell scores on low-dimensional representation (UMAP)

library(ggplot2)
library(patchwork)

FeaturePlot(seurat.object, reduction = "umap", features = names(signatures))

5 Signature smoothing

Single-cell data are sparse. It can be useful to ‘impute’ scores by neighboring cells and partially correct this sparsity. The function SmoothKNN performs smoothing of single-cell scores by weighted average of the k-nearest neighbors in a given dimensionality reduction. It can be applied directly on Seurat objects to smooth UCell scores:

seurat.object <- SmoothKNN(seurat.object,
                           signature.names = names(signatures),
                           reduction="pca")
FeaturePlot(seurat.object, reduction = "umap", features = c("NK","NK_kNN"))

Smoothing (or imputation) has been designed for UCell scores, but it can be applied to any other metadata. For instance, we can perform knn-smoothing of gene expression scores by copying normalized counts for genes of interest into metadata columns:

genes <- c("CD2","CSF1R")

expr <- Seurat::GetAssayData(seurat.object, assay="RNA", slot="data")[genes,]
expr <- t(as.matrix(expr))
seurat.object <- AddMetaData(seurat.object, metadata=as.data.frame(expr))
seurat.object <- SmoothKNN(seurat.object, signature.names=genes,
                           reduction="pca", k=20, suffix = "_smooth")

FeaturePlot(seurat.object, reduction = "umap",
            features = c("CD2","CD2_smooth","CSF1R","CSF1R_smooth"))

6 Resources

Please report any issues at the UCell GitHub repository.

More demos available on the Bioc landing page and at the UCell demo repository.

If you find UCell useful, you may also check out the scGate package, which relies on UCell scores to automatically purify populations of interest based on gene signatures.

See also SignatuR for easy storing and retrieval of gene signatures.

7 References

  • Andreatta, M., Carmona, S. J. (2021) UCell: Robust and scalable single-cell gene signature scoring Computational and Structural Biotechnology Journal
  • Zilionis, R., Engblom, C., …, Klein, A. M. (2019) Single-Cell Transcriptomics of Human and Mouse Lung Cancers Reveals Conserved Myeloid Populations across Individuals and Species Immunity
  • Hao, Yuhan, et al. (2021) Integrated analysis of multimodal single-cell data Cell

8 Session Info

sessionInfo()
## R version 4.2.1 (2022-06-23)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.5 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.16-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.16-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] patchwork_1.1.2             ggplot2_3.3.6              
##  [3] sp_1.5-0                    SeuratObject_4.1.2         
##  [5] Seurat_4.2.0                UCell_2.2.0                
##  [7] scRNAseq_2.11.0             SingleCellExperiment_1.20.0
##  [9] SummarizedExperiment_1.28.0 Biobase_2.58.0             
## [11] GenomicRanges_1.50.0        GenomeInfoDb_1.34.0        
## [13] IRanges_2.32.0              S4Vectors_0.36.0           
## [15] BiocGenerics_0.44.0         MatrixGenerics_1.10.0      
## [17] matrixStats_0.62.0          BiocStyle_2.26.0           
## 
## loaded via a namespace (and not attached):
##   [1] utf8_1.2.2                    reticulate_1.26              
##   [3] tidyselect_1.2.0              RSQLite_2.2.18               
##   [5] AnnotationDbi_1.60.0          htmlwidgets_1.5.4            
##   [7] grid_4.2.1                    BiocParallel_1.32.0          
##   [9] Rtsne_0.16                    munsell_0.5.0                
##  [11] codetools_0.2-18              ica_1.0-3                    
##  [13] future_1.28.0                 miniUI_0.1.1.1               
##  [15] withr_2.5.0                   spatstat.random_3.0-0        
##  [17] colorspace_2.0-3              progressr_0.11.0             
##  [19] filelock_1.0.2                highr_0.9                    
##  [21] knitr_1.40                    ROCR_1.0-11                  
##  [23] tensor_1.5                    listenv_0.8.0                
##  [25] labeling_0.4.2                GenomeInfoDbData_1.2.9       
##  [27] polyclip_1.10-4               farver_2.1.1                 
##  [29] bit64_4.0.5                   parallelly_1.32.1            
##  [31] vctrs_0.5.0                   generics_0.1.3               
##  [33] xfun_0.34                     BiocFileCache_2.6.0          
##  [35] R6_2.5.1                      AnnotationFilter_1.22.0      
##  [37] bitops_1.0-7                  spatstat.utils_3.0-1         
##  [39] cachem_1.0.6                  DelayedArray_0.24.0          
##  [41] assertthat_0.2.1              promises_1.2.0.1             
##  [43] BiocIO_1.8.0                  scales_1.2.1                 
##  [45] rgeos_0.5-9                   gtable_0.3.1                 
##  [47] globals_0.16.1                goftest_1.2-3                
##  [49] ensembldb_2.22.0              rlang_1.0.6                  
##  [51] splines_4.2.1                 rtracklayer_1.58.0           
##  [53] lazyeval_0.2.2                spatstat.geom_3.0-3          
##  [55] BiocManager_1.30.19           yaml_2.3.6                   
##  [57] reshape2_1.4.4                abind_1.4-5                  
##  [59] GenomicFeatures_1.50.0        httpuv_1.6.6                 
##  [61] tools_4.2.1                   bookdown_0.29                
##  [63] ellipsis_0.3.2                spatstat.core_2.4-4          
##  [65] jquerylib_0.1.4               RColorBrewer_1.1-3           
##  [67] ggridges_0.5.4                Rcpp_1.0.9                   
##  [69] plyr_1.8.7                    progress_1.2.2               
##  [71] zlibbioc_1.44.0               purrr_0.3.5                  
##  [73] RCurl_1.98-1.9                prettyunits_1.1.1            
##  [75] rpart_4.1.19                  deldir_1.0-6                 
##  [77] pbapply_1.5-0                 cowplot_1.1.1                
##  [79] zoo_1.8-11                    ggrepel_0.9.1                
##  [81] cluster_2.1.4                 magrittr_2.0.3               
##  [83] magick_2.7.3                  data.table_1.14.4            
##  [85] scattermore_0.8               lmtest_0.9-40                
##  [87] RANN_2.6.1                    ProtGenerics_1.30.0          
##  [89] fitdistrplus_1.1-8            hms_1.1.2                    
##  [91] mime_0.12                     evaluate_0.17                
##  [93] xtable_1.8-4                  XML_3.99-0.12                
##  [95] gridExtra_2.3                 compiler_4.2.1               
##  [97] biomaRt_2.54.0                tibble_3.1.8                 
##  [99] KernSmooth_2.23-20            crayon_1.5.2                 
## [101] htmltools_0.5.3               mgcv_1.8-41                  
## [103] later_1.3.0                   tidyr_1.2.1                  
## [105] DBI_1.1.3                     ExperimentHub_2.6.0          
## [107] dbplyr_2.2.1                  MASS_7.3-58.1                
## [109] rappdirs_0.3.3                Matrix_1.5-1                 
## [111] cli_3.4.1                     parallel_4.2.1               
## [113] igraph_1.3.5                  pkgconfig_2.0.3              
## [115] GenomicAlignments_1.34.0      plotly_4.10.0                
## [117] spatstat.sparse_3.0-0         xml2_1.3.3                   
## [119] bslib_0.4.0                   XVector_0.38.0               
## [121] stringr_1.4.1                 digest_0.6.30                
## [123] sctransform_0.3.5             RcppAnnoy_0.0.20             
## [125] spatstat.data_3.0-0           Biostrings_2.66.0            
## [127] rmarkdown_2.17                leiden_0.4.3                 
## [129] uwot_0.1.14                   restfulr_0.0.15              
## [131] curl_4.3.3                    shiny_1.7.3                  
## [133] Rsamtools_2.14.0              rjson_0.2.21                 
## [135] nlme_3.1-160                  lifecycle_1.0.3              
## [137] jsonlite_1.8.3                BiocNeighbors_1.16.0         
## [139] viridisLite_0.4.1             fansi_1.0.3                  
## [141] pillar_1.8.1                  lattice_0.20-45              
## [143] KEGGREST_1.38.0               fastmap_1.1.0                
## [145] httr_1.4.4                    survival_3.4-0               
## [147] interactiveDisplayBase_1.36.0 glue_1.6.2                   
## [149] png_0.1-7                     BiocVersion_3.16.0           
## [151] bit_4.0.4                     stringi_1.7.8                
## [153] sass_0.4.2                    blob_1.2.3                   
## [155] AnnotationHub_3.6.0           memoise_2.0.1                
## [157] dplyr_1.0.10                  irlba_2.3.5.1                
## [159] future.apply_1.9.1