Contents

Here, we demonstrate a grid search of clustering parameters with a mouse hippocampus VeraFISH dataset. BANKSY currently provides four algorithms for clustering the BANKSY matrix with clusterBanksy: Leiden (default), Louvain, k-means, and model-based clustering. In this vignette, we run only Leiden clustering. See ?clusterBanksy for more details on the parameters for different clustering methods.

1 Loading the data

The dataset comprises gene expression for 10,944 cells and 120 genes in 2 spatial dimensions. See ?Banksy::hippocampus for more details.

# Load libs
library(Banksy)

library(SummarizedExperiment)
library(SpatialExperiment)
library(scuttle)

library(scater)
library(cowplot)
library(ggplot2)

# Load data
data(hippocampus)
gcm <- hippocampus$expression
locs <- as.matrix(hippocampus$locations)

Here, gcm is a gene by cell matrix, and locs is a matrix specifying the coordinates of the centroid for each cell.

head(gcm[,1:5])
#>         cell_1276 cell_8890 cell_691 cell_396 cell_9818
#> Sparcl1        45         0       11       22         0
#> Slc1a2         17         0        6        5         0
#> Map            10         0       12       16         0
#> Sqstm1         26         0        0        2         0
#> Atp1a2          0         0        4        3         0
#> Tnc             0         0        0        0         0
head(locs)
#>                 sdimx    sdimy
#> cell_1276  -13372.899 15776.37
#> cell_8890    8941.101 15866.37
#> cell_691   -14882.899 15896.37
#> cell_396   -15492.899 15835.37
#> cell_9818   11308.101 15846.37
#> cell_11310  14894.101 15810.37

Initialize a SpatialExperiment object and perform basic quality control. We keep cells with total transcript count within the 5th and 98th percentile:

se <- SpatialExperiment(assay = list(counts = gcm), spatialCoords = locs)
colData(se) <- cbind(colData(se), spatialCoords(se))

# QC based on total counts
qcstats <- perCellQCMetrics(se)
thres <- quantile(qcstats$total, c(0.05, 0.98))
keep <- (qcstats$total > thres[1]) & (qcstats$total < thres[2])
se <- se[, keep]

Next, perform normalization of the data.

# Normalization to mean library size
se <- computeLibraryFactors(se)
aname <- "normcounts"
assay(se, aname) <- normalizeCounts(se, log = FALSE)

2 Parameters

BANKSY has a few key parameters. We describe these below.

For characterising neighborhoods, BANKSY computes the weighted neighborhood mean (H_0) and the azimuthal Gabor filter (H_1), which estimates gene expression gradients. Setting use_agf=TRUE computes both H_0 and H_1.

k_geom specifies the number of neighbors used to compute each H_m for m=0,1. If a single value is specified, the same k_geom will be used for each feature matrix. Alternatively, multiple values of k_geom can be provided for each feature matrix. Here, we use k_geom[1]=15 and k_geom[2]=30 for H_0 and H_1 respectively. More neighbors are used to compute gradients.

We compute the neighborhood feature matrices using normalized expression (normcounts in the se object).

k_geom <- c(15, 30)
se <- computeBanksy(se, assay_name = aname, compute_agf = TRUE, k_geom = k_geom)
#> Computing neighbors...
#> Spatial mode is kNN_median
#> Parameters: k_geom=15
#> Done
#> Computing neighbors...
#> Spatial mode is kNN_median
#> Parameters: k_geom=30
#> Done
#> Computing harmonic m = 0
#> Using 15 neighbors
#> Done
#> Computing harmonic m = 1
#> Using 30 neighbors
#> Centering
#> Done

computeBanksy populates the assays slot with H_0 and H_1 in this instance:

se
#> class: SpatialExperiment 
#> dim: 120 10205 
#> metadata(1): BANKSY_params
#> assays(4): counts normcounts H0 H1
#> rownames(120): Sparcl1 Slc1a2 ... Notch3 Egfr
#> rowData names(0):
#> colnames(10205): cell_1276 cell_691 ... cell_11635 cell_10849
#> colData names(4): sample_id sdimx sdimy sizeFactor
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
#> spatialCoords names(2) : sdimx sdimy
#> imgData names(1): sample_id

The lambda parameter is a mixing parameter in [0,1] which determines how much spatial information is incorporated for downstream analysis. With smaller values of lambda, BANKY operates in cell-typing mode, while at higher levels of lambda, BANKSY operates in zone-finding mode. As a starting point, we recommend lambda=0.2 for cell-typing and lambda=0.8 for zone-finding. Here, we run lambda=0 which corresponds to non-spatial clustering, and lambda=0.2 for spatially-informed cell-typing. We compute PCs with and without the AGF (H_1).

lambda <- c(0, 0.2)
se <- runBanksyPCA(se, use_agf = c(FALSE, TRUE), lambda = lambda, seed = 1000)
#> Using seed=1000
#> Using seed=1000
#> Using seed=1000
#> Using seed=1000

runBanksyPCA populates the reducedDims slot, with each combination of use_agf and lambda provided.

reducedDimNames(se)
#> [1] "PCA_M0_lam0"   "PCA_M0_lam0.2" "PCA_M1_lam0"   "PCA_M1_lam0.2"

Next, we cluster the BANKSY embedding with Leiden graph-based clustering. This admits two parameters: k_neighbors and resolution. k_neighbors determines the number of k nearest neighbors used to construct the shared nearest neighbors graph. Leiden clustering is then performed on the resultant graph with resolution resolution. For reproducibiltiy we set a seed for each parameter combination.

k <- 50
res <- 1
se <- clusterBanksy(se, use_agf = c(FALSE, TRUE), lambda = lambda, k_neighbors = k, resolution = res, seed = 1000)
#> Using seed=1000
#> Using seed=1000
#> Using seed=1000
#> Using seed=1000

clusterBanksy populates colData(se) with cluster labels:

colnames(colData(se))
#> [1] "sample_id"                "sdimx"                   
#> [3] "sdimy"                    "sizeFactor"              
#> [5] "clust_M0_lam0_k50_res1"   "clust_M0_lam0.2_k50_res1"
#> [7] "clust_M1_lam0_k50_res1"   "clust_M1_lam0.2_k50_res1"

To compare clustering runs visually, different runs can be relabeled to minimise their differences with connectClusters:

se <- connectClusters(se)
#> clust_M1_lam0_k50_res1 --> clust_M0_lam0_k50_res1
#> clust_M0_lam0.2_k50_res1 --> clust_M1_lam0_k50_res1
#> clust_M1_lam0.2_k50_res1 --> clust_M0_lam0.2_k50_res1

Visualise spatial coordinates with cluster labels.

cnames <- colnames(colData(se))
cnames <- cnames[grep("^clust", cnames)]
cplots <- lapply(cnames, function(cnm) {
    plotColData(se, x = "sdimx", y = "sdimy", point_size = 0.1, colour_by = cnm) +
        coord_equal() +
        labs(title = cnm) +
        theme(legend.title = element_blank()) +
        guides(colour = guide_legend(override.aes = list(size = 2)))
})

plot_grid(plotlist = cplots, ncol = 2)

Compare all cluster outputs with compareClusters. This function computes pairwise cluster comparison metrics between the clusters in colData(se) based on adjusted Rand index (ARI):

compareClusters(se, func = "ARI")
#>                          clust_M0_lam0_k50_res1 clust_M0_lam0.2_k50_res1
#> clust_M0_lam0_k50_res1                    1.000                     0.67
#> clust_M0_lam0.2_k50_res1                  0.670                     1.00
#> clust_M1_lam0_k50_res1                    1.000                     0.67
#> clust_M1_lam0.2_k50_res1                  0.747                     0.87
#>                          clust_M1_lam0_k50_res1 clust_M1_lam0.2_k50_res1
#> clust_M0_lam0_k50_res1                    1.000                    0.747
#> clust_M0_lam0.2_k50_res1                  0.670                    0.870
#> clust_M1_lam0_k50_res1                    1.000                    0.747
#> clust_M1_lam0.2_k50_res1                  0.747                    1.000

or normalized mutual information (NMI):

compareClusters(se, func = "NMI")
#>                          clust_M0_lam0_k50_res1 clust_M0_lam0.2_k50_res1
#> clust_M0_lam0_k50_res1                    1.000                    0.741
#> clust_M0_lam0.2_k50_res1                  0.741                    1.000
#> clust_M1_lam0_k50_res1                    1.000                    0.741
#> clust_M1_lam0.2_k50_res1                  0.782                    0.915
#>                          clust_M1_lam0_k50_res1 clust_M1_lam0.2_k50_res1
#> clust_M0_lam0_k50_res1                    1.000                    0.782
#> clust_M0_lam0.2_k50_res1                  0.741                    0.915
#> clust_M1_lam0_k50_res1                    1.000                    0.782
#> clust_M1_lam0.2_k50_res1                  0.782                    1.000

See ?compareClusters for the full list of comparison measures.

3 Session information

Vignette runtime:

#> Time difference of 51.71264 secs
sessionInfo()
#> R Under development (unstable) (2024-01-16 r85808)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 22.04.3 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.19-bioc/R/lib/libRblas.so 
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB              LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: America/New_York
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#>  [1] ExperimentHub_2.11.1        AnnotationHub_3.11.1       
#>  [3] BiocFileCache_2.11.1        dbplyr_2.4.0               
#>  [5] spatialLIBD_1.15.1          ggspavis_1.9.0             
#>  [7] cowplot_1.1.3               scater_1.31.2              
#>  [9] ggplot2_3.4.4               harmony_1.2.0              
#> [11] Rcpp_1.0.12                 data.table_1.15.0          
#> [13] scran_1.31.2                scuttle_1.13.0             
#> [15] Seurat_5.0.1                SeuratObject_5.0.1         
#> [17] sp_2.1-3                    SpatialExperiment_1.13.0   
#> [19] SingleCellExperiment_1.25.0 SummarizedExperiment_1.33.3
#> [21] Biobase_2.63.0              GenomicRanges_1.55.3       
#> [23] GenomeInfoDb_1.39.6         IRanges_2.37.1             
#> [25] S4Vectors_0.41.3            BiocGenerics_0.49.1        
#> [27] MatrixGenerics_1.15.0       matrixStats_1.2.0          
#> [29] Banksy_0.99.7               BiocStyle_2.31.0           
#> 
#> loaded via a namespace (and not attached):
#>   [1] spatstat.sparse_3.0-3     bitops_1.0-7             
#>   [3] doParallel_1.0.17         httr_1.4.7               
#>   [5] RColorBrewer_1.1-3        tools_4.4.0              
#>   [7] sctransform_0.4.1         DT_0.32                  
#>   [9] utf8_1.2.4                R6_2.5.1                 
#>  [11] lazyeval_0.2.2            uwot_0.1.16              
#>  [13] withr_3.0.0               gridExtra_2.3            
#>  [15] progressr_0.14.0          cli_3.6.2                
#>  [17] spatstat.explore_3.2-6    fastDummies_1.7.3        
#>  [19] labeling_0.4.3            sass_0.4.8               
#>  [21] spatstat.data_3.0-4       ggridges_0.5.6           
#>  [23] pbapply_1.7-2             Rsamtools_2.19.3         
#>  [25] dbscan_1.1-12             aricode_1.0.3            
#>  [27] dichromat_2.0-0.1         sessioninfo_1.2.2        
#>  [29] parallelly_1.37.0         attempt_0.3.1            
#>  [31] maps_3.4.2                limma_3.59.3             
#>  [33] pals_1.8                  RSQLite_2.3.5            
#>  [35] BiocIO_1.13.0             generics_0.1.3           
#>  [37] ica_1.0-3                 spatstat.random_3.2-2    
#>  [39] dplyr_1.1.4               Matrix_1.6-5             
#>  [41] ggbeeswarm_0.7.2          fansi_1.0.6              
#>  [43] abind_1.4-5               lifecycle_1.0.4          
#>  [45] yaml_2.3.8                edgeR_4.1.17             
#>  [47] SparseArray_1.3.4         Rtsne_0.17               
#>  [49] paletteer_1.6.0           grid_4.4.0               
#>  [51] blob_1.2.4                promises_1.2.1           
#>  [53] dqrng_0.3.2               crayon_1.5.2             
#>  [55] miniUI_0.1.1.1            lattice_0.22-5           
#>  [57] beachmat_2.19.1           mapproj_1.2.11           
#>  [59] KEGGREST_1.43.0           magick_2.8.3             
#>  [61] pillar_1.9.0              knitr_1.45               
#>  [63] metapod_1.11.1            rjson_0.2.21             
#>  [65] future.apply_1.11.1       codetools_0.2-19         
#>  [67] leiden_0.4.3.1            glue_1.7.0               
#>  [69] vctrs_0.6.5               png_0.1-8                
#>  [71] spam_2.10-0               gtable_0.3.4             
#>  [73] rematch2_2.1.2            cachem_1.0.8             
#>  [75] xfun_0.42                 S4Arrays_1.3.3           
#>  [77] mime_0.12                 ggside_0.2.3             
#>  [79] survival_3.5-8            RcppHungarian_0.3        
#>  [81] iterators_1.0.14          fields_15.2              
#>  [83] statmod_1.5.0             bluster_1.13.0           
#>  [85] ellipsis_0.3.2            fitdistrplus_1.1-11      
#>  [87] ROCR_1.0-11               nlme_3.1-164             
#>  [89] bit64_4.0.5               filelock_1.0.3           
#>  [91] RcppAnnoy_0.0.22          bslib_0.6.1              
#>  [93] irlba_2.3.5.1             vipor_0.4.7              
#>  [95] KernSmooth_2.23-22        colorspace_2.1-0         
#>  [97] DBI_1.2.2                 tidyselect_1.2.0         
#>  [99] bit_4.0.5                 compiler_4.4.0           
#> [101] curl_5.2.0                BiocNeighbors_1.21.2     
#> [103] DelayedArray_0.29.4       plotly_4.10.4            
#> [105] rtracklayer_1.63.0        bookdown_0.37            
#> [107] scales_1.3.0              lmtest_0.9-40            
#> [109] rappdirs_0.3.3            stringr_1.5.1            
#> [111] digest_0.6.34             goftest_1.2-3            
#> [113] spatstat.utils_3.0-4      rmarkdown_2.25           
#> [115] benchmarkmeData_1.0.4     RhpcBLASctl_0.23-42      
#> [117] XVector_0.43.1            htmltools_0.5.7          
#> [119] pkgconfig_2.0.3           sparseMatrixStats_1.15.0 
#> [121] highr_0.10                fastmap_1.1.1            
#> [123] rlang_1.1.3               htmlwidgets_1.6.4        
#> [125] shiny_1.8.0               DelayedMatrixStats_1.25.1
#> [127] farver_2.1.1              jquerylib_0.1.4          
#> [129] zoo_1.8-12                jsonlite_1.8.8           
#> [131] BiocParallel_1.37.0       mclust_6.0.1             
#> [133] config_0.3.2              BiocSingular_1.19.0      
#> [135] RCurl_1.98-1.14           magrittr_2.0.3           
#> [137] GenomeInfoDbData_1.2.11   dotCall64_1.1-1          
#> [139] patchwork_1.2.0           munsell_0.5.0            
#> [141] viridis_0.6.5             reticulate_1.35.0        
#> [143] leidenAlg_1.1.2           stringi_1.8.3            
#> [145] zlibbioc_1.49.0           MASS_7.3-60.2            
#> [147] plyr_1.8.9                parallel_4.4.0           
#> [149] listenv_0.9.1             ggrepel_0.9.5            
#> [151] deldir_2.0-2              Biostrings_2.71.2        
#> [153] sccore_1.0.4              splines_4.4.0            
#> [155] tensor_1.5                locfit_1.5-9.8           
#> [157] igraph_2.0.2              spatstat.geom_3.2-8      
#> [159] RcppHNSW_0.6.0            reshape2_1.4.4           
#> [161] ScaledMatrix_1.11.0       XML_3.99-0.16.1          
#> [163] BiocVersion_3.19.1        evaluate_0.23            
#> [165] golem_0.4.1               BiocManager_1.30.22      
#> [167] foreach_1.5.2             httpuv_1.6.14            
#> [169] RANN_2.6.1                tidyr_1.3.1              
#> [171] purrr_1.0.2               polyclip_1.10-6          
#> [173] benchmarkme_1.0.8         future_1.33.1            
#> [175] scattermore_1.2           rsvd_1.0.5               
#> [177] xtable_1.8-4              restfulr_0.0.15          
#> [179] RSpectra_0.16-1           later_1.3.2              
#> [181] viridisLite_0.4.2         tibble_3.2.1             
#> [183] GenomicAlignments_1.39.4  AnnotationDbi_1.65.2     
#> [185] memoise_2.0.1             beeswarm_0.4.0           
#> [187] cluster_2.1.6             shinyWidgets_0.8.1       
#> [189] globals_0.16.2