1 Introduction

Cell-Type-specific Spatially Variable gene detection, or CTSV, is an R package for identifying cell-type-specific spatially variable genes in bulk sptial transcriptomics data. In this Vignette, we will introduce a standard workflow of CTSV. By utilizing single-cell RNA sequencing data as reference, we can first use existing deconvolution methods to obtain cell type proportions for each spot. Subsequently, we take the cell type proportions, location coordinates and spatial expression data as input to CTSV.

2 Usage guide

2.1 Install CTSV

CTSV is an R package available in the Bioconductor repository. It requires installing the R open source statistical programming language, which can be accessed on any operating system from CRAN. Next, you can install CTSV by using the following commands in your R session:

if (!requireNamespace("BiocManager", quietly = TRUE)) {
      install.packages("BiocManager")
  }

BiocManager::install("CTSV", version = "devel")

## Check that you have a valid Bioconductor installation
BiocManager::valid()

If there are any issues with the installation procedure or package features, the best place would be to commit an issue at the GitHub repository.

2.2 Load example data

In order to run RCTD, the first step is to get cell-type proportions. There are some deconvolution methods such as RCTD, SPOTlight, SpatialDWLS and CARD. We provide an example data including the observed raw count bulk ST data, the location coordinate matrix, the cell-type proportion matrix and the true SV gene patterns.

suppressPackageStartupMessages(library(CTSV))
suppressPackageStartupMessages(library(SpatialExperiment))
data("CTSVexample_data", package="CTSV")
spe <- CTSVexample_data[[1]]
W <- CTSVexample_data[[2]]
gamma_true <- CTSVexample_data[[3]]
Y <- assay(spe)
# dimension of bulk ST data
dim(Y)
#> [1]  20 100
# dimension of cell-type proportion matrix:
dim(W)
#> [1] 100   2
# SV genes in each cell type:
colnames(Y)[which(gamma_true[,1] == 1)]
#> [1] "spot11" "spot12" "spot13"
colnames(Y)[which(gamma_true[,2] == 1)]
#> [1] "spot14" "spot15" "spot16"
# Number of SV genes at the aggregated level:
sum(rowSums(gamma_true)>0)
#> [1] 6

2.3 Running CTSV

We are now ready to run CTSV on the bulk ST data using ctsv function.

  • spe is a SpatialExperiment class object.
  • W is the cell-type-specific matrix with \(n\times K\) dimensions, where \(K\) is the number of cell types.
  • num_core: for parallel processing, the number of cores used. If set to 1, parallel processing is not used. The system will additionally be checked for number of available cores. Note, that we recommend setting num_core to at least 4 or 8 to improve efficiency.
  • BPPARAM: Optional additional argument for parallelization. The default is NULL, in which case num_core will be used.
result <- CTSV(spe,W,num_core = 8)

2.4 CTSV results

The results of CTSV are located in a list.

  • pval, combined p-values, a \(G\times 2K\) matrix.
  • qval stores adjusted q-values of the combined p-values, it is a \(G \times 2K\) matrix.
# View on q-value matrix
head(result$qval)
#>             [,1]      [,2]      [,3]      [,4]
#> gene1 0.19608203 0.0551131 0.4798779 0.1473743
#> gene2 0.48138007 0.1340526 0.5496413 0.1111610
#> gene3 0.51617289 0.4432612 0.5396344 0.4848866
#> gene4 0.32941836 0.4644302 0.3294184 0.5622568
#> gene5 0.06301157 0.4137664 0.5202114 0.1050800
#> gene6 0.13405257 0.5223837 0.3294184 0.3078812

Then we want to extra SV genes with an FDR level at 0.05 using svGene function. We use the q-value matrix qval returned by the CTSV function and a threshold of 0.05 as input. The output of the svGene is a list containing two elements, the first of which is a \(G\times 2K\) 0-1 matrix indicating SV genes in each cell type and axis, denoted as SV. The second element is a list with names of SV genes in each cell type, denoted as SVGene.

re <- svGene(result$qval,0.05)
# SV genes in each cell type:
re$SVGene
#> [[1]]
#> [1] "gene8"  "gene11" "gene12" "gene13"
#> 
#> [[2]]
#> [1] "gene7"  "gene14" "gene15" "gene16" "gene17" "gene19"

3 Session information

sessionInfo()
#> R version 4.2.1 (2022-06-23)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 20.04.5 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.16-bioc/R/lib/libRblas.so
#> LAPACK: /home/biocbuild/bbs-3.16-bioc/R/lib/libRlapack.so
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB              LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#>  [1] SpatialExperiment_1.8.0     SingleCellExperiment_1.20.0
#>  [3] SummarizedExperiment_1.28.0 Biobase_2.58.0             
#>  [5] GenomicRanges_1.50.0        GenomeInfoDb_1.34.0        
#>  [7] IRanges_2.32.0              S4Vectors_0.36.0           
#>  [9] BiocGenerics_0.44.0         MatrixGenerics_1.10.0      
#> [11] matrixStats_0.62.0          CTSV_1.0.0                 
#> [13] BiocStyle_2.26.0           
#> 
#> loaded via a namespace (and not attached):
#>  [1] bitops_1.0-7              tools_4.2.1              
#>  [3] bslib_0.4.0               utf8_1.2.2               
#>  [5] R6_2.5.1                  HDF5Array_1.26.0         
#>  [7] DBI_1.1.3                 colorspace_2.0-3         
#>  [9] rhdf5filters_1.10.0       tidyselect_1.2.0         
#> [11] compiler_4.2.1            cli_3.4.1                
#> [13] DelayedArray_0.24.0       bookdown_0.29            
#> [15] sass_0.4.2                scales_1.2.1             
#> [17] stringr_1.4.1             digest_0.6.30            
#> [19] rmarkdown_2.17            R.utils_2.12.1           
#> [21] XVector_0.38.0            pscl_1.5.5               
#> [23] pkgconfig_2.0.3           htmltools_0.5.3          
#> [25] sparseMatrixStats_1.10.0  fastmap_1.1.0            
#> [27] limma_3.54.0              rlang_1.0.6              
#> [29] DelayedMatrixStats_1.20.0 jquerylib_0.1.4          
#> [31] generics_0.1.3            jsonlite_1.8.3           
#> [33] BiocParallel_1.32.0       dplyr_1.0.10             
#> [35] R.oo_1.25.0               RCurl_1.98-1.9           
#> [37] magrittr_2.0.3            GenomeInfoDbData_1.2.9   
#> [39] scuttle_1.8.0             Matrix_1.5-1             
#> [41] Rcpp_1.0.9                munsell_0.5.0            
#> [43] Rhdf5lib_1.20.0           fansi_1.0.3              
#> [45] lifecycle_1.0.3           R.methodsS3_1.8.2        
#> [47] stringi_1.7.8             yaml_2.3.6               
#> [49] edgeR_3.40.0              MASS_7.3-58.1            
#> [51] zlibbioc_1.44.0           rhdf5_2.42.0             
#> [53] plyr_1.8.7                qvalue_2.30.0            
#> [55] grid_4.2.1                parallel_4.2.1           
#> [57] dqrng_0.3.0               lattice_0.20-45          
#> [59] beachmat_2.14.0           splines_4.2.1            
#> [61] locfit_1.5-9.6            magick_2.7.3             
#> [63] knitr_1.40                pillar_1.8.1             
#> [65] rjson_0.2.21              reshape2_1.4.4           
#> [67] codetools_0.2-18          glue_1.6.2               
#> [69] evaluate_0.17             BiocManager_1.30.19      
#> [71] vctrs_0.5.0               gtable_0.3.1             
#> [73] assertthat_0.2.1          cachem_1.0.6             
#> [75] ggplot2_3.3.6             xfun_0.34                
#> [77] DropletUtils_1.18.0       tibble_3.1.8