Contents

1 Introduction

nnSVG is a method for scalable identification of spatially variable genes (SVGs) in spatially-resolved transcriptomics data.

The nnSVG method is based on nearest-neighbor Gaussian processes (Datta et al., 2016, Finley et al., 2019) and uses the BRISC algorithm (Saha and Datta, 2018) for model fitting and parameter estimation. nnSVG allows identification and ranking of SVGs with flexible length scales across a tissue slide or within spatial domains defined by covariates. The method scales linearly with the number of spatial locations and can be applied to datasets containing thousands or more spatial locations.

nnSVG is implemented as an R package within the Bioconductor framework, and is available from Bioconductor.

More details describing the method are available in our preprint, available from bioRxiv.

2 Installation

The following code will install the latest release version of the nnSVG package from Bioconductor. Additional details are shown on the Bioconductor page.

if (!requireNamespace("BiocManager", quietly = TRUE)) {
  install.packages("BiocManager")
}
BiocManager::install("nnSVG")

The latest development version can also be installed from the devel version of Bioconductor or from GitHub.

3 Input data format

In the examples below, we assume the input data are provided as a SpatialExperiment Bioconductor object. In this case, the outputs are stored in the rowData of the SpatialExperiment object.

However, the inputs can also be provided as a numeric matrix of normalized and transformed counts (e.g. log-transformed normalized counts) and a numeric matrix of spatial coordinates.

To provide the inputs as numeric matrices, please install the development version of the package from GitHub or the devel version of Bioconductor (which will become the new Bioconductor release version in October 2022).

4 Tutorial

4.1 Run nnSVG

Here we show a short example demonstrating how to run nnSVG.

For faster runtime in this example, we subsample the dataset and run nnSVG on only a small number of genes. For a full analysis, the subsampling step can be skipped.

library(SpatialExperiment)
library(STexampleData)
library(scran)
library(nnSVG)
library(ggplot2)
# load example dataset from STexampleData package
spe <- Visium_humanDLPFC()

dim(spe)
## [1] 33538  4992
# preprocessing steps

# keep only spots over tissue
spe <- spe[, colData(spe)$in_tissue == 1]
dim(spe)
## [1] 33538  3639
# skip spot-level quality control, since this has been performed previously 
# on this dataset

# filter low-expressed and mitochondrial genes
# using default filtering parameters
spe <- filter_genes(spe)
## Gene filtering: removing mitochondrial genes
## removed 13 mitochondrial genes
## Gene filtering: retaining genes with at least 3 counts in at least 0.5% (n = 19) of spatial locations
## removed 30216 out of 33525 genes due to low expression
# calculate log-transformed normalized counts using scran package
# (alternatively could calculate deviance residuals using scry package)
set.seed(123)
qclus <- quickCluster(spe)
spe <- computeSumFactors(spe, cluster = qclus)
spe <- logNormCounts(spe)

assayNames(spe)
## [1] "counts"    "logcounts"
# select small set of random genes and several known SVGs for 
# faster runtime in this example
set.seed(123)
ix_random <- sample(seq_len(nrow(spe)), 10)

known_genes <- c("MOBP", "PCP4", "SNAP25", "HBB", "IGKC", "NPY")
ix_known <- which(rowData(spe)$gene_name %in% known_genes)

ix <- c(ix_known, ix_random)

spe <- spe[ix, ]
dim(spe)
## [1]   16 3639
# run nnSVG

# set seed for reproducibility
set.seed(123)
# using a single thread in this example
spe <- nnSVG(spe)

# show results
rowData(spe)
## DataFrame with 16 rows and 17 columns
##                         gene_id   gene_name    feature_type   sigma.sq
##                     <character> <character>     <character>  <numeric>
## ENSG00000211592 ENSG00000211592        IGKC Gene Expression   0.591619
## ENSG00000168314 ENSG00000168314        MOBP Gene Expression   1.864593
## ENSG00000122585 ENSG00000122585         NPY Gene Expression   0.295111
## ENSG00000244734 ENSG00000244734         HBB Gene Expression   0.348191
## ENSG00000132639 ENSG00000132639      SNAP25 Gene Expression   0.338292
## ...                         ...         ...             ...        ...
## ENSG00000130382 ENSG00000130382       MLLT1 Gene Expression 0.00782205
## ENSG00000036672 ENSG00000036672        USP2 Gene Expression 0.00239457
## ENSG00000086232 ENSG00000086232     EIF2AK1 Gene Expression 0.00320466
## ENSG00000106278 ENSG00000106278      PTPRZ1 Gene Expression 0.00476292
## ENSG00000133606 ENSG00000133606       MKRN1 Gene Expression 0.00543859
##                    tau.sq       phi    loglik   runtime      mean       var
##                 <numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
## ENSG00000211592  0.464762 20.035566  -4580.98     1.515  0.630200  1.042847
## ENSG00000168314  0.371646  0.922937  -3716.46     0.902  0.841100  1.382681
## ENSG00000122585  0.302841 68.183198  -4087.69     1.219  0.401353  0.599801
## ENSG00000244734  0.365750 27.611193  -4114.59     2.474  0.418996  0.729640
## ENSG00000132639  0.440346  3.570016  -3940.04     1.251  3.464790  0.779762
## ...                   ...       ...       ...       ...       ...       ...
## ENSG00000130382  0.291221 53.659081  -2966.12     1.978  0.299885  0.299134
## ENSG00000036672  0.243720 23.418486  -2612.19     1.778  0.248144  0.246162
## ENSG00000086232  0.275781 26.755704  -2840.19     1.309  0.276816  0.279064
## ENSG00000106278  0.385881  6.565846  -3451.62     1.350  0.357875  0.390770
## ENSG00000133606  0.277671  0.537947  -2861.92     1.193  0.295867  0.283165
##                     spcov    prop_sv loglik_lm   LR_stat      rank       pval
##                 <numeric>  <numeric> <numeric> <numeric> <numeric>  <numeric>
## ENSG00000211592  1.220514   0.560043  -5239.35  1316.742         3          0
## ENSG00000168314  1.623470   0.833808  -5752.58  4072.246         1          0
## ENSG00000122585  1.353525   0.493536  -4232.97   290.543         6          0
## ENSG00000244734  1.408312   0.487703  -4589.50   949.823         4          0
## ENSG00000132639  0.167868   0.434466  -4710.39  1540.685         2          0
## ...                   ...        ...       ...       ...       ...        ...
## ENSG00000130382  0.294921 0.02615689  -2967.13  2.018403        13 0.36451001
## ENSG00000036672  0.197201 0.00972948  -2612.50  0.629481        15 0.72997811
## ENSG00000086232  0.204503 0.01148682  -2840.77  1.141738        14 0.56503430
## ENSG00000106278  0.192844 0.01219248  -3453.35  3.459945        12 0.17728929
## ENSG00000133606  0.249257 0.01921021  -2867.31 10.770357         9 0.00458402
##                       padj
##                  <numeric>
## ENSG00000211592          0
## ENSG00000168314          0
## ENSG00000122585          0
## ENSG00000244734          0
## ENSG00000132639          0
## ...                    ...
## ENSG00000130382 0.44862770
## ENSG00000036672 0.77864332
## ENSG00000086232 0.64575349
## ENSG00000106278 0.23638572
## ENSG00000133606 0.00814937

4.2 Investigate results

The results are stored in the rowData of the SpatialExperiment object.

The main results of interest are:

  • LR_stat: likelihood ratio (LR) statistics
  • rank: rank of top SVGs according to LR statistics
  • pval: p-values from asymptotic chi-squared distribution with 2 degrees of freedom
  • padj: p-values adjusted for multiple testing, which can be used to define a cutoff for statistically significant SVGs (e.g. padj <= 0.05)
  • prop_sv: effect size, defined as proportion of spatial variance out of total variance
# number of significant SVGs
table(rowData(spe)$padj <= 0.05)
## 
## FALSE  TRUE 
##     7     9
# show results for top n SVGs
rowData(spe)[order(rowData(spe)$rank)[1:10], ]
## DataFrame with 10 rows and 17 columns
##                         gene_id   gene_name    feature_type   sigma.sq
##                     <character> <character>     <character>  <numeric>
## ENSG00000168314 ENSG00000168314        MOBP Gene Expression 1.86459294
## ENSG00000132639 ENSG00000132639      SNAP25 Gene Expression 0.33829228
## ENSG00000211592 ENSG00000211592        IGKC Gene Expression 0.59161928
## ENSG00000244734 ENSG00000244734         HBB Gene Expression 0.34819123
## ENSG00000183036 ENSG00000183036        PCP4 Gene Expression 0.22354847
## ENSG00000122585 ENSG00000122585         NPY Gene Expression 0.29511061
## ENSG00000129562 ENSG00000129562        DAD1 Gene Expression 0.03687246
## ENSG00000114923 ENSG00000114923      SLC4A3 Gene Expression 0.01123674
## ENSG00000133606 ENSG00000133606       MKRN1 Gene Expression 0.00543859
## ENSG00000149923 ENSG00000149923       PPP4C Gene Expression 0.12004347
##                    tau.sq        phi    loglik   runtime      mean       var
##                 <numeric>  <numeric> <numeric> <numeric> <numeric> <numeric>
## ENSG00000168314  0.371646   0.922937  -3716.46     0.902  0.841100  1.382681
## ENSG00000132639  0.440346   3.570016  -3940.04     1.251  3.464790  0.779762
## ENSG00000211592  0.464762  20.035566  -4580.98     1.515  0.630200  1.042847
## ENSG00000244734  0.365750  27.611193  -4114.59     2.474  0.418996  0.729640
## ENSG00000183036  0.456889   8.700988  -4041.98     2.091  0.684281  0.681316
## ENSG00000122585  0.302841  68.183198  -4087.69     1.219  0.401353  0.599801
## ENSG00000129562  0.484816   8.805056  -3942.80     1.225  0.561114  0.523034
## ENSG00000114923  0.237750  16.239042  -2621.78     1.072  0.249525  0.249055
## ENSG00000133606  0.277671   0.537947  -2861.92     1.193  0.295867  0.283165
## ENSG00000149923  0.132992 198.872410  -2660.25     4.130  0.235632  0.253096
##                     spcov   prop_sv loglik_lm    LR_stat      rank        pval
##                 <numeric> <numeric> <numeric>  <numeric> <numeric>   <numeric>
## ENSG00000168314  1.623470 0.8338075  -5752.58 4072.24582         1 0.00000e+00
## ENSG00000132639  0.167868 0.4344665  -4710.39 1540.68468         2 0.00000e+00
## ENSG00000211592  1.220514 0.5600433  -5239.35 1316.74199         3 0.00000e+00
## ENSG00000244734  1.408312 0.4877029  -4589.50  949.82287         4 0.00000e+00
## ENSG00000183036  0.690958 0.3285363  -4464.82  845.69334         5 0.00000e+00
## ENSG00000122585  1.353525 0.4935363  -4232.97  290.54324         6 0.00000e+00
## ENSG00000129562  0.342215 0.0706791  -3983.78   81.97837         7 0.00000e+00
## ENSG00000114923  0.424820 0.0451298  -2633.77   23.96612         8 6.24917e-06
## ENSG00000133606  0.249257 0.0192102  -2867.31   10.77036         9 4.58402e-03
## ENSG00000149923  1.470399 0.4744140  -2663.05    5.60524        10 6.06510e-02
##                        padj
##                   <numeric>
## ENSG00000168314 0.00000e+00
## ENSG00000132639 0.00000e+00
## ENSG00000211592 0.00000e+00
## ENSG00000244734 0.00000e+00
## ENSG00000183036 0.00000e+00
## ENSG00000122585 0.00000e+00
## ENSG00000129562 0.00000e+00
## ENSG00000114923 1.24983e-05
## ENSG00000133606 8.14937e-03
## ENSG00000149923 9.70416e-02
# plot spatial expression of top-ranked SVG
ix <- which(rowData(spe)$rank == 1)
ix_name <- rowData(spe)$gene_name[ix]
ix_name
## [1] "MOBP"
df <- as.data.frame(
  cbind(spatialCoords(spe), 
        expr = counts(spe)[ix, ]))

ggplot(df, aes(x = pxl_col_in_fullres, y = pxl_row_in_fullres, color = expr)) + 
  geom_point(size = 0.8) + 
  coord_fixed() + 
  scale_y_reverse() + 
  scale_color_gradient(low = "gray90", high = "blue", name = "counts") + 
  ggtitle(ix_name) + 
  theme_bw() + 
  theme(panel.grid = element_blank(), 
        axis.title = element_blank(), 
        axis.text = element_blank(), 
        axis.ticks = element_blank())

5 Session information

sessionInfo()
## R version 4.2.1 (2022-06-23)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.4 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.15-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.15-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] ggplot2_3.3.6               nnSVG_1.0.4                
##  [3] scran_1.24.0                scuttle_1.6.2              
##  [5] STexampleData_1.4.5         ExperimentHub_2.4.0        
##  [7] AnnotationHub_3.4.0         BiocFileCache_2.4.0        
##  [9] dbplyr_2.2.1                SpatialExperiment_1.6.0    
## [11] SingleCellExperiment_1.18.0 SummarizedExperiment_1.26.1
## [13] Biobase_2.56.0              GenomicRanges_1.48.0       
## [15] GenomeInfoDb_1.32.2         IRanges_2.30.0             
## [17] S4Vectors_0.34.0            BiocGenerics_0.42.0        
## [19] MatrixGenerics_1.8.1        matrixStats_0.62.0         
## [21] BiocStyle_2.24.0           
## 
## loaded via a namespace (and not attached):
##   [1] colorspace_2.0-3              rjson_0.2.21                 
##   [3] ellipsis_0.3.2                bluster_1.6.0                
##   [5] XVector_0.36.0                BiocNeighbors_1.14.0         
##   [7] farver_2.1.1                  bit64_4.0.5                  
##   [9] interactiveDisplayBase_1.34.0 AnnotationDbi_1.58.0         
##  [11] fansi_1.0.3                   codetools_0.2-18             
##  [13] R.methodsS3_1.8.2             sparseMatrixStats_1.8.0      
##  [15] cachem_1.0.6                  knitr_1.39                   
##  [17] jsonlite_1.8.0                cluster_2.1.3                
##  [19] png_0.1-7                     R.oo_1.25.0                  
##  [21] shiny_1.7.2                   HDF5Array_1.24.1             
##  [23] BiocManager_1.30.18           compiler_4.2.1               
##  [25] httr_1.4.3                    dqrng_0.3.0                  
##  [27] assertthat_0.2.1              Matrix_1.4-1                 
##  [29] fastmap_1.1.0                 limma_3.52.2                 
##  [31] cli_3.3.0                     later_1.3.0                  
##  [33] BiocSingular_1.12.0           BRISC_1.0.5                  
##  [35] htmltools_0.5.3               tools_4.2.1                  
##  [37] rsvd_1.0.5                    igraph_1.3.3                 
##  [39] gtable_0.3.0                  glue_1.6.2                   
##  [41] GenomeInfoDbData_1.2.8        RANN_2.6.1                   
##  [43] dplyr_1.0.9                   rappdirs_0.3.3               
##  [45] Rcpp_1.0.9                    jquerylib_0.1.4              
##  [47] vctrs_0.4.1                   Biostrings_2.64.0            
##  [49] rhdf5filters_1.8.0            DelayedMatrixStats_1.18.0    
##  [51] xfun_0.31                     stringr_1.4.0                
##  [53] beachmat_2.12.0               mime_0.12                    
##  [55] lifecycle_1.0.1               irlba_2.3.5                  
##  [57] statmod_1.4.36                edgeR_3.38.1                 
##  [59] scales_1.2.0                  zlibbioc_1.42.0              
##  [61] promises_1.2.0.1              parallel_4.2.1               
##  [63] rhdf5_2.40.0                  yaml_2.3.5                   
##  [65] curl_4.3.2                    pbapply_1.5-0                
##  [67] memoise_2.0.1                 sass_0.4.2                   
##  [69] stringi_1.7.8                 RSQLite_2.2.15               
##  [71] highr_0.9                     BiocVersion_3.15.2           
##  [73] ScaledMatrix_1.4.0            filelock_1.0.2               
##  [75] rdist_0.0.5                   BiocParallel_1.30.3          
##  [77] rlang_1.0.4                   pkgconfig_2.0.3              
##  [79] bitops_1.0-7                  evaluate_0.15                
##  [81] lattice_0.20-45               purrr_0.3.4                  
##  [83] Rhdf5lib_1.18.2               labeling_0.4.2               
##  [85] bit_4.0.4                     tidyselect_1.1.2             
##  [87] magrittr_2.0.3                bookdown_0.27                
##  [89] R6_2.5.1                      magick_2.7.3                 
##  [91] generics_0.1.3                metapod_1.4.0                
##  [93] DelayedArray_0.22.0           DBI_1.1.3                    
##  [95] withr_2.5.0                   pillar_1.8.0                 
##  [97] KEGGREST_1.36.3               RCurl_1.98-1.7               
##  [99] tibble_3.1.7                  crayon_1.5.1                 
## [101] DropletUtils_1.16.0           utf8_1.2.2                   
## [103] rmarkdown_2.14                locfit_1.5-9.6               
## [105] grid_4.2.1                    blob_1.2.3                   
## [107] digest_0.6.29                 xtable_1.8-4                 
## [109] httpuv_1.6.5                  R.utils_2.12.0               
## [111] munsell_0.5.0                 bslib_0.4.0