Contents

1 Introduction

switchde is an R package for detecting switch-like differential expression along single-cell RNA-seq trajectories. It assumes genes follow a sigmoidal pattern of gene expression and tests for differential expression using a likelihood ratio test. It also returns maximum likelihood estimates (MLE) for the sigmoid parameters, which allows filtering of genes for up or down regulation as well as where along the trajectory the regulation occurs.

The parametric form of gene expression assumed is a sigmoid:

example_sigmoid()

Governed by three parameters:

2 Installation

switchde can be installed from both Bioconductor and Github.

Example installation from Bioconductor:

source("https://bioconductor.org/biocLite.R")
biocLite("switchde")

Example installation from Github:

devtools::install_github("kieranrcampbell/switchde")

3 Example usage

We provide a brief example on some synthetic single-cell data bundled with the package. synth_gex contains a 12-by-100 expression matrix of 12 genes, and ex_pseudotime contains a pseudotime vector of length 100. We can start by plotting the expression:

data(synth_gex)
data(ex_pseudotime)

gex_cleaned <- as_data_frame(t(synth_gex)) %>% 
  mutate(Pseudotime = ex_pseudotime) %>% 
  gather(Gene, Expression, -Pseudotime)

ggplot(gex_cleaned, aes(x = Pseudotime, y = Expression)) +
  facet_wrap(~ Gene) + geom_point(shape = 21, fill = 'grey', color = 'black') +
  theme_bw() + stat_smooth(color = 'darkred', se = FALSE)

Model fitting and differential expression testing is provided by a call to the switchde function:

sde <- switchde(synth_gex, ex_pseudotime)
## Input gene-by-cell matrix: 12 genes and  100 cells

This can equivalently be called using an SCESet from the package scater:

sce <- newSCESet(synth_gex)
sde <- switchde(sce, ex_pseudotime)

This returns a data.frame with 6 columns:

We can use the function arrange from dplyr to order this by q-value:

arrange(sde, qval)
## # A tibble: 12 × 6
##      gene         pval         qval       mu0         k         t0
##     <chr>        <dbl>        <dbl>     <dbl>     <dbl>      <dbl>
## 1   Gene8 6.511826e-21 7.814191e-20  5.320604 -9.217995  0.1675885
## 2   Gene2 1.030532e-12 6.183192e-12  3.445171  6.792040  0.7013710
## 3   Gene9 3.389230e-12 1.355692e-11  5.525775 -5.048873  0.3183684
## 4   Gene1 9.478795e-12 2.843639e-11  3.864267  7.506369  0.5669419
## 5   Gene5 1.544662e-11 3.707188e-11  3.514618 10.348479  0.4752514
## 6   Gene4 4.831243e-10 9.662485e-10 12.843280 -3.224137 -0.2114432
## 7   Gene3 2.196190e-08 3.764897e-08  4.202233 -4.998180  0.5578453
## 8  Gene10 4.473107e-08 6.709660e-08  3.176998  8.471475  0.3797383
## 9   Gene7 4.273169e-06 5.697559e-06  3.778711  7.417856  0.3024221
## 10 Gene12 3.362652e-04 4.035183e-04  4.858423 -4.047125  0.7791013
## 11 Gene11 1.024067e-03 1.117164e-03  2.689331 -7.272166  0.8318200
## 12  Gene6 1.913351e-02 1.913351e-02  1.758552 -8.871320  0.8700276

We may then wish to plot the expression of a particular gene and the MLE model. This is acheived using the switchplot function, which takes three arguments:

We can easily extract the parameters using the extract_pars function and pass this to switchplot, which plots the maximum likelihood sigmoidal mean:

gene <- sde$gene[which.min(sde$qval)]
pars <- extract_pars(sde, gene)
print(pars)
##        mu0          k         t0 
##  5.3206036 -9.2179954  0.1675885
switchplot(synth_gex[gene, ], ex_pseudotime, pars)

Note that switchplot returns a ggplot which can be further customised (e.g. using theme_xxx(), etc).

4 Zero-inflation

We can also model zero inflation in the data with a dropout probability proportional to the latent gene expression magnitude. To enable this set zero_inflation = TRUE. While this model is more appropriate for single-cell RNA-seq data, it requires use of the EM algorithm so takes typically 20x longer.

zde <- switchde(synth_gex, ex_pseudotime, zero_inflated = TRUE)
## Input gene-by-cell matrix: 12 genes and  100 cells

As before it returns a data_frame, this time with an additional parameter \(\lambda\) corresponding to the dropout probability (see manuscript):

arrange(zde, qval)
## # A tibble: 12 × 7
##      gene         pval         qval       mu0         k         t0
##     <chr>        <dbl>        <dbl>     <dbl>     <dbl>      <dbl>
## 1   Gene8 5.626484e-21 6.751780e-20  5.317191 -9.207447  0.1679975
## 2   Gene2 6.340400e-13 3.804240e-12  3.459876  6.781590  0.7018734
## 3   Gene9 2.344258e-12 9.377032e-12  5.516667 -5.062499  0.3202179
## 4   Gene1 3.717684e-12 1.115305e-11  3.892665  7.442640  0.5677112
## 5   Gene5 1.017045e-11 2.440907e-11  3.522190 10.359586  0.4749875
## 6   Gene4 1.574915e-10 3.149830e-10 12.585511 -3.239785 -0.1981837
## 7   Gene3 1.465468e-08 2.512231e-08  4.225010 -4.976908  0.5568023
## 8  Gene10 2.235608e-08 3.353411e-08  3.190144  8.430648  0.3768459
## 9   Gene7 3.871759e-06 5.162346e-06  3.784151  7.402636  0.3025647
## 10 Gene12 2.958115e-04 3.549738e-04  4.903109 -3.965236  0.7756110
## 11 Gene11 9.451842e-04 1.031110e-03  2.699533 -7.179051  0.8325326
## 12  Gene6 1.435134e-02 1.435134e-02  1.778945 -8.672480  0.8692771
## # ... with 1 more variables: lambda <dbl>

We can plot the gene with the largest dropout effect and compare it to the non-zero-inflated model:

gene <- zde$gene[which.min(zde$lambda)]
pars <- extract_pars(sde, gene)
zpars <- extract_pars(zde, gene)


switchplot(synth_gex[gene, ], ex_pseudotime, pars)

switchplot(synth_gex[gene, ], ex_pseudotime, zpars)

5 Technical info

sessionInfo()
## R version 3.3.1 (2016-06-21)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.1 LTS
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] parallel  stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
## [1] switchde_1.0.0      tidyr_0.6.0         dplyr_0.5.0        
## [4] scater_1.2.0        ggplot2_2.1.0       Biobase_2.34.0     
## [7] BiocGenerics_0.20.0 BiocStyle_2.2.0    
## 
## loaded via a namespace (and not attached):
##  [1] tximport_1.2.0       beeswarm_0.2.3       locfit_1.5-9.1      
##  [4] reshape2_1.4.1       lattice_0.20-34      rhdf5_2.18.0        
##  [7] colorspace_1.2-7     htmltools_0.3.5      stats4_3.3.1        
## [10] mgcv_1.8-15          yaml_2.1.13          chron_2.3-47        
## [13] XML_3.98-1.4         DBI_0.5-1            matrixStats_0.51.0  
## [16] plyr_1.8.4           stringr_1.1.0        zlibbioc_1.20.0     
## [19] munsell_0.4.3        gtable_0.2.0         codetools_0.2-15    
## [22] evaluate_0.10        labeling_0.3         knitr_1.14          
## [25] IRanges_2.8.0        biomaRt_2.30.0       httpuv_1.3.3        
## [28] vipor_0.4.4          AnnotationDbi_1.36.0 Rcpp_0.12.7         
## [31] xtable_1.8-2         edgeR_3.16.0         scales_0.4.0        
## [34] formatR_1.4          limma_3.30.0         S4Vectors_0.12.0    
## [37] mime_0.5             gridExtra_2.2.1      rjson_0.2.15        
## [40] digest_0.6.10        stringi_1.1.2        shiny_0.14.1        
## [43] grid_3.3.1           tools_3.3.1          bitops_1.0-6        
## [46] magrittr_1.5         lazyeval_0.2.0       RCurl_1.95-4.8      
## [49] tibble_1.2           RSQLite_1.0.0        Matrix_1.2-7.1      
## [52] data.table_1.9.6     ggbeeswarm_0.5.0     shinydashboard_0.5.3
## [55] assertthat_0.1       rmarkdown_1.1        viridis_0.3.4       
## [58] R6_2.2.0             nlme_3.1-128