Introduction
Load data
Fit trajectories using Monocle
Fit Smoothers
Monocle3
- Constructing the trajectory
- Extracting the pseudotimes and cell weights for tradeSeq
Session
References

Introduction

tradeSeq is an R package that allows analysis of gene expression along trajectories. While it has been developed and applied to single-cell RNA-sequencing (scRNA-seq) data, its applicability extends beyond that, and also allows the analysis of, e.g., single-cell ATAC-seq data along trajectories or bulk RNA-seq time series datasets. For every gene in the dataset, tradeSeq fits a generalized additive model (GAM) by building on the mgcv R package. It then allows statistical inference on the GAM by assessing contrasts of the parameters of the fitted GAM model, aiding in interpreting complex datasets. All details about the tradeSeq model and statistical tests are described in our preprint (Van den Berge et al. 2019).

In this vignette, we analyze a subset of the data from (Paul et al. 2015). A SingleCellExperiment object of the data has been provided with the tradeSeq package and can be retrieved as shown below. The data and UMAP reduced dimensions were derived from following the Monocle 3 vignette.

The main vignette focuses on using tradeSeq downsteam of slingshot. Here, we present how to use tradeSeq downstream of monocle(Qiu et al. 2017).

Load data

library(tradeSeq)
library(RColorBrewer)
library(SingleCellExperiment)
library(monocle)
library(dplyr)
library(tidyr)

# For reproducibility
palette(brewer.pal(8, "Dark2"))
data(countMatrix, package = "tradeSeq")
counts <- as.matrix(countMatrix)
rm(countMatrix)
data(celltype, package = "tradeSeq")

Fit trajectories using Monocle

We will fit developmental trajectories using the monocle package.

set.seed(200)
pd <- data.frame(cells = colnames(counts), cellType = celltype)
rownames(pd) <- pd$cells
fd <- data.frame(gene_short_name = rownames(counts))
rownames(fd) <- fd$gene_short_name
cds <- newCellDataSet(counts, phenoData = new("AnnotatedDataFrame", data = pd),
                      featureData = new("AnnotatedDataFrame", data = fd))
cds <- estimateSizeFactors(cds)
cds <- reduceDimension(cds, max_components = 2)

## Registered S3 methods overwritten by 'proxy':
##   method               from    
##   print.registry_field registry
##   print.registry_entry registry

cds <- orderCells(cds)
cds <- orderCells(cds, root_state = 5)

Fit Smoothers

Only possible for DDRTree

sce <- fitGAM(cds, verbose = TRUE)

You can then analyse the sce object following the main vignette.

Monocle3

As of now (02/2020), monocle3(Cao et al. 2019), is still in its beta version. Therefore, we have no plan yet to include a S4 method for monocle3 while it is not on CRAN or Bioconductor and the format is still moving. However, we present below a way to use tradeSeq downstream of monocle3 as of version ‘0.2’, for a fully connected graph. We follow the tutorial from the monocle3 website.

Constructing the trajectory

You will need to install monocle3 from here before running the code below.

set.seed(22)
library(monocle3)
# Create a cell_data_set object
cds <- new_cell_data_set(counts, cell_metadata = pd,
                gene_metadata = data.frame(gene_short_name = rownames(counts),
                                           row.names = rownames(counts)))
# Run PCA then UMAP on the data
cds <- preprocess_cds(cds, method = "PCA")
cds <- reduce_dimension(cds, preprocess_method = "PCA",
                        reduction_method = "UMAP")

# First display, coloring by the cell types from Paul et al
plot_cells(cds, label_groups_by_cluster = FALSE, cell_size = 1,
           color_cells_by = "cellType")

# Running the clustering method. This is necessary to the construct the graph
cds <- cluster_cells(cds, reduction_method = "UMAP")
# Visualize the clusters
plot_cells(cds, color_cells_by = "cluster", cell_size = 1)

# Construct the graph
# Note that, for the rest of the code to run, the graph should be fully connected
cds <- learn_graph(cds, use_partition = FALSE)

# We find all the cells that are close to the starting point
cell_ids <- colnames(cds)[pd$cellType ==  "Multipotent progenitors"]
closest_vertex <- cds@principal_graph_aux[["UMAP"]]$pr_graph_cell_proj_closest_vertex
closest_vertex <- as.matrix(closest_vertex[colnames(cds), ])
closest_vertex <- closest_vertex[cell_ids, ]
closest_vertex <- as.numeric(names(which.max(table(closest_vertex))))
mst <- principal_graph(cds)$UMAP
root_pr_nodes <- igraph::V(mst)$name[closest_vertex]

# We compute the trajectory
cds <- order_cells(cds, root_pr_nodes = root_pr_nodes)
plot_cells(cds, color_cells_by = "pseudotime")

Extracting the pseudotimes and cell weights for tradeSeq

# Get the closest vertice for every cell
y_to_cells <-  principal_graph_aux(cds)$UMAP$pr_graph_cell_proj_closest_vertex %>%
  as.data.frame() %>%
  dplyr::mutate(cells = rownames(.)) %>%
  rename("Y" = V1)

# Get the root vertices
# It is the same node as above
root <- cds@principal_graph_aux$UMAP$root_pr_nodes

# Get the other endpoints
endpoints <- names(which(igraph::degree(mst) == 1))
endpoints <- endpoints[!endpoints %in% root]

# For each endpoint
cellWeights <- lapply(endpoints, function(endpoint) {
  # We find the path between the endpoint and the root
  path <- igraph::shortest_paths(mst, root, endpoint)$vpath[[1]]
  path <- as.character(path)
  # We find the cells that map along that path
  df <- y_to_cells %>%
    dplyr::filter(Y %in% path)
  df <- data.frame(weights = as.numeric(colnames(cds) %in% df$cells))
  colnames(df) <- endpoint
  return(df)
  }) %>% bind_cols() 
rownames(cellWeights) <- colnames(cds)
pseudotime <- matrix(pseudotime(cds), ncol = ncol(cellWeights),
                     nrow = ncol(cds), byrow = FALSE)

sce <- fitGAM(counts = counts,
              pseudotime = pseudotime,
              cellWeights = cellWeights)

Then, the sce object can be analyzed following the main vignette.

Session

sessionInfo()

## R version 4.0.0 (2020-04-24)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.4 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.11-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.11-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
##  [1] splines   stats4    parallel  stats     graphics  grDevices utils    
##  [8] datasets  methods   base     
## 
## other attached packages:
##  [1] tidyr_1.1.0                 dplyr_0.8.5                
##  [3] monocle_2.16.0              DDRTree_0.1.5              
##  [5] irlba_2.3.3                 VGAM_1.1-3                 
##  [7] ggplot2_3.3.0               Matrix_1.2-18              
##  [9] SingleCellExperiment_1.10.1 SummarizedExperiment_1.18.1
## [11] DelayedArray_0.14.0         matrixStats_0.56.0         
## [13] GenomicRanges_1.40.0        GenomeInfoDb_1.24.0        
## [15] IRanges_2.22.2              S4Vectors_0.26.1           
## [17] RColorBrewer_1.1-2          tradeSeq_1.2.01            
## [19] bigmemory_4.5.36            Biobase_2.48.0             
## [21] BiocGenerics_0.34.0         knitr_1.28                 
## 
## loaded via a namespace (and not attached):
##   [1] bigmemory.sri_0.1.3     Rtsne_0.15              colorspace_1.4-1       
##   [4] ellipsis_0.3.1          XVector_0.28.0          proxy_0.4-24           
##   [7] bit64_0.9-7             ggrepel_0.8.2           AnnotationDbi_1.50.0   
##  [10] RSpectra_0.16-0         xml2_1.3.2              codetools_0.2-16       
##  [13] docopt_0.6.1            doParallel_1.0.15       ade4_1.7-15            
##  [16] locfdr_1.1-8            annotate_1.66.0         phylobase_0.8.10       
##  [19] gridBase_0.4-7          kernlab_0.9-29          cluster_2.1.0          
##  [22] pheatmap_1.0.12         HDF5Array_1.16.0        compiler_4.0.0         
##  [25] httr_1.4.1              assertthat_0.2.1        lazyeval_0.2.2         
##  [28] limma_3.44.1            htmltools_0.4.0         prettyunits_1.1.1      
##  [31] tools_4.0.0             igraph_1.2.5            gtable_0.3.0           
##  [34] glue_1.4.1              GenomeInfoDbData_1.2.3  RANN_2.6.1             
##  [37] reshape2_1.4.4          Rcpp_1.0.4.6            softImpute_1.4         
##  [40] zinbwave_1.10.0         slam_0.1-47             NMF_0.22.0             
##  [43] vctrs_0.3.0             ape_5.3                 nlme_3.1-147           
##  [46] iterators_1.0.12        slingshot_1.6.0         xfun_0.14              
##  [49] stringr_1.4.0           lifecycle_0.2.0         rngtools_1.5           
##  [52] XML_3.99-0.3            princurve_2.1.4         edgeR_3.30.0           
##  [55] zlibbioc_1.34.0         MASS_7.3-51.6           scales_1.1.1           
##  [58] hms_0.5.3               rhdf5_2.32.0            yaml_2.2.1             
##  [61] pbapply_1.4-2           memoise_1.1.0           gridExtra_2.3          
##  [64] pkgmaker_0.31.1         RSQLite_2.2.0           fastICA_1.2-2          
##  [67] stringi_1.4.6           genefilter_1.70.0       foreach_1.5.0          
##  [70] clusterExperiment_2.8.0 densityClust_0.3        BiocParallel_1.22.0    
##  [73] bibtex_0.4.2.2          rlang_0.4.6             pkgconfig_2.0.3        
##  [76] bitops_1.0-6            qlcMatrix_0.9.7         rncl_0.8.4             
##  [79] evaluate_0.14           lattice_0.20-41         Rhdf5lib_1.10.0        
##  [82] purrr_0.3.4             bit_1.1-15.2            tidyselect_1.1.0       
##  [85] plyr_1.8.6              magrittr_1.5            R6_2.4.1               
##  [88] combinat_0.0-8          DBI_1.1.0               pillar_1.4.4           
##  [91] withr_2.2.0             mgcv_1.8-31             survival_3.1-12        
##  [94] RCurl_1.98-1.2          tibble_3.0.1            crayon_1.3.4           
##  [97] uuid_0.1-4              howmany_0.3-1           rmarkdown_2.1          
## [100] viridis_0.5.1           progress_1.2.2          locfit_1.5-9.4         
## [103] RNeXML_2.4.4            grid_4.0.0              blob_1.2.1             
## [106] FNN_1.1.3               HSMMSingleCell_1.8.0    sparsesvd_0.2          
## [109] digest_0.6.25           xtable_1.8-4            munsell_0.5.0          
## [112] registry_0.5-1          viridisLite_0.3.0

References

Cao, Junyue, Malte Spielmann, Xiaojie Qiu, Xingfan Huang, Daniel M. Ibrahim, Andrew J. Hill, Fan Zhang, et al. 2019. “The Dynamics and Regulators of Cell Fate Decisions Are Revealed by Pseudo-Temporal Ordering of Single Cells.” Nature.

Paul, Franziska, Ya’ara Arkin, Amir Giladi, DiegoÂ Adhemar Jaitin, Ephraim Kenigsberg, Hadas Keren-Shaul, Deborah Winter, et al. 2015. “Transcriptional Heterogeneity and Lineage Commitment in Myeloid Progenitors.” Cell 163 (7). Cell Press:1663–77. https://doi.org/10.1016/J.CELL.2015.11.013.

Qiu, Xiaojie, Qi Mao, Ying Tang, Li Wang, Raghav Chawla, Hannah A Pliner, and Cole Trapnell. 2017. “Reversed graph embedding resolves complex single-cell trajectories.” Nature Methods 14 (10):979–82. https://doi.org/10.1038/nmeth.4402.

Van den Berge, Koen, Hector Roux de Bézieux, Kelly Street, Wouter Saelens, Robrecht Cannoodt, Yvan Saeys, Sandrine Dudoit, and Lieven Clement. 2019. “Trajectory-based differential expression analysis for single-cell sequencing data.” bioRxiv, May. Cold Spring Harbor Laboratory, 623397. https://doi.org/10.1101/623397.

Using Monocle as input to tradeSeq

Koen Van den Berge and Hector Roux de Bézieux

02/20/2020