Contents

1 uSORT workflow

uSORT package is designed to uncover the intrinsic cell progression path from single-cell RNA-seq data. It incorporates data pre-processing, preliminary PCA gene selection, preliminary cell ordering, refined gene selection, refined cell ordering, and post-analysis interpretation and visualization. The schematic overview of the uSORT workflow is shown in the figure below:

2 Run uSORT

The uSORT workflow can be applied through either the user-friendly GUI or calling the main function.

2.1 uSORT GUI

After the installation of the uSORT pacakge, the GUI can be easily launched by a single command.

require(uSORT)
## Loading required package: uSORT
## Loading required package: tcltk
# uSORT_GUI()  

On mac, the GUI will appear as shown below:

On the GUI, user can choose their input file (currently support TPM and CPM format in txt file), specify the priliminary sorting method and refined sorting method. By click the parameter button, user can further customize the parameters for each method. A parameter panel for autoSPIN method appears like below:

In the main GUI window, give a project name and choose the result path, then click submit. The program will run and details will be printed on the R console. Once the analysis is done, results will be saved under the selected result path.

2.2 uSORT Main Function

User can also directly call the main function named uSORT of the pacakge. The documentation file can be extracted using command ?uSORT. The usage and parameters of uSORT function is shown below:

args(uSORT)
## function (exprs_file, log_transform = TRUE, remove_outliers = TRUE, 
##     preliminary_sorting_method = c("autoSPIN", "sWanderlust", 
##         "monocle", "Wanderlust", "SPIN", "none"), refine_sorting_method = c("autoSPIN", 
##         "sWanderlust", "monocle", "Wanderlust", "SPIN", "none"), 
##     project_name = "uSORT", result_directory = getwd(), nCores = 1, 
##     save_results = TRUE, reproduce_seed = 1234, scattering_cutoff_prob = 0.75, 
##     driving_force_cutoff = NULL, qval_cutoff_featureSelection = 0.05, 
##     pre_data_type = c("linear", "cyclical"), pre_SPIN_option = c("STS", 
##         "neighborhood"), pre_SPIN_sigma_width = 1, pre_autoSPIN_alpha = 0.2, 
##     pre_autoSPIN_randomization = 20, pre_wanderlust_start_cell = NULL, 
##     pre_wanderlust_dfmap_components = 4, pre_wanderlust_l = 15, 
##     pre_wanderlust_num_waypoints = 150, pre_wanderlust_waypoints_seed = 2711, 
##     pre_wanderlust_flock_waypoints = 2, ref_data_type = c("linear", 
##         "cyclical"), ref_SPIN_option = c("STS", "neighborhood"), 
##     ref_SPIN_sigma_width = 1, ref_autoSPIN_alpha = 0.2, ref_autoSPIN_randomization = 20, 
##     ref_wanderlust_start_cell = NULL, ref_wanderlust_dfmap_components = 4, 
##     ref_wanderlust_l = 15, ref_wanderlust_num_waypoints = 150, 
##     ref_wanderlust_flock_waypoints = 2, ref_wanderlust_waypoints_seed = 2711) 
## NULL

3 uSORT Example

Runing the pacakge through the GUI is quite straightforward, so here we demo the usage of the main function with an example:

dir <- system.file('extdata', package='uSORT')
file <- list.files(dir, pattern='.txt$', full=TRUE)
# uSORT_results <- uSORT(exprs_file = file, 
#                        log_transform = TRUE,
#                        remove_outliers = TRUE,
#                        project_name = "uSORT_example",
#                        preliminary_sorting_method = "autoSPIN", 
#                        refine_sorting_method = "sWanderlust",
#                        result_directory = getwd(),
#                        save_results = TRUE,
#                        reproduce_seed = 1234)

3.1 Result Object and files

When the analysis is done, the results will be returned in a list:

#str(uSORT_results)

# List of 7
#  $ exp_raw                        : num [1:251, 1:43280] 1.08 0 0 0.62 0 0 0 0.27 1.16 0 ...
#   ..- attr(*, "dimnames")=List of 2
#   .. ..$ : chr [1:251] "RMD119" "RMD087" "RMD078" "RMD225" ...
#   .. ..$ : chr [1:43280] "0610005C13Rik" "0610007P14Rik" "0610009B22Rik" "0610009E02Rik" ...
#  $ trimmed_log2exp                : num [1:241, 1:9918] 4.82 0 0 2.77 5.84 ...
#   ..- attr(*, "dimnames")=List of 2
#   .. ..$ : chr [1:241] "RMD119" "RMD087" "RMD078" "RMD225" ...
#   .. ..$ : chr [1:9918] "0610007P14Rik" "0610009B22Rik" "0610009E02Rik" "0610009O20Rik" ...
#  $ preliminary_sorting_genes      : chr [1:650] "1110038B12Rik" "1190002F15Rik" "2810417H13Rik" "5430435G22Rik" ...
#  $ preliminary_sorting_order      : chr [1:241] "RMD196" "RMD236" "RMD250" "RMD220" ...
#  $ refined_sorting_genes          : chr [1:320] "Mpo" "H2-Aa" "Cd74" "H2-Ab1" ...
#  $ refined_sorting_order          : chr [1:241] "RMD271" "RMD272" "RMD265" "RMD295" ...
#  $ driverGene_refinedOrder_log2exp: num [1:241, 1:320] 13.16 10.77 12.17 9.82 9.77 ...
#   ..- attr(*, "dimnames")=List of 2
#   .. ..$ : chr [1:241] "RMD271" "RMD272" "RMD265" "RMD295" ...
#   .. ..$ : chr [1:320] "Mpo" "H2-Aa" "Cd74" "H2-Ab1" ...

And if save_results = TRUE, several result files will be saved:

uSORT_example_final_driver_genes_profiles.pdf:

uSORT_example_distance_heatmap_preliminary.pdf:

uSORT_example_distance_heatmap_refined.pdf:

If the cell type and signature genes are known, the reuslts can be validated with these information:

# sig_genes <- read.table(file.path(system.file('extdata', package='uSORT'),  'signature_genes.txt'))
# sig_genes <- as.character(sig_genes[,1])
# spl_annotat <- read.table(file.path(system.file('extdata', package='uSORT'), 'celltype.txt'),header=T)

3.2 Preliminary ordering heatmap with signature gene

pre_log2ex <- uSORT_results$trimmed_log2exp[rev(uSORT_results$preliminary_sorting_order), ]
m <- spl_annotat[match(rownames(pre_log2ex), spl_annotat$SampleID), ]
celltype_color <- c('blue','red','black')
celltype <- c('MDP','CDP','PreDC')
cell_color <- celltype_color[match(m$GroupID, celltype)]
sigGenes_log2ex <- t(pre_log2ex[ ,colnames(pre_log2ex) %in% sig_genes])
fileNm <- paste0(project_name, '_signatureGenes_profiles_preliminary.pdf')
heatmap.2(as.matrix(sigGenes_log2ex), 
          dendrogram='row',
          trace='none',
          col = bluered,
          Rowv=T,Colv=F,
          scale = 'row',
          cexRow=1.8,
          ColSideColors=cell_color, 
          margins = c(8, 8))
    
legend("topright",
       legend=celltype,
       col=celltype_color,
       pch=20,
       horiz=T, 
       bty= "n", 
       inset=c(0,-0.01),
       pt.cex=1.5)

3.3 Refine ordering heatmap with signature gene

ref_log2ex <- uSORT_results$trimmed_log2exp[uSORT_results$refined_sorting_order, ]
m <- spl_annotat[match(rownames(ref_log2ex), spl_annotat$SampleID), ]
celltype_color <- c('blue','red','black')
celltype <- c('MDP','CDP','PreDC')
cell_color <- celltype_color[match(m$GroupID, celltype)]
sigGenes_log2ex <- t(ref_log2ex[ ,colnames(ref_log2ex) %in% sig_genes])
fileNm <- paste0(project_name, '_signatureGenes_profiles_refine.pdf')
heatmap.2(as.matrix(sigGenes_log2ex), 
          dendrogram='row',
          trace='none',
          col = bluered,
          Rowv=T,Colv=F,
          scale = 'row',
          cexRow=1.8,
          ColSideColors=cell_color, 
          margins = c(8, 8))
    
legend("topright",
       legend=celltype,
       col=celltype_color,
       pch=20,
       horiz=T, 
       bty= "n", 
       inset=c(0,-0.01),
       pt.cex=1.5)

4 Session Information

sessionInfo()
## R version 4.0.0 (2020-04-24)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.4 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.11-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.11-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] tcltk     stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
## [1] uSORT_1.14.0     BiocStyle_2.16.0
## 
## loaded via a namespace (and not attached):
##  [1] bitops_1.0-6         matrixStats_0.56.0   DDRTree_0.1.5       
##  [4] RColorBrewer_1.1-2   prabclus_2.3-2       docopt_0.6.1        
##  [7] tools_4.0.0          R6_2.4.1             irlba_2.3.3         
## [10] KernSmooth_2.23-17   BiocGenerics_0.34.0  colorspace_1.4-1    
## [13] nnet_7.3-14          tidyselect_1.0.0     gridExtra_2.3       
## [16] compiler_4.0.0       Biobase_2.48.0       bookdown_0.18       
## [19] slam_0.1-47          diptest_0.75-7       caTools_1.18.0      
## [22] scales_1.1.0         DEoptimR_1.0-8       robustbase_0.93-6   
## [25] stringr_1.4.0        digest_0.6.25        rmarkdown_2.1       
## [28] sparsesvd_0.2        pkgconfig_2.0.3      htmltools_0.4.0     
## [31] limma_3.44.0         rlang_0.4.5          VGAM_1.1-2          
## [34] FNN_1.1.3            combinat_0.0-8       mclust_5.4.6        
## [37] gtools_3.8.2         dplyr_0.8.5          magrittr_1.5        
## [40] modeltools_0.2-23    Matrix_1.2-18        Rcpp_1.0.4.6        
## [43] munsell_0.5.0        viridis_0.5.1        lifecycle_0.2.0     
## [46] stringi_1.4.6        yaml_2.2.1           MASS_7.3-51.6       
## [49] flexmix_2.3-15       gplots_3.0.3         Rtsne_0.15          
## [52] plyr_1.8.6           grid_4.0.0           parallel_4.0.0      
## [55] gdata_2.18.0         ggrepel_0.8.2        crayon_1.3.4        
## [58] lattice_0.20-41      splines_4.0.0        knitr_1.28          
## [61] pillar_1.4.3         igraph_1.2.5         fpc_2.2-5           
## [64] reshape2_1.4.4       stats4_4.0.0         glue_1.4.0          
## [67] evaluate_0.14        BiocManager_1.30.10  vctrs_0.2.4         
## [70] gtable_0.3.0         RANN_2.6.1           purrr_0.3.4         
## [73] kernlab_0.9-29       assertthat_0.2.1     ggplot2_3.3.0       
## [76] xfun_0.13            monocle_2.16.0       RSpectra_0.16-0     
## [79] viridisLite_0.3.0    class_7.3-17         HSMMSingleCell_1.7.0
## [82] qlcMatrix_0.9.7      tibble_3.0.1         pheatmap_1.0.12     
## [85] cluster_2.1.0        fastICA_1.2-2        densityClust_0.3    
## [88] ellipsis_0.3.0