systemPipeR 2.8.0
This section provides general description and how to use this single cell RNAseq (scRNAseq) workflow. In the actual analysis report, this section is usually removed.
This scRNAseq workflow template is based on the counted 10x scRNA data. It means
this workflow expect users have output from cell counting programs, like cell ranger.
If you have the raw sequencing data and would like to count the cell gene counts,
please use another workflow in systemPipeR, such as SPcellranger
.
This workflow does:
This template is modified from Seurat Tutorial.
Users want to provide here background information about the design of their scRNAseq project.
This report describes the analysis of a scRNAseq project studying drug …
Typically, users want to specify here all information relevant for the analysis of their scRNAseq study. This includes detailed descriptions of files, experimental design, reference genome, gene annotations, etc.
systemPipeR
workflows can be designed and built from start to finish with a
single command, importing from an R Markdown file or stepwise in interactive
mode from the R console.
This tutorial will demonstrate how to build the workflow in an interactive mode,
appending each step. The workflow is constructed by connecting each step via
appendStep
method. Each SYSargsList
instance contains instructions needed
for processing a set of input files with a specific command-line or R software
and the paths to the corresponding outfiles generated by a particular tool/step.
To create a Workflow within systemPipeR
, we can start by defining an empty
container and checking the directory structure:
library(systemPipeR)
sal <- SPRproject()
sal
This is an empty template that contains only one demo step. Refer to our website for how to add more steps. If you prefer a more enriched template, read this page for other pre-configured templates.
cat(crayon::blue$bold("To use this workflow, following R packages are expected:\n"))
cat(c("'Seurat", "ggplot2", "ggpubr", "patchwork", "dplyr", "tibble",
"readr'\n"), sep = "', '")
### pre-end
appendStep(sal) <- LineWise(code = {
library(systemPipeR)
library(Seurat)
library(dplyr)
library(ggplot2)
library(ggpubr)
library(patchwork)
}, step_name = "load_packages")
In this example, the single cell data is preprocessed/filtered 10x data from a healthy donor. Samples taken from peripheral blood mononuclear cells (PBMCs), about 3000 cells.
Dataset can be downloaded with this link: https://s3-us-west-2.amazonaws.com/10x.files/samples/cell/pbmc3k/pbmc3k_filtered_gene_bc_matrices.tar.gz
If the link is not working, visit 10x website for updated links.
For your real data, please preprocess and put the dataset inside data
directory
appendStep(sal) <- LineWise(code = {
# unzip the data
untar("data/pbmc3k_filtered_gene_bc_matrices.tar.gz", exdir = "data")
# load data
pbmc.data <- Read10X(data.dir = "data/filtered_gene_bc_matrices/hg19/")
# Use dim to see the size of dataset, example data has
# 2700 cells x 32738 genes
dim(pbmc.data)
}, step_name = "load_data", dependency = "load_packages")
We can plot to see how many cells have good expressions.
appendStep(sal) <- LineWise(code = {
at_least_one <- apply(pbmc.data, 2, function(x) sum(x > 0))
count_p1 <- tibble::as_tibble(at_least_one) %>%
ggplot() + geom_histogram(aes(x = value), binwidth = floor(nrow(pbmc.data)/400),
fill = "#6b97c2", color = "white") + theme_pubr(16) +
scale_y_continuous(expand = c(0, 0)) + scale_x_continuous(expand = c(0,
0)) + labs(title = "Distribution of detected genes",
x = "Genes with at least one tag")
count_p2 <- tibble::as_tibble(BiocGenerics::colSums(pbmc.data)) %>%
ggplot() + geom_histogram(aes(x = value), bins = floor(ncol(pbmc.data)/50),
fill = "#6b97c2", color = "white") + theme_pubr(16) +
scale_y_continuous(expand = c(0, 0)) + scale_x_continuous(expand = c(0,
0)) + labs(title = "Expression sum per cell", x = "Sum expression")
png("results/count_plots.png", 1000, 700)
count_p1 + count_p2 + patchwork::plot_annotation(tag_levels = "A")
dev.off()
}, step_name = "count_plot", dependency = "load_data")
appendStep(sal) <- LineWise(code = {
sce <- CreateSeuratObject(counts = pbmc.data, project = "pbmc3k",
min.cells = 3, min.features = 200)
# calculate mitochondria gene ratio
sce[["percent.mt"]] <- PercentageFeatureSet(sce, pattern = "^MT-")
}, step_name = "create_seurat", dependency = "load_data")
Filters applied here are:
min.cells = 3
: for a single gene, it must be presented in at least 3 cells.min.features = 200
: for any cell, it must at least have 200 genes with readable gene counts.In your real data, you may want to adjust the filters for different projects.
Some indicators are important to show the quality of the cell batch, for example
appendStep(sal) <- LineWise(code = {
png("results/qc1.png", 700, 700)
VlnPlot(sce, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"),
ncol = 3)
dev.off()
qc_p1 <- FeatureScatter(sce, feature1 = "nCount_RNA", feature2 = "percent.mt")
qc_p2 <- FeatureScatter(sce, feature1 = "nCount_RNA", feature2 = "nFeature_RNA")
png("results/qc2.png", 700, 450)
qc_p1 + qc_p2 + patchwork::plot_annotation(tag_levels = "A")
dev.off()
}, step_name = "qc_seurat", dependency = "create_seurat")