BUSseq_MCMC {BUSseq} | R Documentation |
The function BUSseq_MCMC
runs a Markov chain Monte Carlo (MCMC)
algorithm to fit the Batch effects correction with Unknown Subtypes
for scRNA-seq data (BUSseq) model. BUSseq is an interpretable Bayesian
hierarchical model that closely follows the data-generating mechanism
of scRNA-seq experiments. BUSseq can simultaneously correct batch
effects, cluster cell types, impute missing data caused by dropout
events and detect differentially expressed genes without requiring a
preliminary normalization step. We adopt the MCMC algorithm to conduct
posterior inference for the BUSseq model. Here, we denote
the total number of cells as N, the batch number as B, the gene number as G,
and the cell-type number as K.
BUSseq_MCMC(ObservedData, n.celltypes, seed = round(runif(1,1,10000)), n.cores = 8, n.iterations = 2000, n.burnin = floor(n.iterations/2), n.unchanged = min(floor(n.iterations * 0.3), 500), n.output = n.iterations/10, working_dir = getwd(), Drop_ind = rep(TRUE, length(ObservedData)), fdr = 0.05, hyper_pi = 2, hyper_gamma0 = 3, hyper_gamma1 = c(0.001, 0.01), hyper_alpha = 5, tau1sq = 50, hyper_p = c(1, 3), hyper_tau0sq = c(2, 0.01), hyper_nu = 5, hyper_delta = 5, hyper_phi = c(1, 0.1))
ObservedData |
|
n.celltypes |
|
seed |
|
n.cores |
|
n.iterations |
|
n.burnin |
|
n.unchanged |
|
n.output |
|
working_dir |
|
Drop_ind |
|
fdr |
The false discovery rate level we want to control in order to identify intrinsic genes. The default is 0.05. |
hyper_pi |
|
hyper_gamma0 |
|
hyper_gamma1 |
|
hyper_alpha |
|
tau1sq |
|
hyper_p |
|
hyper_tau0sq |
|
hyper_nu |
|
hyper_delta |
|
hyper_phi |
|
A SingleCellExperiment
object res
is outputed with an imputed
count data matrix assay(res, "imputed_data")
after imputing dropout
events. At the same time, the estimated cell type labels and relative size
factors of cells are added in the column-level metadata
int_colData(res)$BUSseq
, while the identified
intrinsic genes are stored in the row-level metadata
int_elementMetadata(res)$BUSseq
.
Morover, the dimension information, the posterior inference of parameters
and latent variables are stored as a list in metadata(res)$BUSseq
:
n.cell |
The total number of cells in all batches, a scalar. |
n.gene |
The number of genes, a scalar. |
n.batch |
The number of batches, a scalar. |
n.perbatch |
The number of cells in each batch, a B-dimensional vector. |
n.celltype |
The number of cell types specified by user, a scalar. |
n.iter |
The total number of iterations applied in the MCMC algorithm, a scalar. |
seed |
The seed for the MCMC algorithm. |
n.burnin |
The number of iterations as burnin, a scalar. |
gamma.est |
The estimated intercept and odds ratio of the logistic regression for dropout events, a B by 2 matrix. |
alpha.est |
The estimated log-scale baseline expression levels, a G-dimensional vector whose g-th element is the estimated log-scale mean gene expression level of gene g in the first cell type. |
beta.est |
The estimated cell-type effects, a G by K matrix, whose [g,k] element is the effects of cell type k on gene g compared with the first cell type. Note that the first column is zero as the first cell type is taken as the baseline cell type. |
nu.est |
The estimated location batch effects, a G by B matrix, where [g,b] element is the location batch effect on gene g in the batch b compared with the first batch. Note that the first column is zero as the first batch is taken as the reference batch without batch effects. |
delta.est |
The estimated cell-specific global effects, an N-dimensional vector. Note that the first element in each vector is zero as the first cell in each batch is taken as the reference cell. |
phi.est |
The estimated overdispersion parameters, a G by B matrix, where [g,b] element is the overdispersion parameter on gene g in batch b. |
pi.est |
The estimated cell-type proportions across batches, a B by K matrix, whose [b,k] element is the estimated proportion of cell type k in batch b. |
w.est |
The estimated cell-type indicators of each cell, a N-dimensional vector. |
p.est |
The estimated proportion of differentially expressed genes compared with the first cell type, a scalar. |
PPI.est |
The estimated posterior marginal probability of being differentially expressed genes compared with the first cell type. The return is G by K matrix. Noted that the first column consist of zeros as there is no differentially expressed gene compared with the cell type of itself. |
D.est |
The intrinsic gene indicators. The return is an N-dimensional vector. |
BIC |
The BIC value when K = |
Fangda Song
Song, Fangda, Ga Ming Angus Chan, and Yingying Wei. Flexible experimental designs for valid single-cell RNA-sequencing experiments allowing batch effects correction. Nature communications 11, no. 1 (2020): 1-15.
####################################### # Apply BUSseq to the Simulation Data # ####################################### library(SingleCellExperiment) RawCountData <- assay(BUSseqfits_example, "counts") batch_ind <- colData(BUSseqfits_example) sce <- SingleCellExperiment(assays = list(counts = RawCountData), colData = DataFrame(Batch_ind = batch_ind)) BUSseqfits_res <- BUSseq_MCMC(ObservedData = sce, seed = 1234, n.cores = 2, n.celltypes = 4, n.iterations = 500) ################################################ # Extract Estimates from the BUSseqfits Object # ################################################ #return cell type indicators w.est <- celltypes(BUSseqfits_res) #return the intercept and odds ratio of the logistic regression #for dropout events gamma.est <- dropout_coefficient_values(BUSseqfits_res) #return the log-scale baseline expression values alpha.est <- baseline_expression_values(BUSseqfits_res) #return the cell-type effects beta.est <- celltype_effects(BUSseqfits_res) #return the mean expression levels mu.est <- celltype_mean_expression(BUSseqfits_res) #return the cell-specific global effects delta.est <- cell_effect_values(BUSseqfits_res) #return the location batch effects nu.est <- location_batch_effects(BUSseqfits_res) #return the overdispersion parameters phi.est <- overdispersions(BUSseqfits_res) #return the intrinsic gene indices D.est <- intrinsic_genes_BUSseq(BUSseqfits_res) #return the BIC value BIC <- BIC_BUSseq(BUSseqfits_res) #return the raw read count matrix CountData_raw <- raw_read_counts(BUSseqfits_res) #return the imputed read count matrix CountData_imputed <- imputed_read_counts(BUSseqfits_res) #return the corrected read count matrix CountData_corrected <- corrected_read_counts(BUSseqfits_res) ################# # Visualization # ################# #generate the heatmap of raw read count data heatmap_data_BUSseq(BUSseqfits_res, project_name="Heatmap_raw") #generate the heatmap of imputed read count data heatmap_data_BUSseq(BUSseqfits_res, data_type = "Imputed", project_name="Heatmap_imputed") #generate the heatmap of corrected read count data heatmap_data_BUSseq(BUSseqfits_res, data_type = "Corrected", project_name="Heatmap_corrected")