distinct_test_zero_inflated {distinct} | R Documentation |
distinct_test
tests for differential state between two groups of samples.
distinct_test_zero_inflated( x, name_assays_expression = "logcounts", name_cluster = "cluster_id", name_sample = "sample_id", design, column_to_test = 2, P_1 = 100, P_2 = 500, P_3 = 2000, P_4 = 10000, N_breaks = 25, min_non_zero_cells = 20, n_cores = 1 )
x |
a |
name_assays_expression |
a character ("logcounts" by default),
indicating the name of the assays(x) element which stores the expression data (i.e., assays(x)$name_assays_expression).
We strongly encourage using normalized data, such as counts per million (CPM) or log2-CPM (e.g., 'logcounts' as created via |
name_cluster |
a character ("cluster_id" by default), indicating the name of the colData(x) element which stores the cluster id of each cell (i.e., colData(x)$name_cluster). |
name_sample |
a character ("sample_id" by default), indicating the name of the colData(x) element which stores the sample id of each cell (i.e., colData(x)$name_sample). |
design |
a |
column_to_test |
indicates the column(s) of the design one wants to test (do not include the intercept). |
P_1 |
the number of permutations to use on all gene-cluster combinations. |
P_2 |
the number of permutations to use, when a (raw) p-value is < 0.1 (500 by default). |
P_3 |
the number of permutations to use, when a (raw) p-value is < 0.01 (2,000 by default). |
P_4 |
the number of permutations to use, when a (raw) p-value is < 0.001 (10,000 by default). In order to obtain a finer ranking for the most significant genes, if computational resources are available, we encourage users to set P_4 = 20,000. |
N_breaks |
the number of breaks at which to evaluate the comulative density function. |
min_non_zero_cells |
the minimum number of non-zero cells (across all samples) in each cluster for a gene to be evaluated. |
n_cores |
the number of cores to parallelize the tasks on (parallelization is at the cluster level: each cluster is parallelized on a thread). |
A data.frame
object.
Columns 'gene' and 'cluster_id' contain the gene and cell-cluster name, while 'p_val', 'p_adj.loc' and 'p_adj.glb' report the raw p-values, locally and globally adjusted p-values, via Benjamini and Hochberg (BH) correction.
In locally adjusted p-values ('p_adj.loc') BH correction is applied in each cluster separately, while in globally adjusted p-values ('p_adj.glb') BH correction is performed to the results from all clusters.
Column 'filtered' indicates whether a gene-cluster result was filtered (if TRUE), or analyzed (if FALSE).
A gene-cluster combination is filtered when fewer than 'min_non_zero_cells' non-zero cells are available.
Filtered results have raw and adjusted p-values equal to 1.
Simone Tiberi simone.tiberi@uzh.ch
plot_cdfs
, plot_densities
, log2_FC
, top_results
# load the input data: data("Kang_subset", package = "distinct") Kang_subset # create the design of the study: samples = Kang_subset@metadata$experiment_info$sample_id group = Kang_subset@metadata$experiment_info$stim design = model.matrix(~group) # rownames of the design must indicate sample ids: rownames(design) = samples design # Note that the sample names in `colData(x)$name_sample` have to be the same ones as those in `rownames(design)`. rownames(design) unique(SingleCellExperiment::colData(Kang_subset)$sample_id) # In order to obtain a finer ranking for the most significant genes, if computational resources are available, we encourage users to increase P_4 (i.e., the number of permutations when a raw p-value is < 0.001) and set P_4 = 20,000 (by default P_4 = 10,000). # The group we would like to test for is in the second column of the design, therefore we will specify: column_to_test = 2 set.seed(61217) res = distinct_test( x = Kang_subset, name_assays_expression = "logcounts", name_cluster = "cell", design = design, column_to_test = 2, min_non_zero_cells = 20, n_cores = 2) # We can optionally add the fold change (FC) and log2-FC between groups: res = log2_FC(res = res, x = Kang_subset, name_assays_expression = "cpm", name_group = "stim", name_cluster = "cell") # Visualize significant results: head(top_results(res)) # Visualize significant results from a specified cluster of cells: top_results(res, cluster = "Dendritic cells") # By default, results from 'top_results' are sorted by (globally) adjusted p-value; # they can also be sorted by log2-FC: top_results(res, cluster = "Dendritic cells", sort_by = "log2FC") # Visualize significant UP-regulated genes only: top_results(res, up_down = "UP", cluster = "Dendritic cells") # Plot density and cdf for gene 'ISG15' in cluster 'Dendritic cells'. plot_densities(x = Kang_subset, gene = "ISG15", cluster = "Dendritic cells", name_assays_expression = "logcounts", name_cluster = "cell", name_sample = "sample_id", name_group = "stim") plot_cdfs(x = Kang_subset, gene = "ISG15", cluster = "Dendritic cells", name_assays_expression = "logcounts", name_cluster = "cell", name_sample = "sample_id", name_group = "stim")