sumCountsAcrossCells {scater}R Documentation

Sum counts across sets of cells

Description

Sum together expression values (by default, counts) for each set of cells and for each feature.

Usage

sumCountsAcrossCells(x, ...)

aggregateAcrossCells(x, ...)

## S4 method for signature 'ANY'
sumCountsAcrossCells(x, ids, subset_row = NULL,
  subset_col = NULL, average = FALSE, BPPARAM = SerialParam())

## S4 method for signature 'SummarizedExperiment'
sumCountsAcrossCells(x, ...,
  exprs_values = "counts")

## S4 method for signature 'SummarizedExperiment'
aggregateAcrossCells(x, ids, ...,
  use_exprs_values = "counts")

## S4 method for signature 'SingleCellExperiment'
aggregateAcrossCells(x, ids, ...,
  subset_row = NULL, use_exprs_values = "counts", use_altexps = TRUE)

Arguments

x

For sumCountsAcrossCells, a numeric matrix of counts containing features in rows and cells in columns. Alternatively, a SummarizedExperiment object containing such a count matrix.

For aggregateAcrossCells, a SingleCellExperiment or SummarizedExperiment containing a count matrix.

...

For the generics, further arguments to be passed to specific methods.

For the sumCountsAcrossCells SummarizedExperiment method, further arguments to be passed to the ANY method.

For aggregateAcrossCells, further arguments to be passed to sumCountsAcrossCells.

ids

A factor specifying the set to which each cell in x belongs.

Alternatively, a DataFrame of such vectors or factors, in which case each unique combination of levels defines a set.

subset_row

An integer, logical or character vector specifying the features to use. Defaults to all features.

For the SingleCellExperiment method, this argument will not affect alternative Experiments, where summation is always performed for all features (or not at all, depending on use_alt_exps).

subset_col

An in teger, logical or character vector specifying the cells to use. Defaults to all cells with non-NA entries of ids.

average

Logical scalar indicating whether the average should be computed instead of the sum.

BPPARAM

A BiocParallelParam object specifying whether summation should be parallelized.

exprs_values

A string or integer scalar specifying the assay of x containing the matrix of counts (or any other expression quantity that can be meaningfully summed).

use_exprs_values

A character or integer vector specifying the assay(s) of x containing count matrices.

use_altexps

Logical scalar indicating whether aggregation should be performed for alternative experiments in x.

Alternatively, a character vector specifying the names of the alternative experiments to be aggregated.

Details

This function provides a convenient method for aggregating counts across multiple columns for each feature. A typical application would be to sum counts across all cells in each cluster to obtain “pseudo-bulk” samples for further analysis.

The behaviour of this function is equivalent to that of colsum. However, this function can operate on any matrix representation in object; can do so in a parallelized manner for large matrices without resorting to block processing; and can natively support combinations of multiple factors in ids.

Any NA values in ids are implicitly ignored and will not be considered during summation. This may be useful, e.g., to remove undesirable cells by setting their entries in ids to NA. Alternatively, we can explicitly select the cells of interest with subset_col.

Setting average=TRUE will compute the average in each set rather than the sum. This is particularly useful if x contains expression values that have already been normalized in some manner, as computing the average avoids another round of normalization to account for differences in the size of each set.

Value

For sumCountsAcrossCells with a factor ids, a count matrix is returned with one column per level of ids. For each feature, counts for all cells in the same set are summed together. Columns are ordered by levels(ids).

For sumCountsAcrossCells with a DataFrame ids, a SummarizedExperiment is returned containing a similar count matrix in the first assay. Each column corresponds to a unique combination of levels in ids and contains the sum of counts for all cells with that combination. The identities of the levels for each column are reported in the colData.

For aggregateAcrossCells, a SummarizedExperiment of the same class as x is returned, containing summed matrices generated by sumCountsAcrossCell on all assays specified by use_exprs_values. Column metadata is retained for the first instance of a cell from each set in ids. If ids is a DataFrame, the combination of levels corresponding to each column is also reported in the column metadata.

Author(s)

Aaron Lun

Examples

example_sce <- mockSCE()
ids <- sample(LETTERS[1:5], ncol(example_sce), replace=TRUE)

out <- sumCountsAcrossCells(example_sce, ids)
head(out)

batches <- sample(1:3, ncol(example_sce), replace=TRUE)
out2 <- sumCountsAcrossCells(example_sce, DataFrame(label=ids, batch=batches))
out2

[Package scater version 1.14.0 Index]