agglomerate-methods {mia}R Documentation

Agglomerate data using taxonomic information

Description

agglomerateByRank can be used to sum up data based on the association to certain taxonomic ranks given as rowData. Only available taxonomicRanks can be used.

Usage

## S4 method for signature 'SummarizedExperiment'
agglomerateByRank(
  x,
  rank = taxonomyRanks(x)[1],
  onRankOnly = FALSE,
  na.rm = FALSE,
  empty.fields = c(NA, "", " ", "\t", "-", "_"),
  ...
)

## S4 method for signature 'SingleCellExperiment'
agglomerateByRank(x, ..., altexp = NULL, strip_altexp = TRUE)

## S4 method for signature 'TreeSummarizedExperiment'
agglomerateByRank(x, ..., agglomerateTree = FALSE)

Arguments

x

a SummarizedExperiment object

rank

a single character defining a taxonomic rank. Must be a value of taxonomicRanks() function.

onRankOnly

TRUE or FALSE: Should information only from the specified rank be used or from ranks equal and above? See details. (default: onRankOnly = FALSE)

na.rm

TRUE or FALSE: Should taxa with an empty rank be removed? Use it with caution, since empty entries on the selected rank will be dropped. This setting can be tweaked by defining empty.fields to your needs. (default: na.rm = TRUE)

empty.fields

a character value defining, which values should be regarded as empty. (Default: c(NA, "", " ", "\t")). They will be removed if na.rm = TRUE before agglomeration.

...

arguments passed to agglomerateByRank function for SummarizedExperiment objects, mergeRows and sumCountsAcrossFeatures.

altexp

String or integer scalar specifying an alternative experiment containing the input data.

strip_altexp

TRUE or FALSE: Should alternative experiments be removed prior to agglomeration? This prevents to many nested alternative experiments by default (default: strip_altexp = TRUE)

agglomerateTree

TRUE or FALSE: should rowTree() also be agglomerated? (Default: agglomerateTree = FALSE)

Details

Based on the available taxonomic data and its structure setting onRankOnly = TRUE has certain implications on the interpretability of your results. If no loops exist (loops meaning two higher ranks containing the same lower rank), the results should be comparable. you can check for loops using detectLoop.

Agglomeration sum up values of assays at specified taxonomic level. Certain assays, e.g. those that include binary or negative values, can lead to meaningless values, when values are summed. In those cases, consider doing agglomeration first and then transformation.

Value

A taxonomically-agglomerated, optionally-pruned object of the same class as x.

See Also

mergeRows, sumCountsAcrossFeatures

Examples

data(GlobalPatterns)
# print the available taxonomic ranks
colnames(rowData(GlobalPatterns))
taxonomyRanks(GlobalPatterns)

# agglomerate at the Family taxonomic rank
x1 <- agglomerateByRank(GlobalPatterns, rank="Family")
## How many taxa before/after agglomeration?
nrow(GlobalPatterns)
nrow(x1)

# with agglomeration of the tree
x2 <- agglomerateByRank(GlobalPatterns, rank="Family",
                       agglomerateTree = TRUE)
nrow(x2) # same number of rows, but
rowTree(x1) # ... different
rowTree(x2) # ... tree

 # If assay contains binary or negative values, summing might lead to meaningless
 # values, and you will get a warning. In these cases, you might want to do 
 # agglomeration again at chosen taxonomic level.
 tse <- transformSamples(GlobalPatterns, method = "pa")
 tse <- agglomerateByRank(tse, rank = "Genus")
 tse <- transformSamples(tse, method = "pa")

# removing empty labels by setting na.rm = TRUE
sum(is.na(rowData(GlobalPatterns)$Family))
x3 <- agglomerateByRank(GlobalPatterns, rank="Family", na.rm = TRUE)
nrow(x3) # different from x2

# Because all the rownames are from the same rank, rownames do not include 
# prefixes, in this case "Family:". 
print(rownames(x3[1:3,]))

# To add them, use getTaxonomyLabels function.
rownames(x3) <- getTaxonomyLabels(x3, with_rank = TRUE)
print(rownames(x3[1:3,]))

## Look at enterotype dataset...
data(enterotype)
## print the available taxonomic ranks. Shows only 1 rank available
## not useful for agglomerateByRank
taxonomyRanks(enterotype)

[Package mia version 1.2.0 Index]