HSC_population_size_estimate {ISAnalytics}R Documentation

Hematopoietic stem cells population size estimate.

Description

[Experimental] Hematopoietic stem cells population size estimate with capture-recapture models.

Usage

HSC_population_size_estimate(
  x,
  metadata,
  stable_timepoints = NULL,
  aggregation_key = c("SubjectID", "CellMarker", "Tissue", "TimePoint"),
  blood_lineages = blood_lineages_default(),
  timepoint_column = "TimePoint",
  seqCount_column = "seqCount_sum",
  seqCount_threshold = 3,
  nIS_threshold = 5,
  cell_type = "MYELOID",
  tissue_type = "PB"
)

Arguments

x

An aggregated integration matrix. See details.

metadata

An aggregated association file. See details.

stable_timepoints

A numeric vector or NULL if there are no stable time points.

aggregation_key

A character vector indicating the key used for aggregating x and metadata. Note that x and metadata should always be aggregated with the same key.

blood_lineages

A data frame containing information on the blood lineages. Users can supply their own, provided the columns CellMarker and CellType are present.

timepoint_column

What is the name of the time point column to use? Note that this column must be present in the key.

seqCount_column

What is the name of the column in x holding the values of sequence count quantification?

seqCount_threshold

A single numeric value. After re-aggregating x, rows with a value greater or equal will be kept, the others will be discarded.

nIS_threshold

A single numeric value. If a group (row) in the metadata data frame has a count of distinct integration sites strictly greater than this number it will be kept, otherwise discarded.

cell_type

The cell types to include in the models. Note that the matching is case-insensitive.

tissue_type

The tissue types to include in the models. Note that the matching is case-insensitive.

Value

A data frame with the results of the estimates

Input formats

Both x and metadata should be supplied to the function in aggregated format (ideally through the use of aggregate_metadata and aggregate_values_by_key). Note that the aggregation_key, aka the vector of column names used for aggregation, must contain at least the columns SubjectID, CellMarker, Tissue and a time point column (the user can specify the name of the column in the argument timepoint_column).

On time points

If stable_timepoints is a vector with length > 1, the function will look for the first available stable time point and slice the data from that time point onward. If NULL is supplied instead, it means there are no stable time points available. Note that 0 time points are ALWAYS discarded. Also, to be included in the analysis, a group must have at least 2 distinct non-zero time points.

Examples

data("integration_matrices", package = "ISAnalytics")
data("association_file", package = "ISAnalytics")
aggreg <- aggregate_values_by_key(
    x = integration_matrices,
    association_file = association_file,
    value_cols = c("seqCount", "fragmentEstimate")
)
aggreg_meta <- aggregate_metadata(association_file = association_file)
estimate <- HSC_population_size_estimate(
    x = aggreg,
    metadata = aggreg_meta,
    stable_timepoints = c(90, 180, 360),
    cell_type = "Other"
)

[Package ISAnalytics version 1.4.1 Index]