compute_near_integrations {ISAnalytics}R Documentation

Scans input matrix to find and merge near integration sites.

Description

[Stable] This function scans the input integration matrix to detect eventual integration sites that are too "near" to each other and merges them into single integration sites adjusting their values if needed.

Usage

compute_near_integrations(
  x,
  threshold = 4,
  keep_criteria = "max_value",
  strand_specific = TRUE,
  value_columns = c("seqCount", "fragmentEstimate"),
  max_value_column = "seqCount",
  map_as_file = TRUE,
  file_path = default_report_path()
)

Arguments

x

An integration matrix

threshold

A single integer that represents an absolute number of bases for which two integrations are considered distinct. If the threshold is set to 3 it means, provided fields chr and strand are the same, integrations sites which have at least 3 bases in between them are considered distinct (e.g. (1, 14576, +) and (1, 14580, +) are considered distinct)

keep_criteria

While scanning, which integration should be kept? The 2 possible choices for this parameter are:

  • "max_value": keep the integration site which has the highest value (and collapse other values on that integration).

  • "keep_first": keeps the first integration

strand_specific

Should strand be considered? If yes, for example these two integration sites (chr = "1", strand = "+", integration_locus = 14568) and (chr = "1", strand = "-", integration_locus = 14568) are considered different and not grouped together.

value_columns

Character vector, contains the names of the numeric experimental columns

max_value_column

The column that has to be considered for searching the maximum value

map_as_file

Produce recalibration map as a .tsv file?

file_path

String representing the path were the file will be saved. Can be either a folder or a file. Relevant only if map_as_file is TRUE.

Details

The whole matrix is scanned with a sliding window mechanism: for each row in the integration matrix an interval is calculated based on the threshold value, then a "look ahead" operation is performed to detect subsequent rows which integration locuses fall in the interval. If CompleteAmplificationIDs of the near integrations are different only the locus value (and optionally GeneName and GeneStrand if the matrix is annotated) is modified, otherwise rows with the same id are aggregated and values are summed. The function will also produce a re-calibration map: this data frame contains the reference of pre-recalibration values for chr, strand and integration_locus and the value to which that integration was changed to.

Value

An integration matrix with same or less number of rows

Note

We do recommend to use this function in combination with comparison_matrix to automatically perform re-calibration on all quantification matrices.

Examples

data("integration_matrices", package = "ISAnalytics")
rec <- compute_near_integrations(
    x = integration_matrices, map_as_file = FALSE
)
head(rec)

[Package ISAnalytics version 1.4.2 Index]