outliers_by_pool_fragments {ISAnalytics} | R Documentation |
outliers_by_pool_fragments( metadata, key = "BARCODE_MUX", outlier_p_value_threshold = 0.01, normality_test = FALSE, normality_p_value_threshold = 0.05, transform_log2 = TRUE, per_pool_test = TRUE, pool_col = "PoolID", min_samples_per_pool = 5, flag_logic = "AND", keep_calc_cols = TRUE, report_path = default_report_path() )
metadata |
The metadata data frame |
key |
A character vector of numeric column names |
outlier_p_value_threshold |
The p value threshold for a read to be considered an outlier |
normality_test |
Perform normality test? Normality is assessed for each column in the key using Shapiro-Wilk test and if the values do not follow a normal distribution, other calculations are skipped |
normality_p_value_threshold |
Normality threshold |
transform_log2 |
Perform a log2 trasformation on values prior the actual calculations? |
per_pool_test |
Perform the test for each pool? |
pool_col |
A character vector of the names of the columns that uniquely identify a pool |
min_samples_per_pool |
The minimum number of samples that a pool
needs to contain in order to be processed - relevant only if
|
flag_logic |
A character vector of logic operators to obtain a global flag formula - only relevant if the key is longer than one. All operators must be chosen between: AND, OR, XOR, NAND, NOR, XNOR |
keep_calc_cols |
Keep the calculation columns in the output data frame? |
report_path |
The path where the report file should be saved.
Can be a folder, a file or NULL if no report should be produced.
Defaults to |
This particular test calculates for each column in the key
The zscore of the values
The tstudent of the values
The the distribution of the tstudent values
Optionally the test can be performed for each pool and a normality test can be run prior the actual calculations. Samples are flagged if this condition is respected:
tdist < outlier_p_value_threshold & zscore < 0
If the key contains more than one column an additional flag logic can be
specified for combining the results.
Example:
let's suppose the key contains the names of two columns, X and Y
key = c("X", "Y")
if we specify the the argument flag_logic = "AND"
then the reads will
be flagged based on this global condition:
(tdist_X < outlier_p_value_threshold & zscore_X < 0) AND
(tdist_Y < outlier_p_value_threshold & zscore_Y < 0)
The user can specify one or more logical operators that will be applied in sequence.
A data frame of metadata with the column to_remove
Other Outlier tests:
available_outlier_tests()
data("association_file", package = "ISAnalytics") flagged <- outliers_by_pool_fragments(association_file, report_path = NULL ) head(flagged)