plot_PVCA {proBatch} | R Documentation |
Plot variance distribution by variable
plot_PVCA(data_matrix, sample_annotation, sample_id_col = "FullRunName", feature_id_col = "peptide_group_label", technical_covariates = c("MS_batch", "instrument"), biological_covariates = c("cell_line", "drug_dose"), fill_the_missing = 0, threshold_pca = 0.6, threshold_var = 0.01, colors_for_bars = NULL, theme = "classic", plot_title = NULL)
data_matrix |
features (in rows) vs samples (in columns) matrix, with feature IDs in rownames and file/sample names as colnames. in most function, it is assumed that this is the log transformed version of the original data |
sample_annotation |
data matrix with 1) |
sample_id_col |
name of the column in sample_annotation file, where the filenames (colnames of the data matrix are found) |
feature_id_col |
name of the column with feature/gene/peptide/protein
ID used in the long format representation |
technical_covariates |
vector |
biological_covariates |
vector |
fill_the_missing |
numeric value that the missing values are substituted with |
threshold_pca |
the percentile value of the minimum amount of the variabilities that the selected principal components need to explain |
threshold_var |
the percentile value of weight each of the covariates needs to explain (the rest will be lumped together) |
colors_for_bars |
four-item color vector, specifying colors for the following categories: c('residual', 'biological', 'biol:techn', 'technical') |
theme |
ggplot theme, by default |
plot_title |
Title of the plot (usually, processing step + representation level (fragments, transitions, proteins)) |
list of two items: plot =gg, df = pvca_res
sample_annotation_to_colors
, ggplot
matrix <- example_proteome_matrix[1:50, ] pvca_plot <- plot_PVCA(matrix, example_sample_annotation, technical_covariates = c('MS_batch', 'digestion_batch'), biological_covariates = c("Diet", "Sex", "Strain"))