plot_PVCA {proBatch}R Documentation

Plot variance distribution by variable

Description

Plot variance distribution by variable

Usage

plot_PVCA(data_matrix, sample_annotation, sample_id_col = "FullRunName",
  feature_id_col = "peptide_group_label",
  technical_covariates = c("MS_batch", "instrument"),
  biological_covariates = c("cell_line", "drug_dose"),
  fill_the_missing = 0, threshold_pca = 0.6, threshold_var = 0.01,
  colors_for_bars = NULL, theme = "classic", plot_title = NULL)

Arguments

data_matrix

features (in rows) vs samples (in columns) matrix, with feature IDs in rownames and file/sample names as colnames. in most function, it is assumed that this is the log transformed version of the original data

sample_annotation

data matrix with 1) sample_id_col (this can be repeated as row names) 2) biological and 3) technical covariates (batches etc)

sample_id_col

name of the column in sample_annotation file, where the filenames (colnames of the data matrix are found)

feature_id_col

name of the column with feature/gene/peptide/protein ID used in the long format representation df_long. In the wide formatted representation data_matrix this corresponds to the row names.

technical_covariates

vector sample_annotation column names that are technical covariates

biological_covariates

vector sample_annotation column names, that are biologically meaningful covariates

fill_the_missing

numeric value that the missing values are substituted with

threshold_pca

the percentile value of the minimum amount of the variabilities that the selected principal components need to explain

threshold_var

the percentile value of weight each of the covariates needs to explain (the rest will be lumped together)

colors_for_bars

four-item color vector, specifying colors for the following categories: c('residual', 'biological', 'biol:techn', 'technical')

theme

ggplot theme, by default classic. Can be easily overriden (see examples)

plot_title

Title of the plot (usually, processing step + representation level (fragments, transitions, proteins))

Value

list of two items: plot =gg, df = pvca_res

See Also

sample_annotation_to_colors, ggplot

Examples

matrix <- example_proteome_matrix[1:50, ]
pvca_plot <- plot_PVCA(matrix, example_sample_annotation, 
technical_covariates = c('MS_batch', 'digestion_batch'),
biological_covariates = c("Diet", "Sex", "Strain"))


[Package proBatch version 1.0.0 Index]