Small variants within the genome (single nucleotide variants/insertions/deletions) are a critical component in the basis for genetic diseases. The identification and summary of these types of variants is often a first step for the development of hypothesis regarding the role of these events in disease genesis and progression. The waterfall
funtion is designed to effeciently summarize “small variant” (SNVs/indels) information at a cohort level. It is usefull for obtaining a broad sense of the type of variants observed in a cohort. Further waterfall
will give a sense of the mutation burden, reccurently mutated genes, the mutually or co exclusivity between genes and the relation of variants to clinical data.
The purpose of this vignette is to display the many features of the waterfall
function in order to give an in depth view of it’s parameters and functionality. For these examples the data frame brcaMAf
originating from a truncated .maf file from TCGA and available within GenVisR
will be used unless otherwise stated. Further for reproducability the seed for all examples has been set to == 426.
Parameters covered: fileType
, variant_class_order
For basic use a user will only need to read a file of the proper type into R as a data frame and then supply this data frame to the waterfall
function as the argument given to x
. By default the data frame supplied is expected to correspond to a file in .maf (version 2.4) format (see below for additional supported formats). This data frame should have at a minimum the following column names “Tumor_Sample_Barcode”,
“Hugo_Symbol”, “Variant_Classification”, and contain rows corresponding to mutation events. Further while any value is permissible for the “Tumor_Sample_Barcode” and “Hugo_Symbol” columns which correspond to a sample name and gene name respectively, specific values are expected for the “Variant_Classification” column (see table below). This is because waterfall
is only capable of displaying a single variant type in the main plot for a cell (i.e. gene/sample). To achieve this waterfall
will choose to plot the most deleterious variant based on a hierarchy predefined for a .maf file. This heiararchy follows the order from top to bottom of the legend output with the plot.
# Load the GenVisR package
library("GenVisR")
set.seed(426)
# Plot with the MAF file type specified (default) The mainRecurCutoff
# parameter is described in the next section
waterfall(brcaMAF, fileType = "MAF", mainRecurCutoff = 0.05)