1 Overview

Small variants within the genome (single nucleotide variants/insertions/deletions) are a critical component in the basis for genetic diseases. The identification and summary of these types of variants is often a first step for the development of hypothesis regarding the role of these events in disease genesis and progression. The waterfall funtion is designed to effeciently summarize “small variant” (SNVs/indels) information at a cohort level. It is usefull for obtaining a broad sense of the type of variants observed in a cohort. Further waterfall will give a sense of the mutation burden, reccurently mutated genes, the mutually or co exclusivity between genes and the relation of variants to clinical data.

The purpose of this vignette is to display the many features of the waterfall function in order to give an in depth view of it’s parameters and functionality. For these examples the data frame brcaMAf originating from a truncated .maf file from TCGA and available within GenVisR will be used unless otherwise stated. Further for reproducability the seed for all examples has been set to == 426.

1.1 Functionality

1.1.1 Loading primary input

Parameters covered: fileType, variant_class_order

For basic use a user will only need to read a file of the proper type into R as a data frame and then supply this data frame to the waterfall function as the argument given to x. By default the data frame supplied is expected to correspond to a file in .maf (version 2.4) format (see below for additional supported formats). This data frame should have at a minimum the following column names “Tumor_Sample_Barcode”, “Hugo_Symbol”, “Variant_Classification”, and contain rows corresponding to mutation events. Further while any value is permissible for the “Tumor_Sample_Barcode” and “Hugo_Symbol” columns which correspond to a sample name and gene name respectively, specific values are expected for the “Variant_Classification” column (see table below). This is because waterfall is only capable of displaying a single variant type in the main plot for a cell (i.e. gene/sample). To achieve this waterfall will choose to plot the most deleterious variant based on a hierarchy predefined for a .maf file. This heiararchy follows the order from top to bottom of the legend output with the plot.

# Load the GenVisR package

# Plot with the MAF file type specified (default) The mainRecurCutoff
# parameter is described in the next section
waterfall(brcaMAF, fileType = "MAF", mainRecurCutoff = 0.05)