GenVisR 1.14.2
GenVisR is being converted to S4 to increase flexibility for users. The S4-based functions in this vignette are operative but changes may still occur and documentation is still being developed. Please keep in mind that the new functions based on S4 are similarily named to the established functions; both are available for used in this release. For example:
In General new functions start with a capital, old functions do not
# new S4 function (note the capital W)
?Waterfall
# older established function
?waterfall
Intuitively visualizing and interpreting data from high-throughput genomic technologies continues to be challenging. Often creating a publication ready graphic not only requires extensive manipulation of data but also an in-depth knowledge of graphics libraries. As such creating such visualizations has traditionally taken a significant amount of time in regards to both data pre-processing and aesthetic manipulations. GenVisR (Genomic Visulizations in R) attempts to alleviate this burden by providing highly customizable publication-quality graphics in an easy to use structure. Many of the plotting functions in this library have a focus in the realm of human cancer genomics however we support a large number of species and many of the plotting methods incorprated within are of use for visualizing any type of genomic abnormality.
For the majority of users we recommend installing GenVisR from the release branch of Bioconductor, Installation instructions using this method can be found on the GenVisR landing page on Bioconductor.
Please note that GenVisR imports a few packages that have “system requirements”, in most cases these requirements will already be installed. If they are not please follow the instructions to install these packages given in the R terminal. Briefly these packages are: “libcurl4-openssl-dev” and “libxml2-dev”
Once GenVisR is successfully installed it will need to be loaded. For the purposes of this vignette we do this here and set a seed to ensure reproducibility.
# set a seed
set.seed(426)
# load GenVisR into R
library(GenVisR)
Genomic changes come in many forms, one such form are single nucleotide variants (SNVS) and small insertions and deletions (INDELS). These variations while small can have a devestating impact on the health of cells and are the basis for many genomic diseases. We provide 5 functions for visualizing the impact these changes can have and to aid in recognizing patterns that could be associated with genomic diseases.
In order to be as flexible as possible GenVisR is designed to work with 3 file formats which encode information regarding small variants. The first of these are vep formated files from the Variant Effect Predictor (VEP), a tool developed by ensembl to annotate the effect a variant has on a genome. Mutation Annotation Format (MAF) files are also widely used in the bioinformatics community particularly within The Cancer Genome Atlas (TCGA) project and are supported by GenVisR. Annotation files from the Genome Modeling System (GMS) are also supported. In the interest of flexibility a fourth option exists as well, users can supply a data.table
object to any function designed to work with small variants. Users should note however that specific columns are required for functions when using this method (see specific functions for details).
The output from Waterfall()
, Lolliplot()
, Rainfall()
, SmallVariantSummary()
, and MutSpectra()
are objects which store the data which was plotted as well as the individual subplots, and the final arranged plot. The reason for storing the data and plots in such a manner is two-fold. First allowing access to the data which was plotted provides transparency as to how the plot was produced. Secondly as all this items can be accessed by the user it allows for greater flexibility for customizing plots. These data and plots can be accessed with the getData()
and getGrob()
functions respectively. The final plot can be printed with the drawPlot()
function which takes one of the afore mentioned objects.
When constructing a plot with GenVisR there are three basic steps. First one must load the data into R, as mentioned previously GenVisR supports three filetypes which can be read in this manner. An example for each is supplied below using test data installed with the package.
# get the disk location for maf test file
testFileDir <- system.file("extdata", package = "GenVisR")
testFile <- Sys.glob(paste0(testFileDir, "/brca.maf"))
# define the objects for testing
mafObject <- MutationAnnotationFormat(testFile)
# get the disk location for test files
testFileDir <- system.file("extdata", package = "GenVisR")
testFile <- Sys.glob(paste0(testFileDir, "/*.vep"))
# define the object for testing
vepObject <- VEP(testFile)
# get the disk location for test files
testFileDir <- system.file("extdata", package = "GenVisR")
testFile <- Sys.glob(paste0(testFileDir, "/FL.gms"))
# define the objects for testing
gmsObject <- GMS(testFile)
## This function is part of the new S4 feature and is under active development
In some cases you may want to view data after it has been read in, there are functions available to make viewing these data easier. Briefly these are getSample()
, getPosition()
, getMutation()
, and getMeta()
which are globally available for each object type (GMS, VEP, etc.). Let’s test one out by viewing the samples from the VEP file that was read in with VEP()
.
# view the samples from the VEP file
getSample(vepObject)
All of these functions expect a path to a file of the appropriate type. Optionally a wildcard can be supplied with * as was done with VEP()
causing the function to read in multiple files at once. With the exception of MAF files these functions expect the files being read in to have a column called “sample” (MAF files already have a sample designation within the file). If a sample column is not found one will be created based on the filenames. Each of these functions will attempt to infer a file specification version from the file header. If a version is not found one will need to be specified via the version
parameter.
Once the data is read in and stored in one of the previously mentioned objects the data can be plotted with one of the plotting functions. We will go over each plotting function in more detail in section 4 however to continue with our example pipeline let’s create a simple waterfall plot for the VEP annotated data.
waterfallPlot <- Waterfall(vepObject, recurrence = 0.4)
## This function is part of the new S4 feature and is under active development, did you mean to use waterfall() with a lower case w?
With the plot object created we can view the actual data making up the plot with the getData()
function. This is an accessor function to pull out specific data making up the plot (Refer to the R documentation for Waterfall()
to see available slots in the object which hold data). Let’s use it here to extract the data making up the main plot panel, we can specify the slot either by name or it’s index.
# extract data by the slot name
getData(waterfallPlot, name="primaryData")
# extract data by the slot index (same as above)
getData(waterfallPlot, index=1)
With the plot object defined we can now visualize the plot, this is achieved by calling the drawPlot()
function on the plot object. The plot can be saved simply by opening a graphics device using the base R functions such as pdf()
, svg()
etc. It should be noted that space within the graphics device will need to be adequate in order to draw a plot. If plots appear to be overlapping try giving the plot more space to draw via the height
and width
parameters.
# draw the plot
drawPlot(waterfallPlot)
# draw the plot and save it to a pdf
pdf(file="waterfall.pdf", height=10, width=15)
drawPlot(waterfallPlot)
dev.off()
The Waterfall plot is designed to make recognizing patterns related to co-occurring or mutually exclusive events easier to visualize across a cohort. It plots mutations for a gene/sample in a hierarchical order beginning with the most recurrently mutated gene for a cohort and working it’s way down. By default two sub-plots are also displayed which show the mutation burden for a sample and the number of genes effected by a mutation in a cohort. Let’s start by making a simple waterfall plot with default parameters from the MAF file we read in earlier.
# draw a waterfall plot for the maf object
drawPlot(Waterfall(vepObject, recurrence = 0.2))
## This function is part of the new S4 feature and is under active development, did you mean to use waterfall() with a lower case w?