RepViz-vignette

Thomas Faux, Kalle Rytkönen, Asta Laiho, Laura L. Elo

2020-04-28

true

Introduction

DNA sequencing has become an essential part of biomedicine and biology. Commonly, the data are analyzed with automatized pipelines. However, a visual inspection by a researcher is often useful both at the level of basic quality control and as a confirmation of the analysis results. Efficient visualization also often has an essential role in guiding the analysis design and interpretation of the results. In an effort to provide a simple and efficient visualization of genomic regions for the user, we have developed a replicate driven tool, RepViz. RepViz allows simultaneous viewing of both intra- and intergroup variation in sequencing counts of the studied conditions, as well as their comparison to the output features (e.g. identified peaks) from user selected data analysis methods. The RepViz tool is primarily designed for chromatin data such as ChIP-seq and ATAC-seq, but can also be used with other sequencing data such as RNA-seq, or combinations of different types of genomic data.

Before starting

Before executing the commands, it will be necessary to copy the example files to a temporary file and then move the working directory to it:

file.copy(from = list.files(system.file("extdata", package = "RepViz"), full.names = TRUE),to = tempdir())
setwd(tempdir())

The input files

The tool takes two CSV files as an input. These CSV files contain the paths to the different files needed for RepViz. The first CSV file contains the file paths to the BAM files and the information about group name. The second CSV file contains the file paths to the BED files. Below is an example of a BAM input file BAM_input.csv containing two columns, the path to the BAM files and the group name.

The BAM input file:

bam file group
rep1_1.bam NORMAL
rep2_1.bam NORMAL
rep3_1.bam NORMAL
rep4_1.bam NORMAL
rep5_1.bam NORMAL
rep1_2.bam TUMOR
rep2_2.bam TUMOR
rep3_2.bam TUMOR
rep4_2.bam TUMOR
rep5_2.bam TUMOR

Below is an example of the BED_input.csv containing two columns; the path to the BED file and the name that will appear in the legend. For the purpose of this example the file contains only one line, but it can contain multiple lines.

The BED input file

bed file Legend
consensus.bed MACS

Note that the corresponding BAI files need to be present in the same folder with the BAM files. A BAI file is the index file of the BAM file. It takes the same name as the BAM file with an added suffix “.bai”.

##  [1] "rep1_1.bam"     "rep1_1.bam.bai" "rep1_2.bam"     "rep1_2.bam.bai"
##  [5] "rep2_1.bam"     "rep2_1.bam.bai" "rep2_2.bam"     "rep2_2.bam.bai"
##  [9] "rep3_1.bam"     "rep3_1.bam.bai" "rep3_2.bam"     "rep3_2.bam.bai"
## [13] "rep4_1.bam"     "rep4_1.bam.bai" "rep4_2.bam"     "rep4_2.bam.bai"
## [17] "rep5_1.bam"     "rep5_1.bam.bai" "rep5_2.bam"     "rep5_2.bam.bai"

The plotting function

Once the CSV input files are ready, the user needs to declare the region that will be visualized together with the genome matching the BAM and BED files. For this example we are using the region in the vicinity of the VPS29 gene. The genomes that are currently implemented are hg19, hg38 and mm10. The avgTrack and geneTrack logicals enable to control whether the gene track and average track are plotted or not.

region <- GRanges("chr12:110938000-110940000")
RepViz::RepViz(region = region,
    genome = "hg19",
    BAM = "BAM_input.csv",
    BED = "BED_input.csv",
    avgTrack = TRUE,
    geneTrack = TRUE,
    verbose = FALSE)

Bellow is the output of the visualization. The two upper panels show the different replicates within each group,with the number of panels depending on the number of groups. The third panel shows the group averages. The fourth panel visualizes the genomic regions specified in the BED file, whereas The lowest panel shows the genomic track.

## The given region is not a GRanges object