Interactive Differential Expression Analysis Tool
Differential gene expression analysis has become an increasingly popular tool in determining and viewing up and/or down experssed genes between two sets of samples. The goal of Differential gene expression analysis is to find genes or transcripts whose difference in expression, when accounting for the variance within condition, is higher than expected by chance. DESeq2 https://bioconductor.org/packages/release/bioc/html/DESeq2.html is an R package available via Bioconductor and is designed to normalize count data from high-throughput sequencing assays such as RNA-Seq and test for differential expression (Love et al. 2014). With multiple parameters such as padjust values, log fold changes, plot styles, and so on, altering plots created with your DE data can be a hassle as well as time consuming. The Differential Expression Browser uses DESeq2 coupled with shiny to produce real-time changes within your plot queries and allows for interactive browsing of your DESeq results. In addition to DESeq analysis, DEBrowser also offers a variety of other plots and analysis tools to help visualize your data even further.
Before you start;
First, you will have to install R and/or RStudio. (On Fedora/Red Hat/CentOS, these packages have to be installed; openssl-devel, libxml2-devel, libcurl-devel, libpng-devel) Running these simple commands will launch the DEBrowser within your local machine:
# Installation instructions:
# 1. Install DEBrowser and its dependencies by running the lines below
# in R or RStudio.
source(“http://www.bioconductor.org/biocLite.R”)
biocLite("debrowser")
# 2. Load the library
library(DEBrowser)
# 3. Start DEBrowser
startDEBrowser()
Once you have the DEBrowser running, a page will load asking to choose a CSV file or to load the demo data. In order to run DESeq2, we are going to need gene quantifications for those genes contained in a tab-seperated values (TSV) format. The file values must contain the gene, transcript, and the samples count values you wish to enter into DEBrowser.
IE:
# TSV:
gene transcript exper_rep1 exper_rep2 control_rep1 control_rep2
DQ714826 uc007tfl.1 0.00 0.00 0.00 0.00
DQ551521 uc008bml.1 0.00 0.00 0.00 0.00
AK028549 uc011wpi.1 2.00 1.29 0.00 0.00
You can also view/use the demo data by clicking the ‘Load Demo!’ text as an example. For the case study demo data, feel free to download our case study demo file at http://galaxyweb.umassmed.edu/pub/DC/advanced_demo.tsv After obtaining and loading in the gene quantifications file, you are then able to view QC information of your quantifications or to continue on to running DESeq2.
Figure 1: The initial options selection.
Upon selection of QC information, you will be shown an all-to-all plot of your samples. This sample-by-sample comparison will help you visualize possible descrepencies between replicate samples, in case you may want to omit them for further analysis. To the left of this plot are various plot-shaping options you can alter to more easily view the all-to-all plot.
Additionally, two addition QC plots are available for you to use: Heatmap and PCA plots. The heatmap will display genes for each sample within your dataset in the form of a heatmap and PCA will display Principal component analysis of your dataset. You have the option of veiwing an interactive heatmap by selecting the ‘Interactive’ checkbox in the left side panel when you have selected the Heatmap option. You can select these various plot options by selecting the type of plot you wish to view on the left panel.
Figure 2: Display of the all-to-all plot in the initial QC plots page.
You can also view the genes within your quantification file in various ways. The ‘Tables’ tab will bring you to a table setup based on the dataset you have selected on the left options panel. The ‘All detected’ option lists all of the genes present within your file. The ‘Selected’ option lets your browser your gene selection based on your interactive heatmap selection. The Last option, ‘Most Varied’, will display your top N varied genes. You can alter the value of N by selecting ‘most-varied’ from the dropdown menu on the left.
Figure 3: Display of the heatmap in the initial QC plots page.
Figure 4: Display of the PCA plot in the initial QC plots page.
Figure 5: Display of most varied genes.
Upon selecting to run DESeq, you are then able to select which samples will be selected for your first condition and second condition to use for differential expression analysis. By clicking the ‘Add New Comparison’ button, you can add as many different comparisons as you want. Sample names are created based on the column headers within your data file. Once you’ve selected your comparisons, you are then ready to run DESeq2 to calculate differential expression by clicking on the ‘Submit!’ button.
Figure 6: Menus after loading in a sample.
After clicking on the ‘Submit!’ button, DESeq2 will analyze your comparisons and store the results into seperate data tables. Shiny will then allow you to access this data, with multiple interactive features, at the click of a button. It is important to note that the resulting data produced from DESeq is normalized. Upon finishing the DESeq analysis, a tab-based menu will appear with multiple options.
Figure 7: List of the tabbed menus in DEBrowser.
The first tab, the ‘Main Plots’ section, is where you will be able to view the interactive results plots. On the left hand side of the screen will be the options you have to alter the padj and fold change cutoff values, what specific data set to use such as up or down regulated genes, what comparison dataset you would like to use to plot, and what type of plot you would like to view your results in. Plot choices include:
Main plot
Volcano plot
MA plot
Figure 8: Main scatter plot and the zoomed in main scatterplot.
Figure 9: Main volcano Plot and the zoomed in main volcano plot.
Figure 10: Main MA plot and the zoomed in main MA plot.
Once you have selected your values, you can hit the ‘Submit!’ button to create your interactive plots!
The top left plot is whichever plot you have selected to use to analyze your results. Up-regulated genes are displayed in green while down-regulated genes are displayed in red. Hovering over a gene on this plot will display the bottom two plots: the genes normalized variation and colored by condition in the left graph, and the normalized variation between conditions within the right graph. Hovering over a gene will also display information about that gene in regards to both conditions you have selected. By clicking and dragging your mouse to create a selection over the main graph, you will create the top right plot, or the zoomed in version of your selection. If you are going to change any of the parameters on the left, please make sure to re-click the ‘Submit!’ button to update the graphs. You can also change which type of dataset to use within the main plots by selecting from the drop down dataset box. Additionally, you can further filter these datasets by typeing in the genes of interest, or regex for specific genes, to search for those specific genes within the dataset. It’s also worth noting that the plots are resizable as well as downloable.
Figure 11: The main plots page within DEBrowser.
Figure 12: Display of the most varied genes as a scatter plot.
Figure 13: Display of the geneset list as a scatter plot.
Selecting the ‘QC Plots’ tab will take you to the quality control plots section. These QC plots are very similar to the QC plots shown before running DESeq, however the dataset being used here depends on the one you select on the left menu. In addition to the all-to-all plot shown within the previous QC analysis, users can also view a heatmap and PCA plot of their analyzed data by selecting the proper plot on the left menu.
Figure 14: Display of the heatmap within DEBrowser.
The heatmap is a great way to analyze replicate results of genes all in one simple plot. Users have the option to change the clustering method used as well as the distance method used to display their heatmap. In addition, you can also change the size of the heatmap produced and adjust the p-adjust and fold change cut off for this plot as well. Once all of the parameters have been set, click the ‘Submit!’ button at the bottom of the left menu to generate your heatmap.
You can also select to view an interactive version of the heatmap by clicking on the ‘Interactive’ button on the left panel under the height and width options. Selecting this feature changes the heatmap into an interactive version with two colors, allowing you to select specific genes to be compared within the GO term plots. In order to use the interactive heatmap selection within your GO term query, you must use either the up+down dataset or the most varied dataset for the heatmap display. This will allow you to compare interesting clusters found within the the heatmap within our GO Term analysis section which will be discussed later within the materials.
Figure 15: View of the interactive Heatmap
Figure 16: Display of the PCA plot within DEBrowser.
Prinicipal Component Analysis (PCA) is another excellent method of checking replicates. PCA calculates the variance between all of the samples genes within your current comparison set and creates a two-dimensional graph to represent the proportion of variance explained in different components. Within the PCA plot section you can select the p-adjust value, fold change cut off value, which comparison set to use, which dataset to use, the height and width of the corresponding plots, as well as which prinicipal components to analyze by changing the appropriate values on the left menu.
The next tab, ‘GO Term’, takes you to the ontology comparison portion of DEBrowser. From here you can select the standard dataset options such as p-adjust value, fold change cut off value, which comparison set to use, and which dataset to use on the left menu. In addition to these parameters, you also can choose from the 4 different ontology plot options: ‘enrichGO’, ‘enrichKEGG’, ‘Disease’, and ‘compareCluster’. Selecting one of these plot options queries their specific databases with your current DESeq results. By selecting the ‘selection’ dataset on the left panel after selecting specific genes from the interactive heatmap, you will be able to compare your specific gene selection within the various GO Term databases.
In order to use your selected genes from the interactive heatmap, you must first make your selection within the interactive heatmap. Next you will want to switch to the GO Terms tab and use the ‘selected’ dataset. Once all your other parameters have been selected, hit submit and you will use your selected genes from the interactive heatmap in your GO Term analysis.
Figure 17: Display of the GO Plot section within DEBrowser.
Figure 18: Display of the GO dotplot section within DEBrowser.
Figure 19: Display of the DO plot section within DEBrowser.
Figure 20: Display of the DO dotplot section within DEBrowser.
Figure 21: Display of the KEGG dotplot section within DEBrowser.
Figure 22: Display of Heatmap selected enriched GO Term search
The last tab, ‘Tables’, contains various result information in table formats. The ‘All Detected’ option contains the list of all the genes within the TSV/CSV provided with the corresponding DESeq analyses. Up-regulated values are shown in green while down-regulated values are displayed in red. To view any particular dataset’s custom options, the dataset type must be selected.
The ‘Up’ option contains a list of all the up-regulated genes based on the options selected on the left panel. The ‘Down’ option contains a list of all the down-regulated genes based on the options selected. The ‘Selected’ option contains the list of genes selected from the main plots section. By clicking and dragging your mouse on the main plot within the ‘Main Plots’ tab, you will then be able to see that selection in list form within the ‘Selected’ option. The ‘Gene Set’ option allows you to filter out gene data based on genes selected via a text box. To create a gene set, simply type the names of the genes you wish to view in the text box on the left panel in a comma- seperated format. You can also use regular expressions in order to search for specific gene sets. The ‘Most Varied’ option, much like the original QC ‘Most Varied’ tab, allows you to view the list of most varied genes based on user input parameters on the left panel. The ‘Comparisons’ option allows you to view the differences between your different condition comparisons. Comparisons between datasets are shown if at least one of the conditional comparisons has passed the padj value or fold change cut off.
It is also important to note that comparisons with only one sample cannot create statistically significant p-adjust values. The more replicates you have within a condition, the greater the statistical significance of your comparisons.
Figure 23: Display of the up+down-regulated genes table.
Figure 24: Display of the down-regulated genes table.
Figure 25: Display of the geneset input box.
Figure 26: Display of the gene set search of the term ‘^al’.
Figure 27: Condition comparisons table within DEBrowser.
Lastly, the tables have a bunch of features that allow you to view your DESeq results more conviently. By clicking on a column header, you can sort the data within the table based either alphabetical or numeric sorting. You can also enter a term within the search box to search for a specific gene within the table.
With that, you’ve now successfully navigated the DEBrowser and are ready to start inserting your own data files and browsing your own experiments. Enjoy the DEBrowser!
Taking a look at the case study (Vernia S. et al 2014), Multiple heatmaps were created to display findings within the research. The heatmaps generated for the study were customized to a high level of specificity. However, using a sample dataset generated from this study, it is possible to recreate similar heatmaps displayed within the studies findings.
Figure 28: All detected genes heatmap using case study data.
Figure 29: Most varied genes heatmap using case study data.
The only main difference between the plots created within DEBrowser and the plots created within the research paper is that the clustering method used within the paper was a k-means method with k equalling 6.
Using the ‘advanced demo’ dataset we mentioned earlier, you too can recreate these tables using the same data. Browsing, changing parameters, and creating unique plots to view and analyze data can be a creative way to recreate the same analytical results produced. DEBrowser can be used in multiple ways to check the reproducibility of research results in a highly interactive format!