The stable version of this package is available on Bioconductor. You can install it by running the following:
if (!requireNamespace("BiocManager", quietly=TRUE))
install.packages("BiocManager")
BiocManager::install("vidger")
The latest developmental version of ViDGER can be installed via GitHub
using the devtools package:
if (!require("devtools")) install.packages("devtools")
devtools::install_github("btmonier/vidger", ref = "devel")
Once installed, you will have access to the following functions:
vsBoxplot()vsScatterPlot()vsScatterMatrix()vsDEGMatrix()vsMAPlot()vsMAMatrix()vsVolcano()vsVolcanoMatrix()vsFourWay()Further explanation will be given to how these functions work later on in the
documentation. For the following examples, three toy data sets will be used:
df.cuff, df.deseq, and df.edger. Each of these data sets reflect the
three RNA-seq analyses this package covers. These can be loaded in the R
workspace by using the following command:
data(<data_set>)
Where <data_set> is one of the previously mentioned data sets. Some of the
recurring elements that are found in each of these functions are the type
and d.factor arguments. The type argument tells the function how to
process the data for each analytical type (i.e. "cuffdiff", "deseq", or
"edger"). The d.factor argument is used specifically for DESeq2 objects
which we will discuss in the DESeq2 section. All other arguments are discussed
in further detail by looking at the respective help file for each functions
(i.e. ?vsScatterPlot).
As mentioned earlier, three toy data sets are included with this package. In addition to these data sets, 5 “real-world” data sets were also used. All real-world data used is currently unpublished from ongoing collaborations. Summaries of this data can be found in the following tables:
Table 1: An overview of the toy data sets included in this package. In this table, each data set is summarized in terms of what analytical software was used, organism ID, experimental layout (replicates and treatments), number of transcripts (IDs), and size of the data object in terms of megabytes (MB).
| Data | Software | Organism | Reps | Treat. | IDs | Size (MB) |
|---|---|---|---|---|---|---|
df.cuff |
CuffDiff | H | 2 | 3 | 1200 | 0.2 |
| sapiens | ||||||
df.deseq |
DESeq2 | D. | 2 | 3 | 29391 | 2.3 |
| melanogaster | ||||||
df.deseq |
edgeR | A. | 2 | 3 | 724 | 0.1 |
| thaliana |
Table 2: “Real-world” (RW) data set statistics. To test the reliability of our package, real data was used from human collections and several plant samples. Each data set is summarized in terms of organism ID, number of experimental samples (n), experimental conditions, and number of transcripts ( IDs).
| Data | Organism | n | Exp. Conditions | IDs |
|---|---|---|---|---|
| RW-1 | H. | 10 | Two treatment dosages taken at two | 198002 |
| sapiens | time points and one control sample | |||
| taken at one time point | ||||
| RW-2 | M. | 24 | Two phenotypes taken at four time | 63517 |
| domestia | points (three replicates each) | |||
| RW-3 | V. | 6 | Two conditions (three replicates | 59262 |
| ripria: | each). | |||
| bud | ||||
| RW-4 | V. | 6 | Two conditions (three replicates | 17962 |
| ripria: | each). | |||
| shoot-tip | ||||
| (7 days) | ||||
| RW-5 | V. | 6 | Two conditions (three replicates | 19064 |
| ripria: | each). | |||
| shoot-tip | ||||
| (21 days) |
Box plots are a useful way to determine the distribution of data. In this case
we can determine the distribution of FPKM or CPM values by using the
vsBoxPlot() function. This function allows you to extract necessary
results-based data from analytical objects to create a box plot comparing
\(log_{10}\) (FPKM or CPM) distributions for experimental treatments.
vsBoxPlot(
data = df.cuff, d.factor = NULL, type = 'cuffdiff', title = TRUE,
legend = TRUE, grid = TRUE
)
Figure 1: A box plot example using the vsBoxPlot() function with
cuffdiff data. In this example, FPKM distributions for each treatment within
an experiment are shown in the form of a box and whisker plot.
vsBoxPlot(
data = df.deseq, d.factor = 'condition', type = 'deseq',
title = TRUE, legend = TRUE, grid = TRUE
)
Figure 2: A box plot example using the vsBoxPlot() function with
DESeq2 data. In this example, FPKM distributions for each treatment within
an experiment are shown in the form of a box and whisker plot.
vsBoxPlot(
data = df.edger, d.factor = NULL, type = 'edger',
title = TRUE, legend = TRUE, grid = TRUE
)
Figure 3: A box plot example using the vsBoxPlot() function with edgeR
data. In this example, CPM distributions for each treatment within an
experiment are shown in the form of a box and whisker plot
vsBoxPlot() can allow for different iterations to showcase data
distribution. These changes can be implemented using the aes parameter.
Currently, there are 6 different variants:
box: standard box plotviolin: violin plotboxdot: box plot with dot plot overlayviodot: violin plot with dot plot overlayviosumm: violin plot with summary stats overlaynotch: box plot with notchbox variantdata("df.edger")
vsBoxPlot(
data = df.edger, d.factor = NULL, type = "edger", title = TRUE,
legend = TRUE, grid = TRUE, aes = "box"
)
Figure 4: A box plot example using the aes parameter: box
violin variantdata("df.edger")
vsBoxPlot(
data = df.edger, d.factor = NULL, type = "edger", title = TRUE,
legend = TRUE, grid = TRUE, aes = "violin"
)
Figure 5: A box plot example using the aes parameter: violin
boxdot variantdata("df.edger")
vsBoxPlot(
data = df.edger, d.factor = NULL, type = "edger", title = TRUE,
legend = TRUE, grid = TRUE, aes = "boxdot"
)
Figure 6: A box plot example using the aes parameter: boxdot
viodot variantdata("df.edger")
vsBoxPlot(
data = df.edger, d.factor = NULL, type = "edger", title = TRUE,
legend = TRUE, grid = TRUE, aes = "viodot"
)
Figure 7: A box plot example using the aes parameter: viodot
viosumm variantdata("df.edger")
vsBoxPlot(
data = df.edger, d.factor = NULL, type = "edger", title = TRUE,
legend = TRUE, grid = TRUE, aes = "viosumm"
)
Figure 8: A box plot example using the aes parameter: viosumm
notch variantdata("df.edger")
vsBoxPlot(
data = df.edger, d.factor = NULL, type = "edger", title = TRUE,
legend = TRUE, grid = TRUE, aes = "notch"
)
Figure 9: A box plot example using the aes parameter: notch
In addition to aesthetic changes, the fill color of each variant can
also be changed. This can be implemented by modifiying the fill.color
parameter.
The palettes that can be used for this parameter are based off of the
palettes found in the RColorBrewer
package. A visual list of all the palettes can be found
here.
data("df.edger")
vsBoxPlot(
data = df.edger, d.factor = NULL, type = "edger", title = TRUE,
legend = TRUE, grid = TRUE, aes = "box", fill.color = "RdGy"
)
Figure 10: Color variant 1
A box plot example using the fill.color
parameter: RdGy.
data("df.edger")
vsBoxPlot(
data = df.edger, d.factor = NULL, type = "edger", title = TRUE,
legend = TRUE, grid = TRUE, aes = "viosumm", fill.color = "Paired"
)
Figure 11: Color variant 2
A violin plot example using the fill.color
parameter: Paired with the aes parameter: viosumm.
data("df.edger")
vsBoxPlot(
data = df.edger, d.factor = NULL, type = "edger", title = TRUE,
legend = TRUE, grid = TRUE, aes = "notch", fill.color = "Greys"
)
Figure 12: Color variant 3
A notched box plot example using the fill.color
parameter: Greys with the aes parameter: notch. Using these parameters,
we can also generate grey-scale plots.
This example will look at a basic scatter plot function, vsScatterPlot().
This function allows you to visualize comparisons of \(log_{10}\) values of
either FPKM or CPM measurements of two treatments depending on analytical type.
vsScatterPlot(
x = 'hESC', y = 'iPS', data = df.cuff, type = 'cuffdiff',
d.factor = NULL, title = TRUE, grid = TRUE
)
Figure 13: A scatterplot example using the vsScatterPlot() function with
Cuffdiff data. In this visualization, \(log_{10}\) comparisons are made of
fragments per kilobase of transcript per million mapped reads (FPKM)
measurments. The dashed line represents regression line for the comparison.
vsScatterPlot(
x = 'treated_paired.end', y = 'untreated_paired.end',
data = df.deseq, type = 'deseq', d.factor = 'condition',
title = TRUE, grid = TRUE
)
Figure 14: A scatterplot example using the vsScatterPlot() function with
DESeq2 data. In this visualization, \(log_{10}\) comparisons are made of
fragments per kilobase of transcript per million mapped reads (FPKM)
measurments. The dashed line represents regression line for the comparison.
vsScatterPlot(
x = 'WM', y = 'MM', data = df.edger, type = 'edger',
d.factor = NULL, title = TRUE, grid = TRUE
)
Figure 15: A scatterplot example using the vsScatterPlot() function with
edgeR data. In this visualization, \(log_{10}\) comparisons are made of
fragments per kilobase of transcript per million mapped reads (FPKM)
measurments. The dashed line represents regression line for the comparison.
This example will look at an extension of the vsScatterPlot() function which
is vsScatterMatrix(). This function will create a matrix of all possible
comparisons of treatments within an experiment with additional info.
vsScatterMatrix(
data = df.cuff, d.factor = NULL, type = 'cuffdiff',
comp = NULL, title = TRUE, grid = TRUE, man.title = NULL
)
Figure 16: A scatterplot matrix example using the vsScatterMatrix()
function with Cuffdiff data. Similar to the scatterplot function, this
visualization allows for all comparisons to be made within an experiment. In
addition to the scatterplot visuals, FPKM distributions (histograms) and
correlation (Corr) values are generated.
vsScatterMatrix(
data = df.deseq, d.factor = 'condition', type = 'deseq',
comp = NULL, title = TRUE, grid = TRUE, man.title = NULL
)
Figure 17: A scatterplot matrix example using the vsScatterMatrix()
function with DESeq2 data. Similar to the scatterplot function, this
visualization allows for all comparisons to be made within an experiment. In
addition to the scatterplot visuals, FPKM distributions (histograms) and
correlation (Corr) values are generated.
vsScatterMatrix(
data = df.edger, d.factor = NULL, type = 'edger', comp = NULL,
title = TRUE, grid = TRUE, man.title = NULL
)
Figure 18: A scatterplot matrix example using the vsScatterMatrix()
function with edgeR data. Similar to the scatterplot function, this
visualization allows for all comparisons to be made within an experiment. In
addition to the scatterplot visuals, FPKM distributions (histograms) and
correlation (Corr) values are generated.
Using the vsDEGMatrix() function allows the user to visualize the number of
differentially expressed genes (DEGs) at a given adjusted p-value (padj =
) for each experimental treatment level. Higher color intensity correlates to
a higher number of DEGs.
vsDEGMatrix(
data = df.cuff, padj = 0.05, d.factor = NULL, type = 'cuffdiff',
title = TRUE, legend = TRUE, grid = TRUE
)
Figure 19: A matrix of differentially expressed genes (DEGs) at a given
p-value using the vsDEGMatrix() function with Cuffdiff data. With this
function, the user is able to visualize the number of DEGs ata given adjusted
p-value for each experimental treatment level. Higher color intensity
correlates to a higher number of DEGs.
vsDEGMatrix(
data = df.deseq, padj = 0.05, d.factor = 'condition',
type = 'deseq', title = TRUE, legend = TRUE, grid = TRUE
)
Figure 20: A matrix of differentially expressed genes (DEGs) at a given
p-value using the vsDEGMatrix() function with DESeq2 data. With this
function, the user is able to visualize the number of DEGs ata given adjusted
p-value for each experimental treatment level. Higher color intensity
correlates to a higher number of DEGs.
vsDEGMatrix(
data = df.edger, padj = 0.05, d.factor = NULL, type = 'edger',
title = TRUE, legend = TRUE, grid = TRUE
)
Figure 21: A matrix of differentially expressed genes (DEGs) at a given
p-value using the vsDEGMatrix() function with edgeR data. With this
function, the user is able to visualize the number of DEGs ata given adjusted
p-value for each experimental treatment level. Higher color intensity
correlates to a higher number of DEGs.
A grey-scale option is available for this function if you wish to use a
grey-to-white gradient instead of the classic blue-to-white gradient. This
can be invoked by setting the grey.scale parameter to TRUE.
vsDEGMatrix(data = df.deseq, d.factor = "condition", type = "deseq",
grey.scale = TRUE
)
vsMAPlot() visualizes the variance between two samples in terms of gene
expression values where logarithmic fold changes of count data are plotted
against mean counts. For more information on how each of the aesthetics are
plotted, please refer to the figure captions and Method S1.
vsMAPlot(
x = 'iPS', y = 'hESC', data = df.cuff, d.factor = NULL,
type = 'cuffdiff', padj = 0.05, y.lim = NULL, lfc = NULL,
title = TRUE, legend = TRUE, grid = TRUE
)
Figure 22: MA plot visualization using the vsMAPLot() function with
Cuffdiff data. LFCs are plotted mean counts to determine the variance
between two treatments in terms of gene expression. Blue nodes on the graph
represent statistically significant LFCs which are greater than a given value
than a user-defined LFC parameter. Green nodes indicate statistically
significant LFCs which are less than the user-defined LFC parameter. Gray
nodes are data points that are not statistically significant. Numerical values
in parantheses for each legend color indicate the number of transcripts that
meet the prior conditions. Triangular shapes represent values that exceed the
viewing area of the graph. Node size changes represent the magnitude of the
LFC values (i.e. larger shapes reflect larger LFC values). Dashed lines
indicate user-defined LFC values.
vsMAPlot(
x = 'treated_paired.end', y = 'untreated_paired.end',
data = df.deseq, d.factor = 'condition', type = 'deseq',
padj = 0.05, y.lim = NULL, lfc = NULL, title = TRUE,
legend = TRUE, grid = TRUE
)
Figure 23: MA plot visualization using the vsMAPLot() function with
DESeq2 data. LFCs are plotted mean counts to determine the variance between
two treatments in terms of gene expression. Blue nodes on the graph represent
statistically significant LFCs which are greater than a given value than a
user-defined LFC parameter. Green nodes indicate statistically significant
LFCs which are less than the user-defined LFC parameter. Gray nodes are data
points that are not statistically significant. Numerical values in parantheses
for each legend color indicate the number of transcripts that meet the prior
conditions. Triangular shapes represent values that exceed the viewing area of
the graph. Node size changes represent the magnitude of the LFC values (i.e.
larger shapes reflect larger LFC values). Dashed lines indicate user-defined
LFC values.
vsMAPlot(
x = 'WW', y = 'MM', data = df.edger, d.factor = NULL,
type = 'edger', padj = 0.05, y.lim = NULL, lfc = NULL,
title = TRUE, legend = TRUE, grid = TRUE
)
Figure 24: MA plot visualization using the vsMAPLot() function with
edgeR data. LFCs are plotted mean counts to determine the variance between
two treatments in terms of gene expression. Blue nodes on the graph represent
statistically significant LFCs which are greater than a given value than a
user-defined LFC parameter. Green nodes indicate statistically significant
LFCs which are less than the user-defined LFC parameter. Gray nodes are data
points that are not statistically significant. Numerical values in parantheses
for each legend color indicate the number of transcripts that meet the prior
conditions. Triangular shapes represent values that exceed the viewing area of
the graph. Node size changes represent the magnitude of the LFC values (i.e.
larger shapes reflect larger LFC values). Dashed lines indicate user-defined
LFC values.
Similar to a scatter plot matrix, vsMAMatrix() will produce visualizations
for all comparisons within your data set. For more information on how the
aesthetics are plotted in these visualizations, please refer to the figure
caption and Method S1.
vsMAMatrix(
data = df.cuff, d.factor = NULL, type = 'cuffdiff',
padj = 0.05, y.lim = NULL, lfc = 1, title = TRUE,
grid = TRUE, counts = TRUE, data.return = FALSE
)
Figure 25: A MA plot matrix using the vsMAMatrix() function with Cuffdiff
data. Similar to the vsMAPlot() function, vsMAMatrix() will generate a
matrix of MA plots for all comparisons within an experiment. LFCs are plotted
mean counts to determine the variance between two treatments in terms of gene
expression. Blue nodes on the graph represent statistically significant LFCs
which are greater than a given value than a user-defined LFC parameter. Green
nodes indicate statistically significant LFCs which are less than the
user-defined LFC parameter. Gray nodes are data points that are not
statistically significant. Numerical values in parantheses for each legend
color indicate the number of transcripts that meet the prior conditions.
Triangular shapes represent values that exceed the viewing area of the graph.
Node size changes represent the magnitude of the LFC values (i.e. larger
shapes reflect larger LFC values). Dashed lines indicate user-defined LFC
values.
vsMAMatrix(
data = df.deseq, d.factor = 'condition', type = 'deseq',
padj = 0.05, y.lim = NULL, lfc = 1, title = TRUE,
grid = TRUE, counts = TRUE, data.return = FALSE
)
Figure 26: A MA plot matrix using the vsMAMatrix() function with DESeq2
data. Similar to the vsMAPlot() function, vsMAMatrix() will generate a
matrix of MA plots for all comparisons within an experiment. LFCs are plotted
mean counts to determine the variance between two treatments in terms of gene
expression. Blue nodes on the graph represent statistically significant LFCs
which are greater than a given value than a user-defined LFC parameter. Green
nodes indicate statistically significant LFCs which are less than the
user-defined LFC parameter. Gray nodes are data points that are not
statistically significant. Numerical values in parantheses for each legend
color indicate the number of transcripts that meet the prior conditions.
Triangular shapes represent values that exceed the viewing area of the graph.
Node size changes represent the magnitude of the LFC values (i.e. larger
shapes reflect larger LFC values). Dashed lines indicate user-defined LFC
values.
vsMAMatrix(
data = df.edger, d.factor = NULL, type = 'edger',
padj = 0.05, y.lim = NULL, lfc = 1, title = TRUE,
grid = TRUE, counts = TRUE, data.return = FALSE
)
Figure 27: A MA plot matrix using the vsMAMatrix() function with edgeR
data. Similar to the vsMAPlot() function, vsMAMatrix() will generate a
matrix of MA plots for all comparisons within an experiment. LFCs are plotted
mean counts to determine the variance between two treatments in terms of gene
expression. Blue nodes on the graph represent statistically significant LFCs
which are greater than a given value than a user-defined LFC parameter. Green
nodes indicate statistically significant LFCs which are less than the
user-defined LFC parameter. Gray nodes are data points that are not
statistically significant. Numerical values in parantheses for each legend
color indicate the number of transcripts that meet the prior conditions.
Triangular shapes represent values that exceed the viewing area of the graph.
Node size changes represent the magnitude of the LFC values (i.e. larger
shapes reflect larger LFC values). Dashed lines indicate user-defined LFC
values.
The next few visualizations will focus on ways to display differential gene
expression between two or more treatments. Volcano plots visualize the variance
between two samples in terms of gene expression values where the \(-log_{10}\) of
calculated p-values (y-axis) are a plotted against the \(log_2\) changes
(x-axis). These plots can be visualized with the vsVolcano() function.
For more information on how each of the aesthetics are plotted, please refer
to the figure captions and Method S1.
vsVolcano(
x = 'iPS', y = 'hESC', data = df.cuff, d.factor = NULL,
type = 'cuffdiff', padj = 0.05, x.lim = NULL, lfc = NULL,
title = TRUE, legend = TRUE, grid = TRUE, data.return = FALSE
)
Figure 28: A volcano plot example using the vsVolcano() function with
Cuffdiff data. In this visualization, comparisons are made between the
\(-log_{10}\) p-value versus the \(log_2\) fold change (LFC) between two
treatments. Blue nodes on the graph represent statistically significant LFCs
which are greater than a given value than a user-defined LFC parameter. Green
nodes indicate statistically significant LFCs which are less than the
user-defined LFC parameter. Gray nodes are data points that are not
statistically significant. Numerical values in parantheses for each legend
color indicate the number of transcripts that meet the prior conditions. Left
and right brackets (< and >) represent values that exceed the viewing area of
the graph. Node size changes represent the magnitude of the LFC values (i.e.
larger shapes reflect larger LFC values). Vertical and horizontal lines
indicate user-defined LFC and adjusted p-values, respectively.
vsVolcano(
x = 'treated_paired.end', y = 'untreated_paired.end',
data = df.deseq, d.factor = 'condition', type = 'deseq',
padj = 0.05, x.lim = NULL, lfc = NULL, title = TRUE,
legend = TRUE, grid = TRUE, data.return = FALSE
)
Figure 29: A volcano plot example using the vsVolcano() function with
DESeq2 data. In this visualization, comparisons are made between the
\(-log_{10}\) p-value versus the \(log_2\) fold change (LFC) between two
treatments. Blue nodes on the graph represent statistically significant LFCs
which are greater than a given value than a user-defined LFC parameter. Green
nodes indicate statistically significant LFCs which are less than the
user-defined LFC parameter. Gray nodes are data points that are not
statistically significant. Numerical values in parantheses for each legend
color indicate the number of transcripts that meet the prior conditions. Left
and right brackets (< and >) represent values that exceed the viewing area of
the graph. Node size changes represent the magnitude of the LFC values (i.e.
larger shapes reflect larger LFC values). Vertical and horizontal lines
indicate user-defined LFC and adjusted p-values, respectively.
vsVolcano(
x = 'WW', y = 'MM', data = df.edger, d.factor = NULL,
type = 'edger', padj = 0.05, x.lim = NULL, lfc = NULL,
title = TRUE, legend = TRUE, grid = TRUE, data.return = FALSE
)
Figure 30: A volcano plot example using the vsVolcano() function with
edgeR data. In this visualization, comparisons are made between the
\(-log_{10}\) p-value versus the \(log_2\) fold change (LFC) between two
treatments. Blue nodes on the graph represent statistically significant LFCs
which are greater than a given value than a user-defined LFC parameter. Green
nodes indicate statistically significant LFCs which are less than the
user-defined LFC parameter. Gray nodes are data points that are not
statistically significant. Numerical values in parantheses for each legend
color indicate the number of transcripts that meet the prior conditions. Left
and right brackets (< and >) represent values that exceed the viewing area of
the graph. Node size changes represent the magnitude of the LFC values (i.e.
larger shapes reflect larger LFC values). Vertical and horizontal lines
indicate user-defined LFC and adjusted p-values, respectively.
Similar to the prior matrix functions, vsVolcanoMatrix() will produce
visualizations for all comparisons within your data set. For more information
on how the aesthetics are plotted in these visualizations, please refer to the
figure caption and Method S1.
vsVolcanoMatrix(
data = df.cuff, d.factor = NULL, type = 'cuffdiff',
padj = 0.05, x.lim = NULL, lfc = NULL, title = TRUE,
legend = TRUE, grid = TRUE, counts = TRUE
)
Figure 31: A volcano plot matrix using the vsVolcanoMatrix() function with
Cuffdiff data. Similar to the vsVolcano() function, vsVolcanoMatrix()
will generate a matrix of volcano plots for all comparisons within an
experiment. Comparisons are made between the \(-log_{10}\) p-value versus the
\(log_2\) fold change (LFC) between two treatments. Blue nodes on the graph
represent statistically significant LFCs which are greater than a given value
than a user-defined LFC parameter. Green nodes indicate statistically
significant LFCs which are less than the user-defined LFC parameter. Gray
nodes are data points that are not statistically significant. The blue and
green numbers in each facet represent the number of transcripts that meet the
criteria for blue and green nodes in each comparison. Left and right brackets
(< and >) represent values that exceed the viewing area of the graph. Node
size changes represent the magnitude of the LFC values (i.e. larger shapes
reflect larger LFC values). Vertical and horizontal lines indicate
user-defined LFC and adjusted p-values, respectively.
vsVolcanoMatrix(
data = df.deseq, d.factor = 'condition', type = 'deseq',
padj = 0.05, x.lim = NULL, lfc = NULL, title = TRUE,
legend = TRUE, grid = TRUE, counts = TRUE
)
Figure 32: A volcano plot matrix using the vsVolcanoMatrix() function with
DESeq2 data. Similar to the vsVolcano() function, vsVolcanoMatrix()
will generate a matrix of volcano plots for all comparisons within an
experiment. Comparisons are made between the \(-log_{10}\) p-value versus the
\(log_2\) fold change (LFC) between two treatments. Blue nodes on the graph
represent statistically significant LFCs which are greater than a given value
than a user-defined LFC parameter. Green nodes indicate statistically
significant LFCs which are less than the user-defined LFC parameter. Gray
nodes are data points that are not statistically significant. The blue and
green numbers in each facet represent the number of transcripts that meet the
criteria for blue and green nodes in each comparison. Left and right brackets
(< and >) represent values that exceed the viewing area of the graph. Node
size changes represent the magnitude of the LFC values (i.e. larger shapes
reflect larger LFC values). Vertical and horizontal lines indicate
user-defined LFC and adjusted p-values, respectively.
vsVolcanoMatrix(
data = df.edger, d.factor = NULL, type = 'edger', padj = 0.05,
x.lim = NULL, lfc = NULL, title = TRUE, legend = TRUE,
grid = TRUE, counts = TRUE
)
Figure 33: A volcano plot matrix using the vsVolcanoMatrix() function with
edgeR data. Similar to the vsVolcano() function, vsVolcanoMatrix()
will generate a matrix of volcano plots for all comparisons within an
experiment. Comparisons are made between the \(-log_{10}\) p-value versus the
\(log_2\) fold change (LFC) between two treatments. Blue nodes on the graph
represent statistically significant LFCs which are greater than a given value
than a user-defined LFC parameter. Green nodes indicate statistically
significant LFCs which are less than the user-defined LFC parameter. Gray
nodes are data points that are not statistically significant. The blue and
green numbers in each facet represent the number of transcripts that meet the
criteria for blue and green nodes in each comparison. Left and right brackets
(< and >) represent values that exceed the viewing area of the graph. Node
size changes represent the magnitude of the LFC values (i.e. larger shapes
reflect larger LFC values). Vertical and horizontal lines indicate
user-defined LFC and adjusted p-values, respectively.
To create four-way plots, the function, vsFourWay() is used. This plot
compares the \(log_2\) fold changes between two samples and a ‘control’. For more
information on how each of the aesthetics are plotted, please refer to the
figure captions and Method S1.
vsFourWay(
x = 'iPS', y = 'hESC', control = 'Fibroblasts', data = df.cuff,
d.factor = NULL, type = 'cuffdiff', padj = 0.05, x.lim = NULL,
y.lim = NULL, lfc = NULL, legend = TRUE, title = TRUE, grid = TRUE
)
Figure 34: A four way plot visualization using the vsFourWay() function with
Cuffdiff data. In this example, LFCs comparisons between two treatments and
a control are made. Blue nodes indicate statistically significant LFCs which
are greater than a given user-defined value for both x and y-axes. Green nodes
reflect statistically significant LFCs which are less than a user-defined
value for treatment y and greater than said value for treatment x. Similar to
green nodes, red nodes reflect statistically significant LFCs which are
greater than a user-defined vlaue treatment y and less than said value for
treatment x. Gray nodes are data points that are not statistically significant
for both x and y-axes. Triangular shapes indicate values which exceed the
viewing are for the graph. Size change reflects the magnitude of LFC values (
i.e. larger shapes reflect larger LFC values). Vertical and horizontal dashed
lines indicate user-defined LFC values.
vsFourWay(
x = 'treated_paired.end', y = 'untreated_single.read',
control = 'untreated_paired.end', data = df.deseq,
d.factor = 'condition', type = 'deseq', padj = 0.05, x.lim = NULL,
y.lim = NULL, lfc = NULL, legend = TRUE, title = TRUE, grid = TRUE
)
Figure 35: A four way plot visualization using the vsFourWay() function with
DESeq2 data. In this example, LFCs comparisons between two treatments and a
control are made. Blue nodes indicate statistically significant LFCs which are
greater than a given user-defined value for both x and y-axes. Green nodes
reflect statistically significant LFCs which are less than a user-defined
value for treatment y and greater than said value for treatment x. Similar to
green nodes, red nodes reflect statistically significant LFCs which are
greater than a user-defined vlaue treatment y and less than said value for
treatment x. Gray nodes are data points that are not statistically significant
for both x and y-axes. Triangular shapes indicate values which exceed the
viewing are for the graph. Size change reflects the magnitude of LFC values (
i.e. larger shapes reflect larger LFC values). Vertical and horizontal dashed
lines indicate user-defined LFC values.
vsFourWay(
x = 'WW', y = 'WM', control = 'MM', data = df.edger,
d.factor = NULL, type = 'edger', padj = 0.05, x.lim = NULL,
y.lim = NULL, lfc = NULL, legend = TRUE, title = TRUE, grid = TRUE
)
Figure 36: A four way plot visualization using the vsFourWay() function with
DESeq2 data. In this example, LFCs comparisons between two treatments and a
control are made. Blue nodes indicate statistically significant LFCs which are
greater than a given user-defined value for both x and y-axes. Green nodes
reflect statistically significant LFCs which are less than a user-defined
value for treatment y and greater than said value for treatment x. Similar to
green nodes, red nodes reflect statistically significant LFCs which are
greater than a user-defined vlaue treatment y and less than said value for
treatment x. Gray nodes are data points that are not statistically significant
for both x and y-axes. Triangular shapes indicate values which exceed the
viewing are for the graph. Size change reflects the magnitude of LFC values (
i.e. larger shapes reflect larger LFC values). Vertical and horizontal dashed
lines indicate user-defined LFC values.
For point-based plots, users can highlight IDs of interest (i.e. genes, transcripts, etc.). Currently, this functionality is implemented in the following functions:
vsScatterPlot()vsMAPlot()vsVolcano()vsFourWay()To use this feature, simply provide a vector of specified IDs to the
highlight parameter found in the prior functions. An example of a typical
vector would be as follows:
important_ids <- c(
"ID_001",
"ID_002",
"ID_003",
"ID_004",
"ID_005"
)
important_ids
## [1] "ID_001" "ID_002" "ID_003" "ID_004" "ID_005"
For specific examples using the toy data set, please see the proceeding 4 sub-sections.
vsScatterPlot()data("df.cuff")
hl <- c(
"XLOC_000033",
"XLOC_000099",
"XLOC_001414",
"XLOC_001409"
)
vsScatterPlot(
x = "hESC", y = "iPS", data = df.cuff, d.factor = NULL,
type = "cuffdiff", title = TRUE, grid = TRUE, highlight = hl
)
Figure 37: Highlighting with vsScatterPlot()
IDs of interest can be
identified within basic scatter plots. When highlighted, non-important points
will turn grey while highlighted points will turn blue. Text tags will try
to optimize their location within the graph without trying to overlap each
other.
vsMAPlot()hl <- c(
"FBgn0022201",
"FBgn0003042",
"FBgn0031957",
"FBgn0033853",
"FBgn0003371"
)
vsMAPlot(
x = "treated_paired.end", y = "untreated_paired.end",
data = df.deseq, d.factor = "condition", type = "deseq",
padj = 0.05, y.lim = NULL, lfc = NULL, title = TRUE,
legend = TRUE, grid = TRUE, data.return = FALSE, highlight = hl
)