Abstract
This Quick-Start is a runnable example showing the functionalities of the SpliceWiz workflow. Version 1.0.4
SpliceWiz is a graphical interface for differential alternative splicing and visualization in R. It differs from other alternative splicing tools as it is designed for users with basic bioinformatic skills to analyze datasets containing up to hundreds of samples! SpliceWiz contains a number of innovations including:
This vignette is a runnable working example of the SpliceWiz workflow. The purpose is to quickly demonstrate the basic functionalities of SpliceWiz.
We provide here a brief outline of the workflow for users to get started as quickly as possible. However, we also provide more details for those wishing to know more. Many sections will contain extra information that can be displayed when clicked on, such as these:
Note: for all runnable examples, first load the SpliceWiz library:
What’s New: Novel splice detection (version 0.99.3+)
In version 0.99.3, SpliceWiz offers detection of novel events in addition to annotated events. How this works:
To reduce false positives in novel splicing detection, SpliceWiz provides several filters to reduce the number of novel junctions fed into the analysis:
novelSplicing_minSamples
parameternovelSplicing_countThreshold
) in a smaller number of samples (set using novelSplicing_minSamplesAboveThreshold
)novelSplicing_requireOneAnnotatedSJ = TRUE
)Novel ASE detection is integrated into the SpliceWiz pipeline at the collation step. After compilation and processing of novel junctions / TJ’s, the novel transcripts are appended to the transcript annotation, which is then used to re-construct the SpliceWiz reference. This reference is contained in the “Reference” subfolder of the output folder of collateData()
function.
TL/DR - how to enable novel ASE mode
novelSplicing = TRUE
when running collateData()
. For example:
# Usual pipeline:
ref_path <- file.path(tempdir(), "Reference")
buildRef(
reference_path = ref_path,
fasta = chrZ_genome(),
gtf = chrZ_gtf()
)
pb_path <- file.path(tempdir(), "pb_output")
processBAM(
bamfiles = bams$path,
sample_names = bams$sample,
reference_path = ref_path,
output_path = pb_path
)
# Modified pipeline - collateData with novel ASE discovery:
nxtse_path <- file.path(tempdir(), "NxtSE_output")
collateData(
Experiment = expr,
reference_path = ref_path,
output_path = nxtse_path,
## NEW ##
novelSplicing = TRUE,
# switches on novel splice detection
novelSplicing_requireOneAnnotatedSJ = TRUE,
# novel junctions must share one annotated splice site
novelSplicing_minSamples = 3,
# retain junctions observed in 3+ samples (of any non-zero expression)
novelSplicing_minSamplesAboveThreshold = 1,
# only 1 sample required if its junction count exceeds a set threshold
novelSplicing_countThreshold = 10
# threshold for previous parameter
)
What’s New: Visualising junction reads in coverage plots (version 0.99.4+)
In version 0.99.4, SpliceWiz visualises split/junction reads in individual samples and in sample groups
For individual sample coverage plots (i.e. when condition
is not set), junction counts for each sample are plotted. Samples with low junction counts (less than 0.01x of the track height) are omitted to reduce clutter.
For group-normalized coverage plots (where coverage of multiple samples in a condition group are combined), junctions are instead labeled by their “provisional PSIs”. These PSIs are calculated per junction (instead of per ASE). This is done by determining the ratio of junction counts as a proportion of all junction reads that share a common exon cluster as the junction being assessed.
TL/DR - how to enable junction plotting
plotJunctions = TRUE
from within plotCoverage()
# Retrieve example NxtSE object
se <- SpliceWiz_example_NxtSE()
# Assign annotation of the experimental conditions
colData(se)$treatment <- rep(c("A", "B"), each = 3)
# Return a list of ggplot and plotly objects, also plotting junction counts
p <- plotCoverage(
se = se,
Event = "SE:SRSF3-203-exon4;SRSF3-202-int3",
tracks = colnames(se)[1:4],
## NEW ##
plotJunctions = TRUE
)
#> Warning in geom_line(data = dfJn, aes_string(x = "x", y = "yarc", group = "junction", : Ignoring unknown aesthetics: label
#> Ignoring unknown aesthetics: label
#> Ignoring unknown aesthetics: label
#> Ignoring unknown aesthetics: label