Introduction

SpliceWiz is a graphical interface for differential alternative splicing and visualization in R. It differs from other alternative splicing tools as it is designed for users with basic bioinformatic skills to analyze datasets containing up to hundreds of samples! SpliceWiz contains a number of innovations including:

  • Super-fast handling of alignment BAM files using ompBAM, our developer resource for multi-threaded BAM processing,
  • Alternative splicing event filters, designed to remove unreliable measurements prior to differential analysis, which improves accuracy of reported results.
  • Group-averaged coverage plots: publication-ready figures to clearly visualize differential alternative splicing between biological / experimental conditions
  • Seamless storage and recall of sequencing coverage, using the COV format that stores strand-specific coverage typical of current RNA-seq protocols
  • Interactive figures, including scatter and volcano plots, heatmaps, and scrollable coverage plots, powered using the shinyDashboard interface

This vignette is a runnable working example of the SpliceWiz workflow. The purpose is to quickly demonstrate the basic functionalities of SpliceWiz.

We provide here a brief outline of the workflow for users to get started as quickly as possible. However, we also provide more details for those wishing to know more. Many sections will contain extra information that can be displayed when clicked on, such as these:

Click on me for more details
In most sections, we offer more details about each step of the workflow, that can be revealed in text segments like this one. Be sure to click on buttons like these, where available.


What’s New?

Note: for all runnable examples, first load the SpliceWiz library:

library(SpliceWiz)

What’s New: Novel splice detection (version 0.99.3+)
In version 0.99.3, SpliceWiz offers detection of novel events in addition to annotated events. How this works:

  • SpliceWiz compiles counts of all junctions (split reads) compatible with splice junction events (having compatible donor / acceptor splice motifs)
  • Additionally, it also uses “tandem junction” to identify novel exons (tandem junctions are reads that are split into 3 or more segments, arising from splicing of 2+ consecutive introns from a short-read sequence).

To reduce false positives in novel splicing detection, SpliceWiz provides several filters to reduce the number of novel junctions fed into the analysis:

  • Novel junctions that are lowly expressed (only in a small number of samples) are removed. The minimum number of samples required to retain a novel junction is set using novelSplicing_minSamples parameter
  • Alternately, junctions are retained if its expression exceeds a certain threshold (set using novelSplicing_countThreshold) in a smaller number of samples (set using novelSplicing_minSamplesAboveThreshold)
  • Further, novel junctions can be filtered by requiring at least one end to be an annotated splice site (this is enabled using novelSplicing_requireOneAnnotatedSJ = TRUE)

Novel ASE detection is integrated into the SpliceWiz pipeline at the collation step. After compilation and processing of novel junctions / TJ’s, the novel transcripts are appended to the transcript annotation, which is then used to re-construct the SpliceWiz reference. This reference is contained in the “Reference” subfolder of the output folder of collateData() function.

TL/DR - how to enable novel ASE mode

  • To enable analysis involving both annotated and novel ASEs, simply set novelSplicing = TRUE when running collateData(). For example:

# Usual pipeline:
ref_path <- file.path(tempdir(), "Reference")
buildRef(
    reference_path = ref_path,
    fasta = chrZ_genome(),
    gtf = chrZ_gtf()
)

pb_path <- file.path(tempdir(), "pb_output")
processBAM(
    bamfiles = bams$path,
    sample_names = bams$sample,
    reference_path = ref_path,
    output_path = pb_path
)

# Modified pipeline - collateData with novel ASE discovery:

nxtse_path <- file.path(tempdir(), "NxtSE_output")
collateData(
    Experiment = expr,
    reference_path = ref_path,
    output_path = nxtse_path,
    
        ## NEW ##
    novelSplicing = TRUE,
        # switches on novel splice detection
    
    novelSplicing_requireOneAnnotatedSJ = TRUE,
        # novel junctions must share one annotated splice site

    novelSplicing_minSamples = 3,
        # retain junctions observed in 3+ samples (of any non-zero expression)
    
    novelSplicing_minSamplesAboveThreshold = 1,
        # only 1 sample required if its junction count exceeds a set threshold
    novelSplicing_countThreshold = 10  
        # threshold for previous parameter
)


What’s New: Visualising junction reads in coverage plots (version 0.99.4+)
In version 0.99.4, SpliceWiz visualises split/junction reads in individual samples and in sample groups

For individual sample coverage plots (i.e. when condition is not set), junction counts for each sample are plotted. Samples with low junction counts (less than 0.01x of the track height) are omitted to reduce clutter.

For group-normalized coverage plots (where coverage of multiple samples in a condition group are combined), junctions are instead labeled by their “provisional PSIs”. These PSIs are calculated per junction (instead of per ASE). This is done by determining the ratio of junction counts as a proportion of all junction reads that share a common exon cluster as the junction being assessed.

TL/DR - how to enable junction plotting

  • To enable plotting of junctions, set plotJunctions = TRUE from within plotCoverage()
# Retrieve example NxtSE object
se <- SpliceWiz_example_NxtSE()

# Assign annotation of the experimental conditions
colData(se)$treatment <- rep(c("A", "B"), each = 3)

# Return a list of ggplot and plotly objects, also plotting junction counts
p <- plotCoverage(
    se = se,
    Event = "SE:SRSF3-203-exon4;SRSF3-202-int3",
    tracks = colnames(se)[1:4], 
    
        ## NEW ##
    plotJunctions = TRUE
)
#> Warning in geom_line(data = dfJn, aes_string(x = "x", y = "yarc", group = "junction", : Ignoring unknown aesthetics: label
#> Ignoring unknown aesthetics: label
#> Ignoring unknown aesthetics: label
#> Ignoring unknown aesthetics: label
if(interactive()) {
    # Display as plotly object
    p$final_plot
} else {
    # Display as ggplot
    as_ggplot_cov(p)
}