psichomics tutorial: command-line interface (CLI)

Nuno Saraiva-Agostinho

29 August 2017


psichomics is an interactive R package for integrative analyses of alternative splicing using data from The Cancer Genome Atlas (TCGA) (containing molecular data associated with 34 tumour types) and from the Genotype-Tissue Expression (GTEx) project (containing data for multiple normal human tissues). The data leveraged from these projects includes clinical information and transcriptomic data, such as the quantification of RNA-Seq reads aligning to splice junctions (henceforth called junction quantification) and exons.

Installing and starting the program

Install psichomics by typing the following in an R console (the R environment is required):

## try http:// if https:// URLs are not supported
source("https://bioconductor.org/biocLite.R")
biocLite("psichomics")

After the installation, load psichomics by typing:

library(psichomics)

Available functions: quick reference

Data retrieval

TCGA/Firebrowse

  • getDownloadsFolder
  • isFirebrowseUp
  • getFirebrowseCohorts
  • getFirebrowseDataTypes
  • getFirebrowseDates
  • loadFirebrowseData
  • parseTcgaSampleInfo
  • parseSampleGroups

GTEx

  • getGtexTissues
  • loadGtexData

Custom files

  • loadLocalFiles

Gene expression pre-processing

PSI quantification*getSplicingEventTypes*listSplicingAnnotations*loadAnnotation*quantifySplicing`

Preparing custom annotation

  • prepareAnnotationFromEvents
  • parseMatsAnnotation
  • parseMisoAnnotation
  • parseSuppaAnnotation
  • parseVastToolsAnnotation

Data Grouping

Dimensionality reduction

PCA

  • performPCA
  • plotPCA
  • plotVariance

ICA

  • performICA
  • plotICA

Survival analysis

Differential analyses

Gene expression and alternative splicing correlation

Annotation retrieval

Misc

Retrieving data

The quantification of each alternative splicing event is based on the proportion of junction reads that support the inclusion isoform, known as percent spliced-in or PSI (E. T. Wang et al. 2008).

To estimate this value for each splicing event, both alternative splicing annotation and junction quantification are required. While alternative splicing annotation is provided by the package, junction quantification may be retrieved from TCGA.

Download and load data from TCGA/Firebrowse

Data is downloaded from Firebrowse, a service that hosts proccessed data from TCGA as required to run the downstream analyses. Before downloading data, check the following available options:

# Available tumour types
cohorts <- getFirebrowseCohorts()

# Available sample dates
date <- getFirebrowseDates()

# Available data types
dataTypes <- getFirebrowseDataTypes()

After deciding on the options to use, download and load data of interest using:

# Set download folder
folder <- getDownloadsFolder()

# Download and load most recent junction quantification and clinical data from
# TCGA/Firebrowse for Breast Cancer
data <- loadFirebrowseData(folder=folder,
                           cohort="BRCA",
                           data=c("clinical", "junction_quantification",
                                  "RSEM_genes"),
                           date="2016-01-28")

# Select clinical and junction quantification dataset
clinical      <- data[[1]]$`Clinical data`
sampleInfo    <- data[[1]]$`Sample metadata`
junctionQuant <- data[[1]]$`Junction quantification (Illumina HiSeq)`
geneExpr      <- data[[1]]$`Gene expression`

Data is only downloaded if the files are not present in the given folder. In other words, if the files were already downloaded, the function will just load the files, so it is possible to reuse the code above just to load the requested files.

Load GTEx files

First, GTEx data needs to be downloaded from the GTEx Portal. Afterwards, load GTEx data (subject phenotype, sample attributes and junction quantification for given tissues) by following these commands:

# Replace with the correct path to these files
subjects <- "~/Downloads/GTEx_Data_V6_Annotations_SubjectPhenotypesDS.txt"
sampleAttr <- "~/Downloads/GTEx_Data_V6_Annotations_SampleAttributesDS.txt"
junctionQuant <- "~/Downloads/GTEx_junction_reads.txt"

# Check GTEx tissues available based on the sample attributes
getGtexTissues(sampleAttr)

tissues <- c("blood", "brain")
gtex <- loadGtexData(subjects, sampleAttr, junctionQuant, tissues)

If you desire to load junction quantification for all tissues, you can also do it using the following commands:

tissues <- NULL
gtex <- loadGtexData(subjectPhenotype, sampleAttr, junctionQuant, tissues)

Load unspecified local files

To load local files instead, indicate the folder of interest. Any files located in this folder and sub-folders will be attempted to load. To avoid errors during this process, files of interest should be put in a dedicated folder.

For instance, to load GTEx files in this way, create a directory called GTEx, put all files of interest inside that folder and follow these commands:

folder <- "~/Downloads/GTEx/"
ignore <- c(".aux.", ".mage-tab.")
data <- loadLocalFiles(folder, ignore=ignore)

# Select clinical and junction quantification dataset
clinical      <- data[[1]]$`Clinical data`
sampleInfo    <- data[[1]]$`Sample metadata`
junctionQuant <- data[[1]]$`Junction quantification (Illumina HiSeq)`

Filtering and normalising gene expression

As this package is not focused on gene expression analysis, we suggest to read the RNA-seq section of limma’s user guide. Nevertheless, we present the following commands to quickly filter and normalise gene expression:

# Check genes where min. counts are available in at least N samples and filter
# out genes with mean expression and variance of 0
checkCounts <- rowSums(geneExpr >= 10) >= 10
filter <- rowMeans(geneExpr) > 0 & rowVars(geneExpr) > 0 & checkCounts
geneExprFiltered <- geneExpr[filter, ]

# Normalise gene expression and perform log2-transformation
geneExprNorm <- normaliseGeneExpression(geneExprFiltered, log2transform=TRUE)

Quantifying alternative splicing

As previously mentioned, alternative splicing is quantified from the previously loaded junction quantification and an alternative splicing annotation file. To check current annotation files available:

# Available alternative splicing annotation
annotList <- listSplicingAnnotations()
annotList
##                        Human hg19/GRCh37 (2017-10-20) 
## "annotationHub_alternativeSplicingEvents.hg19_V2.rda" 
##                        Human hg19/GRCh37 (2016-10-11) 
##    "annotationHub_alternativeSplicingEvents.hg19.rda" 
##                               Human hg38 (2017-10-20) 
##    "annotationHub_alternativeSplicingEvents.hg38.rda"

Custom alternative splicing annotation obtained from SUPPA, rMATS, VAST-TOOLS or MISO can also be provided. Note that SUPPA and rMATS are able to create their splicing annotation based on transcript annotation. Read more here.

To quantify alternative splicing, first select the junction quantification, alternative splicing annotation and alternative splicing event type(s) of interest:

# Load Human (hg19/GRCh37 assembly) annotation
human <- listSplicingAnnotations()[[1]]
annotation <- loadAnnotation(human)
# Available alternative splicing event types (skipped exon, alternative 
# first/last exon, mutually exclusive exons, etc.)
eventType <- getSplicingEventTypes()
eventType
##               Skipped exon    Mutually exclusive exon 
##                       "SE"                      "MXE" 
## Alternative 5' splice site Alternative 3' splice site 
##                     "A5SS"                     "A3SS" 
##     Alternative first exon      Alternative last exon 
##                      "AFE"                      "ALE"
eventType <- "SE"

Afterwards, quantify alternative splicing using the previously defined parameters:

# Discard alternative splicing quantified using few reads
minReads <- 10

psi <- quantifySplicing(annotation, junctionQuant, eventType=eventType, 
                        minReads=minReads)
# Check the identifier of the splicing events in the resulting table
events <- rownames(psi)
head(events)
## [1] "SE_1_-_2125078_2124414_2124284_2121220_C1orf86"        
## [2] "SE_1_+_46033849_46034157_46034356_46034599_AKR1A1"     
## [3] "SE_1_-_113249700_113247790_113247722_113246428_RHOC"   
## [4] "SE_1_-_113249700_113248874_113247722_113246428_RHOC"   
## [5] "SE_1_-_154186933_154186422_154186369_154185100_C1orf43"
## [6] "SE_1_+_154241430_154241838_154241888_154242676_UBAP2L"

Note that the event identifier (for instance, SE_1_-_2125078_2124414_2124284_2121220_C1orf86) is composed of:

WARNING: all the examples shown below are performed using a small, but representative subset of the data available.

Survival analysis

Survival data can be analysed based on clinical attributes, for instance, by tumour stage and patient gender, using time to death as the follow-up time and death as the event of interest.

Around 80% of breast cancers have cells that express estrogen receptors (ER) and require estrogen binding to grow. Estrogen receptors may be blocked by tamoxifen and other drugs. These drugs are thus used as treatment or prevention for ER-positive breast cancers.

To compare the overall survival of ER-positive patients treated with and without tamoxifen, the following groups will be created:

# Get available groups in clinical data
cols <- colnames(clinical)

# Check the name from which to retrieve groups of interest
er_expr <- grep("estrogen_receptor_status", cols, value=TRUE)[1]
er_expr <- createGroupByAttribute(col=er_expr, dataset=clinical)

# Discard values other than "positive" and "negative" for ER expression
er_expr <- er_expr[c("positive", "negative")]

############################################################################
# Now the same for patients treated with tamoxifen. However, TCGA presents #
# data for the administred drugs spread through multiple columns, so...    #
############################################################################

# Look for the appropriate columns with the drugs administred
drug_name <- grep("drug_name", cols, value=TRUE)[1:23]

# Using the previous columns, look for either "tamoxifen" or "tamoxiphen"
tamoxifen <- lapply(drug_name, function(i) {
    rows <- grep("tamoxi.*", clinical[ , i])
    rownames(clinical)[rows]
})

# Collect all previous results and create a single list named tamoxifen
tamoxifen <- sort(unique(unlist(tamoxifen)))
tamoxifen <- list("tamoxifen"=tamoxifen)

# Combine all previously created groups
groups <- c(er_expr, tamoxifen)

# Assign each patient to a group
patients <- rownames(clinical)
g <- groupPerElem(groups[c("tamoxifen", "positive")], patients)

# Append the created groups to the clinical dataset
cl <- cbind(clinical, groups=g)

Before continuing, be sure you undestand how to use formulas in R.

help(formula)

To plot the survival curves:

# Create the right-hand side of a formula
formulaStr <- "groups"

# Events are retrieved based on the information available for time to event: if 
# there is info for a patient, the event is considered as occurred
daysToDeath <- "days_to_death"
survTerms  <- processSurvTerms(cl, censoring="right", event=daysToDeath,
                               timeStart=daysToDeath,
                               formulaStr=formulaStr, scale="years")

require("survival")
## Loading required package: survival
surv <- survfit(survTerms)
pvalue <- testSurvival(survTerms)

plotSurvivalCurves(surv, pvalue=pvalue)

Information regarding number of individuals and events is returned when hovering over each survival curve in the plot. The plot also allows zooming in by clicking and dragging and to omit data series by clicking on their name in the legend.

Cox Proportional Hazards model

Calculate Cox proportional hazards model using the previously used parameters by adding the argument coxph=TRUE when processing survival terms.

survTermsCox <- processSurvTerms(cl, censoring="right", event=daysToDeath,
                                 timeStart=daysToDeath,
                                 formulaStr=formulaStr, coxph=TRUE)
summary(survTermsCox)
## Call:
## coxph(formula = form, data = survData)
## 
##   n= 841, number of events= 66 
##    (255 observations deleted due to missingness)
## 
##                        coef exp(coef) se(coef)      z Pr(>|z|)   
## tamoxifen            0.9762    2.6545   0.7241  1.348  0.17756   
## tamoxifen, positive -1.1990    0.3015   0.4308 -2.783  0.00538 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##                     exp(coef) exp(-coef) lower .95 upper .95
## tamoxifen              2.6545     0.3767    0.6422   10.9722
## tamoxifen, positive    0.3015     3.3167    0.1296    0.7014
## 
## Concordance= 0.582  (se = 0.038 )
## Rsquare= 0.015   (max possible= 0.523 )
## Likelihood ratio test= 12.53  on 2 df,   p=0.001901
## Wald test            = 10.07  on 2 df,   p=0.006492
## Score (logrank) test = 11.58  on 2 df,   p=0.003065

Exploring principal component analysis

Principal component analysis (PCA) is a technique to reduce data dimensionality by identifying variable combinations (called principal components) that explain the variance in the data (Ringnér 2008).

To explore alternative splicing quantification groups by estrogen receptor (ER) expression:

# Match patients with samples
samples <- colnames(psi)
match <- getPatientFromSample(samples, clinical, sampleInfo=sampleInfo)

# Filter alternative splicing quantification to colour values by positive or
# negative ER expression
erGroups <- getMatchingSamples(groups[c("positive", "negative")], samples, 
                               clinical, match=match)

Now that the groups are ready, let’s perform and plot the PCA results based on (normalised and log2-transformed) gene expression:

# Transpose data so samples are within rows
pcaData <- t(geneExprNorm[ , unlist(erGroups)])

# Missing values tolerated by gene: tolerated missing values are replaced with 
# the median expression of that gene
missingValues <- 10

# Center and scale normalised gene expression values
# Perform principal component analysis
pca <- performPCA(pcaData, center=TRUE, scale.=FALSE, 
                  missingValues=missingValues)

# Plot the variance explained by each principal component
plotVariance(pca)
# Score plot coloured according to the ER expression
plotPCA(pca, pcX="PC1", pcY="PC2", erGroups)
# Loadings plot that displays the contribution of the variables (i.e. 
# alternative splicing events)
plotPCA(pca, pcX="PC1", pcY="PC2", individuals=FALSE, loadings=TRUE)

Now the same for PSI:

# Transpose data so samples are within rows
pcaData <- t(psi[ , unlist(erGroups)])

# Missing values tolerated by splicing event: tolerated missing values are 
# replaced with the median value of that splicing event
missingValues <- 10

# Center but do not scale PSI values (they are already scaled between 0 and 1)
# Perform principal component analysis
pca <- performPCA(pcaData, center=TRUE, scale.=FALSE, 
                  missingValues=missingValues)

# Plot the variance explained by each principal component
plotVariance(pca)
# Score plot coloured according to the ER expression
plotPCA(pca, pcX="PC1", pcY="PC2", erGroups)
# Loadings plot that displays the contribution of the variables (i.e. 
# alternative splicing events)
plotPCA(pca, pcX="PC1", pcY="PC2", individuals=FALSE, loadings=TRUE)

The bubble size of the loadings plot represent the total contribution of each alternative splicing event to the selected principal components.

Note that the clinical samples from ER-positive individuals (i.e. patients whose cancer cells express estrogen receptors) seem to separate from samples from ER-negative individuals along the principal component 1. There is one alternative splicing event that may contribute to this separation: SE 10 + 79797062 79799962 79799983 79800373 RPS24.

Differential splicing analysis

By default, differential splicing analysis is performed by sample types (i.e. tumour, normal, metastasis, etc) for multiple parametric and non-parametric statistical tests. Let’s analyse the previous splicing event found:

# Choose a method to use when adjusting p-values ("none" is valid)
# help(p.adjust.methods)
pvalueAdjust <- "BH"

# Check sample types available (normal, tumour, etc.)
types <- parseSampleGroups(colnames(psi))
unique(types)
## [1] "Primary solid Tumor" "Solid Tissue Normal" "Metastatic"
# Analyse by sample types (by default) and perform all statistical analyses (by 
# default)
event <- "SE_10_+_79797062_79799962_79799983_79800373_RPS24"
eventPSI <- psi[event, ]
stats <- diffAnalyses(eventPSI, pvalueAdjust=pvalueAdjust,
                      analyses = c("wilcoxRankSum", "ttest", "kruskal", 
                                   "levene", "fligner"))
## Time difference of 0.0205 secs
## Time difference of 0.00527 secs
## Time difference of 0.00103 secs
# View(stats)
plotDistribution(eventPSI, groups=types)

Differential splicing analysis returns a data frame with the results of the statistical tests, as well as other info, such as associated gene and number of samples per group. The following statistical tests are available:

Not all statistical tests are performed depending on the number of groups available. Check if any tests show statistical significance using only tumour against normal samples (for instance, Log-rank p-values below 0.05):

filter <- grep("Tumor|Normal", types)
gr_event <- types[filter]
eventPSI <- eventPSI[ , filter]

stats <- diffAnalyses(eventPSI, groups=gr_event, pvalueAdjust=pvalueAdjust,
                      analyses = c("wilcoxRankSum", "ttest", "kruskal", 
                                   "levene", "fligner"))
## Time difference of 0.11 secs
## Time difference of 0.00637 secs
## Time difference of 0.000415 secs
## Time difference of 0.000684 secs
# View(stats)
plotDistribution(eventPSI, groups=gr_event)

Now, change the groups to positive and negative for ER expression and check if any of the tests show statistical significance:

# Filter alternative splicing quantification by positive or negative ER
# expression; match between patient information and samples as such:
erGroups <- getMatchingSamples(groups[c("negative", "positive")], samples, 
                               clinical, match=match)
eventPSI <- psi[event, ]

stats <- diffAnalyses(eventPSI, groups=erGroups, pvalueAdjust=pvalueAdjust,
                      analyses = c("wilcoxRankSum", "ttest", "kruskal", 
                                   "levene", "fligner"))
## Time difference of 0.0191 secs
## Time difference of 0.00598 secs
## Time difference of 0.000391 secs
## Time difference of 0.000642 secs
# View(stats)
plotDistribution(eventPSI, groups=erGroups)

Survival analysis

To study the impact of an alternative splicing event on prognosis, Kaplan-Meier curves may be plotted for groups of patients separated by a given PSI cut-off for the selected alternative splicing event. Let’s carry on with the splicing event from before:

event <- "SE_10_+_79797062_79799962_79799983_79800373_RPS24"

# Assign alternative splicing quantification to patients based on their samples
clinicalPSI <- getValuePerPatient(psi, match, clinical)
eventPSI <- as.numeric(clinicalPSI[event, ])

# Statistics of the alternative splicing event's quantification
summary(eventPSI)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
## 0.01419 0.46876 0.71263 0.64414 0.83500 1.00000       4

The optimal PSI cut-off that maximises the significance of their difference in survival (i.e. minimises the p-value of the Wald/Log/Logrank tests of difference in survival between individuals with PSI below and above that threshold) can be calculated as per:

opt <- optimalPSIcutoff(clinical, eventPSI, censoring="right", 
                        event="days_to_death", timeStart="days_to_death")
optimalCutoff <- opt$par # Optimal exon inclusion level
optimalCutoff
## [1] 0.6394619
optimalPvalue <- opt$value # Respective p-value
optimalPvalue
## [1] 0.0198

Finally, plot survival curves separated by the optimal quantification cut-off or any other cut-off of interest:

cutoff <- optimalCutoff
group <- labelBasedOnCutoff(eventPSI, cutoff, label="PSI values")
survTerms <- processSurvTerms(clinical, censoring="right",
                              event="days_to_death", timeStart="days_to_death", 
                              group=group)

require("survival")
surv <- survfit(survTerms)
pvalue <- testSurvival(survTerms)
plotSurvivalCurves(surv, pvalue=pvalue)

Let’s now compare the resulting curves with Kaplan-Meier curves separated by a cut-off based on gene expression:

gene <- "RPS24"

# Assign gene expression to patients based on their samples
clinicalGE <- getValuePerPatient(geneExprNorm, match, clinical)
eventGE <- as.numeric(clinicalGE[gene, ])

# Statistics of the gene expression for the selected gene
summary(eventGE)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   14.87   16.28   16.65   16.70   17.05   20.05       4
# Plot survival curves based on the mean gene expression
cutoff <- mean(eventGE, na.rm=TRUE)
group  <- labelBasedOnCutoff(eventGE, cutoff, label="Gene expression")
survTerms <- processSurvTerms(clinical, censoring="right", 
                              event="days_to_death", timeStart="days_to_death", 
                              group=group)

require("survival")
surv <- survfit(survTerms)
pvalue <- testSurvival(survTerms)
plotSurvivalCurves(surv, pvalue=pvalue)

Literature support and external database information

If an event is differentially spliced and has an impact on patient survival, its association with the studied disease might be already described in the literature. Check for relevant research articles on PubMed.

It is also possible to plot transcripts and proteins related to the gene associated to a given splicing event. To retrieve and plot transcript information from ENSEMBL:

parsed <- parseSplicingEvent(event)
info <- queryEnsemblByEvent(event, species="human", assembly="hg19")
plotTranscripts(info, parsed$pos[[1]])

To retrieve and plot protein information from UniProt:

parsed <- parseSplicingEvent(event)
info <- queryEnsemblByEvent(event, species="human", assembly="hg19")

# Some of these transcripts have no corresponding protein
proteins <- info$Transcript$Translation$id
protein <- proteins[[6]]

# Convert protein identifier from ENSEMBL to UniProt
uniprot <- ensemblToUniprot(protein)
uniprot
## RS24_HUMAN (UniProtKB/Swiss-Prot) 
##                          "P62847"
uniprot <- uniprot[[1]]

plotProtein(uniprot)
## 
## Retrieving protein annotation from UniProt...
## 
## Plotting protein domains...

The protein plot shows the UniProt matches for the selected transcript. Hover the protein’s rendered domains to obtain more information on them.

Other database of interest include: - Human Protein Atlas (Cancer Atlas) allows to check the evidence of a gene at protein level for multiple cancer tissues. - VastDB shows multi-species alternative splicing profiles for diverse tissues and cell types. - UCSC Genome Browser may reveal protein domain disruptions caused by the alternative splicing event. To check so, activate the Pfam in UCSC Gene and UniProt tracks (in Genes and Gene Predictions) and check if any domains are annotated in the alternative and/or constitutive exons of the splicing event.

Exploring differential splicing analysis

To analyse differencial splicing, choose which clinical groups on which to perform the analyses. For instance, splicing events can be analysed based on sample types (e.g., tumour versus normal tissue, if available) or clinical groups of the patients (e.g. stage of the disease).

Let’s perform differential splicing analysis based on ER expression:

# Filter alternative splicing quantification by positive or negative ER
# expression; match between patient information and samples as such:
erGroups <- getMatchingSamples(groups[c("negative", "positive")], samples, 
                               clinical, match=match)
diff <- diffAnalyses(psi, erGroups)
## Time difference of 1.47 secs
## Time difference of 0.0358 secs
## Time difference of 0.000437 secs
## Time difference of 0.00113 secs
# View(diff)

Survival analysis

To study the impact of the alternative splicing events on prognosis, significant differences in survival may be identified between patients separated by a given splicing quantification cut-off. Given the time-consuming process of identifying a cut-off that minimises the survival difference, it is recommended to filter differentially spliced events supported by statistical significance before calculating the optimal cut-off and associated p-value.

# Filter statistically significant events as you see fit
filter <- diff$`Wilcoxon p-value (BH adjusted)` <= 0.05
filter <- filter[!is.na(filter)]
events <- rownames(diff[filter, ])

# Assign alternative splicing quantification to patients based on their samples
clinicalPSI <- getValuePerPatient(psi, match, clinical)

# Get time for all survival analyses of interest (as this is the same for all,
# pre-fetching this information speeds the following loop)
daysToDeath <- "days_to_death"
survTime <- getAttributesTime(clinical, event=daysToDeath, timeStart=daysToDeath)

# Prepare progress bar
min <- 1
max <- nrow(clinicalPSI)
pb <- txtProgressBar(min, max, "Calculating optimal PSI cut-off", style=3)

# Prepare empty vectors where information will be stored (faster)
optimalCutoff <- rep(NA, max)
optimalPvalue <- rep(NA, max)

# Retrieve the optimal PSI cut-off and respective p-value for multiple events
for (row in min:max) {
    singlePSI <- as.numeric(clinicalPSI[row, ])
    opt <- optimalPSIcutoff(clinical, singlePSI, censoring="right", 
                            event=daysToDeath, timeStart=daysToDeath,
                            survTime=survTime)
    optimalCutoff[row] <- opt$par # Optimal exon inclusion level
    optimalPvalue[row] <- opt$value # Respective p-value
    setTxtProgressBar(pb, row)
}

# Bind columns to differential splicing analysis if desired
diff <- cbind(diff, optimalCutoff, optimalPvalue)
# View(diff)

Afterwards, check literature information and search external databases for more information on the events of interest, including on UCSC Genome Browser for putative protein domain disruptions resulting from the splicing event.

Exploring an alternative splicing event of interest

The event SE 6 - 46823711 46822518 46822452 46821808 GPR116 has been previously reported to have potential prognostic value in breast cancer patients, where patients with higher PSI values for this event have a lower 5-year survival rate than patients with lower PSI values (Tsai et al. 2015).

Confirm so by performing differential splicing (for example, normal vs tumour samples) on this event and survival analysis by PSI cut-off. Also, check literature information and search external databases for more information, including on UCSC Genome Browser for putative protein domain disruptions resulting from the alternative splicing event.

Feedback

All feedback on the program, documentation and associated material (including this tutorial) is welcome. Please send any suggestions and comments to:

Nuno Saraiva-Agostinho (nunodanielagostinho@gmail.com) Computation Biology Lab, Instituto de Medicina Molecular (Portugal)

References

Ringnér, Markus. 2008. “What is principal component analysis?” Nature Biotechnology 26 (3): 303–4.

Tsai, Yihsuan S, Daniel Dominguez, Shawn M Gomez, and Zefeng Wang. 2015. “Transcriptome-wide identification and study of cancer-specific splicing events across multiple tumors.” Oncotarget 6 (9): 6825–39.

Wang, E. T., R. Sandberg, S. Luo, I. Khrebtukova, L. Zhang, C. Mayr, S. F. Kingsmore, G. P. Schroth, and C. B. Burge. 2008. “Alternative isoform regulation in human tissue transcriptomes.” Nature 456 (7221): 470–76.