psichomics case study: visual interface

Nuno Saraiva-Agostinho

2019-10-07


psichomics is an interactive R package for integrative analyses of alternative splicing and gene expression based on The Cancer Genome Atlas (TCGA) (containing molecular data associated with 34 tumour types), the Genotype-Tissue Expression (GTEx) project (containing data for multiple normal human tissues), Sequence Read Archive and user-provided data. The data from GTEx, TCGA and select SRA projects include subject/sample-associated information and transcriptomic data, such as the quantification of RNA-Seq reads aligning to splice junctions (henceforth called junction quantification) and exons.

Installing and starting the program

Install psichomics by typing the following in an R console (the R environment is required):

install.packages("BiocManager")
BiocManager::install("psichomics")

After the installation, start the visual interface of the program in your default web browser by typing:

library(psichomics)
psichomics()

Exploration of clinically-relevant, differentially spliced events in breast cancer

The following case study was adapted from psichomics’ original article:

Nuno Saraiva-Agostinho and Nuno L. Barbosa-Morais (2019). psichomics: graphical application for alternative splicing quantification and analysis. Nucleic Acids Research.

Breast cancer is the cancer type with the highest incidence and mortality in women (Torre et al., 2015) and multiple studies have suggested that transcriptome-wide analyses of alternative splicing changes in breast tumours are able to uncover tumour-specific biomarkers (Tsai et al., 2015; Danan-Gotthold et al., 2015; Anczuków et al., 2015). Given the relevance of early detection of breast cancer to patient survival, we can use psichomics to identify novel tumour stage-I-specific molecular signatures based on differentially spliced events.

Downloading and loading TCGA data

The quantification of each alternative splicing event is based on the proportion of junction reads that support the inclusion isoform, known as percent spliced-in or PSI (Wang et al., 2008).

To estimate this value for each splicing event, both alternative splicing annotation and junction quantification are required. While alternative splicing annotation is provided by the package, junction quantification may be retrieved from muliple sources.

Start by loading breast cancer data by following these instructions:

  1. To load TCGA data, click on the blue panel Download/load TCGA data.
  2. Fill in the Tumour type field with Breast invasive carcinoma (BRCA).
  3. Set the most recent date available in the Date field.
  4. In the Data type field, select Clinical data, Junction quantification and Gene expression (RSEM).

Note there is also the option for Gene expression (normalised by RSEM). However, we recommend to load the raw gene expression data instead, followed by filtering and normalisation as demonstrated afterwards.

  1. Confirm if the Folder where data is stored field indicates the appropriate folder to where your browser downloads files.
  2. Click Load data. If the required files are not available in the given folder, an information message will appear asking if you want to download the requested data. Click Download data and, when all downloads are finished, load data by clicking on Load data again (do not change any parameters).

Downloading multiple files: Note that multiple files will be requested for download at once. Some web browsers (such as Google Chrome) will ask for your confirmation before allowing such behaviour. In order to proceed, please allow multiple downloads.

Windows limitations: If you are using Windows, note that the downloaded files have huge names that may be over Windows Maximum Path Length. A workaround would be to manually rename the downloaded files to have shorter names, move all downloaded files to a single folder and load such folder by going to Load user files > Folder input and selecting the newly-created folder in Folder where data is stored.

Options to load TCGA data.

Options to load TCGA data.

After the data finish loading (keep an eye on the progress bar at the top-right corner), the on-screen instructions at the right will be replaced by the loaded data, including options to view and save such data.

Filtering and normalising gene expression

To filter and normalise gene expression, click the green panel Gene expression filtering and normalisation. Within this panel, click the different grey sections (Gene filtering, Normalisation and Compute CPM and log-transform) to check the settings available for processing gene expression. When you are ready to proceed, click Filter and normalise gene expression.

Options to filter and normalise gene expression.

Options to filter and normalise gene expression.

Quantifying alternative splicing

After loading the clinical and alternative splicing junction quantification data from TCGA, quantify alternative splicing by clicking the green panel Alternative splicing quantification.

  1. From the loaded data, select the junction quantification dataset to use in Alternative splicing junction quantification. For many tumour types, only one dataset is provided.
  2. Select the alternative splicing event annotation Human hg19/GRCh37 (2017-10-20). Note that there are other annotation files available and you can also load custom annotation files.

Custom splicing annotation: Additional alternative splicing annotations can be prepared for psichomics by parsing the annotation from programs like VAST-TOOLS, MISO, SUPPA and rMATS. Note that SUPPA and rMATS are able to create their splicing annotation based on transcript annotation. For more information, read this tutorial.

  1. Choose the event type(s) of interest. For the purposes of following the case study, select Skipped exon (SE), Mutually exclusive exon (MXE), Alternative 5’ splice site (A5SS), Alternative 3’ splice site (A3SS), Alternative first exon (AFE) and Alternative last exon (ALE).
  2. Set the minimum read counts’ threshold to 10. Inclusion levels calculated with total read counts below this threshold are discarded from further analyses.
  3. Click Quantify alternative splicing.
Options to quantify alternative splicing.

Options to quantify alternative splicing.

Data grouping

In order to group data for downstream analyses, look for the navigation bar on the top and click Groups. In the displayed table, confirm that three groups are automatically present based on the available sample types: Metastatic, Primary solid Tumor and Solid Tissue Normal.

Next, to create groups by tumour stage, click in the field Select attribute and type tumor stage. Select the first hit (it should be patient.stage_event.pathologic_stage_tumor_stage) and click Create group.

Creating groups by tumour stage.

Creating groups by tumour stage.

The table on the right will be updated with the created groups per tumour stage. Next, we will merge tumour stages so as to have only Stage I, II, III and IV. To do so:

  1. Select stage i, stage ia and stage ib in the table.

Hint: You can shift-click to select multiple groups at once.

  1. Click Merge in the toolbar below the table. A new group named (stage i ∪ stage ia ∪ stage ib) will appear.
  2. Select the newly created and the Primary solid Tumor groups and click Intersect to retrieve Tumour Stage I samples.
  3. Optionally, rename the newly created group ((stage i ∪ stage ia ∪ stage ib) ∩ Primary solid Tumor) by selecting it and clicking in Rename selected group on the bottom. Type Tumour Stage I and click Rename.

Do the same for tumour stages II, III and IV with their respective subgroups (ignore stage X samples as they are uncharacterised tumour samples). We also recommend to remove groups that are of no interest by selecting them and clicking Remove. In the end, you should end up with a table similar to the one below.

Changing group colours: The colours defined for each group will be used to represent those same groups in the plots throughout psichomics. To change the colour of a given group, select that group and, next to the rename field, change its associated colour (by clicking on the colour field and picking a new colour or by inputting a HEX code) and click Set colour.

Table showing data groups as created for downstream analyses.

Table showing data groups as created for downstream analyses.

The created groups can then be saved in a text file and loaded in a future session. To do so, in the toolbar below the table click the folder icon (right next to More) and select Save elements from all groups.

Options to save and load groups.

Options to save and load groups.

Principal component analysis (PCA)

PCA is a technique to reduce data dimensionality by identifying variable combinations (called principal components) that explain the variance in the data (Ringnér, 2008). To analyse principal components, click on the Analyses tab located in the navigation menu at the top and select Principal component analysis (PCA).

PCA performance

To perform PCA on alternative splicing data using all samples:

  1. In Data to perform PCA on, select Inclusion levels.
  2. Select Center values and de-select Scale values.
  3. Perform PCA on All samples.
  4. Number of missing values to tolerate per event: select 5% of the samples (i.e. 61 samples).

As PCA cannot be performed on data with missing values, missing values need to be either removed (thus discarding data from whole splicing events or genes) or imputed (i.e. attributing to missing values the median of the non-missing ones). This input allows to select the number of missing values that are tolerable per event (i.e. if a splicing event or gene has less than N missing values, those missing values will be imputed; otherwise, the event is discarded from PCA).

  1. Perform PCA on All genes and splicing events.
  2. Click Calculate PCA.

PCA plotting

After PCA is performed, the Plot PCA panel will automatically open. Note that the explained variance of each principal component (PC) is shown next to the respective component and that PC1 explains most of the data variance, followed by PC2, then PC3, then PC4, etc. The variance plot is also available to compare the explained variance across principal components (by clicking Show variance plot). Now:

  1. Select PC1 as the X axis.
  2. Select PC2 as the Y axis.
  3. In Sample colouring, select Colour using selected groups and in the group selection input insert Tumour Stage I, Tumour Stage II, Tumour Stage III, Tumour Stage IV and Solid Tissue Normal.
  4. In Variables to plot in loadings plot, select All variables.

For performance reasons, only the Top 100 variables that most contribute to the select principal components are plotted by default.

  1. Click Plot PCA.
Options to perform and visualise PCA.

Options to perform and visualise PCA.

Two PCA plots are then rendered. The plot above is a score plot that shows the clinical samples, while the loadings plot below displays the variables (in this case, alternative splicing events). The table below the loadings plot depicts the contribution of each variable to each PC.

Hint: As most plots in psichomics, PCA plots can be zoomed-in by clicking-and-dragging within the plot (click Reset zoom to zoom-out). To toggle the visibility of the data series represented in the plot, click its respective name in the plot legend.