Recent breakthroughs in bioinformatics now allow us to accurately reconstruct and quantify full-length gene isoforms from RNA-sequencing data (via tools such as Cufflinks, Kallisto and Salmon). These tools make it possible to analyzing alternative isoform usage, but unfortunatly this is rarely done. Thus, RNA-se data is typically not used to its full potential.
To solve this problem we developed IsoformSwitchAnalyzeR. IsoformSwitchAnalyzeR is an easy-to use-R package that enables statistical identification of isoform switching from RNA-seq derived quantification of novel and/or annotated full-length isoforms. IsoformSwitchAnalyzeR facilitates integration of many sources of (predicted) annotation including features, including Open Reading Frame (ORF), protein domains (via Pfam), signal peptides (via SignalP), coding potential (via CPAT) and sensitivity to Non-sense Mediated Decay (NMD). The combination of identified isoform switches and their annotation also enables IsoformSwitchAnalyzeR to predict potential functional consequences of the identified isoform switches — such as loss of protein domains or coding potential — thereby identifying isoform switches of particular interest. Lastly, IsoformSwitchAnalyzeR provides article-ready visualization methods for isoform switches, and summary statistics describing the genome-wide occurences of isoform switches, their consequences as well as the associated alternative splicing.
In summary, IsoformSwitchAnalyzeR enables analysis of RNA-seq data with isoform resolution with a focus on isoform switching (with predicted consequences), thereby expanding the usability of RNA-seq data.
What To Cite (please remember)
Analyzing Alternative Splicing (new)
Frequently Asked Questions, Problems and Errors
The usage of Alternative Transcription Start sites (aTSS), Alternative Splicing (AS) and alternative Transcription Termination Sites (aTTS) are collectively collectively results in the production of different isoforms. Alternative isoforms are widely used as recently demonstrated by The ENCODE Consortium, which found that on average, 6.3 different transcripts are generated per gene; a number whic may var considerably per gene.
The importance of analyzing isoforms instead of genes has been highlighted by many examples showing functionally important changes. One of these examples is the pyruvate kinase. In normal adult homeostasis, cells use the adult isoform (M1), which supports oxidative phosphorylation. However, almost all cancer cells use the embryonic isoform (M2), which promotes aerobic glycolysis, one of the hallmarks of cancer. Such shifts in isoform usage are termed ‘isoform switching’ and cannot be detected at when only analyzing data on gene level.
Tools such as Cufflinks, Salmon and Kallisto allows for reconstruction and quantification of full-length transcripts from RNA-seq data. Such data has the potential to facilitate genome-wide analysis of alternative isoform usage and identification of isoform switching — but unfortunately these types of analyses are still only rarely done; most analyses are on gene level only.
We hypothesize that there are multiple reasons why RNA-seq data is not used to its full potential:
To solve all these problems, we developed IsoformSwitchAnalyzeR.
IsoformSwitchAnalyzeR is an easy-to-use R package that enables the user to import (novel) full-length derived isoforms from an RNA-seq experiment into R. If annotated transcripts are analyzed, IsoformSwitchAnalyzeR offers integration with the multi-layer information stored in a GTF file including the annotated coding sequences (CDS). If transcript structures are predicted (either de-novo or guided) IsoformSwitchAnalyzeR offers an accurate tool for identifying the dominant ORF of the isoforms. The knowledge of isoform positions for the CDS/ORF allows for prediction of sensitivity to Nonsense Mediated Decay (NMD) — the mRNA quality control machinery that degrades isoforms with pre-mature termination codons (PTC).
IsoformSwitchAnalyzeR facilitates identification of isoform switches via newly developed statistical methods that tests each individual isoform for differential usage and thereby identifies the exact isoforms involved in an isoform switch.
Since we know the exon structure of the full-length isoform, IsoformSwitchAnalyzeR can extract the underlying nucleotide sequence from a reference genome. This enables integration with the Coding Potential Assessment Tool (CPAT) which predicts the coding potential of an isoform and can be used to increase accuracy of ORF predictions. By combining the CDS/ORF isoform positions with the nucleotide sequence, we can extract the most likely amino acid sequence of the CDS/ORF. The amino acid sequence enables integration of analysis of protein domains (via Pfam) and signal peptides (via SignalP) — both supported by IsoformSwitchAnalyzeR. Lastly, since the structures of all expressed isoforms from a given gene are known, one can also annotate alternative splicing - including retentions - a functionality also implemented in IsoformSwitchAnalyzeR.
Thus, in summary, IsoformSwitchAnalyzeR enables annotation of isoforms with intron retention, ORF, NMD sensitivity, coding potential, protein domains and signal peptides (and many more), resulting in the ability to predict important functional consequences of isoform switches.
IsoformSwitchAnalyzeR contains tools that allow the user to create article-ready visualizations of:
These visualizations are easy to understand and integrate all information gathered throughout the workflow. Example of visualizations can be found in the Examples of Switch Visualizations section.
Lastly IsoformSwitchAnalyzeR is based on standard Bioconductor classes such as GRanges and BSgenome. Thus, it supports all species- and annotation versions facilitated by the Bioconductor annotation packages.
Back to Table of Content.
IsoformSwitchAnalyzeR is part of the Bioconductor repository and community which means it is distributed with, and dependent on, Bioconductor. Installation of IsoformSwitchAnalyzeR is easy and can be done from within the R terminal. If it is the first time you use Bioconductor, simply copy-paste the following into an R session to install the basic Bioconductor packages:
source("http://bioconductor.org/biocLite.R")
biocLite()
If you already have installed Bioconductor, running these two commands will check whether updates for installed packages are available.
After you have installed the basic Bioconductor packages you can install IsoformSwitchAnalyzeR by copy-pasting the following into an R session:
source("http://bioconductor.org/biocLite.R")
biocLite("IsoformSwitchAnalyzeR")
This will install the IsoformSwitchAnalyzeR package as well as other R packages that are needed for IsoformSwitchAnalyzeR to work.
If you need to install from the developmental branch of Bioconductor you need to specify that - note that this is for advanced uses and should not be done unless you have good reason to:
source("http://bioconductor.org/biocLite.R")
useDevel(devel = TRUE)
biocLite("IsoformSwitchAnalyzeR")
This R package comes with plenty of documentation. Much information can be found in the R help files (which can easily be accessed by running the following command in R “?functionName”, for example “?isoformSwitchTest”). Furthermore, this vignette contains a lot of information and will be continously updated, so make sure to read both sources carefully as it contains the answers to the most Frequently Asked Questions, Problems and Errors.
If you want to report a bug/error (found in the newest version of the R package!) please make an issue with a reproducible example at github — remember to add the appropriate label.
If you have unanswered questions or comments regarding IsoformSwitchAnalyzeR, please post them on the associated google group (plase make sure the question was not already answered there).
If you have suggestions for improvements, please put them on github. This will allow other people to upvote your idea, thereby showing us there is wide support of implementing your idea.
Back to Table of Content.
The analysis performed by IsoformSwitchAnalyzeR is only possible due to a string of other tools and scientific discoveries — please read this section thoroughly and cite the appropriate articles to give credit where credit is due.
If you are using the
Refrences:
The idea behind IsoformSwitchAnalyzeR is to make it easy to do advanced post-analysis of full-length RNA-seq derived transcripts with a focus on finding, annotating and visualizing isoform switches with functional consequences. In the switch analysis workflow (see Analyzing Alternative Splicing for the alternative splicing workflow) IsoformSwitchAnalyzeR performs three high-level tasks:
A normal workflow for identification and analysis of isoform switches with functional consequences can be divided into two parts (also illustrated below in Figure 1).
Part 1) Extract Isoform Switches and Their Sequences. This part includes importing the data into R, identifying isoform switches, annotating those switches with open reading frames (ORF) and extracting the nucleotide and amino acid (peptide) sequences. The latter step enables the usage of external sequence analysis tools such as
All of the above steps are performed by the high-level function:
isoformSwitchAnalysisPart1()
See below for example of usage, and Detailed Workflow for details on the individual steps.
Part 2) Plot All Isoform Switches and Their annotation. This part involves importing and incorporating the results of the external sequence analysis, identifying intron retention, predicting functional consequences and plotting i) all genes with isoform switches and ii) summaries of general consequences of switching.
All of this can be done using the function:
isoformSwitchAnalysisPart2()
See below for usage example, and Detailed Workflow for details on the individual steps.
Alternatively if one does not plan to incorporate external sequence analysis, it is possible to run the full workflow using:
isoformSwitchAnalysisCombined()
This corresponds to running isoformSwitchAnalysisPart1() and isoformSwitchAnalysisPart2() without adding the external results.
As indicted above, a full, but less customizable, analysis of isoform switches can be done using the two high-level functions isoformSwitchAnalysisPart1() and isoformSwitchAnalysisPart2(). This section aims to show how these functions are used as well as illustrate what IsoformSwitchAnalyzeR can be used for.
Lets start by loading the R package
library(IsoformSwitchAnalyzeR)
Note that newer versions of RStudio support auto-completion of package names.
The first step is to import all data needed for the analysis and store them in the switchAnalyzeRlist object. IsoformSwitchAnalyzeR contains different functions for importing data from tools such as Salmon/Kallisto/RSEM/Cufflinks, but can be used with all isoform-level expression data using implemented general-purpose functions . For the purpose of illustrating data import, let’s use Cufflinks/Cuffdiff data as (See Importing Data Into R) for details on importing data from other tools).
cuffDB <- prepareCuffExample()
This function is just a wrapper for readCufflinks() which makes the sql database from the example Cufflinks/Cuffdiff results data included in the R package “cummeRbund” so when analyzing you own data you should just use the readCufflinks() function instead.
Once you have a CuffSet (the object type generated by readCufflinks() and prepareCuffExample()), a switchAnalyzeRlist is then created by:
aSwitchList <- importCufflinksCummeRbund(cuffDB)
Unfortunately, this example dataset is not ideal for illustrating the usability of IsoformSwitchAnalyzeR, as it only has two replicates, and the isoform switch tests rely on replicates. To illustrate the workflow, let’s instead use some of the test data from the IsoformSwitchAnalyzeR package:
data("exampleSwitchList")
exampleSwitchList
#> This switchAnalyzeRlist list contains:
#> 259 isoforms from 84 genes
#> 1 comparison from 2 conditions
#>
#> Switching features:
#> Comparison switchingIsoforms switchingGenes
#> 1 hESC vs Fibroblasts 0 0
#>
#> Feature analyzed:
#> [1] "Isoform Swich Identification"
The data stored in this example switch list correspond to a small subset of the result of using as illustrated above. Note that although a switch identification analysis was performed (by CuffDiff), no genes with differential usage were found. This small data subset is ideal as a showcase due to the short runtimes.
Before we can run the analysis it is nessesary to know that IsoformSwitchAnalyzeR measures isoform usageas via isoform fraction (IF) values which quantifies the fration of the parent gene expression originating from a specific isoform (calculated as
We can now run the first part of the isoform switch analysis workflow which filters for non-expressed genes/isoforms, identifies isoform switches, annotates open reading frames (ORF), switches with and extracts both the nucleotide and peptide (amino acid) sequences.
### isoformSwitchAnalysisPart1 needs the genomic sequence to predict ORFs
# Genome sequences are available from Biocindoctor as BSgenome objects:
# http://bioconductor.org/packages/release/BiocViews.html#___BSgenome
# Here we use the gg19 reference genome - which can be downloaded by
# copy/pasting the following two lines into the R terminal:
# source("https://bioconductor.org/biocLite.R")
# biocLite("BSgenome.Hsapiens.UCSC.hg19")
library(BSgenome.Hsapiens.UCSC.hg19)
exampleSwitchList <- isoformSwitchAnalysisPart1(
input=exampleSwitchList,
genomeObject = Hsapiens, # the reference to the human BS genome
dIFcutoff = 0.4, # Cutoff for finding switches - set high for short runtime in example data
outputSequences = FALSE # prevents outputting of the fasta files used for external sequence analysis
)
Note that:
It is possible to supply the CuffSet (the object which links to the cufflinks/cuffdiff SQL database, created with readCufflinks{cummeRbund}) directly to the input argument instead of creating the switchAnalyzeRlist first.
The isoformSwitchAnalysisPart1() function has an argument, overwritePvalues, which overwrites the result of user-supplied p-values (such as those imported by cufflinks) with the result of running isoformSwitchTestDRIMSeq().
The switchAnalyzeRlist produced by isoformSwitchAnalysisPart1() has been reduced to only contain genes where an isoform switch (as defined by the alpha and dIFcutoff arguments) was identified. This enables much faster runtimes for the rest of the pipeline, as only isoforms from a gene with a switch are analyzed.
The number of switching features is easily summarized via:
extractSwitchSummary(exampleSwitchList, dIFcutoff = 0.4) # supply the same cutoff to the summary function
#> Comparison nrIsoforms nrGenes
#> 1 hESC vs Fibroblasts 12 7
The second part of the isoform switch analysis workflow, which includes importing and incorporating external sequence annotation, analysis of alternative splicing, predicting functional consequences and visualizing both the general effects of isoform switches and the individual isoform switches. The combined analysis can be done by:
exampleSwitchList <- isoformSwitchAnalysisPart2(
switchAnalyzeRlist = exampleSwitchList,
dIFcutoff = 0.4, # Cutoff for finding switches - set high for short runtime in example data
n = 10, # if plotting was enabled, it would only output the top 10 switches
removeNoncodinORFs = TRUE, # Because ORF was predicted de novo
pathToCPATresultFile = system.file("extdata/cpat_results.txt" , package = "IsoformSwitchAnalyzeR"),
pathToPFAMresultFile = system.file("extdata/pfam_results.txt" , package = "IsoformSwitchAnalyzeR"),
pathToSignalPresultFile = system.file("extdata/signalP_results.txt", package = "IsoformSwitchAnalyzeR"),
codingCutoff = 0.725, # the coding potential cutoff we suggested for human
outputPlots = FALSE # keeps the function from outputting the plots from this example
)
The exampleSwitchList now contains all the information needed to analyze isoform switches and alterantive splicing both for individual genes as well as genome-wide summary statistics or analysis.
The number of isoform switches with functional consequences are easily extracted by setting “filterForConsequences = TRUE”:
extractSwitchSummary(exampleSwitchList, filterForConsequences = TRUE, dIFcutoff = 0.4) # supply the same cutoff to the summary function
#> Comparison nrIsoforms nrGenes
#> 1 hESC vs Fibroblasts 10 5
For each of these top genes, a switch plot will be generated if “outputPlots=TRUE” were used. Let’s take a closer look at the top candidates:
The top genes with isoform switches are:
extractTopSwitches(exampleSwitchList, filterForConsequences = TRUE, n=3)
#> gene_ref gene_id gene_name condition_1 condition_2
#> 1 geneComp_00000098 XLOC_000088:KIF1B KIF1B hESC Fibroblasts
#> 2 geneComp_00000389 XLOC_001345 ARHGEF19 hESC Fibroblasts
#> 3 geneComp_00000205 XLOC_000177 LDLRAD2 hESC Fibroblasts
#> gene_switch_q_value switchConsequencesGene Rank
#> 1 2.507552e-263 TRUE 1
#> 2 1.413555e-80 TRUE 2
#> 3 9.231366e-45 TRUE 3
Let’s take a look at the isoform switch in the KIF1B gene via the switch plot produced by IsoformSwitchAnalyzeR:
switchPlot(exampleSwitchList, gene='KIF1B')
From this plot, we first note that the gene expression is not significantly changed (bottom left). Next, we see the large significant switch in isoform usage across conditions (bottom right). By comparing which isoforms are chaning (bottom right) to the isoform structure (top) it can be seen that in hESC it is primarly the long isoforms that are used, but as the cells are differentiated to fibroblasts, there is a change towards mainly using the short isoform. Interestingly, this isoform switch results in the loss of several protein domains including the PH domain which plays a role in anchoring proteins to cell membranes. This could lead to the hypothesis that KIF1B would lose its ability to bind vesicles — , which would have to be confirmed experimentally.
Note that if you want to save this plot as a pdf file via the pdf command, you need to specify “onefile = FALSE”. The folloing colde chunk that will produce a nicely-sized figure:
pdf(file = '<outoutDirAndFileName>.pdf', onefile = FALSE, height=5, width = 8)
switchPlot(exampleSwitchList, gene='KIF1B')
dev.off()
Furthermore, note that if you are analyzing multiple conditions, you will also need to specify the ‘condition1’ and ‘condition2’ arguments in the switchPlot() function to indicate specifically which comparison you want to plot.
To illustrate the genome-wide analysis of consequences of isoform switching in the different comparisons, we will use a larger dataset which is a subset of two of the TCGA Cancer types analyzed in Vitting-Seerup et al 2017.
data("exampleSwitchListAnalyzed")
The first step is to take a look at the number and overlap of isoform switches. The numbers can be extracted via the extractSwitchSummary() function and by setting “filterForConsequences=TRUE” we only extract the features involved in a switch which is predicted to have a functional consequence:
extractSwitchSummary(
exampleSwitchListAnalyzed,
filterForConsequences=TRUE
)
#> Comparison nrIsoforms nrGenes
#> 1 COAD_ctrl vs COAD_cancer 690 368
#> 2 LUAD_ctrl vs LUAD_cancer 422 218
#> 3 combined 1008 529
The number, and overlap, between condtions can be viusalized with the extractSwitchOverlap() function:
extractSwitchOverlap(
exampleSwitchListAnalyzed,
filterForConsequences=TRUE
)
We can look more into the details of the consequences by considering each type consequence seperately as follows:
extractConsequenceSummary(
exampleSwitchListAnalyzed,
consequencesToAnalyze='all',
plotGenes = FALSE, # enables analysis of genes (instead of isoforms)
asFractionTotal = FALSE # enables analysis of fraction of significant features
)
Note the “consequencesToAnalyze” argument enables analysis of only a subset of features.
From this summary plot many conclusions are possible. First of all, the most frequent changes are changes affecting protein domains and ORFs. Secondly,intron retention is more commonin LUAD than in COAD. Lastly, when considering oppositeconsequence (e.g. the gain vs loss of protein domains) its quite easy to see they are unevenly distributed (e.g. more protein domain loss than protein domain gain). This uneven distribution can be systematically analyzed using the build in enrichment analysis:
extractConsequenceEnrichment(
exampleSwitchListAnalyzed,
consequencesToAnalyze='all',
analysisOppositeConsequence = TRUE
)
For each pair of oppositeconsequences (e.g. protein domain loss vs grain) (y-axis) this plot shows the fraction of switches, affected by either of the opposing consequence, that results in the consequnce indicated (e.g. protein domain loss) (x-axis). If this fraction is significantly different from 0.5 it indicates there is a systematic biases in which consequence is detected.
From the analysis above, it is therefore quite clear, that many of the oposing consequences are significantly unevenly distributed. In other words, many types of consequences seems to be used in a group-specific manner.
When comparing the two cancer types (right vs left plot) the overall pattern seems similar - but there might be differences. This can formally be analyzed with the extractConsequenceEnrichmentComparison() function. This function with for each oposing set of consequence (e.g. protein domain loss vs grain), in a pairwise manner contrasts the individual comparisons to assess whether the ratio of loss/gains are indeed significantly different:
extractConsequenceEnrichmentComparison(
exampleSwitchListAnalyzed,
consequencesToAnalyze=c('domains_identified','intron_retention','coding_potential'),
analysisOppositeConsequence = T
)
The analysis shows that compared to LUAD, COAD have a significantly higher fraction of switches resulting in protein domain loss, but there is no difference in terms of intron retention loss or switches from noncoding to to coding transcripts.
Such a results leads to the question what cellular mechanism underlies behind these changes. Due to the detailed alternative splicing analysis performed we can analyze this using the built-in summary function:
extractSplicingEnrichment(exampleSwitchListAnalyzed)
Which is equivivalent to the consequnce enrichment analysis described above. From this analysis, we see that although the patterns of consequences looked somewhat slimiar (as illustrated above), the underlying consequences are quite different. COAD utilizes alternative 3’ acceptor sites (A3), multiple exon skipping (MES) and alternative transcription start sites (ATSS) while LUAD utilize intron retention (IR) and alternative transcription termination sites (ATTS) more. For more details (and functionality) regarding splicing analysis see the Genome-Wide Analysis of Alternative Splicing section.
Since all the data about isoform and gene expression is saved in the switchAnalyzeRlist, we can also make a set of overview plots:
data("exampleSwitchListAnalyzed")
### Vulcano like plot:
ggplot(data=exampleSwitchListAnalyzed$isoformFeatures, aes(x=dIF, y=-log10(isoform_switch_q_value))) +
geom_point(
aes( color=abs(dIF) > 0.1 & isoform_switch_q_value < 0.05 ), # default cutoff
size=1
) +
geom_hline(yintercept = -log10(0.05), linetype='dashed') + # default cutoff
geom_vline(xintercept = c(-0.1, 0.1), linetype='dashed') + # default cutoff
facet_wrap( ~ condition_2) +
#facet_grid(condition_1 ~ condition_2) + # alternative to facet_wrap if you have overlapping conditions
scale_color_manual('Signficant\nIsoform Switch', values = c('black','red')) +
labs(x='dIF', y='-Log10 ( Isoform Switch Q Value )') +
theme_bw()
As there are many dIF values (effect size) very close to zero, which have a significant isoform switch (black dots above dashed hoizontal line) this nicely illustrates why a cutoffs both on the dIF and the q-value are necessary.
Another interesting overview plot can be made as follows:
### Switch vs Gene changes:
ggplot(data=exampleSwitchListAnalyzed$isoformFeatures, aes(x=gene_log2_fold_change, y=dIF)) +
geom_point(
aes( color=abs(dIF) > 0.1 & isoform_switch_q_value < 0.05 ), # default cutoff
size=1
) +
facet_wrap(~ condition_2) +
#facet_grid(condition_1 ~ condition_2) + # alternative to facet_wrap if you have overlapping conditions
geom_hline(yintercept = 0, linetype='dashed') +
geom_vline(xintercept = 0, linetype='dashed') +
scale_color_manual('Signficant\nIsoform Switch', values = c('black','red')) +
labs(x='Gene log2 fold change', y='dIF') +
theme_bw()
Here, it is clear that changes in gene expression and isoform switches are not necessarily mutually exclusive, as there are many genes which are both differentially expressed (large gene log2FC) and contain isoform switches (color). This also highlights the importance of also analyzing RNA-seq data at isoform-level resolution.
Back to Table of Content.
Before you start on this section, we recommend you to read though the Quick Start section, as it gives the large overview, introduces some basic concepts and gives a couple of tips not repeated here.
Compared to the workflow presented in Quick Start, a full workflow for analyzing isoform switches has many sub-steps which each can be customized/optimized. In this section, we will go more into depth with each of the steps as well as provide tips and shortcuts for working with IsoformSwitchAnalyzeR. Specifically, each of the main functions behind these steps will be described and illustrated. For a more comprehensive and detailed description of each individual function please refer to the individual function documentation build into the R package (easily accessed via ?functionName).
The first section contains IsoformSwitchAnalyzeR Background Information, followed by a detailed isoform switch analysis workflow and lastly an overview of useful Other Tools in IsoformSwitchAnalyzeR is provided.
The detailed workflow consists of the following steps (illustrated in Figure 2) which, just like before, can be divided into two parts:
Part 1) Extract Isoform Switches and Their Sequences.
This corresponds to running the following functions in sequential order (which incidentally is just what the isoformSwitchAnalysisPart1() function does):
### Import data and create SwitchAnalyzeRlist
mySwitchList <- importCufflinksData() # OR
mySwitchList <- importIsoformExpression() # OR
mySwitchList <- importRdata()
### Run analysis
mySwitchList <- preFilter( mySwitchList )
mySwitchList <- isoformSwitchTestDRIMSeq( mySwitchList ) # OR
mySwitchList <- isoformSwitchTest( mySwitchList )
mySwitchList <- analyzeORF( mySwitchList )
mySwitchList <- extractSequence( mySwitchList )
Part 2) Plot All Isoform Switches and Their annotation.
This corresponds to running the following functions in sequential order (just like the isoformSwitchAnalysisPart2() function does):
### Add annotation
mySwitchList <- analyzeCPAT( mySwitchList )
mySwitchList <- analyzePFAM( mySwitchList )
mySwitchList <- analyzeSignalP( mySwitchList )
mySwitchList <- analyzeAlternativeSplicing( mySwitchList )
### Analyse consequences
mySwitchList <- analyzeSwitchConsequences( mySwitchList )
### Visual analysis
# Indiviudal switches
switchPlotTopSwitches( mySwitchList )
# global consequence analysis
extractConsequenceSummary( mySwitchList )
extractConsequenceEnrichment( mySwitchList )
extractConsequenceGenomeWide( mySwitchList )
# global splicing analysis
extractSplicingSummary( mySwitchList )
extractSplicingEnrichment( mySwitchList )
extractSplicingGenomeWide( mySwitchList )
The combined workflow is visualized here:
The switchAnalyzeRlist object is created to specifically contain and summarize all relevant information about the isoforms involved in isoform switches. The switchAnalyzeRlist object is a named list, meaning each entry in the list can be accessed by its name via the ‘$’ symbol or by using “[[‘entryName’]]”. A newly created switchAnalyzeRlist object contains 6 entries, and as the isoforms are gradually annotated and analyzed more entries are added.
data("exampleSwitchList") # A newly created switchAnalyzeRlist + switch analysis
names(exampleSwitchList)
#> [1] "isoformFeatures" "exons" "conditions"
#> [4] "designMatrix" "isoformCountMatrix" "sourceId"
#> [7] "isoformSwitchAnalysis"
data("exampleSwitchListAnalyzed") # A fully analyzed switchAnalyzeRlist
names(exampleSwitchListAnalyzed)
#> [1] "isoformFeatures" "exons"
#> [3] "conditions" "sourceId"
#> [5] "orfAnalysis" "domainAnalysis"
#> [7] "signalPeptideAnalysis" "AlternativeSplicingAnalysis"
#> [9] "switchConsequence"
The first entry ‘isoformFeatures’ is a data.frame where all relevant data about each comparison of an isoform (between conditions), as well as the analysis performed and annotation incooperate via IsoformSwitchAnalyzeR, is stored. Amongst the default information is isoform and gene id, gene and isoform expression as well as the isoform_switch_q_value and isoform_switch_q_value where the result of the differential isoform analysis is stored. The comparisons made can be identified as “from ‘condition_1’ to ‘condition_2’”, meaning ‘condition_1’ is considered the ground state and ‘condition_2’ the changed state. This also means that a positive dIF value indicates that the isoform usage is increased in ‘condition_2’ compared to ‘condition_1’. Since the ‘isoformFeatures’ entry is the most relevant part of the switchAnalyzeRlist object, the most-used standard methods have also been implemented to work directly on isoformFeatures.
### Preview
head(exampleSwitchList, 2)
#> iso_ref gene_ref isoform_id gene_id
#> 19 isoComp_00000007 geneComp_00000005 TCONS_00000007 XLOC_000005
#> 22 isoComp_00000008 geneComp_00000005 TCONS_00000008 XLOC_000005
#> condition_1 condition_2 gene_name nearest_ref_id class_code
#> 19 hESC Fibroblasts <NA> uc009vjk.2 =
#> 22 hESC Fibroblasts <NA> uc001aau.2 =
#> TSS_group_id length locus gene_status gene_overall_mean
#> 19 TSS2 2750 chr1:322036-328580 OK 193.8339
#> 22 TSS3 4369 chr1:322036-328580 OK 171.6605
#> gene_value_1 gene_value_2 gene_stderr_1 gene_stderr_2
#> 19 696.704 48.0566 3.592857 2.307488
#> 22 696.704 48.0566 3.592857 2.307488
#> gene_log2_fold_change gene_p_value gene_q_value gene_significant
#> 19 -3.85774 2.66665e-09 3.20379e-08 yes
#> 22 -3.85774 2.66665e-09 3.20379e-08 yes
#> iso_status iso_overall_mean iso_value_1 iso_value_2 iso_stderr_1
#> 19 OK 372.3803 358.383 29.28480 2.091049
#> 22 OK 372.3803 338.308 5.01291 1.322809
#> iso_stderr_2 iso_log2_fold_change iso_p_value iso_q_value
#> 19 17.489950 -3.61328 4.83698e-05 0.000548629
#> 22 6.999295 -6.07655 2.64331e-03 0.015898000
#> iso_significant IF_overall IF1 IF2 dIF
#> 19 yes 0.5618896 0.5143978 0.6093814 0.09498364
#> 22 yes 0.2949481 0.4855835 0.1043126 -0.38127092
#> isoform_switch_q_value gene_switch_q_value
#> 19 NA 1
#> 22 NA 1
# tail(exampleSwitchList, 2)
### Dimentions
dim(exampleSwitchList$isoformFeatures)
#> [1] 259 38
nrow(exampleSwitchList)
#> [1] 259
ncol(exampleSwitchList)
#> [1] 38
dim(exampleSwitchList)
#> [1] 259 38
A very useful functionality implemented in IsoformSwitchAnalyzeR is the subsetSwitchAnalyzeRlist() function, which allows for removal of isoforms and all their associated information across all entries in a switchAnalyzeRlist. The function subsets the switchAnalyzeRlist based on a vector of logicals matching the isoformFeatures entry of the list.
exampleSwitchList
#> This switchAnalyzeRlist list contains:
#> 259 isoforms from 84 genes
#> 1 comparison from 2 conditions
#>
#> Switching features:
#> Comparison switchingIsoforms switchingGenes
#> 1 hESC vs Fibroblasts 0 0
#>
#> Feature analyzed:
#> [1] "Isoform Swich Identification"
### Subset
subsetSwitchAnalyzeRlist(
exampleSwitchList,
exampleSwitchList$isoformFeatures$gene_name == 'ARHGEF19'
)
#> This switchAnalyzeRlist list contains:
#> 3 isoforms from 1 genes
#> 1 comparison from 2 conditions
#>
#> Switching features:
#> Comparison switchingIsoforms switchingGenes
#> 1 hESC vs Fibroblasts 0 0
#>
#> Feature analyzed:
#> [1] "Isoform Swich Identification"
Transcript structure information is stored in the exon entry of the switchAnalyzeRlist and contains the genomic coordinates for each exon in each isoform, and a column indicating which isoform it originates from. This information is stored as GenomicRanges (GRanges), which is very useful for overlapping genomic features and interacting with other Bioconductor packages.
head(exampleSwitchList$exons,2)
#> GRanges object with 2 ranges and 2 metadata columns:
#> seqnames ranges strand | isoform_id gene_id
#> <Rle> <IRanges> <Rle> | <character> <character>
#> [1] chr1 322037-322228 + | TCONS_00000007 XLOC_000005
#> [2] chr1 324288-324345 + | TCONS_00000007 XLOC_000005
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths
A full description of the initial switchAnalyzeRlist can be found via ?switchAnalyzeRlist and each additional entry added is described in the “value” part of the documentation for the specific function adding the entry.
Back to Table of Content
With a few exceptions, all functions in IsoformSwitchAnalyzeR can be divided into 6 groups sharing the first part of the function name (very usefull with RStudios autocompletion functionality (via the tab botton)):
The exceptions to these categories are the following:
createSwitchAnalyzeRlist()
preFilter()
Where:
Some additional tools that are essential for the workflow are omitted form this list (more about them later).
Note that we have tried to be systematic with function argument names for cutoffs, meaning that if the argument name contains “cutoff” the value supplied is not included, whereas if it contains “min” or “max” the given value is included.
Back to Table of Content
IsoformSwitchAnalyzeR can analyze the output from any tool that quantifies (de-novo/guided deconvoluted) full-length isoforms. All it requires are the following three sets of data:
Furthermore it is highly recommended to also obtain estimated replicate abundances and use them in the construction of the switchAnalyzeRlist as they better allow for effect size estimates.
From these data, a switchAnalyzeRlist (see The switchAnalyzeRlist for description) can be constructed. To facilitate the usage of IsoformSwitchAnalyzeR, several dedicated wrappers for constructing the switchAnalyzeRlist from different data sources are included in IsoformSwitchAnalyzeR. See the following sections for details about the specific import methods:
The data from Cufflinks/Cuffdiff is of special interest because Cufflinks/CuffDiff is amongst the most comprehensive tools for analyzing RNA-seq data with isoform resolution. Furthermore, Cuffdiff also supports identification of isoform switching by: a) having a test of isoform switches amongst isoforms with shared promoters (isoforms from same pre-mRNA), b) giving better estimates of gene and isoform expression uncertainties (variance) than estimated from the raw data itself. These uncertainties can be utilized with one of the isoform switch tests (at isoform level) (see Identifying Isoform Switches).
There are two ways of obtaining a switchAnalyzeRlist from cufflinks.
importCufflinksCummeRbund()
importCufflinksFiles()
Creating a switchAnalyzeRlist via cummeRbund is a two-step process. First use the cummeRbund function readCufflinks() to create the SQL backend (which in the following example is done by prepareCuffExample() on the cummeRbund example data).
cuffDB <- prepareCuffExample()
cuffDB
#> CuffSet instance with:
#> 3 samples
#> 400 genes
#> 1203 isoforms
#> 662 TSS
#> 906 CDS
#> 1062 promoters
#> 1986 splicing
#> 990 relCDS
This SQL backend is also extremely useful since it, via other functions implemented in cummeRbund, allows for a lot of (necessary!) quality control (QC) and exploratory data analysis (EDA). See section 5 in the cummeRbund manual for more information).
Now that we have the SQL connection, simply use importCufflinksCummeRbund() to create the switchAnalyzeRlist.
aSwitchList <- importCufflinksCummeRbund(cuffDB)
#> Reading cuffDB, isoforms...
#> Reading cuffDB, exons...
#> Analyzing cufflinks annotation problem...
#> Fixing cufflinks annotation problem...
#> Cufflinks annotation problem was fixed for 65 Cuff_genes
#> Extracting analysis of alternative splicing...
#> Making IsoformSwitchAanalyzeRlist...
#> Done
aSwitchList
#> This switchAnalyzeRlist list contains:
#> 1203 isoforms from 468 genes
#> 3 comparison from 3 conditions
#>
#> Switching features:
#> Comparison switchingIsoforms switchingGenes
#> 1 hESC vs Fibroblasts 0 0
#> 2 iPS vs Fibroblasts 0 0
#> 3 iPS vs hESC 0 0
#>
#> Feature analyzed:
#> [1] "Isoform Swich Identification"
An alternative way of importing data from Cufflinks/Cuffdiff is to directly use the files outputted by Cuffdiff. This can be done with the importCufflinksFiles() function. In the example below, we link to the example files in the cummeRbund package via the system.file() function. To analyze your own data, simply supply the (full) path to the required files as a string (do not use the system.file() function).
### Please note
# The way of importing file in this example with
# "system.file('pathToFile', package="cummeRbund") is
# specialiced to access the example data in the cummeRbund package
# and not something you need to do - just supply the string e.g.
# "/myAnnotation/myGenome/myFile.filetype" to the argument
testSwitchList <- importCufflinksFiles(
pathToGTF = system.file('extdata/chr1_snippet.gtf', package = "cummeRbund"),
pathToGeneDEanalysis = system.file('extdata/gene_exp.diff', package = "cummeRbund"),
pathToIsoformDEanalysis = system.file('extdata/isoform_exp.diff', package = "cummeRbund"),
pathToGeneFPKMtracking = system.file('extdata/genes.fpkm_tracking', package = "cummeRbund"),
pathToIsoformFPKMtracking = system.file('extdata/isoforms.fpkm_tracking', package = "cummeRbund"),
pathToIsoformReadGroupTracking = system.file('extdata/isoforms.read_group_tracking', package = "cummeRbund"),
pathToSplicingAnalysis = system.file('extdata/splicing.diff', package = "cummeRbund"),
pathToReadGroups = system.file('extdata/read_groups.info', package = "cummeRbund"),
pathToRunInfo = system.file('extdata/run.info', package = "cummeRbund"),
fixCufflinksAnnotationProblem=TRUE,
quiet=TRUE
)
testSwitchList
#> This switchAnalyzeRlist list contains:
#> 1203 isoforms from 468 genes
#> 3 comparison from 3 conditions
#>
#> Switching features:
#> Comparison switchingIsoforms switchingGenes
#> 1 hESC vs Fibroblasts 0 0
#> 2 iPS vs Fibroblasts 0 0
#> 3 iPS vs hESC 0 0
#>
#> Feature analyzed:
#> [1] "Isoform Swich Identification"
Note that both cufflinks import functions, per default:
The resulting switchAnalyzeRlist can then be used with the rest of the isoform switch analysis pipeline. The next step is typically Filtering.
While RSEM is a well-established and robust isoform quantification tool, a new generation of tools often refered to as pseudo/quasi-aligners, such as Salmon and Kallisto, are on the rise due to their speed and increased accuracy. We therefore support analysis of quantifications made with all three tools by allowing for easy import of the isoform replicate count matrix via the importIsoformExpression() function.
This function is a wrapper for the tximport R package with extra functionalities. By using tximport, it allows the isoform counts to be estimated from the abundance estimates — which is recomended as it will incorporate the bias correction performed by the tools into the switch identification. The importIsoformExpression function can furthermore perform an inter-library normalization of the abundance estimates which is necessary for a fair comparison of libraries (for in-depth discussion of normalization and why it is needed, see section 2.7 of the edgeR user guide).
The output of importIsoformExpression is a normal list which contains both a count and an abundance matrix that can then easily be used to generate a switchAnalyzeRlist with the importRdata() function (as described in the next section).
As described above, all you need to create a switchAnalyzeRlist is:
Note that:
Once you have imported the isoform expression estimates (via importIsoformExpression() or otherwise) as well as the isoform structure and the isoform annotation into R a switchAnalyzeRlist can be constructed using the general-purpose wrapper:
importRdata()
All you need to supply to the importRdata() is the estimated replicate isoform count expression — optionally, the replicate isoform abundance estimates — the isoform annotation and a design matrix indicating which samples are from the same condition. The importRdata() function will then:
Note that:
To illustrate ho this is done lets first create som example data
### Construct data needed from example data in cummeRbund package
### (The recomended way of analyzing Cufflinks/Cuffdiff data is via importCufflinksCummeRbund()
### This is just an easy way to get some example data ).
cuffDB <- prepareCuffExample()
# Extract count matrix
isoRepCount <- repCountMatrix(isoforms(cuffDB))
isoRepCount$isoform_id <- rownames(isoRepCount)
# Extract design matrix
designMatrix <- cummeRbund::replicates(cuffDB)[,c('rep_name','sample_name')]
colnames(designMatrix) <- c('sampleID','condition')
# Extract isoform structure
localAnnotaion <- import(system.file("extdata/chr1_snippet.gtf", package="cummeRbund"))
localAnnotaion <- localAnnotaion[,c('transcript_id','gene_id')]
colnames(localAnnotaion@elementMetadata)[1] <- 'isoform_id'
### Please note
# 1) The way of importing the GTF file in this example with
# "system.file('pathToFile', package="cummeRbund") is
# specialized to access the example data in the cummeRbund package
# and not somhting you need to do - just supply the string e.g.
# "/myAnnotation/myGenome/gersionQuantified.gtf" to the import function
# 2) importRdata also supports direct import of a GTF file - just supply the
# string to the "isoformExonAnnoation"" argument
### Take a look at the data
head(isoRepCount, 2)
#> iPS_0 iPS_1 hESC_0 hESC_1 Fibroblasts_0
#> TCONS_00000001 0.0000 14.99310 0 0.000000 11.6729
#> TCONS_00000002 16.0613 8.06806 0 0.654207 0.0000
#> Fibroblasts_1 isoform_id
#> TCONS_00000001 12.1679 TCONS_00000001
#> TCONS_00000002 0.0000 TCONS_00000002
designMatrix
#> sampleID condition
#> 1 iPS_0 iPS
#> 2 iPS_1 iPS
#> 3 hESC_0 hESC
#> 4 hESC_1 hESC
#> 5 Fibroblasts_0 Fibroblasts
#> 6 Fibroblasts_1 Fibroblasts
head(localAnnotaion, 3)
#> GRanges object with 3 ranges and 2 metadata columns:
#> seqnames ranges strand | isoform_id gene_id
#> <Rle> <IRanges> <Rle> | <character> <character>
#> [1] chr1 11874-12227 + | TCONS_00000003 XLOC_000001
#> [2] chr1 12646-12697 + | TCONS_00000003 XLOC_000001
#> [3] chr1 13221-14409 + | TCONS_00000003 XLOC_000001
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths
From this we can create the switchAnalyzeRlist as follows:
### Create switchAnalyzeRlist
aSwitchList <- importRdata(
isoformCountMatrix=isoRepCount,
designMatrix=designMatrix,
isoformExonAnnoation=localAnnotaion, # alternatively supply a string that points directly to the GTF file
showProgress=FALSE
)
#> Step 1 of 5: Obtaining annotation...
#> Step 2 of 5: Calculating gene expression...
#> Step 3 of 5: Merging gene and isoform expression...
#> Step 4 of 5: Making comparisons...
#> Step 5 of 5: Making switchAnalyzeRlist object...
#> Done
aSwitchList
#> This switchAnalyzeRlist list contains:
#> 1203 isoforms from 400 genes
#> 3 comparison from 3 conditions
This switchAnalyzeRlist can then be used with the rest of the pipeline. The next step in a typical workflow Filtering.
Alternatives to using importRdata() are either importGTF() or createSwitchAnalyzeRlist().
If the importGTF() function is used it will generate a switchAnalyzeRlist containing all transcripts in the GTF file:
aSwitchList <- importGTF(pathToGTF = system.file("extdata/chr1_snippet.gtf", package = "cummeRbund"))
aSwitchList
#> This switchAnalyzeRlist list contains:
#> 1203 isoforms from 400 genes
#> 1 comparison from 2 conditions
#>
#> Feature analyzed:
#> [1] "ORFs"
head(aSwitchList,2)
#> iso_ref gene_ref isoform_id gene_id
#> 3 isoComp_00000001 geneComp_00000001 TCONS_00000001 XLOC_000001
#> 2 isoComp_00000002 geneComp_00000001 TCONS_00000002 XLOC_000001
#> condition_1 condition_2 gene_name class_code gene_overall_mean
#> 3 plaseholder1 plaseholder2 <NA> NA 0
#> 2 plaseholder1 plaseholder2 <NA> NA 0
#> gene_value_1 gene_value_2 gene_stderr_1 gene_stderr_2
#> 3 0 0 NA NA
#> 2 0 0 NA NA
#> gene_log2_fold_change gene_p_value gene_q_value iso_overall_mean
#> 3 0 1 1 0
#> 2 0 1 1 0
#> iso_value_1 iso_value_2 iso_stderr_1 iso_stderr_2 iso_log2_fold_change
#> 3 0 0 NA NA 0
#> 2 0 0 NA NA 0
#> iso_p_value iso_q_value IF_overall IF1 IF2 dIF isoform_switch_q_value
#> 3 1 1 NA NA NA NA NA
#> 2 1 1 NA NA NA NA NA
#> gene_switch_q_value PTC
#> 3 NA NA
#> 2 NA NA
head(aSwitchList$conditions,2)
#> condition nrReplicates
#> 1 plaseholder1 1
#> 2 plaseholder2 1
From above, it is observed that dummy variables have been inserted in both isoformFeatures and conditions entries of the switchAnalyzeRlist. This approach is well suited if you just want to annotate a transcriptome and are not interested in expression. If you are interested in expression estimates and isoform switches, it is probably easier to use importRdata.
Back to Table of Content
Once you have a switchAnalyzeRlist, there is a good chance that it contains a lot of genes/isoforms that are irrelevant for an analysis of isoform switches. Examples of such could be single isoform genes or non-expressed isoforms. These extra genes/isoforms will make the downstream analysis take (much) longer than necessary. Therefore we have implemented a pre-filtering step to remove these features before continuing with the analysus. Importantly, filtering can enhance the reliability of the downstream analysis as described in detail below.
By using preFilter() it is possible to remove genes and isoforms from all aspects of the switchAnalyzeRlist by filtering on:
Removal of single isoform genes is the default setting in preFilter() since these genes, per definition, cannot have changes in isoform usage.
Filtering on isoform expression allows removal of non-used isoforms that only appear in the switchAnalyzeRlist because they were in the isoform/gene annotation used. Furthermore, the expression filtering allows removal of lowly expressed isoforms where the expression levels might be untrustworthy. The filtering on gene expression allows for removal of lowly expressed genes which causes the calculations of the Isoform Fractions (IF) to become untrustworthy.
The filter on Isoform Fraction allows removal of isoforms that only contribute minimally to the gene expression thereby speeding up and simplifying the rest of the downstream analysis without losing important isoforms.
Filtering on unwanted isoform classes is only an option available if using data form Cufflinks/Cuffdiff since the Tuxedo workflow includes classification of transcript types. This allows for removal of, for example, transcripts classified as “Possible polymerase run-on fragment” or “Repeat”. See full description of Cufflinks/Cuffdiff transcript classification here
The preFilter() function is used as follows:
data("exampleSwitchList")
exampleSwitchList
#> This switchAnalyzeRlist list contains:
#> 259 isoforms from 84 genes
#> 1 comparison from 2 conditions
#>
#> Switching features:
#> Comparison switchingIsoforms switchingGenes
#> 1 hESC vs Fibroblasts 0 0
#>
#> Feature analyzed:
#> [1] "Isoform Swich Identification"
exampleSwitchListFiltered <- preFilter(exampleSwitchList, geneExpressionCutoff = 1, isoformExpressionCutoff = 0, removeSingleIsoformGenes = TRUE)
#> The fitering removed 52 ( 20.08% of ) transcripts. There is now 207 isoforms left
exampleSwitchListFilteredStrict <- preFilter(exampleSwitchList, geneExpressionCutoff = 10, isoformExpressionCutoff = 3, removeSingleIsoformGenes = TRUE)
#> The fitering removed 87 ( 33.59% of ) transcripts. There is now 172 isoforms left
Back to Table of Content
As identification of isoform switches are essential for IsoformSwitchAnalyzeR, multiple ways of identifying isoform switches are supported. Currently IsoformSwitchAnalyzeR directly supports three different approaches:
However, results of other tools which can test for differential isoform usage at gene and/or isoform level can also be used. See below in Testing Isoform Switches with other Tools.
No matter which method is used, all the downstream functionality in IsoformSwitchAnalyzeR (e.g. not only the test) uses two parameters to define a significant isoform switch:
This combination is used since a Q-value is only a measure of the statistical certainty of the difference between two groups and thereby does not reflect the effect size which is measured by the dIF values.
IsoformSwitchAnalyzeR supports tests for differential isoform usage performed both at isoform and gene resolution. Note, however, that for gene-level analysis you rely on cutoffs on dIF values for identifying the isoforms involved in a switch — something that, when compared to the isoform-level analysis, could give false positives. We therefore recommend the use of isoform-level analysis whenever possible.
A recent breakthrough in the analysis of isoform usage was when the DRIMSeq package (By Nowicka et al, see What To Cite — please remember to cite the tool) was updated to not only analyze genes for differential isoform usage but provide a statistical test for each isoform analyzed thereby allowing for identification of the exact isoforms involved in a switch.
IsoformSwitchAnalyzeR supports identification of isoform switches with DRIMSeq via the wrapper isoformSwitchTestDRIMSeq() which performs the full analysis of all isoforms and comparisons in switchAnalyzeRlist. Afterwards, the results are integrated back into the switchAnalyzeRlist which is returned to the user as usual. Note that we support both the use of the gene-level and isoform-level analysis performed by DRIMSeq as controlled by the ‘testIntegration’ argument. Using this argument, it is also possible to be even more stringent by requiring that both the gene- and isoform-level analysis must be significant.
The isoformSwitchTestDRIMSeq() function is used as follows:
# Load example data and pre-filter
data("exampleSwitchList")
exampleSwitchList <- preFilter(exampleSwitchList) # preFilter for fast runtime
#> The fitering removed 52 ( 20.08% of ) transcripts. There is now 207 isoforms left
# Perform test (takes ~10 sec)
exampleSwitchListAnalyzed <- isoformSwitchTestDRIMSeq(
switchAnalyzeRlist = exampleSwitchList,
testIntegration='isoform_only',
reduceToSwitchingGenes=TRUE
)
#> Step 1 of 6: Creating DM data object...
#> Step 2 of 6: Filtering DM data object...
#> Step 3 of 6: Estimating precision paramter (this may take a while)...
#> Step 4 of 6: Fitting linear models (this may take a while)...
#> Step 5 of 6: Testing pairwise comparison(s)...
#> Step 6 of 6: Preparing output...
#> Result added switchAnalyzeRlist
#> An isoform switch analysis was performed for 54 gene comparisons (87.1%).
#> Done
# Summarize switching features
extractSwitchSummary(exampleSwitchListAnalyzed)
#> Comparison nrIsoforms nrGenes
#> 1 hESC vs Fibroblasts 85 41
An important argument in isoformSwitchTestDRIMSeq() is the ‘reduceToSwitchingGenes’. When TRUE this arguments will casue the function to to reduce/subset the switchAnalyzeRlist to the genes which each contains at least one differential used isoform, as indicated by the alpha and dIFcutoff cutoffs. This option ensures the rest of the workflow runs significantly faster since isoforms from genes without isoform switching are not analyzed.
Please note:
The DRIMSeq approach utilizes Bayesian information sharing across genes/isoforms. We therfore recommend most users to use the DRIMSeq approach as it is much more sensitive when having a small number of samples. The exception to this recommendation is if a large number of samples (per condition) are analyzed. For such cases we recommend the statistical test implemented in IsoformSwitchAnalyzeR as it fast and the gain of the information sharing in DRIMSeq decreases as the number of samples increases.
IsoformSwitchAnalyzeR contains an implementation of a test for differential isoform usage (in the function isoformSwitchTest()) described in Vitting-Seerup et al (2017) (see What To Cite). Here, we will introduce two sections describing the test for differential isoform usage implemented in isoformSwitchTest(). First, the idea behind the test will be introduced and afterwards we will show how to use it (jump to Usage of The Statistical Test).
One of the better ways of describing isoform usage is as Isoform Fractions (IF). These are simply values describing how large a fraction of the expression of a given gene originate from a given isoform. So the Isoform Fraction of isoform x (\(IF_x\)) is given by: \(IF_x = i_x / g\) where \(i_x\) is the expression of isoform x and g is the parent gene expression. Note that the parent gene expression is the sum of the expression of all the associated isoforms. The difference between two conditions, condition 1 and 2, is then given by \(dIF= IF_2 - IF_1\).
The underlying idea for the statistical test implemented in IsoformSwitchAnalyzeR is:
More information can be found in the documentation page of isoformSwitchTest().
The test described above in The Idea Behind The Statistical Test section is implemented in IsoformSwitchAnalyzeR as the function:
isoformSwitchTest()
### Show arguments of function
args(isoformSwitchTest)
#> function (switchAnalyzeRlist, alpha = 0.05, dIFcutoff = 0.1,
#> reduceToSwitchingGenes = TRUE, calibratePvalues = TRUE, showProgress = FALSE,
#> quiet = FALSE)
#> NULL
If a single isoform from a given gene is termed significant, we define this gene as having an isoform switch, since if there is a change in the isoform usage of one isoform there must be a change in another (or multiple) isoforms compensating for the identified change. To extrapolate this to gene level, we define the q-value at the gene level (in the ‘gene_switch_q_value’ column) as equivalent to the smallest q-value of the associated isoforms (in the ’isoform_switch_q_value column).
An important argument in isoformSwitchTest() is the ‘reduceToSwitchingGenes’. When TRUE this arguments will casue the function to to reduce/subset the switchAnalyzeRlist to the genes which each contains at least one differential used isoform, as indicated by the alpha and dIFcutoff cutoffs. This option ensures the rest of the workflow runs significantly faster since isoforms from genes without isoform switching are not analyzed.
Note that if a switchAnalyzeRlist already contains the result of a switch test this will be completely overwritten by isoformSwitchTest().
# Load example data and prefilter
data("exampleSwitchList")
exampleSwitchList <- preFilter(exampleSwitchList) # preFilter for fast runtime
#> The fitering removed 52 ( 20.08% of ) transcripts. There is now 207 isoforms left
# Perfom test
exampleSwitchListAnalyzed <- isoformSwitchTest(exampleSwitchList)
#> Step 1 of 3: Filtering for eligible data...
#> Found 75 ( 36.23 %) of isoform comparisons eligable for switch analysis
#> Step 2 of 3: Testing isoform usage...
#> Step 3 of 3: Preparing output...
# Summarize swiching geatures
extractSwitchSummary(exampleSwitchListAnalyzed)
#> Comparison nrIsoforms nrGenes
#> 1 hESC vs Fibroblasts 1 1
As suggested by the Ferguson et al (2014) the p-value calibration is not always performed (even though calibratePvalues=TRUE, see details in ?isoformSwitchTest). It is therefore useful to be able to check whether the p-value calibration was performed. To this end extractCalibrationStatus() have been implemented.
extractCalibrationStatus(exampleSwitchListAnalyzed)
#> Comparison pvalsAreCalibrated
#> 1 hESC vs Fibroblasts TRUE
Per version 1.1.06 isoformSwitchTest() will only perform the callibarion if all conditions meet the criteria but for older versions one should be careful with comparing the two conditions since they have been treated differently (one callibrated, another not callibrated). To fix this problem we can simply turn the calibration off:
# Perfom test
exampleSwitchListAnalyzed <- isoformSwitchTest(exampleSwitchList, calibratePvalues = FALSE)
#> Step 1 of 3: Filtering for eligible data...
#> Found 75 ( 36.23 %) of isoform comparisons eligable for switch analysis
#> Step 2 of 3: Testing isoform usage...
#> Step 3 of 3: Preparing output...
# Summarize swiching geatures
extractSwitchSummary(exampleSwitchListAnalyzed)
#> Comparison nrIsoforms nrGenes
#> 1 hESC vs Fibroblasts 1 1
# check callibration status
extractCalibrationStatus(exampleSwitchListAnalyzed)
#> Comparison pvalsAreCalibrated
#> 1 hESC vs Fibroblasts FALSE
Back to Table of Content
IsoformSwichtAnalyzeR also supports the analysis of isoform switches found via other tools. All you need to do is to generate a switchAnalyzeRlist with the corresponding data via one of the import options (see Importing Data Into R) and then fill in either the isoform_switch_q_value or gene_switch_q_value columns in the isoformFeatures entry of the switchAnalyzeRlist with the multiple testing corrected p-values (q-values) from the testing tool.
If the external tool used has isoform resolution (one test per isoform), the q-values should be added to the isoform_switch_q_value column and the smallest q-value of a given gene (in a specific comparison) should be added to all the isoforms for that gene in the gene_switch_q_value column. If the external tool has lower resolution (e.g. pre-mRNA or gene level resolution) the q-values should only be added to the gene_switch_q_value column (and the isoform_switch_q_value should be set to NA).
In this way, IsoformSwitchAnalyzeR can also support non-isoform resolution testing — note however that:
Back to Table of Content
Once the isoform switches have been found, the next step is to annotate the isoforms involved in the isoform switches. The first step in such annotation is to obtain Open Reading Frames (ORFs).
If known annotated isoforms were quantified (i.e. you did not perform a (guided) de novo isoform reconstruction), we advide that you use the annotated CDS (Coding Sequence) form the annotation — which can be imported and integrated though one of the implemented import methods, see Importing Data Into R.
If you performed (guided) de-novo isoform reconstruction (isoform deconvolution), prediction of ORFs can be done with high accuracy from the transcript sequence alone (see supplementary data in Vitting-Seerup et al 2017 for benchmark). Such prediction can be done with:
analyzeORF()
This function utilizes the genomic coordinates of each transcript to extract the transcript nucleotide sequence from a reference genome (supplied via the genomeObject argument). In analyzeORF four different methods for predicting the ORF are implemented, suitable for different purposes and circumstances. The four methods are:
One important argument is the minORFlength, which will ensure that only ORFs longer than minORFlength nucleotides are annotated. Besides predicting the ORF, information about the most likely stop codon also allows for prediction of Pre-mature Termination Codon (PTC) and thereby sensitivity to degradation via the Nonsense Mediated Decay (NMD) pathway. This analysis is also implemented in the analyzeORF() function and is controlled by the PTCDistance argument.
Note that the ORF prediction can be integrated with both the CPAT results (via the removeNoncodinORFs paramter, see ?analyzeCPAT) as well as Pfam results (see Augmenting ORF Predictions with Pfam Results)
The analyzeORF() function can be used as follows:
### This example relies on the example data from the 'Usage of The Statistical Test' section above
### analyzeORF needs the genomic sequence to predict ORFs.
# These are readily available from Bioconductor as BSgenome orbjects:
# http://bioconductor.org/packages/release/BiocViews.html#___BSgenome
# Here we use Hg19 --- which can be downloaded by copy/pasting the following two lines into the R termminal:
# source("https://bioconductor.org/biocLite.R")
# biocLite("BSgenome.Hsapiens.UCSC.hg19")
library(BSgenome.Hsapiens.UCSC.hg19)
exampleSwitchListAnalyzed <- analyzeORF(exampleSwitchListAnalyzed, genomeObject = Hsapiens, showProgress=FALSE)
#> Step 1 of 3 : Extracting transcript sequences...
#> Step 2 of 3 : Locating ORFs of interest...
#> Step 3 of 3 : Scanning for PTCs...
#> 3 putative ORFs were identified and analyzed
#> Done
head(exampleSwitchListAnalyzed$orfAnalysis, 3)
#> isoform_id orfTransciptStart orfTransciptEnd orfTransciptLength
#> 1 TCONS_00003919 381 641 261
#> 2 TCONS_00003920 139 660 522
#> 3 TCONS_00003921 139 1335 1197
#> orfStarExon orfEndExon orfStartGenomic orfEndGenomic
#> 1 3 3 1354877 1354617
#> 2 2 3 1356385 1354617
#> 3 2 4 1356385 1354483
#> stopDistanceToLastJunction stopIndex PTC
#> 1 -313 0 FALSE
#> 2 -313 0 FALSE
#> 3 -447 0 FALSE
As seen above, genomic and transcript coordinates for ORF beginning and end, as well as PTC status, are added to the ‘orfAnalysis’ entry of the switchAnalyzeRlist.
If the user wants to use the longestAnnotated or mostUpstreamAnnoated methods, the analyzeORF() function requires known CDS to be supplied as described above. The CDS must be stored as a CDSSeq (see ?CDSSet) and a wrapper for downloading the CDS of the most frequently used datasets from UCSC genome browser is implemented in:
getCDS()
Now that we know the ORF of a transcript, we can obtain the corresponding protein amino acid sequence of the ORF simply by translating the nucleotide sequence of the ORF into amino acids. This opens the possibility for performing both internal and external sequence analysis which enables us to annotate the isoforms involved in isoform switches even further. To facilitate this we have implemented
extractSequence()
which allows for the extraction of both nucleotide and amino acid sequences from the switchAnalyzeRlist. To facilitate external sequence analysis extractSequence can output fasta files (one file per sequence type) and to facilitate internal sequence analysis the sequences can be added to the switchAnalyzeRlist. An example of the internal sequence analysis is a pairwise comparison of the ORFs in two switching isoforms (see Predicting Switch Consequences below).
### This example relies on the example data from the 'Analyzing Open Reading Frames' section above
exampleSwitchListAnalyzed <- extractSequence(
exampleSwitchListAnalyzed,
genomeObject = Hsapiens,
pathToOutput = '<insert_path>',
writeToFile=FALSE # to avoid output when running this example data
)
#> Step 1 of 3 : Extracting transcript nucleotide sequences...
#> Step 2 of 3 : Extracting ORF AA sequences...
#> Step 3 of 3 : Preparing output...
#> Done
head(exampleSwitchListAnalyzed$ntSequence,2)
#> A DNAStringSet instance of length 2
#> width seq names
#> [1] 1456 CTTTTCAGCAGCAGACACTCC...AATAAAGCCCTTCCTTCTAC TCONS_00003919
#> [2] 1475 CTTTTCAGCAGCAGACACTCC...AATAAAGCCCTTCCTTCTAC TCONS_00003920
head(exampleSwitchListAnalyzed$aaSequence,2)
#> A AAStringSet instance of length 2
#> width seq names
#> [1] 87 MAALRCTGLPPEDTCLPSSCW...ERPPYTWLQSEGMGLPWGFC TCONS_00003919
#> [2] 174 MDSQRPEPREEEEEEQELRWM...ERPPYTWLQSEGMGLPWGFC TCONS_00003920
Back to Table of Content
The two fasta files that are the output by extractSequence() (if writeToFile=TRUE) can be used as input, to amongst, others:
These three tools are the ones currently supported, but if you have additional ideas please do not hesitate to contact us as described in How To Get Help.
We suggest that the external sequence analysis tools are run as follows:
Back to Table of Content
After the external sequence analysis with CPAT, Pfam, and SignalP have been performed (please remember to cite as describe in What To Cite), the results can be extracted and incooperated with the switchAnalyzeRlist via (respectively)
analyzeCPAT()
analyzePFAM()
analyzeSignalP()
The functions are used as:
### Load test data (maching the external sequence analysis results)
data("exampleSwitchListIntermediary")
exampleSwitchListIntermediary
#> This switchAnalyzeRlist list contains:
#> 162 isoforms from 40 genes
#> 1 comparison from 2 conditions
#>
#> Switching features:
#> Comparison switchingIsoforms switchingGenes
#> 1 hESC vs Fibroblasts 83 40
#>
#> Feature analyzed:
#> [1] "Isoform Swich Identification, ORFs, ntSequence, aaSequence"
### Add PFAM analysis
exampleSwitchListAnalyzed <- analyzePFAM(
switchAnalyzeRlist = exampleSwitchListIntermediary,
pathToPFAMresultFile = system.file("extdata/pfam_results.txt", package = "IsoformSwitchAnalyzeR"),
showProgress=FALSE
)
#> Converting AA coordinats to transcript and genomic coordinats...
#> Added domain information to 140 (86.42%) transcripts
### Add SignalP analysis
exampleSwitchListAnalyzed <- analyzeSignalP(
switchAnalyzeRlist = exampleSwitchListAnalyzed,
pathToSignalPresultFile = system.file("extdata/signalP_results.txt", package = "IsoformSwitchAnalyzeR")
)
#> Added signal peptide information to 17 (10.49%) transcripts
### Add CPAT analysis
exampleSwitchListAnalyzed <- analyzeCPAT(
switchAnalyzeRlist = exampleSwitchListAnalyzed,
pathToCPATresultFile = system.file("extdata/cpat_results.txt", package = "IsoformSwitchAnalyzeR"),
codingCutoff = 0.725, # the coding potential cutoff we suggested for human
removeNoncodinORFs = TRUE # because ORF was predicted de novo
)
#> Added coding potential to 162 (100%) transcripts
exampleSwitchListAnalyzed
#> This switchAnalyzeRlist list contains:
#> 162 isoforms from 40 genes
#> 1 comparison from 2 conditions
#>
#> Switching features:
#> Comparison switchingIsoforms switchingGenes
#> 1 hESC vs Fibroblasts 83 40
#>
#> Feature analyzed:
#> [1] "Isoform Swich Identification, ORFs, ntSequence, aaSequence, Protein Domains, Signal Peptides, Coding Potential"
Here, the pathTo<tool_name>resultFile points to the result files constructed as suggested in Advice for Running External Sequence Analysis Tools and in the detailed documentation of each of the individual functions. The part of the examples using system.file() is not nesseary when using your own data - just supply the path as a string (e.g. “/myFiles/myCPATresults.txt”)
Of particular interest is the removeNoncodinORFs argument in analyzeCPAT() since it allows integration of the CPAT and ORF analyses by removing ORFs from isoforms not predicted to be coding by CPAT. This can be particularly useful if isoforms and ORFs have been predicted de novo (both guided or non-guided). Note that if enabled (by setting to TRUE), it will effect all downstream analyses and plots as both analysis of protein domains and signal peptides require that ORFs are annotated (e.g. analyzeSwitchConsequences will not consider the domains (potentially) found by Pfam if the ORFs have been removed).
Back to Table of Content
Another type of annotation we easily can obtain, since we know the exon structure of all isoforms in a given gene (with isoform switching), is alternative splicing. Here intron retention events are of particular interest as a consequnce in isoform switches since they repressent the largest changes in isoforms. Identification of alternative splicing, alternative transcription start and termination sites can be obtained via the analyzeAlternativeSplicing().
### This example relies on the example data from the 'Importing External Sequence Analysis' section above
exampleSwitchListAnalyzed <- analyzeAlternativeSplicing(exampleSwitchListAnalyzed, quiet=TRUE)
### overview of number of intron retentions (IR)
table(exampleSwitchListAnalyzed$isoformFeatures$IR)
#>
#> 0 1 2
#> 135 25 2
Meaning 25 isoforms comntain one intron retention (IR) which two isoforms each contain two intron retentions.
IsoformSwitchAnalyzeR also supports many types of genone wide analysis of alternative splicing - for a more thorough walkthrough of the analysis of alternative splicing see the Analyzing Alternative Splicing workflow.
Back to Table of Content
If an isoform has a significant change in its contribution to gene expression, there must per definition be reciprocal changes in one (or more) isoforms in the opposite direction, compensating for the change in the first isoform. We utilize this by extracting the isoforms that are significantly differentially used and compare them to the isoforms that are compensating. Using all the information gathered through the workflow described above, the annotation of the isoform(s) used more (positive dIF) can be compared to the isoform(s) used less (negative dIF) and by systematically identify differences annotation we can identify potential function consequences of the isoform switch.
Specifically, IsoformSwitchAnalyzeR contains a function analyzeSwitchConsequences() which extracts the isoforms with significant changes in their isoform usage (defined by the alpha and dIFcutoff parameters, see Identifying Isoform Switches for details) and the isoform, with a large opposite change in isoform usage (also controlled via the dIFcutoff parameters) that compensate for the changes. Note that if an isoform-level test was not used, the gene is require to be significant (defined by the alpha paramter); but, isoforms are then selected purely based on their changes in dIF values.
These isoforms are then divided into the isoforms that increase their contribution to gene expression (positive dIF values larger than dIFcutoff) and the isoforms that decrease their contribution (negative dIF values smaller than -dIFcutoff). The isoforms with increased contribution are then (in a pairwise manner) compared to the isoform with decreasing contribution. In each of these comparisons the isoforms compared are analyzed for differences in their annotation (controlled by the consequencesToAnalyze parameter). Currently 22 different features of the isoforms can be compared, which include features such as intron retention, coding potential, NMD status, protein domains and the sequence similarity of the amino acid sequence of the annotated ORFs. For a full list, see “Details” on the documentation page for analyzeSwitchConsequences() (by running ?analyzeSwitchConsequences).
A more strict analysis can be performed by enabling the onlySigIsoforms argument, which causes analyzeSwitchConsequences() to only consider significant isoforms (defined by the alpha and dIFcutoff parameters) meaning the compensatory changes in isoform usage are ignored unless they themselves are significant.
The analysis of consequences can be performed as follows:
### This example relies on the example data from the 'Predicting Alternative Splicing' section above
# the consequences highlighted in the text above
consequencesOfInterest <- c('intron_retention','coding_potential','NMD_status','domains_identified','ORF_seq_similarity')
exampleSwitchListAnalyzed <- analyzeSwitchConsequences(
exampleSwitchListAnalyzed,
consequencesToAnalyze = consequencesOfInterest,
dIFcutoff = 0.4, # very high cutoff for fast runtimes
showProgress=FALSE
)
#> Step 1 of 4: Extracting genes with isoform switches...
#> Step 2 of 4: Analyzing 5 pairwise isoforms comparisons...
#> Step 3 of 4: Massaging isoforms comparisons results...
#> Step 4 of 4: Preparing output...
#> Identified genes with containing isoforms switching with functional consequences...
extractSwitchSummary(exampleSwitchListAnalyzed, dIFcutoff = 0.4, filterForConsequences = FALSE)
#> Comparison nrIsoforms nrGenes
#> 1 hESC vs Fibroblasts 12 7
extractSwitchSummary(exampleSwitchListAnalyzed, dIFcutoff = 0.4, filterForConsequences = TRUE)
#> Comparison nrIsoforms nrGenes
#> 1 hESC vs Fibroblasts 10 5
Meaning that ~5/7 genes with switches (with dIF > 0.4) have isoform switches with functional consequences. For larger dataset with default parameters this is typicall 1/3.
Please note that the analyzeSwitchConsequences() function contains many parameters and cutoffs for deciding when a difference in the annotation is sufficiently different (e.g. supplying ‘ntCutoff=50’ ensures that differences in nucleotide lengths of two ORF are larger than 50 nucleotides — not just 3 nucleotides (one codon)).
Back to Table of Content
Now that we have a genome-wide characterization of isoform switches with potential consequences there are a lot of possibilites for analyzing these. IsoformSwitchAnalyzeR currently directly supports two different types of post-analysis (of switches - see Genome-Wide Analysis of Alternative Splicing below for analysis of alternative splicing):
When analyzing the individual genes with isoform switches, the genes/isoforms with the largest changes in isoform usage (i.e. “most” switching genes/isoforms) are of particular interest. IsoformSwitchAnalyzeR can help you obtain these, either by sorting for the smallest q-values (obtaining the genes with the most significant switches) or the largest absolute dIF values (extracting the genes containing the switches wite the largest effect sizes (which are still significant)). Both methods are implemented at both genes and isoforms levels in the extractTopSwitches() function and the sorting algorithm is controlled via the sortByQvals argument.
### Load already analyzed data
data("exampleSwitchListAnalyzed")
### Let's reduce the switchAnalyzeRlist to only one condition
exampleSwitchListAnalyzedSubset <- subsetSwitchAnalyzeRlist(
exampleSwitchListAnalyzed,
exampleSwitchListAnalyzed$isoformFeatures$condition_1 == 'COAD_ctrl'
)
exampleSwitchListAnalyzedSubset
#> This switchAnalyzeRlist list contains:
#> 1000 isoforms from 369 genes
#> 1 comparison from 2 conditions
#>
#> Switching features:
#> Comparison switchingIsoforms switchingGenes
#> 1 COAD_ctrl vs COAD_cancer 692 369
#>
#> Feature analyzed:
#> [1] "Isoform Swich Identification, ORFs, Protein Domains, Signal Peptides, Alternative splicing, Switch Consequences, Coding Potential"
### Extract top switching genes (by q-value)
extractTopSwitches(
exampleSwitchListAnalyzedSubset,
filterForConsequences = TRUE,
n = 2,
sortByQvals = TRUE
)
#> gene_ref gene_id
#> 1 chr1:244998639..245008357,+COAD_ctrl chr1:244998639..245008357,+
#> 2 chr12:569543..671058,+COAD_ctrl chr12:569543..671058,+
#> gene_name condition_1 condition_2 gene_switch_q_value
#> 1 FAM36A COAD_ctrl COAD_cancer 5.344466e-06
#> 2 B4GALNT3 COAD_ctrl COAD_cancer 5.344466e-06
#> switchConsequencesGene Rank
#> 1 TRUE 1
#> 2 TRUE 2
### Extract top 2 switching genes (by dIF values)
extractTopSwitches(
exampleSwitchListAnalyzedSubset,
filterForConsequences = TRUE,
n = 2,
sortByQvals = FALSE
)
#> gene_id gene_name condition_1 condition_2
#> 1 chr10:70980059..71027314,+ HKDC1 COAD_ctrl COAD_cancer
#> 2 chr3:13573824..13679921,+ FBLN2 COAD_ctrl COAD_cancer
#> gene_switch_q_value combinedDIF switchConsequencesGene Rank
#> 1 7.463277e-06 1.216 TRUE 1
#> 2 5.344466e-06 1.167 TRUE 2
Note that by setting “n=NA” extractTopSwitches() will return a data.frame with all significant genes/isoforms - ranked as indicated by the sortByQvals argument.
Let us take a look at the switching isoforms in the ZAK gene (ranked 7 on list) which plays an important role cell cycle:
### Extract data.frame with all switching isoforms
switchingIso <- extractTopSwitches(
exampleSwitchListAnalyzedSubset,
filterForConsequences = TRUE,
n = NA, # n=NA: all features are returned
extractGenes = FALSE, # when FALSE isoforms are returned
sortByQvals = TRUE
)
subset(switchingIso, gene_name == 'ZAK')
#> iso_ref gene_ref isoform_id
#> 9 uc002uhz.2COAD_ctrl chr2:173940565..174132736,+COAD_ctrl uc002uhz.2
#> 237 uc002uhy.2COAD_ctrl chr2:173940565..174132736,+COAD_ctrl uc002uhy.2
#> gene_id gene_name condition_1 condition_2 IF1
#> 9 chr2:173940565..174132736,+ ZAK COAD_ctrl COAD_cancer 0.176
#> 237 chr2:173940565..174132736,+ ZAK COAD_ctrl COAD_cancer 0.181
#> IF2 dIF isoform_switch_q_value switchConsequencesGene
#> 9 0.408 0.232 5.344466e-06 TRUE
#> 237 0.024 -0.157 5.387221e-04 TRUE
#> comparison Rank
#> 9 COAD_ctrl_vs_COAD_cancer 9
#> 237 COAD_ctrl_vs_COAD_cancer 237
From which we can see the details of both isoforms (since we supplied “extractGenes = FALSE”) involved in the isoforms switch in ZAK. The isoform switch can visually analyzed using a switch plot which summarizes all the concatenated information about the isoforms. Specifically the switch plot is a composite plot visualizing:
switchPlot(exampleSwitchListAnalyzedSubset, gene = 'ZAK')
Note that since there is only one comparison in the switchAnalyzeRlist (after the subset), it is not necessary to specify the conditions (which could otherwise be done via the “condition1” and “condition2” arguments).
From this plot, we can easily see that the gene is not differentially expressed (bottom left), but there is an isoform switch where the long isoform is used more in COAD cancers than in normal controls. The switch from the short to the long isoform also results in the inclusion of a SAM_1 protein domain. Since the SAM_1 domain typically facilitates protein-protein interactions this switch might result in a ZAK protein which can interact with a different set of proteins in the cancer patients.
From this plot, we first note that the gene expression is not significantly changed (bottom left). Next, we see the large significant switch in isoform usage across conditions (bottom right). By comparing which isoforms are chaning (bottom right) to the isoform structure (top) it can be seen that the long isoform is used more in COAD cancers compared to healthy thissue. Interestingly, this isoform switch results in the inclusion of a SAM_1 protein domain. Since a SAM_1 domain typically facilitates protein-protein interactions this switch might result in a ZAK protein isoform which compared to the healthy tissue isoform can interact with a new/different set of proteins in the cancer patients.
One advantage of the Isoform Switch Analysis Plot is that it contains all information needed to judge the potential impact of an isoform switch. This also means a systematic screening of the top isoform switches can be made by generating this plot for each of those genes.
To facilitate such screening we have implemented switchPlotTopSwitches() which will extract the top n genes (controlled by the n argument) with significant switches (as defined by alpha and dIFcutoff) and output a pdf or png version of the corresponding switch plot to the directory given by the outputDestination argument. The function automatically ranks (by p-value or switch size) the switches and supports to either filter for isoform switches with predicted functional consequences or to output those with and those without consequences to separate folders.
switchPlotTopSwitches(
switchAnalyzeRlist = exampleSwitchListAnalyzed,
n = 10,
filterForConsequences = FALSE,
splitFunctionalConsequences = TRUE
)
If you want to output a plot for all significant genes simply set “n=NA”.
Back to Table of Content
A genome-wide analysis is both useful for getting an overview of the extent of isoform switching as well as discovering general patterns. IsoformSwitchAnalyzeR supports this by providing four different summaries/analyses for both the analysis of alternative splicing and isoform switches with predicted consequences. All functions provide a visual overview as well as a data.frame with the summary statistics. The four analysis types supported are:
Examples of all four types of analysis have been shown for isoform switches above in Examples of Switch Visualizations and the correspsonding analysis for alternative splicing can be found below in Genome-Wide Analysis of Alternative Splicing.
Back to Table of Content
Apart from the analysis presented above, IsoformSwitchAnalyzeR also contains a few other usefull tools which will be presented in this section.
The central visualization is the switch plot, created with switchPlot() function as shown above. This plot is made from four subplots, which each can be created individualy using:
switchPlotTranscript() # Visualizes the transcripts and their annotation
switchPlotGeneExp() # Visualizes the gene expression
switchPlotIsoExp() # Visualizes the isoform expression
switchPlotIsoUsage() # Visualizes the isoform usage
As illustrated here:
### Load already analyzed data
data("exampleSwitchListAnalyzed")
switchPlotTranscript(exampleSwitchListAnalyzedSubset, gene = 'TNFRSF1B')
switchPlotGeneExp (exampleSwitchListAnalyzedSubset, gene = 'TNFRSF1B', condition1='iPS', condition2='hESC')
switchPlotIsoExp(exampleSwitchListAnalyzedSubset, gene = 'TNFRSF1B', condition1='iPS', condition2='hESC')
switchPlotIsoUsage(exampleSwitchListAnalyzedSubset, gene = 'TNFRSF1B', condition1='iPS', condition2='hESC')
Another useful tool implemented is the isoformToGeneExp() function which takes a data.frame of isoform expression and summarizes it up to gene-level expression values. This function is quite useful in conjunction with the importIsoformExpression() function since the importIsoformExpression() function imports isoform-level data — which can then be summarized to gene level data with isoformToGeneExp(). This is quite analogous to what tximport can do — except it allows for incorporation of the additional features, such as inter-library normalization, implemented in importIsoformExpression().
The last tool currently built into IsoformSwitchAnalyzeR is the extractExpressionMatrix() function. The expression information stored in the switchAnalyzeRlist’s isoformFeatures is ideal for comparing multiple annotations in a specific comparison of two conditions, but is not well-suited for the comparison of multiple conditions. The extractExpressionMatrix() function solves this by converting the average expression information (gene expression, isoform expression or Isoform Fraction values — as controlled via the feature argument) into a matrix format as illustrated here:
data("exampleSwitchListIntermediary")
ifMatrix <- extractExpressionMatrix(exampleSwitchListAnalyzed)
head(ifMatrix)
#> COAD_ctrl LUAD_ctrl COAD_cancer LUAD_cancer
#> uc001aky.2 0.533 0.579 0.414 0.431
#> uc001ciu.2 0.546 0.535 0.354 0.393
#> uc001czi.3 0.665 0.214 0.385 0.418
#> uc001czm.3 0.199 0.697 0.464 0.408
#> uc001dck.2 0.659 0.874 0.380 0.538
#> uc001ebg.3 0.271 0.333 0.393 0.448
dim(ifMatrix)
#> [1] 139 4
Such a matrix can be used for global comparisons of multiple conditions and analysis of sample correlations:
# correlation plot
ggplot(melt(cor(ifMatrix)), aes(x=Var1, y=Var2, fill=value)) +
geom_tile() +
scale_fill_continuous('Correlation') +
labs(x='Condition', y='Condition') +
theme_minimal() +
theme(axis.text.x=element_text(angle=-90, hjust = 0, vjust=0.5)) # turn x-axis 90 degrees
Or expression (via a heatmap):
library(pheatmap)
pheatmap(t(ifMatrix), show_colnames=FALSE)
Back to Table of Content
Analysis of alternative splicing, here made possible by a liftover of functionalities from the now deprecated R package spliceR, is an important area of research. IsoformSwitchAnalyzeR is well suited for performing alternative splicing analysis since alterntive splicing litteraly will result in isoform switches. Please note that alternative splicing here is used as a term which covers both Alternative Splicing (AS), Alternative Transcription Start Sites (ATSS, sometimes called alternative first exon) as well as Alternative Transcription Start Sites (ATTS, sometimes called alternative last exon): although this is techncaiily incorrect it is here used for convenience.
The workflow implemented in IsoformSwitchAnalyzeR is mostly focused on identifying global changes in alternative splicing but also allows for analysis of individual isoforms/switches. A typical workflow consists of five steps. Please note that step 1-4 are included in the standard switch analysis workflow, meaning if you have already analyzed isoform switches with consequences you can skip 1-4. The five steps corresponds to using the following functions in succession:
### Step (1) Import data and create SwitchAnalyzeRlist
mySwitchList <- importCufflinksData() # OR
mySwitchList <- importIsoformExpression() # OR
mySwitchList <- importRdata()
### Step (2) Prefilter
mySwitchList <- preFilter( mySwitchList )
### Step (3) Identify differentially used isoforms
mySwitchList <- isoformSwitchTestDRIMSeq( mySwitchList, reduceToSwitchingGenes=FALSE) # OR
mySwitchList <- isoformSwitchTest( mySwitchList, reduceToSwitchingGenes=FALSE)
### Step (4) Analyze alternative splicing
mySwitchList <- analyzeAlternativeSplicing( mySwitchList )
### Step (5) Global splicing analysis
extractSplicingSummary( mySwitchList )
extractSplicingEnrichment( mySwitchList )
extractSplicingEnrichmentComparison( mySwitchList )
extractSplicingGenomeWide( mySwitchList )
Step 1-4 are shared with the standard switch analysis workflow which is describe above in the Detailed Workflow (specifically each step is described in Importing Data Into R, Filtering, Identifying Isoform Switches and Predicting Alternative Splicing) and will therefore not be repeated here. The only difference is that in some cases one might not want reduce the switchAnalyzeRList to only contain the genes which contain differentially used isoforms — as discussed below. This is done by setting reduceToSwitchingGenes=FALSE in the switch test.
The genome-wide analysis of splicing patterns is facilitated by the following four functions:
The extractSplicingSummary function does exactly what it says on the tin - it summarizes the number of events found in each comparison. To showcase it, let’s look at some already analyzed example data (meaning step 1-4 have already been done).
data("exampleSwitchListAnalyzed")
exampleSwitchListAnalyzed
#> This switchAnalyzeRlist list contains:
#> 1469 isoforms from 530 genes
#> 2 comparison from 4 conditions
#>
#> Switching features:
#> Comparison switchingIsoforms switchingGenes
#> 1 COAD_ctrl vs COAD_cancer 692 369
#> 2 LUAD_ctrl vs LUAD_cancer 422 218
#> 3 combined 1010 530
#>
#> Feature analyzed:
#> [1] "Isoform Swich Identification, ORFs, Protein Domains, Signal Peptides, Alternative splicing, Switch Consequences, Coding Potential"
Note that via the in “Feature analyzed” part of the summary we can see that the analysis of alternative splicing have been done and added to the switchAnalyzeRlist.
extractSplicingSummary(
exampleSwitchListAnalyzed,
asFractionTotal = FALSE,
plotGenes=FALSE
)
Note that it is possible to both summarize per gene or per isoform via the plotGenes argument and that the asFractionTotal argument enables analysis the fraction (if TRUE) or Number (if FALSE) of significant isoforms/genes which have a particular splice type.
From the summary above, it is quite clear that some of the alternative splicing events are not equally used - most prominently illustrated by the use of ATSS in COAD, where there is (a lot) more gain than losses. To formally analyze this type of uneven alternative splicing usage we have implemented extractSplicingEnrichment(). This function summarizes the uneven usage within each comparison by for each alternative splicing type calculate the fraction of events being gains (as opposed to loss) and perform a statistical analysis of this fraction.
splicingEnrichment <- extractSplicingEnrichment(
exampleSwitchListAnalyzed,
splicingToAnalyze='all',
returnResult=TRUE,
returnSummary=TRUE
)
From this plot it is clear to see that some alternative splicing types are preferentially used in the two cancer types, and from the statistical summary also created by extractSplicingEnrichment() (due to the returnSummary=TRUE argument):
subset(splicingEnrichment, splicingEnrichment$AStype == 'ATSS')
#> condition_1 condition_2 AStype nUp nDown propUp propUpCiLo
#> 3 COAD_ctrl COAD_cancer ATSS 234 39 0.8571429 0.8086495
#> 11 LUAD_ctrl LUAD_cancer ATSS 85 54 0.6115108 0.5248958
#> propUpCiHi propUpPval propUpQval Significant
#> 3 0.8952936 7.816835e-32 1.172525e-30 TRUE
#> 11 0.6918656 1.094134e-02 2.735335e-02 TRUE
#> Comparison
#> 3 COAD_ctrl\nvs\nCOAD_cancer
#> 11 LUAD_ctrl\nvs\nLUAD_cancer
We can indeed see that ATSS has a very significant skew in COAD. Note that the “splicingToAnalyze” argument implemented in all these summary functions enables analysis of only a subset of the splice types.
Another observation that can be made from the enrichment analysis above is that although the overall trend in usage of alternative splicing is the same the two cancer types there seem to be some differences. These differences can be analyzed using extractSplicingEnrichmentComparison() function which for each splice type, in a pairwise manner contrast the individual condition comparison, to assess whether the ratio of gains are indeed different acorss comparisons.
extractSplicingEnrichmentComparison(
exampleSwitchListAnalyzed,
splicingToAnalyze=c('A3','MES','ATSS'), # Those significant in COAD in the previous analysis
returnResult=FALSE # Preventing the summary statistics to be returned as a data.frame
)
And indeed only ATSS usage is significantly different between the two cancer types.
All of the above splicing analysis relies on the identification of pairs of isoforms involved in a switch, but the analysis of alternative splicing can also be done from the perspective of the individual splice types. This can be done by asking the question: How does the isoform usage of all isoforms utilizing a particular splicing type change - in other words is all isoforms or only only a subset of isoforms that are affected. This can be analysed with the extractSplicingGenomeWide() function.
extractSplicingGenomeWide(
exampleSwitchListAnalyzed,
featureToExtract = 'all', # all isoforms stored in the switchAnalyzeRlist
splicingToAnalyze = c('A3','MES','ATSS'), # Splice types significantly enriched in COAD
plot=TRUE,
returnResult=FALSE # Preventing the summary statistics to be returned as a data.frame
)
From which it can be seen that all isoforms using ATSS is generally used more, meaning it is a global phenomenon, whereas none of the other splice types are general (i.e. they seem to only happen in a specific subset of the data).
Please note that:
Back to Table of Content
The workflow described here is an extension of the workflow enabled in the analyzeCPAT() function via the “removeNoncodinORFs” argument which if TRUE removes ORF information from the isoforms where the CPAT analysis classifies them as non-coding. Here we will “rescue” the ORF of isoforms predicted to be noncoding by CPAT but which have a predicted protein domain. This workflow is an alternative apporach to setting “removeNoncodinORFs=TRUE” and should be used instead of that option.
Note that this is only recommended if ORFs were predicted (i.e. not imported from a GTF file). After this procedure, isoforms with ORFs will be isoforms with an ORF longer than minORFlength (if specified in analyzeORF) which are predicted to be coding by CPAT OR have a predicted protein domain (by Pfam).
Since the ORF information is stored in the ‘orfAnalysis’ analysis entry of the switchList we can remove it (by replacing it with NAs) as follows:
### Test data
data("exampleSwitchListAnalyzed")
exampleSwitchListAnalyzed
### Extract coding isoforms
nonCodingIsoforms <- unique(exampleSwitchListAnalyzed$isoformFeatures$isoform_id[
which( ! exampleSwitchListAnalyzed$isoformFeatures$codingPotential )
])
### Rescue those with protein domains
nonCodingIsoformsRescued <- setdiff(nonCodingIsoforms, exampleSwitchListAnalyzed$domainAnalysis$isoform_id)
# nr rescued
length(nonCodingIsoforms) - length(nonCodingIsoformsRescued)
### Remove noncoding isoforms ORF annotation
sum(is.na(exampleSwitchListAnalyzed$orfAnalysis$orfTransciptStart))
exampleSwitchListAnalyzed$orfAnalysis[
which( exampleSwitchListAnalyzed$orfAnalysis$isoform_id %in% nonCodingIsoformsRescued), 2:ncol(exampleSwitchListAnalyzed$orfAnalysis)
] <- NA
exampleSwitchListAnalyzed$isoformFeatures$PTC[which(exampleSwitchListAnalyzed$isoformFeatures$isoform_id %in% nonCodingIsoforms)] <- NA
sum(is.na(exampleSwitchListAnalyzed$orfAnalysis$orfTransciptStart))
Back to Table of Content
Recent research suggests that small upstream ORFs are far more frequent that previously assumed. It is therefore of particular interest to start analyzing these and here we have indirectly presented a tool which can do just so: analyzeORF().
Here we show how one could start such an analysis of small upstream ORFs.
# run ORF analysis on longest ORF
exampleSwitchListAnalyzed <- analyzeORF(exampleSwitchListAnalyzed, genomeObject = Hsapiens, method='longest')
mean(exampleSwitchListAnalyzed$orfAnalysis$orfTransciptLength)
# run ORF analysis on most upstream ORF
exampleSwitchListAnalyzed2 <- analyzeORF(exampleSwitchListAnalyzed, genomeObject = Hsapiens, orfMethod = 'mostUpstream', minORFlength = 50)
mean(exampleSwitchListAnalyzed2$orfAnalysis$orfTransciptLength)
# calculate pairwise difference
summary(
exampleSwitchListAnalyzed2$orfAnalysis$orfTransciptLength -
exampleSwitchListAnalyzed$orfAnalysis$orfTransciptLength[
match(exampleSwitchListAnalyzed2$orfAnalysis$isoform_id ,exampleSwitchListAnalyzed$orfAnalysis$isoform_id)
]
)
Back to Table of Content
The sequences stored in the SwitchAnalyzeRlist are not needed after consequences have been predicted and can thereby be removed, reducing the object size as well as loading/saving times. This has been done for the example dataset ‘exampleSwitchListAnalyzed’. This is simply done as follows
summary(exampleSwitchListIntermediary)
exampleSwitchListIntermediary$ntSequence <- NULL
exampleSwitchListIntermediary$aaSequence <- NULL
summary(exampleSwitchListIntermediary)
Back to Table of Content
There has been considerable debate about whether the default parameters for the codingPotential calculated for CPAT are too lenient (too low). This will always be a problem when having a single cutoff. One possible solution is to introduce an “unknown” class with medium size coding potential which we can then disregard.
Let’s start by looking at the distribution of coding potential values.
data("exampleSwitchListAnalyzed")
hist(exampleSwitchListAnalyzed$isoformFeatures$codingPotentialValue)
These coding potential values are summarized by the cutoff supplied in the ‘codingPotential’ column — which is what IsoformSwitchAnalyzeR uses in the downstream analysis.
### These are summarized by the cutoff supplied in the 'codingPotential' column --- which is what is used by IsoformSwitchAnalyzeR in the downstream analysis.
table(exampleSwitchListAnalyzed$isoformFeatures$codingPotential, exclude = NULL)
By simply setting the mid-range values to NA it will cause IsoformSwitchAnalyzeR to ignore them, thereby removing the isoforms with “unknown” coding potential. This can be done as follows:
exampleSwitchListAnalyzed$isoformFeatures$codingPotential <- NA
exampleSwitchListAnalyzed$isoformFeatures$codingPotential[which(exampleSwitchListAnalyzed$isoformFeatures$codingPotentialValue > 0.75)] <- TRUE
exampleSwitchListAnalyzed$isoformFeatures$codingPotential[which(exampleSwitchListAnalyzed$isoformFeatures$codingPotentialValue < 0.25)] <- FALSE
table(exampleSwitchListAnalyzed$isoformFeatures$codingPotential, exclude = NULL)
Back to Table of Content
As we have shown there are many problems with known CDS annotations (see supplementary data in Vitting-Seerup et al 2017) we have implemented two ways to ensure a high quality of the CDS imported from a GTF annotation file:
The difference between the isoforms involved in an isoform switch can arise by changes in three distinct biological mechanisms:
Since we how the structure of the isoforms involved in a isoform switch we can also analyze which (combination) of these biological mechanisms gives rise to the difference between the two isoforms involved in an isoform switch.
This is simply done by, in addition to running analyzeSwitchConsequences with the consequences you find interesting, making a separate consequence analysis of consequences (also with analyzeSwitchConsequences) where the consequences you analyze (supplied to the consequencesToAnalyze argument) are:
Then we can simply compare the result of this analysis to the isoform switches with consequences we already have identified to be of interest, and thereby identify which biological mechanisms give rise to the isoform switches with consequence you are interested in. One suggestion for such an analysis is illustrated here:
### Load example data
data("exampleSwitchListAnalyzed")
### Reduce datasize for fast runtime
selectedGenes <- unique(exampleSwitchListAnalyzed$isoformFeatures$gene_id)[50:100]
exampleSwitchListAnalyzedSubset <- subsetSwitchAnalyzeRlist(
exampleSwitchListAnalyzed,
exampleSwitchListAnalyzed$isoformFeatures$gene_id %in% selectedGenes
)
### analyze the biological mechanisms
bioMechanismeAnalysis <- analyzeSwitchConsequences(
exampleSwitchListAnalyzedSubset,
consequencesToAnalyze = c('tss','tts','intron_structure'),
showProgress = FALSE
)$switchConsequence # only the consequences are interesting here
### subset to those with differences
bioMechanismeAnalysis <- bioMechanismeAnalysis[which(bioMechanismeAnalysis$isoformsDifferent),]
### extract the consequences of interest already stored in the switchAnalyzeRlist
myConsequences <- exampleSwitchListAnalyzedSubset$switchConsequence
myConsequences <- myConsequences[which(myConsequences$isoformsDifferent),]
myConsequences$isoPair <- paste(myConsequences$isoformUpregulated, myConsequences$isoformDownregulated) # id for specific iso comparison
### Obtain the mechanisms of the isoform switches with consequences
bioMechanismeAnalysis$isoPair <- paste(bioMechanismeAnalysis$isoformUpregulated, bioMechanismeAnalysis$isoformDownregulated)
bioMechanismeAnalysis <- bioMechanismeAnalysis[which(bioMechanismeAnalysis$isoPair %in% myConsequences$isoPair),] # id for specific iso comparison
This result is best summarized in a Venn diagram:
### Create list with the isoPair ids for each consequencee
AS <- bioMechanismeAnalysis$isoPair[ which( bioMechanismeAnalysis$featureCompared == 'intron_structure')]
aTSS <- bioMechanismeAnalysis$isoPair[ which( bioMechanismeAnalysis$featureCompared == 'tss' )]
aTTS <- bioMechanismeAnalysis$isoPair[ which( bioMechanismeAnalysis$featureCompared == 'tts' )]
mechList <- list(
AS=AS,
aTSS=aTSS,
aTTS=aTTS
)
### Create venn diagram
library(VennDiagram)
#> Loading required package: futile.logger
myVenn <- venn.diagram(mechList, col='transparent', alpha=0.4, fill=RColorBrewer::brewer.pal(n=3,name='Dark2'), filename=NULL)
### Plot the venn diagram
grid.newpage() ; grid.draw(myVenn)
From this, we can see the relative importance of each of the three mechanisms, as well as the combination of these.
Back to Table of Content
Generally we do not recommend analysis of experiments without replicates as it is impossible to assess technical and biological variation. On the other hand, single-replciate pilot studies can be useful for planning larger experiments: hence this small guide. For running IsoformSwitchAnalyzeR without replicates, we have to solely rely on the dIF values to call isoform switches. This can be done by following the steps outlined in Detailed Workflow with a few modifications:
aSwitchList$isoformFeatures$gene_switch_q_value <- 1
Back to Table of Content
This problem can arrise from multiple sources but essentially means that not all isoforms quantified are found in the annotation or vise versa. This problem cannot be tolerated because it will lead to wrong estimations of Isoform Fractions if some isoforms were not quantifed or if it is the other way around we could not analyze the isoforms because the isoform structure is not annotated.
Currently we have documented this problem from two different sources:
Source 1: The GTF and FASTQ file used to quantify with Kallisto/Salmon/RSEM contain non-overlapping isoforms. We have found this when using data downloaded from Ensembl. Try gencode genes instead.
Source 2: When using Kallisto/Salmon/RSEM data older versions of tximport can cause this problm. To fix this problem you need to update the underlying package tximport. This can be done by copy/pasting this command into R:
devtools::install_github("mikelove/tximport")
Back to Table of Content
This error occures because there is differences in which isoforms are found in the annotation and which isoforms have been quantified. This problem have (so far) only been observed in data quantified with Salmon. It typically occres because during the index building Salmon collapses identical transcripts. A discussion with the Salmon authors of pros and cons of this behaviour can be found here.
We ask the user to make sure why this problem occures in the user-specific setting, and that removing the non-quantified isoforms is the desired solution, before continuing.
With this vignette, we hope to provide a thorough introduction to IsoformSwitchAnalyzeR and give some examples of what IsoformSwitchAnalyzeR can be used for.
We aim to continuously keep IsoformSwitchAnalyzeR up to date and continously improve it. The latter includes the integration of new tools as they are developed (isoform quantification, isoform switch identification, statistical ORF detection, external sequence analysis etc.), so please feel free to suggest new tools to us (see the How To Get Help section for info of how to get in contact). Tools should preferably either be light weight (runnable on an old laptop) and/or available via web-services as that will allow a larger audience to use it.
As the continued updates might change the behaviour of certain functions we encurage uses the read the NEWS file distributed with the R package whenever IsoformSwitchAnalyzeR is updated.
Back to Table of Content
sessionInfo()
#> R version 3.5.0 (2018-04-23)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 16.04.4 LTS
#>
#> Matrix products: default
#> BLAS: /home/biocbuild/bbs-3.7-bioc/R/lib/libRblas.so
#> LAPACK: /home/biocbuild/bbs-3.7-bioc/R/lib/libRlapack.so
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> attached base packages:
#> [1] grid stats4 parallel stats graphics grDevices utils
#> [8] datasets methods base
#>
#> other attached packages:
#> [1] VennDiagram_1.6.20 futile.logger_1.4.3
#> [3] pheatmap_1.0.8 BSgenome.Hsapiens.UCSC.hg19_1.4.0
#> [5] BSgenome_1.48.0 Biostrings_2.48.0
#> [7] XVector_0.20.0 IsoformSwitchAnalyzeR_1.2.0
#> [9] cummeRbund_2.22.0 Gviz_1.24.0
#> [11] rtracklayer_1.40.0 GenomicRanges_1.32.0
#> [13] GenomeInfoDb_1.16.0 IRanges_2.14.0
#> [15] S4Vectors_0.18.0 fastcluster_1.1.24
#> [17] reshape2_1.4.3 ggplot2_2.2.1
#> [19] RSQLite_2.1.0 BiocGenerics_0.26.0
#>
#> loaded via a namespace (and not attached):
#> [1] ProtGenerics_1.12.0 bitops_1.0-6
#> [3] matrixStats_0.53.1 bit64_0.9-7
#> [5] RColorBrewer_1.1-2 progress_1.1.2
#> [7] httr_1.3.1 rprojroot_1.3-2
#> [9] tools_3.5.0 backports_1.1.2
#> [11] R6_2.2.2 rpart_4.1-13
#> [13] Hmisc_4.1-1 DBI_0.8
#> [15] lazyeval_0.2.1 colorspace_1.3-2
#> [17] nnet_7.3-12 gridExtra_2.3
#> [19] prettyunits_1.0.2 bit_1.1-12
#> [21] curl_3.2 compiler_3.5.0
#> [23] Biobase_2.40.0 formatR_1.5
#> [25] htmlTable_1.11.2 DelayedArray_0.6.0
#> [27] labeling_0.3 scales_0.5.0
#> [29] checkmate_1.8.5 stringr_1.3.0
#> [31] digest_0.6.15 Rsamtools_1.32.0
#> [33] foreign_0.8-70 rmarkdown_1.9
#> [35] pkgconfig_2.0.1 base64enc_0.1-3
#> [37] dichromat_2.0-0 htmltools_0.3.6
#> [39] ensembldb_2.4.0 limma_3.36.0
#> [41] htmlwidgets_1.2 rlang_0.2.0
#> [43] rstudioapi_0.7 BiocParallel_1.14.0
#> [45] acepack_1.4.1 VariantAnnotation_1.26.0
#> [47] RCurl_1.95-4.10 magrittr_1.5
#> [49] GenomeInfoDbData_1.1.0 Formula_1.2-2
#> [51] Matrix_1.2-14 Rcpp_0.12.16
#> [53] munsell_0.4.3 stringi_1.1.7
#> [55] edgeR_3.22.0 MASS_7.3-50
#> [57] SummarizedExperiment_1.10.0 zlibbioc_1.26.0
#> [59] plyr_1.8.4 blob_1.1.1
#> [61] lattice_0.20-35 DRIMSeq_1.8.0
#> [63] splines_3.5.0 GenomicFeatures_1.32.0
#> [65] locfit_1.5-9.1 knitr_1.20
#> [67] pillar_1.2.2 biomaRt_2.36.0
#> [69] futile.options_1.0.1 XML_3.98-1.11
#> [71] evaluate_0.10.1 biovizBase_1.28.0
#> [73] latticeExtra_0.6-28 lambda.r_1.2.2
#> [75] data.table_1.10.4-3 gtable_0.2.0
#> [77] assertthat_0.2.0 AnnotationFilter_1.4.0
#> [79] survival_2.42-3 tibble_1.4.2
#> [81] GenomicAlignments_1.16.0 AnnotationDbi_1.42.0
#> [83] memoise_1.1.0 tximport_1.8.0
#> [85] cluster_2.0.7-1
Back to Table of Content