signeRFlow is a shiny app that allows users to explore mutational signatures and exposures to related mutational processes. With the available modules, users are able to perform analysis on theirs own data applying different approaches, such as de novo and fitting. Also, there is a module to explore public datasets from TCGA.
Start the app using either RStudio or a terminal:
The app will open on a new window or on a tab at your browser.
There are three available modules in the app:
You can go through the modules independently by using the app sidebar.
In this module, you can upload a SNV matrix with counts of mutations and execute the signeR de novo algorithm, which computes a Bayesian approach to the non-negative factorization (NMF) of the mutation counts in a matrix product of mutational signatures and exposures to mutational processes.
You can also provide a file with opportunities that are used as weights for the factorization. Further analysis parameters can be set, results can be visualized on different plots and found signatures can be compared to the ones in Cosmic database interactively.
You can upload a VCF file or a SNV matrix file (mandatory) with your own samples to use in signeR de novo module. You can upload an opportunity file as well or use a already built genome opportunity. Also, you can upload a BED file to build an opportunity matrix.
You can upload a VCF file or a SNV matrix file from your computer by clicking at the Browse button.
SNV matrix is a text file with a (tab-delimited) matrix of SNV counts found on analyzed genomes. It must contain one row for each genome sample and 97 columns, the first one with sample ids and, after that, one column for each mutation type. Mutations should be specified in the column names (headers), by both the base change and the trinucleotide context were it occurs (for example: C>A:ACA). The table below shows a example of the SNV matrix structure.
C>A:ACA | C>A:ACC | C>A:ACG | C>A:ACT | C>A:CCA | … | T>G:TTT | |
---|---|---|---|---|---|---|---|
PD3851a | 31 | 34 | 9 | 21 | 24 | … | 21 |
PD3904a | 110 | 91 | 9 | 87 | 108 | … | 77 |
… | … | … | … | … | … | … | … |
PD3890a | 122 | 112 | 13 | 107 | 99 | … | 50 |
If you want to upload a VCF file, you must select the genome build used on your variant calling analysis to allow signeR to generate a SNV matrix of counts. Also, you can generate a SNV matrix from a VCF file using the method:
from signeR package. See the documentation for more details.
Warning
:You must have installed the genomes
BSgenome.Hsapiens.UCSC.hg19
orBSgenome.Hsapiens.UCSC.hg38
fromBSgenome
package in order to use the VCF upload.
Columns:
The first column needs to contain the sample ID and other columns contain the 96 trinucleotide contexts.
Rows:
Each row contain the sample ID and the counts for each trinucleotide contexts.
Example file:
You can upload an Opportunity matrix file or a BED file from your computer by clicking at the Browse button. Also, you can use a already built genome opportunity for human reference genomes. This is an optional file.
Opportunity matrix is a tab-delimited text file with a matrix of counts of trinucleotide contexts found in studied genomes. It must structured as the SNV matrix, with mutations specified on the head line (for each SNV count, the Opportunity matrix shows the total number of genomic loci where the refereed mutation could have occurred). The table below shows a example of the opportunity matrix structure.
366199887 | 211452373 | 45626142 | 292410567 | 335391892 | 239339768 | … | 50233875 |
---|---|---|---|---|---|---|---|
202227618 | 116207171 | 25138239 | 161279580 | 184193767 | 131051208 | … | 177385805 |
225505378 | 130255706 | 28152934 | 179996700 | 206678032 | 147634427 | … | 199062504 |
425545790 | 245523433 | 53437284 | 339065644 | 389386002 | 278770926 | … | 375075216 |
452332390 | 259934779 | 55862550 | 361010972 | 412168035 | 292805460 | … | 396657807 |
If you want to upload a BED file, you must select the genome build used on your analysis to allow signeR to generate the opportunities for your regions file. Also, you can generate an opportunity matrix from the reference genome using the method:
from signeR package. See the documentation for more details.
Warning
:You must have installed the genomes
BSgenome.Hsapiens.UCSC.hg19
orBSgenome.Hsapiens.UCSC.hg38
fromBSgenome
package in order to use the BED upload.
Columns:
There is no header in this file and each column represents a trinucleotide context.
Rows:
Each row contains the count frequency of the trinucleotides in the whole analyzed region for each sample.
Example file:
There are some parameters that you can define before running the analysis by clicking at Start de novo analysis button: