1 Introduction

signeRFlow is a shiny app that allows users to explore mutational signatures and exposures to related mutational processes. With the available modules, users are able to perform analysis on theirs own data applying different approaches, such as de novo and fitting. Also, there is a module to explore public datasets from TCGA.

1.1 Running shiny app

Start the app using either RStudio or a terminal:

signeRFlow()

The app will open on a new window or on a tab at your browser.

2 Modules

There are three available modules in the app:

  • signeR de novo: This module provides access to signeR de novo analysis to find signatures in your data, estimating both signatures and related exposures.
  • signeR fitting: This module provides access to signeR fitting analysis to find exposures to known signatures in your data, which can be uploaded or chosen from Cosmic database. Exposures are estimated and can be explored.
  • TCGA explorer: This module provides access to the results of signeR applications to 33 datasets from TCGA.

You can go through the modules independently by using the app sidebar.

2.1 signeR de novo

In this module, you can upload a SNV matrix with counts of mutations and execute the signeR de novo algorithm, which computes a Bayesian approach to the non-negative factorization (NMF) of the mutation counts in a matrix product of mutational signatures and exposures to mutational processes.

You can also provide a file with opportunities that are used as weights for the factorization. Further analysis parameters can be set, results can be visualized on different plots and found signatures can be compared to the ones in Cosmic database interactively.

2.1.1 Load data

You can upload a VCF file or a SNV matrix file (mandatory) with your own samples to use in signeR de novo module. You can upload an opportunity file as well or use a already built genome opportunity. Also, you can upload a BED file to build an opportunity matrix.

2.1.1.1 VCF or SNV Matrix

You can upload a VCF file or a SNV matrix file from your computer by clicking at the Browse button.

SNV matrix is a text file with a (tab-delimited) matrix of SNV counts found on analyzed genomes. It must contain one row for each genome sample and 97 columns, the first one with sample ids and, after that, one column for each mutation type. Mutations should be specified in the column names (headers), by both the base change and the trinucleotide context were it occurs (for example: C>A:ACA). The table below shows a example of the SNV matrix structure.

C>A:ACA C>A:ACC C>A:ACG C>A:ACT C>A:CCA T>G:TTT
PD3851a 31 34 9 21 24 21
PD3904a 110 91 9 87 108 77
PD3890a 122 112 13 107 99 50

If you want to upload a VCF file, you must select the genome build used on your variant calling analysis to allow signeR to generate a SNV matrix of counts. Also, you can generate a SNV matrix from a VCF file using the method:

genCountMatrixFromVcf

from signeR package. See the documentation for more details.

Warning:

You must have installed the genomes BSgenome.Hsapiens.UCSC.hg19 or BSgenome.Hsapiens.UCSC.hg38 from BSgenome package in order to use the VCF upload.

Columns:

The first column needs to contain the sample ID and other columns contain the 96 trinucleotide contexts.

Rows:

Each row contain the sample ID and the counts for each trinucleotide contexts.

Example file:

21 breast cancer

2.1.1.2 Opportunity matrix

You can upload an Opportunity matrix file or a BED file from your computer by clicking at the Browse button. Also, you can use a already built genome opportunity for human reference genomes. This is an optional file.

Opportunity matrix is a tab-delimited text file with a matrix of counts of trinucleotide contexts found in studied genomes. It must structured as the SNV matrix, with mutations specified on the head line (for each SNV count, the Opportunity matrix shows the total number of genomic loci where the refereed mutation could have occurred). The table below shows a example of the opportunity matrix structure.

366199887 211452373 45626142 292410567 335391892 239339768 50233875
202227618 116207171 25138239 161279580 184193767 131051208 177385805
225505378 130255706 28152934 179996700 206678032 147634427 199062504
425545790 245523433 53437284 339065644 389386002 278770926 375075216
452332390 259934779 55862550 361010972 412168035 292805460 396657807

If you want to upload a BED file, you must select the genome build used on your analysis to allow signeR to generate the opportunities for your regions file. Also, you can generate an opportunity matrix from the reference genome using the method:

genOpportunityFromGenome

from signeR package. See the documentation for more details.

Warning:

You must have installed the genomes BSgenome.Hsapiens.UCSC.hg19 or BSgenome.Hsapiens.UCSC.hg38 from BSgenome package in order to use the BED upload.

Columns:

There is no header in this file and each column represents a trinucleotide context.

Rows:

Each row contains the count frequency of the trinucleotides in the whole analyzed region for each sample.

Example file:

21 breast cancer

2.1.2 de novo analysis

There are some parameters that you can define before running the analysis by clicking at Start de novo analysis button: