decomposeTumorGenomes {decompTumor2Sig} | R Documentation |
Decompose tumor genomes into mutational signatures
Description
'decomposeTumorGenomes()' is the core function of this package. It
decomposes tumor genomes into a given set of mutational signatures by
computing their contributions (exposures) to the mutational load via
quadratic programming. The function takes a set of mutational signatures
and the mutation features of one or more tumor genomes and computes
weights, i.e., contributions for each of the signatures in each
individual genome. Alternatively, the function can determine for each
genome only a subset of signatures whose contributions are sufficient
to exceed a user-given minimum threshold for the explained variance
of the genome's mutation patterns.
Usage
decomposeTumorGenomes(genomes, signatures, minExplainedVariance=NULL,
minNumSignatures=2, maxNumSignatures=NULL, greedySearch=FALSE,
constrainToMaxContribution=FALSE, tolerance=0.1, verbose=FALSE)
Arguments
genomes |
(Mandatory) Can be either a vector, a data frame or a
matrix (for an individual tumor genome), or a list of one of these
object types (for multiple tumors). Each tumor genome must be of the
same form as the signatures .
|
signatures |
(Mandatory) A list of vectors, data frames or matrices.
Each of the objects represents one mutational signature. Vectors are
used for Alexandrov signatures, data frames or matrices for Shiraishi
signatures.
|
minExplainedVariance |
(Optional) If NULL (default), exactly
maxNumSignatures (see below; default: all) will be taken for
decomposing each genome. If a numeric value between 0 and 1 is specified
for minExplainedVariance , for each genome the function will select
the smallest number of signatures which is sufficient to explain at least
the specified fraction of the variance of the genome's mutation patterns.
E.g., if minExplainedVariance =0.99 the smallest subset of
signatures that explains at least 99% of the variance is taken.
Please note: depending on the number of signatures, this may take quite
a while because by default for each number K of signatures, all possible
subsets composed of K signatures will be tested to identify the subset that
explains the highest part of the variance. If not enough variance is
explained, K will be incremented by one. Notes: 1) to speed up the search,
the parameters minNumSignatures , maxNumSignatures and
greedySearch can be used; 2) for genomes for which
none of the possible subsets of signatures explains enough variance, the
returned exposure vector will be set to NULL .
|
minNumSignatures |
(Optional) Used if minExplainedVariance is
specified (see above). To find the smallest subset of signatures which
explain the variance, at least this number of signatures will be taken. This
can be used to reduce the search space in a time-consuming search over a
large number of signatures.
|
maxNumSignatures |
(Optional) If minExplainedVariance is
specified to find the smallest subset of signatures which
explain the variance, at most maxNumSignatures will be taken. This
can be used to reduce the search space in a time-consuming search over a
large number of signatures. If minExplainedVariance is NULL ,
then exactly maxNumSignatures signatures will be used. The default
for maxNumSignatures is NULL (all signatures).
|
greedySearch |
(Optional) Used only in case minExplainedVariance
has been specified. If greedySearch is TRUE then not all
possible combinations of minNumSignatures to maxNumSignatures
signatures will be checked. Instead, first all possible combinations for
exactly minNumSignatures will be checked to select the best starting
set, then iteratively the next best signature will be added (maximum
increase in explained variability) until minExplainedVariance of the
variance can be explained (or maxNumSignatures is exceeded).
NOTE: this approximate search is highly recommended for large sets of
signatures (>15)!
|
constrainToMaxContribution |
(Optional) [Note: this is EXPERIMENTAL
and is usually not needed!] If TRUE , the maximum contribution that
can be attributed to a signature will be constraint by the variant feature
counts (e.g., specific flanking bases) observed in the individual tumor
genome. If, for example, 30% of all observed variants have a specific
feature and 60% of the variants produced by a mutational process/signature
will manifest the feature, then the signature can have contributed up to
0.3/0.6 (=0.5 or 50%) of the observed variants. The lowest possible
contribution over all signature features will be taken as the allowed
maximum contribution of the signature. This allowed maximum will
additionally be increased by the value specified as tolerance
(see below). For the illustrated example and tolerance =0.1 a
contribution of up to 0.5+0.1 = 0.6 (or 60%) of the signature would be
allowed.
|
tolerance |
(Optional) If constrainToMaxContribution is
TRUE , the maximum contribution computed for a signature is increased
by this value (see above). If the parameter constrainToMaxContribution
is FALSE , the tolerance value is ignored. Default: 0.1.
|
verbose |
(Optional) If TRUE some information about the
processed genome and used number of signatures will be printed.
|
Value
A list of signature weight vectors (also called 'exposures'), one
for each tumor genome. E.g., the first vector element of the first list
object is the weight/contribution of the first signature to the first
tumor genome. IMPORTANT: If minExplainedVariance
is specified, then
the exposures of a genome will NOT be returned if the minimum explained
variance is not reached within the requested minimum and maximum numbers
of signatures (minNumSignatures
and maxNumSignatures
)! The
corresponding exposure vector will be set to NULL
.
Author(s)
Rosario M. Piro, Politecnico di Milano
Sandra Krueger, Freie Universitaet Berlin
Maintainer: Rosario
M. Piro
E-Mail: <rmpiro@gmail.com> or <rosariomichael.piro@polimi.it>
References
http://rmpiro.net/decompTumor2Sig/
Krueger, Piro (2019) decompTumor2Sig: Identification of mutational
signatures active in individual tumors. BMC Bioinformatics
20(Suppl 4):152.
See Also
decompTumor2Sig
Examples
### get Alexandrov signatures from COSMIC
signatures <- readAlexandrovSignatures()
### load reference genome
refGenome <- BSgenome.Hsapiens.UCSC.hg19::BSgenome.Hsapiens.UCSC.hg19
### read breast cancer genomes from Nik-Zainal et al (PMID: 22608084)
gfile <- system.file("extdata",
"Nik-Zainal_PMID_22608084-VCF-convertedfromMPF.vcf.gz",
package="decompTumor2Sig")
genomes <- readGenomesFromVCF(gfile, numBases=3, type="Alexandrov",
trDir=FALSE, refGenome=refGenome, verbose=FALSE)
### compute exposures
exposures <- decomposeTumorGenomes(genomes, signatures, verbose=FALSE)
### (for further examples on searching subsets, please see the vignette)
[Package
decompTumor2Sig version 2.9.0
Index]