appreci8R-package {appreci8R}R Documentation

appreci8R: an R/Bioconductor package for filtering SNVs and short indels with high sensitivity and high PPV

Description

The appreci8R is an R version of our appreci8-algorithm - A Pipeline for PREcise variant Calling Integrating 8 tools. Variant calling results of our standard appreci8-tools (GATK, Platypus, VarScan, FreeBayes, LoFreq, SNVer, samtools and VarDict), as well as up to 5 additional tools is combined, evaluated and filtered.

Details

Package: appreci8R
Type: Package
Title: appreci8R: an R/Bioconductor package for filtering SNVs and short indels with high sensitivity and high PPV
Version: 1.12.0
Author: Sarah Sandmann
Maintainer: Sarah Sandmann <sarah.sandmann@uni-muenster.de>
Description: The appreci8R is an R version of our appreci8-algorithm - A Pipeline for PREcise variant Calling Integrating 8 tools. Variant calling results of our standard appreci8-tools (GATK, Platypus, VarScan, FreeBayes, LoFreq, SNVer, samtools and VarDict), as well as up to 5 additional tools is combined, evaluated and filtered.
License: LGPL-3
Encoding: UTF-8
LazyData: true
Imports: shiny, shinyjs, DT, VariantAnnotation, BSgenome, BSgenome.Hsapiens.UCSC.hg19, TxDb.Hsapiens.UCSC.hg19.knownGene, Homo.sapiens, SNPlocs.Hsapiens.dbSNP144.GRCh37, XtraSNPlocs.Hsapiens.dbSNP144.GRCh37, rsnps, Biostrings, MafDb.1Kgenomes.phase3.hs37d5, MafDb.ExAC.r1.0.hs37d5, MafDb.gnomADex.r2.1.hs37d5, COSMIC.67, rentrez, PolyPhen.Hsapiens.dbSNP131, SIFT.Hsapiens.dbSNP137, seqinr, openxlsx, Rsamtools, stringr, utils, stats, GenomicRanges, S4Vectors, GenomicFeatures, IRanges, GenomicScores, SummarizedExperiment
Suggests: GO.db, org.Hs.eg.db
biocViews: VariantDetection, GeneticVariability, SNP, VariantAnnotation, Sequencing,
RoxygenNote: 6.0.1
git_url: https://git.bioconductor.org/packages/appreci8R
git_branch: RELEASE_3_14
git_last_commit: a9fde70
git_last_commit_date: 2021-10-26
Date/Publication: 2021-10-27

For the use of next-generation sequencing in clinical routine valid variant calling results are crucial. However, numerous variant calling tools are available. These tools usually differ in the variant calling algorithsms, the characteristics reported along with the varaint calls, the recommended filtration strategies for the raw calls and thus, also in the output. Especially when calling variants with a low variant allele frequency (VAF), perfect results are hard to obtain. High sensitivity is usually accompanied by low positive predictive value (PPV).

appreci8R is a package for combining and filterating the output of differen variant calling tools according to the 'appreci8'-algorithm. vcf as well as txt files containing variant calls can be evaluated. The number of variant calling tools to consider is unlimited (for the user interface version it is limited to 13). The final output contains a list of variant calls, classified as "probably true", "polymophism" or "artifact".

Important note: Currently, only hg19 is supported.

Index of help topics:

annotate                Annotate and filter calls
appreci8R-package       appreci8R: an R/Bioconductor package for
                        filtering SNVs and short indels with high
                        sensitivity and high PPV
appreci8Rshiny          A user interface to perform the whole
                        appreci8-analysis
combineOutput           Combine output of different variant calling
                        tools
determineCharacteristics
                        Determine characteristics of the calls
evaluateCovAndBQ        Evaluate coverage and base quality
filterTarget            Excludes all off-target calls from further
                        analysis.
finalFiltration         Perform final filtration according to the
                        appreci8-algorithm
normalize               Normalize calls

The package contains a function performing the whole analysis using a shiny user interface - appreci8Rshiny.

Additionally, seven individual functions for performing the seven analysis steps are available:

1) filterTarget: Exclude all off-target calls from further analysis.

2) normalize: Normalize calls with respect to reporting indels, MNVs, reporting of several alternate alleles and reporting of complex indels.

3) annotate: Annotate calls (using VariantAnnotation), and filter the output according to the locations and consequences of interest.

4) combineOutput: Combine output of the different variant calling tools.

5) evaluateCovAndBQ: Evaluate coverage and base quality (using Rsamtools), and filter calls with insufficient coverage and/or base quality.

6) determineCharacteristics: Determine characteristics of the calls, including database check-ups and impact prediction on protein level.

7) finalFiltration: Perform final filtration according to the appreci8-algorithm.

Author(s)

Sarah Sandmann

Maintainer: Sarah Sandmann <sarah.sandmann@uni-muenster.de>

References

More information on appreci8 can be found in our Bioinformatics paper: appreci8: A Pipeline for Precise Variant Calling Integrating 8 Tools https://doi.org/10.1093/bioinformatics/bty518.

More information on the performance of eight commonly used variant calling tools can be found in our Scientific Reports paper: Evaluating Variant Calling Tools for Non-Matched Next-Generation Sequencing Data https://www.nature.com/articles/srep43169

See Also

appreci8Rshiny, filterTarget, normalize, annotate, combineOutput, evaluateCovAndBQ, determineCharacteristics, finalFiltration

Examples

output_folder<-""

target<-bedFileWithTargetRegions
targetFiltered<-list()

caller_folder<-"/test/gatk/"
targetFiltered[[1]]<-filterTarget(output_folder, "GATK", caller_folder,
                                  ".rawMutations", ".vcf", TRUE, "", "")

caller_folder<-"/test/varscan/"
targetFiltered[[2]]<-filterTarget(output_folder, "VarScan", caller_folder,
                                  "", ".txt", FALSE, "_snvs", "_indels", 1 ,
                                  2 , 3, 4)

normalized<-list()
normalized[[1]]<-normalize(output_folder, "GATK", targetFiltered[[1]], FALSE,
                           FALSE)
normalized[[2]]<-normalize(output_folder, "VarScan", targetFiltered[[2]], TRUE,
                           FALSE)

annotated<-list()
annotated[[1]]<-annotate(output_folder, "GATK", normalized[[1]],
                         locations = c("coding","spliceSite"),
                         consequences = c("nonsynonymous","frameshift","nonsense"))
annotated[[2]]<-annotate(output_folder, "VarScan", normalized[[2]],
                         locations = c("coding","spliceSite"),
                         consequences = c("nonsynonymous","frameshift","nonsense"))

combined<-combineOutput(output_folder, c("GATK","VarScan"), annotated)

bam_folder<-"/test/alignment/"
filtered<-evaluateCovAndBQ(output_folder, combined, bam_folder)

databases<-determineCharacteristics(output_folder, filtered,
                                    predict = "Provean")

final<-finalFiltration(output_folder, frequency_calls = filtered,
                       database_calls = databases, combined_calls = combined,
                       damaging_safe = -3, tolerated_safe = -1.5, primer = NA,
                       hotspots = NA, overlapTools = c("VarScan"))


[Package appreci8R version 1.12.0 Index]