1 Genotype cancer hotspots

cancerhotspots allows rapid genotyping of known somatic hotspots from the tumor BAM files. This facilitates to get a quick overlook of 3,181 known somatic hot-spots in a matter of minutes, without spending hours on variant calling and annotation. In simple words, it fetches nucleotide frequencies of known somatic hotspots and prioritizes them based on allele frequency.

Output includes a browsable HTML file with variants passing the VAF/read depth filters and, a TSV file including nucleotide counts of all variants analyzed.

library(maftools)
can_hs_tbl = maftools::cancerhotspots(
  bam = "Tumor.bam",
  refbuild = "GRCh37",
  mapq = 10,
  sam_flag = 1024
)
Input BAM file           :  Tumor.bam
Variants                 :  cancerhotspots_v2_GRCh37.tsv
VAF filter               :  0.050
min reads for t_allele   :  8
MAPQ filter              :  10
FLAG filter              :  1024
Coverage filter          :  30
HTSlib version           :  1.7

Processed 1000 entries..
Processed 2000 entries..
Processed 3000 entries..
Done!

Summary:
Total variants processed :  3181
Variants > 0.05 threshold:  3
Avg. depth of coverage   :  83.02
Output html report       :  Tumor.html
Output TSV file          :  Tumor.tsv

Above command generates an HTML report and a TSV file with the readcounts.

head(can_hs_tbl)

# loci fa_ref NT_change Hugo_Symbol Variant_Classification AA_change              Meta VAF A   T  G  C Ins Del
# 1: 1:2491289     NA       G>A    TNFRSF14      Missense_Mutation     C111Y    deleterious(0)   0 0   0 21  0   0   0
# 2: 1:2491290     NA       C>G    TNFRSF14      Missense_Mutation     C111W    deleterious(0)   0 0   0  0 21   0   0
# 3: 1:8073432     NA       T>G      ERRFI1      Missense_Mutation     K409N    deleterious(0)   0 1  64  0  1   0   0
# 4: 1:8073434     NA       T>G      ERRFI1      Missense_Mutation     K409Q deleterious(0.04)   0 0  63  0  0   0   0
# 5: 1:8074313     NA       T>A      ERRFI1      Nonsense_Mutation     K116*                     0 0 106  0  0   0   0
# 6: 1:9779982     NA       T>C      PIK3CD      Missense_Mutation     C416R   tolerated(0.26)   0 1  18  0  0   0   0

The tsv files generated by cancerhotspots() can be aggregated and converted into MAF with the function cancerhotspotsAggr().

CLI version of cancerhotspots can be found here

2 Fetch readcounts for targetted loci

bamreadcounts function extracts ATGC nucleotide distribution for targeted loci from the BAM files. The function name is an homage to bam-readcount tool and additionally supports INDELS.

#Generate a sample loci - first two columns must contain chromosome name and position 
loci = data.table::data.table(chr = c("seq1", "seq2"), pos = c(1340, 1483))
loci
##       chr   pos
##    <char> <num>
## 1:   seq1  1340
## 2:   seq2  1483

Get nucleotide frequency from BAM files

#Example BAM file from Rsamtools package
#By default position are assumed to be in 1-based coordinate system
bamfile = system.file("extdata", "ex1.bam", package = "Rsamtools") 
loci_rc = maftools::bamreadcounts(bam = bamfile, loci = loci) 

loci_rc
# $ex1
#         loci fa_ref A  T G  C Ins Del
# 1: seq1:1340     NA 1  0 0 62   0   0
# 2: seq2:1483     NA 0 13 0  0   0   0
sessionInfo()
## R version 4.4.0 beta (2024-04-15 r86425)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.4 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.19-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] maftools_2.20.0
## 
## loaded via a namespace (and not attached):
##  [1] digest_0.6.35      RColorBrewer_1.1-3 R6_2.5.1           fastmap_1.1.1     
##  [5] Matrix_1.7-0       xfun_0.43          lattice_0.22-6     splines_4.4.0     
##  [9] cachem_1.0.8       knitr_1.46         htmltools_0.5.8.1  rmarkdown_2.26    
## [13] lifecycle_1.0.4    cli_3.6.2          grid_4.4.0         sass_0.4.9        
## [17] data.table_1.15.4  jquerylib_0.1.4    compiler_4.4.0     tools_4.4.0       
## [21] evaluate_0.23      bslib_0.7.0        survival_3.6-4     yaml_2.3.8        
## [25] DNAcopy_1.78.0     rlang_1.1.3        jsonlite_1.8.8