library(maftools)
Note: This vignette was not evaluated.
maftools
provides a set of functions to facilitate copy
number analysis using ASCAT for tumor-normal or
tumor-only WGS datasets. Although there exists ascatNgs, it requires
the installation of Perl and C modules to fetch the read counts across
the markers. maftools
bypass these requirements entirely
within R with the C code baked in. However, maftools
only
generates the required read counts, BAF, and logR files. Downstream
analyses have to be done with ASCAT.
ASCAT is not available on CRAN or Bioconductor and needs to be installed from GitHub
::install_github(repo = 'VanLoo-lab/ascat/ASCAT') remotes
If you use maftools
functions for CNV analysis, please
cite the ASCAT publication
Van Loo P, Nordgard SH, Lingjærde OC, et al. Allele-specific copy number analysis of tumors. Proc Natl Acad Sci U S A. 2010;107(39):16910-16915. doi:10.1073/pnas.1009843107 |
Below command will generate two tsv files
tumor_nucleotide_counts.tsv
and
normal_nucleotide_counts.tsv
that can be used for
downstream analysis. Note that the function will process ~900K SNPs from
Affymetrix
Genome-Wide Human SNP 6.0 Array. The process can be sped up linearly
by increasing nthreads
which will launch each chromosome on
a separate thread. Currently hg19
and hg38
are
supported.
#Matched normal BAM files are strongly recommended
= maftools::gtMarkers(t_bam = "tumor.bam",
counts n_bam = "normal.bam",
build = "hg19")
prepAscat()
Below command takes tumor_nucleotide_counts.tsv
and
normal_nucleotide_counts.tsv
files, filter SNPs with low
coverage (default <15), estimate BAF, logR, and generates the input
files for ASCAT.
library(ASCAT)
= maftools::prepAscat(t_counts = "tumor_nucleotide_counts.tsv",
ascat.bc n_counts = "normal_nucleotide_counts.tsv",
sample_name = "tumor")
# Library sizes:
# Tumor: 1830168947
# Normal: 1321201848
# Library size difference: 1.385
# ------
# Counts file: tumor_nucleotide_counts.tsv
# Markers: 932148
# Removed 2982 duplicated loci
# Markers > 15: 928607
# ------
# Counts file: normal_nucleotide_counts.tsv
# Markers: 932148
# Removed 2982 duplicated loci
# Markers > 15: 928311
# ------
# Final number SNPs: 928107
# Generated following files:
# tumor_nucleotide_counts.tumour.BAF.txt
# tumor_nucleotide_counts.tumour.logR.txt
# tumor_nucleotide_counts.normal.BAF.txt
# tumor_nucleotide_counts.normal.logR.txt
# ------
Generated BAF and logR files can be processed with ASCAT functions. The below code chunk shows minimal usage with ASCAT. See here for further workflow examples.
= ASCAT::ascat.loadData(
ascat.bc Tumor_LogR_file = "tumor_nucleotide_counts.tumour.logR.txt",
Tumor_BAF_file = "tumor_nucleotide_counts.tumour.BAF.txt",
Germline_LogR_file = "tumor_nucleotide_counts.normal.logR.txt",
Germline_BAF_file = "tumor_nucleotide_counts.normal.BAF.txt",
chrs = c(1:22, "X", "Y"),
sexchromosomes = c("X", "Y")
)
::ascat.plotRawData(ASCATobj = ascat.bc, img.prefix = "tumor")
ASCAT= ASCAT::ascat.aspcf(ascat.bc)
ascat.bc ::ascat.plotSegmentedData(ascat.bc)
ASCAT= ASCAT::ascat.runAscat(ascat.bc) ascat.output
In tumor-only mode, read counts are normalized for median depth of coverage across autosomes.
= maftools::prepAscat_t(t_counts = "tumor_nucleotide_counts.tsv", sample_name = "tumor_only")
ascat.bc
# Library sizes:
# Tumor: 1830168947
# Counts file: tumor_nucleotide_counts.tsv
# Markers: 932148
# Removed 2982 duplicated loci
# Markers > 15: 928607
# Median depth of coverage (autosomes): 76
# ------
# Generated following files:
# tumor_only.tumour.BAF.txt
# tumor_only.tumour.logR.txt
# ------
The output logR and BAF files can be processed with ASCAT without matched normal data protocol:
= ASCAT::ascat.loadData(
ascat.bc Tumor_LogR_file = "tumor_only.tumour.logR.txt",
Tumor_BAF_file = "tumor_only.tumour.BAF.txt",
chrs = c(1:22, "X", "Y"),
sexchromosomes = c("X", "Y")
)
::ascat.plotRawData(ASCATobj = ascat.bc, img.prefix = "tumor_only")
ASCAT= ASCAT::ascat.predictGermlineGenotypes(ascat.bc)
ascat.gg = ASCAT::ascat.aspcf(ascat.bc, ascat.gg=ascat.gg)
ascat.bc ::ascat.plotSegmentedData(ascat.bc)
ASCAT= ASCAT::ascat.runAscat(ascat.bc) ascat.output
Alternatively, tumor logR files generated by
prepAscat()
/prepAscat_t()
can be processed
with segmentLogR()
function that performs circular binary
segmentation and returns the DNAcopy
object.
::segmentLogR(tumor_logR = "tumor.tumour.logR.txt", sample_name = "tumor")
maftools
# Analyzing: tumor
# current chromosome: 1
# current chromosome: 2
# current chromosome: 3
# current chromosome: 4
# current chromosome: 5
# current chromosome: 6
# current chromosome: 7
# current chromosome: 8
# current chromosome: 9
# current chromosome: 10
# current chromosome: 11
# current chromosome: 12
# current chromosome: 13
# current chromosome: 14
# current chromosome: 15
# current chromosome: 16
# current chromosome: 17
# current chromosome: 18
# current chromosome: 19
# current chromosome: 20
# current chromosome: 21
# current chromosome: 22
# current chromosome: MT
# current chromosome: X
# current chromosome: Y
# Segments are written to: tumor_only.tumour_cbs.seg
# Segments are plotted to: tumor_only.tumour_cbs.png
Mosdepth offers the
fastest way to estimate coverage metrics from WGS bam files. Output
generated by mosdepth can be processed with maftools function
plotMosdepth
and plotMosdepth_t
for CNV
analysis by performing segmentation and plotting.
Below mosdepth
command generates
tumor.regions.bed.gz
and normal.regions.bed.gz
that contains depth of coverage across the genome in fixed windows.
mosdepth -n -b 5000 tumor tumor.bam
mosdepth -n -b 5000 normal normal.bam
The output {prefix}.regions.bed.gz
can be imported and
analyzed with maftools
in tumor/normal or tumor only
mode.
If you use the functions for CNV analysis, please cite the mosdepth publication
Pedersen BS, Quinlan AR. Mosdepth: quick coverage calculation for genomes and exomes. Bioinformatics. 2018;34(5):867-868. doi:10.1093/bioinformatics/btx699 |
plotMosdepth(
t_bed = "tumor.regions.bed.gz",
n_bed = "normal.regions.bed.gz",
segment = TRUE,
sample_name = "tumor"
)
# Coverage ratio T/N: 1.821
# Running CBS segmentation:
# Analyzing: tumor01
# current chromosome: 1
# current chromosome: 2
# current chromosome: 3
# current chromosome: 4
# current chromosome: 5
# current chromosome: 6
# current chromosome: 7
# current chromosome: 8
# current chromosome: 9
# current chromosome: 10
# current chromosome: 11
# current chromosome: 12
# current chromosome: 13
# current chromosome: 14
# current chromosome: 15
# current chromosome: 16
# current chromosome: 17
# current chromosome: 18
# current chromosome: 19
# current chromosome: 20
# current chromosome: 21
# current chromosome: 22
# current chromosome: X
# current chromosome: Y
# Segments are written to: tumor01_cbs.seg
# Plotting
Above tumor sample without the germline control, normalized for median depth of coverage
plotMosdepth_t(bed = "tumor.regions.bed.gz")
sessionInfo()
#> R version 4.2.1 Patched (2022-07-09 r82577)
#> Platform: x86_64-apple-darwin17.0 (64-bit)
#> Running under: macOS Big Sur ... 10.16
#>
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
#>
#> locale:
#> [1] C/en_US.UTF-8/en_US.UTF-8/C/en_GB/en_US.UTF-8
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] maftools_2.14.0
#>
#> loaded via a namespace (and not attached):
#> [1] DNAcopy_1.72.0 knitr_1.40 magrittr_2.0.3 splines_4.2.1
#> [5] lattice_0.20-45 R6_2.5.1 rlang_1.0.6 fastmap_1.1.0
#> [9] stringr_1.4.1 tools_4.2.1 grid_4.2.1 data.table_1.14.4
#> [13] xfun_0.34 cli_3.4.1 jquerylib_0.1.4 htmltools_0.5.3
#> [17] yaml_2.3.6 survival_3.4-0 digest_0.6.30 Matrix_1.5-1
#> [21] RColorBrewer_1.1-3 sass_0.4.2 cachem_1.0.6 evaluate_0.17
#> [25] rmarkdown_2.17 stringi_1.7.8 compiler_4.2.1 bslib_0.4.0
#> [29] jsonlite_1.8.3