This vignette shows how to process long-read PacBio HiFi variant calls from a validated trio (HG002–HG003–HG004) and prepare them for UPDhmm analysis.
Ashkenazi trio (GIAB, NIST) – PacBio HiFi Revio, DeepVariant calls (GRCh38).
# Proband (HG002)
wget ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/AshkenazimTrio/analysis/PacBio_HiFi-Revio_20231031/pacbio-wgs-wdl_germline_20231031/HG002.GRCh38.deepvariant.phased.vcf.gz
wget ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/AshkenazimTrio/analysis/PacBio_HiFi-Revio_20231031/pacbio-wgs-wdl_germline_20231031/HG002.GRCh38.deepvariant.phased.vcf.gz.tbi
# Father (HG003)
wget ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/AshkenazimTrio/analysis/PacBio_HiFi-Revio_20231031/pacbio-wgs-wdl_germline_20231031/HG003.GRCh38.deepvariant.phased.vcf.gz
wget ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/AshkenazimTrio/analysis/PacBio_HiFi-Revio_20231031/pacbio-wgs-wdl_germline_20231031/HG003.GRCh38.deepvariant.phased.vcf.gz.tbi
# Mother (HG004)
wget ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/AshkenazimTrio/analysis/PacBio_HiFi-Revio_20231031/pacbio-wgs-wdl_germline_20231031/HG004.GRCh38.deepvariant.phased.vcf.gz
wget ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/AshkenazimTrio/analysis/PacBio_HiFi-Revio_20231031/pacbio-wgs-wdl_germline_20231031/HG004.GRCh38.deepvariant.phased.vcf.gz.tbi
bcftools merge \
-O z \
-o trio_HiFi_GRCh38_phased.vcf.gz \
HG002.GRCh38.deepvariant.phased.vcf.gz \
HG003.GRCh38.deepvariant.phased.vcf.gz \
HG004.GRCh38.deepvariant.phased.vcf.gz
bcftools index trio_HiFi_GRCh38_phased.vcf.gz
The following filtering steps are applied:
keep only biallelic variants
remove sites where all trio members are reference (0/0 or 0|0)
remove sites where all trio members are missing (./. or .|.)
bcftools view \
-m2 -M2 \
-e 'COUNT(GT="0/0" || GT="0|0")==3 || COUNT(GT="./." || GT=".|.")==3' \
-O z \
-o trio_HiFi_GRCh38_phased_biallelic_nonref_nomissing.vcf.gz \
trio_HiFi_GRCh38_phased.vcf.gz
bcftools index trio_HiFi_GRCh38_phased_biallelic_nonref_nomissing.vcf.gz
library(UPDhmm)
library(VariantAnnotation)
vcf <- readVcf(
"trio_HiFi_GRCh38_phased_biallelic_nonref_nomissing.vcf.gz"
)
vcf_check <- vcfCheck(
<!-- vcf, -->
proband = "HG002",
father = "HG003",
mother = "HG004"
)
events <- calculateEvents(
vcf_check,
add_ratios = TRUE
)
sessionInfo()
## R version 4.6.0 Patched (2026-05-01 r89994)
## Platform: aarch64-apple-darwin23
## Running under: macOS Tahoe 26.3.1
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.6/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.6/Resources/lib/libRlapack.dylib; LAPACK version 3.12.1
##
## locale:
## [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: America/New_York
## tzcode source: internal
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] karyoploteR_1.39.0 regioneR_1.45.0
## [3] VariantAnnotation_1.59.0 Rsamtools_2.29.0
## [5] Biostrings_2.81.2 XVector_0.53.0
## [7] SummarizedExperiment_1.43.0 Biobase_2.73.1
## [9] GenomicRanges_1.65.0 IRanges_2.47.2
## [11] S4Vectors_0.51.3 Seqinfo_1.3.0
## [13] MatrixGenerics_1.25.0 matrixStats_1.5.0
## [15] BiocGenerics_0.59.6 generics_0.1.4
## [17] dplyr_1.2.1 UPDhmm_1.9.0
## [19] BiocStyle_2.41.0
##
## loaded via a namespace (and not attached):
## [1] DBI_1.3.0 bitops_1.0-9 gridExtra_2.3
## [4] rlang_1.2.0 magrittr_2.0.5 biovizBase_1.61.0
## [7] otel_0.2.0 compiler_4.6.0 RSQLite_3.53.1
## [10] GenomicFeatures_1.65.0 png_0.1-9 vctrs_0.7.3
## [13] ProtGenerics_1.45.0 stringr_1.6.0 pkgconfig_2.0.3
## [16] crayon_1.5.3 fastmap_1.2.0 magick_2.9.1
## [19] backports_1.5.1 rmarkdown_2.31 UCSC.utils_1.9.0
## [22] tinytex_0.59 bit_4.6.0 xfun_0.58
## [25] cachem_1.1.0 cigarillo_1.3.0 GenomeInfoDb_1.49.1
## [28] jsonlite_2.0.0 blob_1.3.0 DelayedArray_0.39.3
## [31] BiocParallel_1.47.0 parallel_4.6.0 cluster_2.1.8.2
## [34] R6_2.6.1 stringi_1.8.7 bslib_0.11.0
## [37] RColorBrewer_1.1-3 bezier_1.1.2 rtracklayer_1.73.0
## [40] rpart_4.1.27 jquerylib_0.1.4 Rcpp_1.1.1-1.1
## [43] bookdown_0.46 knitr_1.51 base64enc_0.1-6
## [46] BiocBaseUtils_1.15.1 Matrix_1.7-5 nnet_7.3-20
## [49] tidyselect_1.2.1 rstudioapi_0.18.0 dichromat_2.0-0.1
## [52] abind_1.4-8 yaml_2.3.12 codetools_0.2-20
## [55] curl_7.1.0 lattice_0.22-9 tibble_3.3.1
## [58] KEGGREST_1.53.0 S7_0.2.2 evaluate_1.0.5
## [61] foreign_0.8-91 pillar_1.11.1 BiocManager_1.30.27
## [64] checkmate_2.3.4 RCurl_1.98-1.18 ensembldb_2.37.1
## [67] ggplot2_4.0.3 scales_1.4.0 glue_1.8.1
## [70] lazyeval_0.2.3 Hmisc_5.2-5 tools_4.6.0
## [73] BiocIO_1.23.3 data.table_1.18.4 BSgenome_1.81.0
## [76] GenomicAlignments_1.49.0 XML_3.99-0.23 grid_4.6.0
## [79] colorspace_2.1-2 AnnotationDbi_1.75.0 htmlTable_2.5.0
## [82] restfulr_0.0.16 Formula_1.2-5 cli_3.6.6
## [85] HMM_1.0.2 S4Arrays_1.13.0 AnnotationFilter_1.37.0
## [88] gtable_0.3.6 sass_0.4.10 digest_0.6.39
## [91] SparseArray_1.13.2 rjson_0.2.23 htmlwidgets_1.6.4
## [94] farver_2.1.2 memoise_2.0.1 htmltools_0.5.9
## [97] lifecycle_1.0.5 httr_1.4.8 bit64_4.8.2
## [100] bamsignals_1.45.1