1 liftOver of CTCF coordinates

As genome assemblies for model organisms continue to improve, CTCF sites for previous genome assemblies become obsolete. Typically, the actual genome sequence changes little, leading to changes in genomic coordinates. The liftOver method allows for conversion of genomic coordinates between genome assemblies.

Some carefully curated CTCF sites are available only for older genome assemblies. Examples include the data from CTCFBSDB, available for hg18 and mm8 genome assemblies.

To investigate whether liftOver of CTCF sites from older genome assemblies is a viable option, we tested for overlap between CTCF sites directly detected in specific genome assemblies with those lifted over. We detected CTCF sites using the MA0139.1 PWM from JASPAR 2022 database in hg18, hg19, hg38, and T2T genome assemblies and converted their genomic coordinates using the corresponding liftOver chains (download_liftOver.sh and convert_liftOver.sh scripts). We observed high Jaccard overlap among CTCF sites detected in the original genome assemblies or lifted over.

Jaccard overlaps among CTCF binding sites detected in the original and liftOver human genome assemblies. CTCF sites were detected using JASPAR 2022 MA0139.1 PWM. The correlogram was clustered using Euclidean distance and Ward.D clustering . White-red gradient indicate low-to-high Jaccard overlaps. Jaccard values are shown in the corresponding cells.

Our results suggest that liftOver is a viable alternative to obtain CTCF genomic annotations for different genome assemblies. We provide CTCFBSDB data converted to hg19 and hg38 genome assemblies.

Date the vignette was generated.

#> [1] "2022-07-27 17:26:02 EDT"

R session information.

#> ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.2.1 (2022-06-23)
#>  os       Ubuntu 20.04.4 LTS
#>  system   x86_64, linux-gnu
#>  ui       X11
#>  language (EN)
#>  collate  C
#>  ctype    en_US.UTF-8
#>  tz       America/New_York
#>  date     2022-07-27
#>  pandoc   2.5 @ /usr/bin/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
#>  package     * version date (UTC) lib source
#>  BiocManager   1.30.18 2022-05-18 [2] CRAN (R 4.2.1)
#>  BiocStyle   * 2.24.0  2022-07-26 [2] Bioconductor
#>  bookdown      0.27    2022-06-14 [2] CRAN (R 4.2.1)
#>  bslib         0.4.0   2022-07-16 [2] CRAN (R 4.2.1)
#>  cachem        1.0.6   2021-08-19 [2] CRAN (R 4.2.1)
#>  cli           3.3.0   2022-04-25 [2] CRAN (R 4.2.1)
#>  digest        0.6.29  2021-12-01 [2] CRAN (R 4.2.1)
#>  evaluate      0.15    2022-02-18 [2] CRAN (R 4.2.1)
#>  fastmap       1.1.0   2021-01-25 [2] CRAN (R 4.2.1)
#>  highr         0.9     2021-04-16 [2] CRAN (R 4.2.1)
#>  htmltools     0.5.3   2022-07-18 [2] CRAN (R 4.2.1)
#>  jquerylib     0.1.4   2021-04-26 [2] CRAN (R 4.2.1)
#>  jsonlite      1.8.0   2022-02-22 [2] CRAN (R 4.2.1)
#>  knitr         1.39    2022-04-26 [2] CRAN (R 4.2.1)
#>  magrittr      2.0.3   2022-03-30 [2] CRAN (R 4.2.1)
#>  R6            2.5.1   2021-08-19 [2] CRAN (R 4.2.1)
#>  rlang         1.0.4   2022-07-12 [2] CRAN (R 4.2.1)
#>  rmarkdown     2.14    2022-04-25 [2] CRAN (R 4.2.1)
#>  sass          0.4.2   2022-07-16 [2] CRAN (R 4.2.1)
#>  sessioninfo * 1.2.2   2021-12-06 [2] CRAN (R 4.2.1)
#>  stringi       1.7.8   2022-07-11 [2] CRAN (R 4.2.1)
#>  stringr       1.4.0   2019-02-10 [2] CRAN (R 4.2.1)
#>  xfun          0.31    2022-05-10 [2] CRAN (R 4.2.1)
#>  yaml          2.3.5   2022-02-21 [2] CRAN (R 4.2.1)
#> 
#>  [1] /tmp/RtmpnQOK6E/Rinst2409be7b9e03ea
#>  [2] /home/biocbuild/bbs-3.15-bioc/R/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────