As genome assemblies for model organisms continue to improve, CTCF sites for previous genome assemblies become obsolete. Typically, the actual genome sequence changes little, leading to changes in genomic coordinates. The liftOver method allows for conversion of genomic coordinates between genome assemblies.
Some carefully curated CTCF sites are available only for older genome assemblies. Examples include the data from CTCFBSDB, available for hg18 and mm8 genome assemblies.
To investigate whether liftOver of CTCF sites from older genome assemblies is a viable option, we tested for overlap between CTCF sites directly detected in specific genome assemblies with those lifted over. We detected CTCF sites using the MA0139.1 PWM from JASPAR 2022 database in hg18, hg19, hg38, and T2T genome assemblies and converted their genomic coordinates using the corresponding liftOver chains (download_liftOver.sh and convert_liftOver.sh scripts). We observed high Jaccard overlap among CTCF sites detected in the original genome assemblies or lifted over.
Jaccard overlaps among CTCF binding sites detected in the original and liftOver human genome assemblies. CTCF sites were detected using JASPAR 2022 MA0139.1 PWM. The correlogram was clustered using Euclidean distance and Ward.D clustering . White-red gradient indicate low-to-high Jaccard overlaps. Jaccard values are shown in the corresponding cells.