How to Use Fit-Hi-C R Package

Ruyu Tan

2016-10-17

Introduction

Fit-Hi-C is a tool for assigning statistical confidence estimates to intra-chromosomal contact maps produced by genome-wide genome architecture assays such as Hi-C. Compared to Python original, Fit-Hi-C R port has the following advantages:

Install FitHiC

To install this package, start R and enter

## try http:// if https:// URLs are not supported
source("https://bioconductor.org/biocLite.R")
biocLite("FitHiC")

There are two ways to retrieve development versions

## try http:// if https:// URLs are not supported
source("https://bioconductor.org/biocLite.R")
biocLite("BiocInstaller")
useDevel()
biocLite("FitHiC")
wget http://bioconductor.org/packages/devel/bioc/src/contrib/FitHiC_x.y.z.tar.gz .
R CMD INSTALL FitHiC_x.y.z.tar.gz

Prepare Data

Before running Fit-Hi-C, two input files should be prepared.

FRAGSFILE
Chromosome.Name Column.2 Mid.Point Hit.Count Column.5
1 0 1305 0 0
1 0 2635 233 1
1 0 4756 876 1
1 0 8568 1076 1
1 0 10384 1210 1
1 0 12246 639 1
INTERSFILE
Chromosome1.Name Mid.Point.1 Chromosome2.Name Mid.Point.2 Hit.Count
10 100894 10 150593 2
10 100894 10 162267 1
10 100894 10 169783 2
10 100894 10 179515 3
10 100894 10 182528 1
10 100894 10 185071 1

Besides, OUTDIR, the path where the output files will be stored, is also required to be specified.

Run Fit-Hi-C

After the input data is well prepared, you can easily run Fit-Hi-C in R as:

library("FitHiC")
FitHiC(FRAGSFILE, INTERSFILE, OUTDIR, ...)

If you want to output images simultaneously, explicitly set visual to TRUE:

library("FitHiC")
FitHiC(FRAGSFILE, INTERSFILE, OUTDIR, ..., visual=TRUE)

Examples

The pre-processed Hi-C data is from Yeast - EcoRI 1. FRAGSFILE and INTERSFILE are located in system.file("extdata", "fragmentLists/Duan_yeast_EcoRI.gz", package = "FitHiC") and system.file( "extdata", "contactCounts/Duan_yeast_EcoRI.gz", package = "FitHiC"), respectively. When input data is ready, run as follows:

library("FitHiC")
fragsfile <- system.file("extdata", "fragmentLists/Duan_yeast_EcoRI.gz",
    package = "FitHiC")
intersfile <- system.file("extdata", "contactCounts/Duan_yeast_EcoRI.gz",
    package = "FitHiC")
FitHiC(fragsfile, intersfile, getwd(), libname="Duan_yeast_EcoRI",
    distUpThres=250000, distLowThres=10000)

Internally, Fit-Hi-C will successively call generate_FragPairs, read_ICE_biases, read_All_Interactions, calculateing_Probabilities, fit_Spline methods. The execution of Fit-Hi-C will be successfully completed till the following log appears:

## Fit-Hi-C is processing ...
## Running generate_FragPairs method ...
## Complete generate_FragPairs method [OK]
## Running read_All_Interactions method ...
## Complete read_All_Interactions method [OK]
## Running calculating_Probabilities method ...
## Writing Duan_yeast_EcoRI.fithic_pass1.txt
## Complete calculating_Probabilities method [OK]
## Running fit_Spline method ...
## Writing p-values to file Duan_yeast_EcoRI.spline_pass1.significances.txt.gz
## Complete fit_Spline method [OK]
## Running calculating_Probabilities method ...
## Writing Duan_yeast_EcoRI.fithic_pass2.txt
## Complete calculating_Probabilities method [OK]
## Running fit_Spline method ...
## Writing p-values to file Duan_yeast_EcoRI.spline_pass2.significances.txt.gz
## Complete fit_Spline method [OK]
## Execution of Fit-Hi-C completed successfully. [DONE]
## .Primitive("return")

The output files come from two internal methods called by Fit-Hi-C.

Duan_yeast_EcoRI.fithic_pass1.txt
avgGenomicDist contactProbability standardError noOfLocusPairs totalOfContactCounts
10105 3.12e-05 2.7e-06 322 22212
10315 3.05e-05 2.5e-06 330 22251
10545 2.87e-05 2.1e-06 350 22191
10779 2.97e-05 3.0e-06 344 22583
10982 3.16e-05 2.7e-06 323 22532
11196 3.32e-05 2.7e-06 302 22185
Duan_yeast_EcoRI.fithic_pass2.txt
avgGenomicDist contactProbability standardError noOfLocusPairs totalOfContactCounts
10107 1.15e-05 8e-07 252 6428
10317 1.31e-05 9e-07 266 7709
10546 1.43e-05 8e-07 281 8887
10779 1.27e-05 8e-07 285 7974
10982 1.32e-05 8e-07 255 7426
11196 1.40e-05 8e-07 238 7356
Duan_yeast_EcoRI.spline_pass1.significances.txt.gz
chr1 fragmentMid1 chr2 fragmentMid2 contactCount p_value q_value
10 100894 10 150593 2 0.9988785 1
10 100894 10 162267 1 0.9985433 1
10 100894 10 169783 2 0.9708609 1
10 100894 10 179515 3 0.8072602 1
10 100894 10 182528 1 0.9831568 1
10 100894 10 185071 1 0.9795001 1
Duan_yeast_EcoRI.spline_pass2.significances.txt.gz
chr1 fragmentMid1 chr2 fragmentMid2 contactCount p_value q_value
10 100894 10 150593 2 0.9813195 1
10 100894 10 162267 1 0.9902851 1
10 100894 10 169783 2 0.8983241 1
10 100894 10 179515 3 0.6547083 1
10 100894 10 182528 1 0.9571117 1
10 100894 10 185071 1 0.9501637 1

If visual is set to TRUE, corresponding images will be also outputed:

Support

For questions about the use of Fit-Hi-C method, to request pre-processed Hi-C data or additional features and scripts, or to report bugs and provide feedback please e-mail Ferhat Ay.

Ferhat Ay <ferhatay at uw period edu>


  1. Duan Z, et al. 2010. A three-dimensional model of the yeast genome. Nature 465: 363–367.