Contents

1 Important note regarding the SNPhood version

SNPhood is under active development, and we highly recommend using the newest available version. In particular, we recommend using the devel branch of Bioconductor / SNPhood to make sure you use the latest features and bugfixes. If you are not sure how to switch to the devel branch, contact us, we are happy to help!

2 Motivation, Necessity, Package Scope and Limitations

2.1 Motivation and Necessity

Figure 1 - SNPhood logo.



To date, thousands of single nucleotide polymorphisms (SNPs) have been found to be associated with complex traits and diseases. However, the vast majority of these disease-associated SNPs lie in the non-coding part of the genome, and are likely to affect regulatory elements, such as enhancers and promoters, rather than the function of a protein. Thus, to understand the molecular mechanisms underlying genetic traits and diseases, it becomes increasingly important to study the effect of a SNP on nearby molecular traits such as chromatin environment or transcription factor (TF) binding. Towards this aim, we developed SNPhood, a user-friendly Bioconductor [9] R package to investigate, quantify and visualize the local epigenetic neighborhood of a set of SNPs in terms of chromatin marks, TF binding sites using data from NGS experiments. SNPhood comprises a set of easy-to-use functions to extract, normalize and quantify reads for a genomic region, perform data quality checks, normalize read counts using input files, to investigate the binding pattern using unsupervised clustering. In addition, SNPhood can be employed for identifying and visualizing allele-specific binding patterns around SNPs using a robust permutation based FDR procedure. The regions around each SNP can be binned in a user-defined fashion to allow for analyses ranging from very broad regions to highly detailed investigation of specific binding shapes. Importantly, SNPhood supports the integration with genotype information to investigate and visualize genotype-specific binding patterns.

2.2 Package scope and limitations

In this section, we want explicitly mention the designated scope of the SNPhood package, its limitations and additional / companion packages that may be used subsequently or beforehand.

First, let’s be clear what SNPhood is NOT:

  • SNPhood is NOT a peak caller of ChIP-Seq datasets and works in an orthogonal manner instead. See below for more details and differences.
  • SNPhood is NOT a tool for proper or designated quality control (QC) of ChIP-Seq datasets and offers only rudimentary QC. For more details and a designated discussion about this issue, see here.
  • SNPhood is NOT a tool for the discovery of Quantitative Trait Loci (QTLs). For QTL discovery, designated tools such as WASP [13] or MatrixEQTL [14] are available.

Instead, SNPhood aims to fill an existing gap for an increasingly common task: Current workflows for analyzing ChIP-Seq data typically involve peak calling, which summarizes the signal of each binding event into two numbers: enrichment and peak size, and usually neglects additional factors like binding shape. However, when a set of regions of interest (ROI) is already at hand - e.g. GWAS SNPs, quantitative trait loci (QTLs), etc. - a comprehensive and unbiased analysis of the molecular neighborhood of these regions, potentially in combination with allele-specific (AS) binding analyses will be more suited to investigate the underlying (epi-)genomic regulatory mechanisms than simply comparing peak sizes. Currently, such analyses are often carried out “by hand” using basic NGS tools and genome-browser like interfaces to visualize molecular phenotype data independently for each ROI. A tool for systematic analysis of the local molecular neighborhood of regions of interest is currently lacking. SNPhood fills this gap to investigate, quantify, and visualize the local epigenetic neighborhood of regions of interest using chromatin or TF binding data from NGS experiments. It provides a set of tools that are largely complimentary to currently existing software for analyzing ChIP-Seq data.

Figure 2 - SNPhood feature summary and scope. Comparison and distinction of SNPhood with regard to commonly used tools for ChIP-Seq/RNA-Seq data. Green, yellow and red: Feature fully, partially or not supported, respectively.

3 Basic Mode of Action

When running the main function analyzeSNPhood, a series of steps and calculations is performed. In summary, the basic mode of action can be summarized in the following schematic: