1 Introduction

Chromatin modifications orchestrate the dynamic regulation of gene expression during development and in disease. Bulk approaches have characterized the wide repertoire of histone modifications across cell types, detailing their role in shaping cell identity. However, these population-based methods do not capture cell-to-cell heterogeneity of chromatin landscapes, limiting our appreciation of the role of chromatin in dynamic biological processes. Recent technological developments enable the mapping of histone marks at single-cell resolution, opening up perspectives to characterize the heterogeneity of chromatin marks in complex biological systems over time. Yet, existing tools used to analyze bulk histone modifications profiles are not fit for the low coverage and sparsity of single-cell epigenomic datasets.

ChromSCape Overview: The application takes as input various single-cell format and let the user filter & cluster cells, run differential analysis & gene set enrichment analysis between epigenomic subpopulations, in an unsupervised manner.

ChromSCape is a user-friendly interactive Shiny/R application that processes single-cell epigenomic data to assist the biological interpretation of chromatin landscapes within cell populations. ChromSCape analyses the distribution of repressive and active histone modifications as well as chromatin accessibility landscapes from single-cell datasets (scATAC-seq, scChIP-seq, scCUT&TAG…).

The goal of ChromSCape is to provide user a interactive interface to explore and run a complete analyse (QC, preprocessing, analysis, interpretation) on various single-cell epigenomic data. The application has multiple advantages:

2 Quick Start

To launch the application simply run:

if (!requireNamespace("BiocManager", quietly = TRUE)){

Load ChromSCape

print("Loading ChromSCape")
## [1] "Loading ChromSCape"

Launch the shiny application


A browser should open with the application. If the browser doesn’t open automatically, navigate to the displayed url, e.g. “Listening on”. The first time you’ll open the application, you will be guided through a small tour of the application, that you can come back to any time you like by clicking the Help button on the upper right corner.

3 ChromSCape step by step

The user interface is organized by ‘Tab’ corresponding to specific ‘step’ of the analysis. In order to be able to access to each Tab you need to complete the previous steps. For example, before accessing the ‘Filter & Normalize’ Tab, you need to first complete the ‘Select & Import’ Tab.
Each one of your project is named an ‘Analysis’ and is comprised of one raw dataset and additional objects that you have produced by going further into the analysis. A folder named ‘ChromSCape_analyses’ which will contain all your Analysis is produced in the output directory when you create an Analysis for the first time. In this folder, each of your Analysis is a folder with the following organisation:

├── Analysis_1
│   ├── annotation.txt
│   ├── Filtering_Normalize_Reduce
│   │   └── Analysis_1_(…).RData
│   ├── correlation_clustering
│   │   └── Analysis_1_(…).RData
│   ├── Diff_Analysis_Gene_Sets
│   │   └── Analysis_1_(…).RData
│   └── scChIP_raw.RData

The raw data is stored at the root of the folder, and at each main step (‘Filtering & Normalization’, ‘Correlation Clustering’ and ‘Differential Analysis’) the objects are saved. This enable you to close the application and later re-load your analysis without the need of re-doing all those steps. This also enable you to share your analysis with colleagues simply by copying your Analysis folder.
Note: The (…) in the saved objects contained the values of the parameters for an Analysis. If you try multiple parameter, each results will be saved this way and all trials will accessible in the future.

3.1 Input files (before launching ChromSCape)

Various existing technologies allow to produce single-cell genome-wide epigenomic datasets :

  • scChIP-seq (Grosselin et al., 2019) scCUT&Tag (Kaya-Okur et al., 2019), scChIL-seq (Harada et al., 2019), scChIC-seq (Ku et al.,) reveal the distribution of histone marks (H3K27me3, H3K4me3) or transcription factors (RNA Polymerase 2,…) with single-cell resolution.
  • scATAC-seq (Buenorostro et al., 2015) or sciATAC-seq (Cusanovitch et al.,
  1. assess regions of open chromatin in single-cells
  • scDNA methylation profiling, scRRBS, scWGBS …

After sequencing a single-cell epigenomic experiment, the raw sequencing files (.fastq) are demultiplexed, aligned against a reference genome to output different kind of data depending on the technology. ChromSCape allows user to input a variety of different format. Depending on the output of the data-engineering/pre-processing pipeline used, the signal can be already summarized into features (Count matrix, Peak-Index-Barcode files) or stored directly in raw format (single-cell BAM or single-cell BED files).

Anyhow the format, ChromSCape needs signal to be summarized into features. If inputting raw signal (scBAM or scBED), the application lets user summarize signal of each cells into various features:

  • Genomic features (extended region around TSS of genes, enhancers)
  • Peaks called on bulk or single-cell signal (BED file must be provided by the user)
  • Genomic bins (windows of constant length, e.g. 100kbp, 50kbp, 5kbp…)

Note that summarizing will take longer if using BAM files than BED files, and if the number of features is important (e.g. 5kbp bins, enhancers…).

3.1.1 Count matrices files

Each sample should be contained into single-cell count matrix (features x cells) in tab-separated format (extension .txt or .tsv) or comma-separated format (.csv). The first column is genomic location in standard format (chr:start-end) and the next columns are reads counted in each cells in the corresponding genomic feature. Note that the first entry (row 1, column 1) must be empty. All files should be placed in the same folder.
If inputing only a single matrix regrouping different samples, the user can check the ‘multiple sample’ check box and specify a number of samples. ChromSCape will automatically find the different samples based on the names of the cells, so please make sure samples names are all quite different (e.g. K562_.., GM12878_..).

An example of such dataset is availablefor H3K27me3 mouse scChIP-seq of paired PDX samples at: PDX mouse cells H3K27me3 scChIP-seq matrices.

BC969404 BC893525 BC239068 BC073314 chr10:0-50000 0 0 0 0 chr10:50000-100000 0 0 0 0 chr10:100000-150000 0 0 0 0 …

3.1.2 Peak-Index-Barcode files

This format regroups three files containing signal of all samples of one or multiple experiment:

  • The barcode file contains one cell barcode per line and finish by ’_barcode.txt’ :

HBCx95_BC969404 HBCx95_BC893525 HBCx22_BC239068 HBCx22_BC073314 …

  • The peak file contains feature location (usually peaks) must be in BED format and finish by ’_peaks.bed’:

chr3 197959001 197961500 chr3 198080001 198081500 chr4 53001 55500 …

  • The index file contains indexes of non-zero signal and finish by ’_indexes.txt’, the first column is the index of peaks, the second column contains barcode index, the last column contains the signal:

459 1 1 461 1 1 556 1 2

3.1.3 Single-cell BAM or BED files

Each BAM or BED (.bed or .bed.gz) file must be grouping signal of a single-cell, and all files must be placed in the same folder. The signal will be summarized into either bins, peaks, or around gene TSS depending on user choice.

An example of such dataset is publicly available for H3K27me3 single-cell CUT&TAG (Kaya-Okur et al., 2019) K562 and H1 cells at: K562 H3K27me3 cells and K562 + H1 H3K27me3 cells. All files need to be placed in the same folders, the BED files do not need to be unzipped.

3.1.4 Alignment BAM files for Peak Calling (optional)

For the optional step of Peak Calling on cluster of cells found de novo, users have to input one BAM file per sample, placed in the same folder. The barcode information of each read should be contained in a specific tag (XB, CB..) and correspond to the column names of the corresponding count matrix.

samtools view example_matrix.bam | head NS500388:436:HNG5VAFXX:2:21311:26430:3816 89 chr10 3102405 255 51M * 0 0 CTTGGTGTCTAGTGGATCTGCTGCAGTCTTCTGTTGTCAGTGCTAAATCAC EEEEE/E6EEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAA NH:i:1 HI:i:1 AS:i:50 nM:i:0 XS:i:2147483647 XD:Z:GATGACAAAG XB:Z:BC969404 …

3.2 Create & import a dataset

Once you have launched the application, you arrive on the “Select & Import”. Here you can selet an output directory, where your analyses will be saved. ChromSCape will automatically save the folder’s location so that you don’t have to select it each time you connect.

You must then name your analysis. The name shouldn’t contain special characters (except ’_‘). Choose either the Human (hg38) or Mouse (mm10) genome. This is used to annotate your features with the closest genes TSS for the Gene Set Analysis. Browse your computer for one or multiple matrices that will be analyzed together. To select multiple matrices, they must be placed in the same folder and then the user can select multiples matrices with (Shift + Click) or (Ctrl + Click). Finally, clicking on ’Create Analysis’ will create an analysis & import the count matrices in this analysis. This will create a folder named ‘ChromSCape_analyses’ in the output directory you specified, inside which another folder ‘Your_analysis_name’ is nested. If you create another analysis, it will also be created under ‘ChromSCape_analyses’. If you already created an analysis in a previous section, selecting the same output directory as you chose in the previous session will enable you to load your analysis.