1 Introduction

InterCellar is a Bioconductor package that provides an interactive Shiny application to enable the analysis of cell-cell communication from single-cell RNA sequencing (scRNA-seq) data. Every step of the analysis can be performed interactively, thus not requiring any programming skills. Moreover, InterCellar runs on your local machine, avoiding issues related to data privacy.

1.1 Installation

InterCellar is distributed as a Bioconductor package and requires R (version “4.1”) and Bioconductor (version “3.13”).

To install InterCellar package enter:

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("InterCellar")

1.2 Launching the app

Once InterCellar is successfully installed, it can be loaded as follow:

library(InterCellar)

In order to start the app, please run the following command:

InterCellar::run_app( reproducible = TRUE )

InterCellar should be opening in a browser. If this does not happen automatically, please open a browser and navigate to the address shown (for example, Listening on http://127.0.0.1:6134). The flag reproducible = TRUE ensures that your results will be reproducible across R sessions.

2 Overview of the workflow

The following figure represents InterCellar analysis workflow. We could not miss the opportunity of depicting it as an interstellar journey.

3 Data upload

The first step of the workflow requires the upload of pre-computed results generated by an external tool capable of predicting cell-cell communication mediated by ligand-receptor interactions. InterCellar supports both published tools such as CellPhoneDBv2 (Efremova et al. 2020) and SingleCellSignalR (Cabello-Aguilar et al. 2020), and custom results output of ad hoc methods, which must contain necessary information described in the relative panel. For the purpose of this user guide, we will use CellPhoneDB results computed on a scRNA-seq dataset of malignant melanoma, from Tirosh et al.(Tirosh et al. 2016).

By navigating to 1. Data and Upload, we can simply select the folder containing CellPhoneDB results from our local drive. InterCellar will read and pre-process the data and show the resulting table in Table view. The pre-processing step consists of:

  • Mapping interaction pairs to the corresponding genes (hgnc symbols)
  • Annotating genes to their molecular function: L (ligand) or R (receptor)
  • Re-ordering the interaction pairs listed as R-L to L-R

4 Universes

Once the input data has been uploaded, InterCellar takes us to the exploration of three Universes. Each universe has its focus on a different biological domain: cell clusters, genes and functions. Specific filtering options can be applied and multiple visualization choices are available to enable a deep exploration of the cellular communication.

4.1 Cluster-verse

Focus of this universe are clusters of cells participating in the communication. The filtering options allow the user to subset the dataset by:

  • excluding entire clusters from the data. All interactions related to the excluded clusters will be excluded as well from further steps of the analysis;
  • setting a minimum interaction score;
  • (when available) changing the p-value threshold for significant interactions (default to 0.05).

The analyst will be able to see the effect of these filtering steps by looking at the box showing the number of total interactions. Warning: these filters have global influence on the analysis, since they subset the input data!

Moreover, four tabs are part of the Cluster-verse: Network, Barplot, 1 vs 1 and Table.

The Network of clusters shows the overall cellular communication. Nodes represent different clusters while edges show the total number of paracrine interactions performed between two clusters. Edges that fall back on the same cluster represent autocrine interactions.

Barplot offers two different barplots representing: (1) the total number of interactions per cluster, divided in paracrine and autocrine interactions; and (2) the relative number of interactions for a certain cell type.

The 1 vs 1 panel allows the comparison of two conditions, in terms of numbers of interactions per cluster. Files (.csv) output of panel Barplot are needed. Output of this panel are: (i) a back-to-back barplot comparing total number of interactions per cluster, in the two conditions; and (ii) a radar plot, comparing relative numbers of interaction from a certain viewpoint cell cluster."

In the Table panel, the analyst can restrict the data exploration to a specific focus, by subsetting the data to one cluster of interest, called viewpoint, and one flow of communication among:

  • Directed, outgoing interactions (L-R): for which the viewpoint cluster sends (expresses) the ligand to the other clusters, that are in turn expressing the corresponding receptor;
  • Directed, incoming interactions (R-L): for which the viewpoint expresses the receptor that binds to the corresponding ligand sent by other clusters;
  • Undirected interactions (L-L and R-R): for which both elements of an interaction pair are either ligands or receptors.

4.2 Gene-verse

InterCellar second universe focuses on the genes. Filtering options are available when the input data has been generated by CellPhoneDB or scSignalR and allow the user to exclude interaction pairs (int-pairs). The Table shows all unique int-pairs enriched in our data, regardless of the clusters in which these are found. InterCellar calculates a uniqueness score for each int-pair (from 0 to 1, with 1 being the most unique) related to the fraction of cluster pairs that are enriched by this int-pair. In other words, an int-pair that is found between all clusters will have uniqueness score close to 0, whereas an int-pair which is only found between e.g. clusters 1 and 2, will have uniqueness score close to 1, suggesting higher specificity of this int-pair to a small subset of cell populations. Other info included in this Table are Ensembl and UniProt IDs of each gene, with hyperlinks to the respective web pages to facilitate a deeper analysis.

Upon selection of one or multiple int-pairs from the previous Table, a dot plot is generated and visible in the Dot Plot panel. The analyst can decide to select a subset of clusters for the visualization as well as choose different colors for high and low int-pair score.

All vs all panel allows the comparison of dotplots generated for different conditions. Files (.csv) output of panel Dotplot are needed. Ideally, similar sets of int-pairs and cluster-pairs should be considered across conditions. The resulting dotplot shows only the int-pairs/cluster-pairs that are unique to a certain condition.

4.3 Function-verse

In the Function-verse, the analyst is required to perform a functional annotation, before proceeding to the next steps of the analysis. To this scope, InterCellar offers multiple sources of functional annotations in terms of Gene Ontology (queried from Ensembl, via the package biomaRt) and pre-downloaded pathway databases (from the package graphite). After selection of suitable sources, the annotation can be performed and a Table showing all functional terms annotated to each int-pair is displayed. Worth to note is the fact that a functional term is annotated to an int-pair only when the functional term is enriched in both genes, partners of the interaction. Thus, InterCellar approach to functional annotation is driven by the underlying concept of protein-protein interaction: a function may be activated only upon presence of both protein partners (in our case, genes). This approach differs from the well-known methods for functional enrichment analysis (e.g. carried out by ToppGene), that are not based on the concept of protein-protein interactions.

The Barplot panel summarizes the number of functional terms annotated for each source.

In the following panel, Ranking, functional terms are listed individually, along with information on occurrence (i.e. how many int-pairs have been annotated to this term) and average uniqueness (i.e. the average value of the uniqueness scores of each int-pair annotated to this term).

By selecting one row of the Ranking table, we can explore the term of interest in the Sunburst plot. This visualization allows to connect functions to int-pairs and clusters. On the left side of the panel, a table lists all int-pairs annotated to the term. On the right side, the sunburst plot is composed as follows:

  • the selected functional term is shown in the inner circle of the plot;
  • the inner ring shows all “first partner” clusters, enriched by the relevant int-pairs. Specifically, clusters on the inner ring express the first gene of each int-pair;
  • the outer ring displays all “second partner” clusters, expressing the second gene of each int-pair;
  • the width of each section represents the fraction of int-pairs found in that section (also shown when hovering on the inner ring sections);
  • hovering on outer ring sections will show the individual int-pairs enriched in the cluster pairs.

5 Int-Pair Modules Analysis

The final step of InterCellar workflow allows the analyst to define and analyze Int-Pair Modules, i.e. groups of int-pairs that share a similar functional profile. To this aim, the choice of a viewpoint cluster and communication flow is required. InterCellar will subset the input data accordingly. This analysis can be repeated for each viewpoint and flow of interest.

To define the number of int-pair modules in the data subset, four visualizations are provided. On the left hand side, the optimal number of modules is calculated by InterCellar using (1) the elbow method on the total within-clusters sum of squares (which should be minimized) and (2) the average silhouette width (which should be maximized). Both methods are standard practice in cluster analysis and are supposed to help the choice of the optimal number of modules. However, the end user is free to choose the best number of modules depending on each case. In general, high (low) number of groups is reflected in high (low) specificity of a module. For this purpose, two visualization offer yet another way to investigate the optimal number of modules. A dendrogram of int-pairs shows the results of a hierarchical clustering obtained on the first two components of the UMAP underneath. Each point of the UMAP represents one int-pair (shown by hovering) and color-coding is consistent for both UMAP and dendrogram, showing the number of modules chosen. Moreover, dendrogram and UMAP are initialized with the optimal number of modules chosen by the elbow method (giving usually higher resolution compared to the average silhouette).

Once the int-pair modules have been defined, InterCellar offers the possibility to visualize the int-pairs belonging to each module and the respective clusters in a Circle plot. Directed interactions are represented here by arrows originating from ligands (double segment) towards receptors (single segment). The Table panel summarizes the same info in a tabular format.

Last step of the int-pair modules analysis concerns functional terms. Upon selection of one int-pair module, InterCellar performs a permutation test to calculate empirical p-values assessing the significance of functional terms annotated to int-pairs of the module. A Table displays functional terms that are found significant (p-value <= 0.05 by default) for the chosen int-pair module.

References

Cabello-Aguilar, Simon, Mélissa Alame, Fabien Kon-Sun-Tack, Caroline Fau, Matthieu Lacroix, and Jacques Colinge. 2020. “SingleCellSignalR: Inference of Intercellular Networks from Single-Cell Transcriptomics.” Nucleic Acids Research 48 (10): e55–e55.

Efremova, Mirjana, Miquel Vento-Tormo, Sarah A Teichmann, and Roser Vento-Tormo. 2020. “CellPhoneDB: Inferring Cell–Cell Communication from Combined Expression of Multi-Subunit Ligand–Receptor Complexes.” Nature Protocols 15 (4): 1484–1506.

Tirosh, Itay, Benjamin Izar, Sanjay M Prakadan, Marc H Wadsworth, Daniel Treacy, John J Trombetta, Asaf Rotem, et al. 2016. “Dissecting the Multicellular Ecosystem of Metastatic Melanoma by Single-Cell Rna-Seq.” Science 352 (6282): 189–96.