iSEE User’s Guide

Federico Marini1,2*, Aaron Lun3**, Charlotte Soneson4,5*** and Kevin Rue-Albrecht6****

1Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI), Mainz
2Center for Thrombosis and Hemostasis (CTH), Mainz
3Cancer Research UK Cambridge Institute, University of Cambridge
4Institute of Molecular Life Sciences, University of Zurich
5SIB Swiss Institute of Bioinformatics
6Kennedy Institute of Rheumatology, University of Oxford, Headington, Oxford OX3 7FY, UK.

*marinif@uni-mainz.de
**aaron.lun@cruk.cam.ac.uk
***charlottesoneson@gmail.com
****kevin.rue-albrecht@kennedy.ox.ac.uk

4 May 2018

Package

iSEE 1.0.1

Compiled date: 2018-05-04

Last edited: 2018-03-08

License: MIT + file LICENSE

1 Introduction

iSEE is a Bioconductor package that provides an interactive Shiny-based graphical user interface for exploring data stored in SummarizedExperiment objects, including row- and column-level metadata. Particular attention is given to single-cell data in a SingleCellExperiment object with visualization of dimensionality reduction results, e.g., from principal components analysis (PCA) or t-distributed stochastic neighbour embedding (t-SNE, (Van der Maaten and Hinton 2008)).

To install the package, start R and enter:

source("http://bioconductor.org/biocLite.R")
biocLite("iSEE")

To load and attach the package to your current workspace, enter:

library("iSEE")

2 Quick start

Do you want to dive deep in using iSEE right away? Here’s how to get up and running in no time:

Get a dataset to work with, such as the allen data set from the scRNAseq package. This dataset contains 379 cells from the mouse visual cortex, and is a subset of the data presented in (Tasic et al. 2016). The allen data set is packaged as a SummarizedExperiment object; convert it into a SingleCellExperiment object, where you can store the reduced dimension representations (e.g., PCA, t-SNE):

library(scRNAseq)
data(allen)
library(scater)
sce <- as(allen, "SingleCellExperiment")
counts(sce) <- assay(sce, "tophat_counts")
sce <- normalize(sce)
sce <- runPCA(sce)
sce <- runTSNE(sce)
sce
#> class: SingleCellExperiment 
#> dim: 20908 379 
#> metadata(3): SuppInfo which_qc log.exprs.offset
#> assays(6): tophat_counts cufflinks_fpkm ... counts logcounts
#> rownames(20908): 0610007P14Rik 0610009B22Rik ... Zzef1 Zzz3
#> rowData names(0):
#> colnames(379): SRR2140028 SRR2140022 ... SRR2139341 SRR2139336
#> colData names(22): NREADS NALIGNED ... Animal.ID passes_qc_checks_s
#> reducedDimNames(2): PCA TSNE
#> spikeNames(0):

Create the Shiny app using the iSEE function. See ?iSEE for how to configure the start-up settings of the interface. By default, the function will generate one panel of every type – see below for details on the different panel types.

app <- iSEE(sce)

Launch the app using the runApp function.

shiny::runApp(app)

Once you have started the app, look in the upper right corner for a question mark icon (), and click on the button for an introductory tour. This will perform an interactive tour of the app, based on the rintrojs package (Ganz 2016). During this tour, you will be taken through the different components of the iSEE user interface and learn the basic usage mechanisms by doing: the highlighted elements will be responding to the user’s actions, while the rest of the UI will be shaded. You can move forward and backward along the tour by clicking on the Next/Back buttons, or also using the arrow keys. You can even jump to a particular step by clicking on its circle. To exit the tour, either click on Skip, or simply click outside of the highlighted UI element.
Once you are done generating plots, click on the wrench icon () in the upper right corner, and click on the button to display R code that you can export and directly re-use in your R session. This will open a modal popup where the R code used to generate the plots is displayed in a shinyAce-based text editor. Select parts or all of it to copy-and-paste it into your analysis script/Rmarkdown file.

3 Description of the user interface

3.1 Header

The layout of the iSEE user interface is based mostly on the shinydashboard package. The dashboard header contains three dropdown menus, where a number of extra elements are available:

The iSEE diagnostics menu (), containing:
- the “Examine panel chart” functionality, identified by a chain icon (). Click on this button to obtain a graph representation of the existing links and point selection among your open plot and table panels (coded with the same colors as in the app). This can be very useful in sessions that include a large number of panels, to visualise the relationship structure between the various panels that send and receive selections of data points.
- the “Extract the R code”, functionality (). Once you are done with your live session, you might want to reproduce exactly the plots you generated. Clicking on this button opens a modal popup window, with a shinyAce-based text editor, where the code is formatted and displayed with syntax highlighting. You can copy the code to the clipboard by selecting the text (please do include the initial lines and the sessionInfo() commands for best tracking of your environment), and store it in your analysis report/script. Your code can then be further edited to finalize the plots (e.g., for publication).
The Documentation menu is accessible through the question mark icon (), which contains:
- the button to start the Quick Tour () of iSEE, which allows users to learn the basic usage mechanisms by doing. The highlighted elements will be responding to the user’s actions, while the rest of the UI will be shaded.
- the button to “Open the vignette” (), which will display the iSEE vignette, either available locally or accessing the current webpage of the package on the Bioconductor project site (internet connection is required, please keep in mind it will be referring to the most current release version, which may differ from the version installed on your system)
The Additional information icon (), with
- the “About this session” button, which reports the output of the sessionInfo() function in a modal popup window (). This is particularly useful for reproducing or reporting the environment, especially in the case of errors or unexpected behaviors.
- the “About iSEE” button () will show up the information on the development team, licensing and citation information for the iSEE package. You can follow the development of the package by checking the GitHub repository (https://github.com/csoneson/iSEE), where new functionality will be added, also based on well-considered suggestions in the form of issues and/or pull requests.

3.2 Sidebar

The dashboard sidebar of iSEE contains a dropdown menu and a button for creating new plots or tables (referred to as “panels”) in the interface. The maximum number of panels of each type are defined in the iSEE function call that launches the app. Upon clicking these buttons, additional panels of the corresponding type are inserted into the main body of the app. Color-coded tabs in the sidebar contain buttons to change the panel order ( and ) or to remove panels entirely (). The width and height of each panel can also be adjusted using .

3.3 Body

The main element in the body of iSEE is the combination of panels, generated (and optionally linked to one another) according to your actions. The number and identity of the panels and their inter-relationships can also be specified at initialization by passing appropriate arguments to iSEE. The main explanation on how the different plots and tables work is presented in Section 4.

4 Description of iSEE functionality

4.1 Overview

There are currently six different panel types that can be generated with iSEE:

Reduced dimension plots
Column data plots
Feature assay plots
Row statistics tables
Row data plots
Heat maps

For each plot panel, three different sets of parameters will be available in collapsible boxes:

Data parameters, to control parameters specific to each type of plot.
Visual parameters, to specify parameters that will determine the aspect of the plot, in terms of coloring, point features, and more (e.g. legend placement, font size)
Selection parameters to control the point selection and link relationships to other plots.

4.2 Reduced dimension plots

If a SingleCellExperiment object is supplied to iSEE, any reduced dimension results (e.g., PCA, t-SNE) is extracted from the reducedDim slot. These results are used to construct two-dimensional reduced dimension plots where each point is a sample, to facilitate efficient exploration of high-dimensional datasets. The Data parameters control the reducedDim slot to be displayed, as well as the two dimensions to plot against each other. Note that iSEE does not computed reduced dimension embeddings; they must be precomputed and available in the object to the iSEE function.

4.3 Column data plots

A column data plot involves visualizing sample metadata stored in the SummarizedExperiment object. Different fields can be used for the X and Y axes by selecting appropriate values in the plotting parameters. This plot can assume various forms, depending on the nature of the data on the x- and y-axes:

If the Y axis is continuous and the X axis is categorical, violin plots are generated (grouped by the X axis factor).
If the Y axis is categorical and the X axis is continuous, horizontal violin plots are generated (grouped by the Y axis factor).
If both are continuous, a scatter plot is generated.
If both are categorical, a plot of squares is generated where the area of each square is proportional to the number of samples within each combination of factor levels.

An X axis setting of “None” is considered to be categorical with a single level.

4.4 Feature assay plots

A feature assay plot visualizes the assayed values (e.g., gene expression) for a particular feature (e.g., gene) across the samples on the y-axis. This usually results in a (grouped) violin plot, if the X axis is set to "None" or a categorical variable; or a scatter plot, if the X axis is another continuous variable.

Note: That said, if there are categorical values for the assayed values, these will be handled as described in the column data plots.

Gene selection for the y-axis is achieved by using a linked row statistics table in another panel. Clicking on a row in the table will automatically change the assayed values plotted on the Y axis. Alternatively, the row name can be directly entered as text that corresponds to an entry of rownames(se).

Note: this is not effective if se does not contain row names.

The X axis covariate can also be selected from the plotting parameters. This can be "None", column data, or the assayed values of another feature (also identified using a linked table or via text). The measurement units are selected as one of the assays(se), which is applied to both the X and Y axes.

Obviously, any other assayed value for any feature can be visualized in this manner, not limited to the expression of genes. The only requirement for this type of panel is that the observations can be stored as a matrix in the SummarizedExperiment object.

4.5 Row statistics tables

A row statistics table contains the values of the rowData slot for the SingleCellExperiment/SummarizedExperiment object. If none are available, a column named Present is added and set to TRUE for all available genes, to avoid issues with DT::datatable and an empty DataFrame. Typically, these tables are used to link to other plots to determine the genes to use for plotting (or coloring). However, they can also be used to retrieve gene-specific annotation on the fly by specifying the annotFun parameter, e.g. using the annotateEntrez or annotateEnsembl functions, provided in iSEE. Alternatively, users can create a customized annotation function; for more details on this, please consult the manual pages ?annotateEntrez and ?annotateEnsembl.

4.6 Row data plots

A row data plot allows the visualization of information stored in the rowData slot of a SummarizedExperiment object. Its behavior mirrors the implementation for the column data plot, and correspondingly this plot can assume various forms. Please refer to Section 4.3 for more details.

4.7 Heat maps

Heat maps can display compact overviews of the data at hand in the form of color-coded matrices. These correspond to the assays stored in the SummarizedExperiment object, where features (e.g., genes) are the rows and samples are the columns.

User can select features (rows) to display from the selectize widget (which supports autocompletion), or also via other panels, like row data plots or row statistics tables. The ‘Suggest feature order’ button clusters the rows, and also rearranges the elements in the selectize according to the clustering. It is also possible to choose which assay type is displayed (logcounts being a reasonable default choice). The heat map can also be annotated, simply by selecting relevant column data. A zooming functionality is also available, restricted to the Y axis (i.e., allowing closer inspection on the individual features included).

4.8 Coloring plots by sample attributes

Coloring of points (i.e., samples) on each plot can be achieved in different ways.

The default is no color scheme ("None" in the radio button). This results in black data points.
Any column of colData(se) can be used. The plot automatically adjusts the scale to use based on whether the chosen column is continuous or categorical.
The assay values of a particular feature in each sample can be used. The feature can be chosen either via a linked row table or text input (as described for feature assay plots). Users can also specify the assays from which values are extracted.

4.9 Selecting data points and linking panels

To link one plot to another, users can instruct a plotting panel to receive a selection of data points from another (transmitting) plot, using the appropriate field in the selection parameters box. Once this is done, data point selection on the transmitting plot affects the receiving plot in a variety of ways:

If the point selection effect is set to "Restrict", only the subset of points selected in the transmitter are visible in the receiver.
If set to "Color", the selected subset of points is plotted in the receiver with a color that can be selected via the colourpicker widget
If set to "Transparent", the selected subset will be drawn with no transparency, while all non-selected points will be plotted with the specified alpha value.

4.10 Zooming in and out

This is possible by first selecting a region of interest in a plot. Then, double-clicking on the brushed area zooms into the selected area. To zoom out to the original plot, double-click outside of any brushed area.

4.11 Using an ExperimentColorMap

iSEE coordinates the coloring of every assay, column data, and row data via the ExperimentColorMap class. In particular, this allows users to override the appearance of specific assay or covariates scale, while leaving the other undefined data scales to use the default colormaps.

4.11.1 Default colormaps

In the absence of any user-defined colormap, the application falls back to a set of two default colormaps – a discrete and a continuous – according to the nature of the data displayed.

4.11.2 Definition of colormaps

Colormaps are defined functions that take one argument: the number of colors that the colormap should return. This is particularly important for continuous variables, where the number of colors defines the number of breaks evenly placed across the range of values represented by the color scale, and thereby the resolution between individual colors. Conversely, discrete colormap functions may ignore their argument, as the number of factor levels is typically known in advance, and impose a fixed number of corresponding colors. Furthermore, discrete colormaps may typically return a named vector of colors, with the name declaring which factor level each color is assigned to.

4.11.3 Specific and shared colormaps

Colormaps can be controlled and overriden by users at three different levels:

each individual assay, column data, and row data can be assigned its own distinct colormap. Those colormaps are stored as items of separate lists stored in the assays, colData, and rowData slots, respectively, of the ExperimentColorMap. This can be useful to easily remember which assay is currently shown, to apply different color scale limits to assays that vary on different ranges of values, or display boolean information in an intuitive way, among many other scenarios.
shared colormaps can be defined for all assays, all column data, and all row data, respectively, that do not have an individual colormap defined. Those colormaps are stored as items assays, colData, and rowData of lists stored in the all_discrete and all_continuous slots of the ExperimentColorMap.
globally shared colormaps can be defined for all discrete or continuous data that do not have colormaps defined in the previous two levels. Those two colormaps are stored in the global_discrete and global_continuous slots of the ExperimentColorMap.

4.11.4 Fall-back mechanism

When queried for a specific colormap of any type (assay, column data, or row data), the following process takes place:

the specific colormap is looked up in the appropriate slot (i.e., @assays, @colData, or @rowData) of the ExperimentColorMap.
if it is not defined, the shared colormap of the appropriate slot is looked up, according to the discrete or continuous nature of the data (for instance, @all_continuous$colData is looked up for continous column data without a defined individual colormap)
if it is not defined, the globally shared colormap is looked up, according to the discrete or continuous nature of the data (for instance, @global_continuous is looked up for continous column data without a defined individual colormap, in the absance of a shared colData continuous colormap)
lastly, if none of the above colormaps were defined, the ExperimentColorMap will return the default colormap, according to the discrete or continuous nature of the data.

4.11.5 Benefits

The ExperimentColorMap class offers the following major features:

a single place to define flexible and lightweight sets of colormaps, that may be saved and reused across sessions and projects outside the app, to apply consistent coloring schemes across entire projects
a simple interface through accessors colDataColorMap(colormap, "coldata_name") and setters assayColorMap(colormap, "assay_name") <- colormap_function
an elegant fallback mechanism to consistently return a colormap, even for undefined covariates, including a default discrete and continuous colormap, respectively.
three levels of colormaps override: individual, shared within slot (i.e., assays, colData, rowData), or shared globally between all discrete or continuous data scales.

Detailed examples on the use of ExperimentColorMap objects are available in the documentation ?ExperimentColorMap, as well as in use in Section 5.1.2.

5 Use cases

5.1 Use case I: Basic exploration of the Allen dataset

5.1.1 Loading the data

In this section, we illustrate how iSEE can be used to explore the allen single-cell RNA-seq data set from the scRNAseq package. It contains expression values for 379 cells from the mouse visual cortex (Tasic et al. 2016). We start by converting the provided SummarizedExperiment object to a SingleCellExperiment object and normalizing the expression values.

library(scRNAseq)
data(allen)
class(allen)
#> [1] "SummarizedExperiment"
#> attr(,"package")
#> [1] "SummarizedExperiment"
library(scater)
sce <- as(allen, "SingleCellExperiment")
counts(sce) <- assay(sce, "tophat_counts")
sce <- normalize(sce)

Next, we apply Principal Components Analysis (PCA) and t-distributed Stochastic Neighbour Embedding (t-SNE) to generate two reduced dimension representations of the cells. Note that all computations (e.g., dimension reduction, clustering) must be performed before passing the object to the iSEE function.

sce <- runPCA(sce)
sce <- runTSNE(sce)
reducedDimNames(sce)
#> [1] "PCA"  "TSNE"

The provided cell annotations for this data set are available in colData(sce).

colnames(colData(sce))
#>  [1] "NREADS"                       "NALIGNED"                    
#>  [3] "RALIGN"                       "TOTAL_DUP"                   
#>  [5] "PRIMER"                       "PCT_RIBOSOMAL_BASES"         
#>  [7] "PCT_CODING_BASES"             "PCT_UTR_BASES"               
#>  [9] "PCT_INTRONIC_BASES"           "PCT_INTERGENIC_BASES"        
#> [11] "PCT_MRNA_BASES"               "MEDIAN_CV_COVERAGE"          
#> [13] "MEDIAN_5PRIME_BIAS"           "MEDIAN_3PRIME_BIAS"          
#> [15] "MEDIAN_5PRIME_TO_3PRIME_BIAS" "driver_1_s"                  
#> [17] "dissection_s"                 "Core.Type"                   
#> [19] "Primary.Type"                 "Secondary.Type"              
#> [21] "Animal.ID"                    "passes_qc_checks_s"

5.1.2 Defining color maps

We define color palettes to use in iSEE when coloring according to specific cell annotations or expression values. Each palette should be a function that accepts a single numeric scalar, and returns a vector of colors of length equal to the input number.

For continuous variables, the function will be asked to generate a number of colors (21, by default). Interpolation will then be performed internally to generate a color gradient. Users can use existing color scales like viridis::viridis, or make their own interpolation points with a function that simply ignores the input number:

fpkm_color_fun <- function(n){
  x <- c("black","brown","red","orange","yellow")
  return(x)
}

For categorical variables, the function should accept the number of levels and return a color per level. This is illustrated below with a function that generates colors for the driver_1_s factor (that has three distinct levels).

driver_color_fun <- function(n){
  return(RColorBrewer::brewer.pal(n, "Set2"))
}

Alternatively, functions can simply return a named vector of colors if users want to specify the color for each level explicitly (alternatively, colors are automatically assigned to factor levels in the order in which they are found). In such cases, the function can ignore its argument, though it is the user’s responsibility to ensure that all levels are accounted for. For instance, the following colormap function will only be compatible with factors of two levels, namely "Y" and "N":

qc_color_fun <- function(n){
  qc_colors <- c("forestgreen", "firebrick1")
  names(qc_colors) <- c("Y", "N")
  return(qc_colors)
}

A variable set of colormap functions can be stored in an instance of the ExperimentColorMap class. Named functions passed as assays, colData, or rowData arguments will be used for coloring data in those slots, respectively.

ecm <- ExperimentColorMap(
  assays = list(
      counts = viridis::viridis,
      cufflinks_fpkm = fpkm_color_fun
  ),
  colData = list(
      passes_qc_checks_s = qc_color_fun,
      driver_1_s = driver_color_fun
  ),
  all_continuous = list(
    assays = viridis::viridis
  )
)
ecm
#> Class: ExperimentColorMap
#> assays(2): counts cufflinks_fpkm
#> colData(2): passes_qc_checks_s driver_1_s
#> rowData(0):
#> all_discrete(0):
#> all_continuous(1): assays

Users may change the defaults for assays or column data without defined individual colormapping functions. Furthermore, default colormap for all variables can also be altered for continuous and discrete variable, respectively. The example shown below alters the defaults for continuous variables, but the same applies for categorical factors as well.

ExperimentColorMap(
  all_continuous=list(
    assays=viridis::plasma,
    colData=viridis::inferno
  ),
  global_continuous=viridis::magma  
)
#> Class: ExperimentColorMap
#> assays(0):
#> colData(0):
#> rowData(0):
#> all_discrete(0):
#> all_continuous(2): assays colData
#> global_continuous(1)

5.1.3 Exploring the dataset

To begin the exploration, we create an iSEE app with the SingleCellExperiment object and the colormap generated above.

app <- iSEE(sce, colormap = ecm)

We run this using runApp to open the app on our browser.

shiny::runApp(app)

5.1.3.1 Overview

By default, the app starts with a dashboard that contains one reduced dimension plot, one column data plot, one feature assay plot, one row statistics table, one row data plot (if data is available), and one heat map (see Use case II for how to change this). By opening the collapsible panels named “Data parameters”, “Visual parameters”, and “Selection parameters” under each plot, we can examine and control the content and appearance of the respective plots.

5.1.3.2 The reduced dimension panel

Let us start by exploring the reduced dimension panel. As can be seen from the “Data parameters” panel, this plot shows the first two principal components. Change Type to (2) TSNE to instead see the two-dimensional t-SNE representation. Next, open the “Visual parameters” panel. By default, the points (cells) are not colored. By selecting Column data and choosing one of the variables in the dropdown menu that shows up, the cells can be colored by any of the provided annotations. Let’s choose passes_qc_checks_s, one of the annotations for which we defined a custom color map in the preparations above. Now, all cells that passed QC (Y) are colored forestgreen, while the ones that didn’t pass are colored firebrick.

5.1.3.3 The column data panel

Now let us move on to the column data panel. Here, we see the distribution of the number of reads (NREADS) across the cells in the data set, as well as the individual values for each cell. Note that the location of the points along the X axis is generated by the jittering, and does not encode any information (you can see this as X-axis = None in the Data parameters panel). We can also plot two cell annotation against each other, by setting X-axis to be Column data and choosing one of the variables in the drop down menu that pops up. For example, we can choose NALIGNED (the number of aligned reads), and we see that (as expected), there is a very strong association between the total number of reads and the number of aligned reads. Again, we can color the cell by whether or not they passed the QC, by selecting Column data in the “Visual parameters” panel and choosing passes_qc_checks_s in the dropdown menu.

5.1.3.4 The feature assay panel

Finally, let us look at the feature assay panel. This plot displays the distribution of the expression values for a given gene, which has been specified by selecting the first row in the row statistics table (this is seen in the “Data parameters” tab, as Y-axis is Gene table, and it is indicated that Y-axis gene linked to is Row statistics table 1). The values shown in the plots are taken from the logcounts assay of the provided SingleCellExperiment. We can modify this choice to any other assay that was available in the object (e.g., rsem_tpm) to display other expression values. Note how the Y axis title keeps track of what is displayed in the plot. Again, note that all assays must be precalculated before the object is passed to the iSEE function. If we would like to show the expression of a particular gene of interest (e.g., Znrf1), we can either find and select it in the row statistics table (use the search box just above the table), or we can set Y-axis to Gene text, and type Znrf1 in the text box that shows up. As for the other plots, we can color in many ways. Let us color the points in the feature assay plot by the expression of another gene (e.g., Grp). To do this, open up the “Visual parameters” panel, select Gene text, and paste Grp in the text box that shows up.

5.1.3.5 Linking panels

After exploring the individual plots, let us now see how they can be linked together using the point selection functionality. To this end, let us say that we are interested in seeing the expression of a certain gene in a particular cluster of cells apparent in the reduced dimension plot. First, open the “Selection parameters” panel under the Feature assay plot,and choose Receive brush from: to be Reduced dimension plot 1 and set the Brush effect to Transparent. Then, drag the mouse to draw a rectangle around the cluster of interest in Reduced dimension plot 1. You will see that the points in Feature assay plot 1 that are not within the rectangle in the Reduced dimension plot 1 are appearing now more transparent, allowing you to see the distribution of expression values for the chosen gene in the cluster of interest. By changing the Brush effect, you can also restrict the receiving plot to only show the selected points (Restrict) or color the selected points (be mindful of the color choice if you have already colored the points according to another covariate via the “Coloring parameters” panel).

5.2 Use case II: Changing the default start configuration

The default start configuration with one plot of each type is not optimal for all use cases. iSEE allows the user to programmatically modify the initial settings, avoiding the need to click through the choices to obtain the desired panel setup.

To demonstrate this feature, let’s assume that we are only interested in feature assay plots. The default set of panels can be changed via the initialPanels argument of the iSEE function call. Given a SingleCellExperiment/SummarizedExperiment named sce, the following code opens an app with two adjacent feature assay plots.

init <- DataFrame(
  Name = c("Feature assay plot 1", "Feature assay plot 2"),
  Width = c(6, 6)
)
app <- iSEE(sce, initialPanels = init)

The genes to show on the y-axis in the two plots can be specified via the geneExprArgs argument to iSEE. This is a DataFrame, specifying the initial parameters for each plot. In this case, we want to modify the YAxis and YAxisFeatName defaults for the two plots:

fexArg <- featAssayPlotDefaults(sce, 2)
fexArg$YAxisFeatName <- c("0610009L18Rik", "0610009B22Rik")
app <- iSEE(sce, initialPanels = init, featAssayArgs = fexArg)

This will open the app with two feature assay plots, showing the selected genes. Of course, from this starting point, all the interactive functionality is still available, and new panels can be added, modified and linked via brushing. Users should refer to ?defaults for the full list of values that can be specified in iSEE.

5.3 Use case III: Exploring mass cytometry data

The flexibility of the iSEE interface means that it can be used to explore a variety of data types that can be represented in tabular format. Here we show how iSEE can be used to explore a mass cytometry (CyTOF) data set (originally from (Bodenmiller et al. 2012)), by defining an initial panel configuration that mimics the familiar gating setup, consisting of multiple two-dimensional scatter plots linked to one another. The example data set was prepared as described by (Nowicka et al. 2017), and the code to generate a SummarizedExperiment object is available from https://github.com/lmweber/CyTOF-example-data. Here we use a subset of the original data, consisting of 5,000 randomly selected cells, for which 35 markers are measured. The SummarizedExperiment object also contains a t-SNE representation of the 5,000 cells and the results of a cell clustering performed with FlowSOM (Van Gassen et al. 2015) in order to automatically detect cell populations. The data set can be downloaded as follows:

cytof <- readRDS(gzcon(url(
  "https://www.dropbox.com/s/42kfedfpzyssqvq/cytof_bcrxl_5000cells.rds?dl=1")))

The traditional way to analyze cytometry experiments and identify known cell populations is to construct a collection of two-dimensional scatter plots, each visualizing the cells in the space of two of the measured markers. By successively selecting subsets of cells with specific patterns of marker abundance (“gating”), experienced users can assign cells to known populations. To mimic part of the original gating strategy from (Bodenmiller et al. 2012), we define a starting configuration for iSEE with five Feature assay plots, each showing two of the markers. Each plot is linked to the previous one, with the brushing effect set to “Restrict”, meaning that only the cells selected by point selection in the previous plot are shown in each plot. We also include a Reduced dimension plot, in which the cells are colored by the automatic cluster assignment. The Reduced dimension plot receives a restricting brush from the last Feature assay plot, which means that only the points that are selected by brushing in the latter will be shown in the Reduced dimension plot.

The code below sets up the initial panel configuration and brushing relationships.

initialPanels = DataFrame(
  Name = c(paste("Feature assay plot", 1:5), "Reduced dimension plot 1"),
  Width = c(rep(3, 5), 5))
# Feature assay plots
fexArg <- featAssayPlotDefaults(cytof, 5)
fexArg$XAxis <- "Feature name"
fexArg$XAxisFeatName <- c("Cell_length", "CD3", "CD7", "CD3", "CD4")
fexArg$YAxisFeatName <- c("CD45", "CD20", "CD45", "CD33", "CD45")
fexArg$SelectByPlot <- c("", paste("Feature assay plot", 1:4))
fexArg$SelectEffect <- "Restrict"
# Reduced dimension plot
redArg <- redDimPlotDefaults(cytof, 1)
redArg$ColorBy <- "Column data"
redArg$ColorByColData <- "population"
redArg$SelectByPlot <- "Feature assay plot 5"
redArg$SelectEffect <- "Restrict"

iSEE(cytof, initialPanels = initialPanels, featAssayArgs = fexArg, redDimArgs = redArg)

Once we have started iSEE in this configuration, we can start brushing from the first Feature assay plot to successively refine a cell subpopulation. Roughly following the gating strategy in (Bodenmiller et al. 2012), we select cells with high CD45 expression in Feature assay plot 1, followed by cells with low CD20 expression in Feature assay plot 2. Next, we select cells with high CD7 expression in Feature assay plot 3, and subsequently cells with high CD3 and low CD33 expression in Feature assay plot 4. Finally, by selecting either cells with high CD4 expression or those with low CD4 expression in Feature assay plot 5, and reviewing the resulting cells shown in the Reduced dimension plot 1, we can see that these are indeed the cells that were automatically classified as CD4⁺ (or CD8⁺) T-cells by the FlowSOM clustering.

At this point, we can export all the code needed to reproduce the six panels by clicking on the wrench () icon in the top right corner and selecting “Extract the R code”.

5.4 Use case III: Using the `ExperimentHub`

Here we will use iSEE to showcase a TCGA RNA-seq data set of Rsubread-summarized raw count data for 7,706 tumor samples, represented as an ExpressionSet. The R data representation was derived from GEO accession GSE62944. It is here retrieved from the Bioconductor ExperimentHub, immediately ready for post-processing and visualization.

stopifnot(
  require(ExperimentHub),
  require(SingleCellExperiment),
  require(irlba),
  require(Rtsne),
  requireNamespace("scater")
)
ehub <- ExperimentHub()
eh1 <- ehub[["EH1"]] # an ExpressionSet
se1 <- as(eh1, "SummarizedExperiment")
sce1 <- as(se1, "SingleCellExperiment")
sce1 <- scater::runPCA(sce1, exprs_values = "exprs")
irlba_out <- irlba(assay(sce1, "exprs"))
tsne_out <- Rtsne(irlba_out$v, pca = FALSE, perplexity = 50, verbose = TRUE)
reducedDim(sce1, "TSNE") <- tsne_out$Y

To demonstrate a fully featured app, that also enables row data plot panels, let us add gene-level metadata. For instance, the mean log₂-transformed expression of each gene, the count of samples in which each gene was detected, and the proportion of samples in which each gene was detected :

num_detected <- rowSums(assay(sce1, "exprs") > 0)
rowData(sce1) <- DataFrame(
  mean_log_exprs = rowMeans(log2(assay(sce1, "exprs") + 1)),
  num_detected = num_detected,
  percent_detected = num_detected * 100 / ncol(sce1)
)
if (interactive()) { iSEE(sce1) }

5.5 Using online file sharing systems

Active projects may require regular updates to the SingleCellExperiment object, that must be iteratively re-distributed to collaborators.

One possible choice is to store the up-to-date SingleCellExperiment object in a file produced by the saveRDS command, and to host this file in an online file sharing system (e.g., Drobpox).

Users may then download the file and launch the iSEE application as follows:

library(curl)
rdsURL <- "https://my.file.sharing.website.com/URI?dl=1" # download=true
temp_file <- tempfile(pattern = "iSEE_", fileext = ".rds")
message("Downloading URL to temporary location: ", temp_file)
curl_download(url = rdsURL, destfile = temp_file, quiet = FALSE)
sce <- readRDS(temp_file)
if (interactive()) { iSEE(sce) }

To save the downloaded file to a persistent location, users may adapt the following code chunk:

newLocation <- "/path/of/your/choice/new_file_name.rds"
file.copy(from = temp_file, to = newLocation)

6 FAQ

Q: Can you implement a ‘Copy to clipboard’ button in the code editor?

A: This is not necessary, as one can click anywhere in the code editor and instantly select all the code using a keyboard shortcut that depends on your operating system.

Q: When brushing with a transparency effect, it seems that data points in the receiving plot are not subsetted correctly.

A: The subsetting is correct. What you see is an artefact of overplotting: in areas excessively dense in points, transparency ceases to be an effective visual effect.

Q: Brushing on violin or square plots doesn’t seem to select anything.

A: For violin plots, points will be selected only if the brushed area includes the center of the x-tick, i.e., the center of the violin plot. This is intentional as it allows easy selection of all points in complex grouped violin plots. Indeed, the location of a specific point on the x-axis has no meaning. The same logic applies to the square plots, where only the center of each square needs to be selected to obtain all the points in the square.

Q: I’d like to try iSEE but I can’t install it/I just want a quick peek. Is there something you can do?

A: We set up an instance of iSEE running on the allen dataset at this address: http://shiny.imbei.uni-mainz.de:3838/iSEE. Please keep in mind this is only for trial purposes, yet it can show a quick way of how you or your system administrator can setup iSEE for analyzing your SummarizedExperiment/SingleCellExperiment precomputed object.

7 Additional information

Bug reports can be posted on the Bioconductor support site or raised as issues in the iSEE GitHub repository. The GitHub repository also contains the development version of the package, where new functionality will be added in the future. The authors appreciate well-considered suggestions for improvements or new features, or even better, pull requests.

If you use iSEE for your analysis, please cite it as shown below:

citation("iSEE")
#> 
#> To cite package 'iSEE' in publications use:
#> 
#>   Charlotte Soneson, Aaron Lun, Federico Marini and Kevin
#>   Rue-Albrecht (2018). iSEE: Interactive SummarizedExperiment
#>   Explorer. R package version 1.0.1.
#>   https://github.com/csoneson/iSEE
#> 
#> A BibTeX entry for LaTeX users is
#> 
#>   @Manual{,
#>     title = {iSEE: Interactive SummarizedExperiment Explorer},
#>     author = {Charlotte Soneson and Aaron Lun and Federico Marini and Kevin Rue-Albrecht},
#>     year = {2018},
#>     note = {R package version 1.0.1},
#>     url = {https://github.com/csoneson/iSEE},
#>   }

Session Info

sessionInfo()
#> R version 3.5.0 (2018-04-23)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 16.04.4 LTS
#> 
#> Matrix products: default
#> BLAS: /home/biocbuild/bbs-3.7-bioc/R/lib/libRblas.so
#> LAPACK: /home/biocbuild/bbs-3.7-bioc/R/lib/libRlapack.so
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] parallel  stats4    stats     graphics  grDevices utils     datasets 
#> [8] methods   base     
#> 
#> other attached packages:
#>  [1] scater_1.8.0                ggplot2_2.2.1              
#>  [3] scRNAseq_1.6.0              iSEE_1.0.1                 
#>  [5] SingleCellExperiment_1.2.0  SummarizedExperiment_1.10.0
#>  [7] DelayedArray_0.6.0          BiocParallel_1.14.0        
#>  [9] matrixStats_0.53.1          Biobase_2.40.0             
#> [11] GenomicRanges_1.32.1        GenomeInfoDb_1.16.0        
#> [13] IRanges_2.14.2              S4Vectors_0.18.1           
#> [15] BiocGenerics_0.26.0         BiocStyle_2.8.0            
#> 
#> loaded via a namespace (and not attached):
#>  [1] nlme_3.1-137             bitops_1.0-6            
#>  [3] bit64_0.9-7              rprojroot_1.3-2         
#>  [5] tools_3.5.0              backports_1.1.2         
#>  [7] R6_2.2.2                 DT_0.4                  
#>  [9] vipor_0.4.5              DBI_1.0.0               
#> [11] lazyeval_0.2.1           mgcv_1.8-23             
#> [13] colorspace_1.3-2         gridExtra_2.3           
#> [15] bit_1.1-12               compiler_3.5.0          
#> [17] shinyjs_1.0              colourpicker_1.0        
#> [19] bookdown_0.7             scales_0.5.0            
#> [21] stringr_1.3.0            digest_0.6.15           
#> [23] rmarkdown_1.9            rentrez_1.2.1           
#> [25] XVector_0.20.0           pkgconfig_2.0.1         
#> [27] htmltools_0.3.6          limma_3.36.0            
#> [29] htmlwidgets_1.2          rlang_0.2.0             
#> [31] RSQLite_2.1.0            shiny_1.0.5             
#> [33] DelayedMatrixStats_1.2.0 bindr_0.1.1             
#> [35] jsonlite_1.5             dplyr_0.7.4             
#> [37] RCurl_1.95-4.10          magrittr_1.5            
#> [39] GenomeInfoDbData_1.1.0   Matrix_1.2-14           
#> [41] Rhdf5lib_1.2.0           ggbeeswarm_0.6.0        
#> [43] Rcpp_0.12.16             munsell_0.4.3           
#> [45] viridis_0.5.1            edgeR_3.22.0            
#> [47] stringi_1.2.2            yaml_2.1.19             
#> [49] rintrojs_0.2.0           zlibbioc_1.26.0         
#> [51] Rtsne_0.13               rhdf5_2.24.0            
#> [53] plyr_1.8.4               grid_3.5.0              
#> [55] blob_1.1.1               promises_1.0.1          
#> [57] shinydashboard_0.7.0     miniUI_0.1.1            
#> [59] lattice_0.20-35          cowplot_0.9.2           
#> [61] locfit_1.5-9.1           knitr_1.20              
#> [63] pillar_1.2.2             igraph_1.2.1            
#> [65] rjson_0.2.15             reshape2_1.4.3          
#> [67] XML_3.98-1.11            glue_1.2.0              
#> [69] evaluate_0.10.1          data.table_1.11.0       
#> [71] httpuv_1.4.2             gtable_0.2.0            
#> [73] assertthat_0.2.0         xfun_0.1                
#> [75] mime_0.5                 xtable_1.8-2            
#> [77] later_0.7.2              viridisLite_0.3.0       
#> [79] tibble_1.4.2             beeswarm_0.2.3          
#> [81] AnnotationDbi_1.42.0     memoise_1.1.0           
#> [83] tximport_1.8.0           bindrcpp_0.2.2          
#> [85] shinyAce_0.2.1
# devtools::session_info()

References

Bodenmiller, Bernd, Eli R Zunder, Rachel Finck, Tiffany J Chen, Erica S Savig, Robert V Bruggner, Erin F Simonds, et al. 2012. “Multiplexed Mass Cytometry Profiling of Cellular States Perturbed by Small-Molecule Regulators.” Nat. Biotechnol. 30 (9):858–67.

Ganz, C. 2016. “rintrojs: A Wrapper for the Intro.js Library.” Journal of Open Source Software 1 (6). The Open Journal. http://dx.doi.org/10.21105/joss.00063.

Nowicka, Malgorzata, Carsten Krieg, Lukas M Weber, Felix J Hartmann, Silvia Guglietta, Burkhard Becher, Mitchell P Levesque, and Mark D Robinson. 2017. “CyTOF Workflow: Differential Discovery in High-Throughput High-Dimensional Cytometry Datasets.” F1000Res. 6:748.

Tasic, B., V. Menon, T. N. Nguyen, T. K. Kim, T. Jarsky, Z. Yao, B. Levi, et al. 2016. “Adult mouse cortical cell taxonomy revealed by single cell transcriptomics.” Nat. Neurosci. 19 (2):335–46.

Van der Maaten, L., and G. Hinton. 2008. “Visualizing Data Using T-SNE.” J. Mach. Learn. Res. 9 (2579-2605):85.

Van Gassen, Sofie, Britt Callebaut, Mary J Van Helden, Bart N Lambrecht, Piet Demeester, Tom Dhaene, and Yvan Saeys. 2015. “FlowSOM: Using Self-Organizing Maps for Visualization and Interpretation of Cytometry Data.” Cytometry A 87 (7):636–45.