Rapid advances in flow cytometry and other single-cell technologies have enabled high-dimensional, multi-parameter, high-throughput measurements of individual cells. Numerous questions about cell population heterogeneity can now be addressed, as these novel technologies permit single-cell analysis of antigen-specific T-cells. Unfortunately, there is a lack of computational tools to take full advantage of these complex data. COMPASS is a statistical framework that enables unbiased analysis of antigen-specific T-cell subsets. COMPASS uses a Bayesian hierarchical framework to model all observed cell-subsets and select the most likely to be antigen-specific while regularizing the small cell counts that often arise in multi-parameter space. The model provides a posterior probability of specificity for each cell subset and each sample, which can be used to profile a subject’s immune response to external stimuli such as infection or vaccination.
We will use a simulated dataset to illustrate the use of the COMPASS
package.
First, we will outline the components used to construct the COMPASSContainer
,
the data structure used to hold data from an ICS experiment. We will first
initialize some parameters used in generating the simulated data.
library(COMPASS)
set.seed(123)
n <- 100 ## number of samples
k <- 6 ## number of markers
sid_vec <- paste0("sid_", 1:n) ## sample ids; unique names used to denote samples
iid_vec <- rep_len( paste0("iid_", 1:(n/10) ), n ) ## individual ids
The COMPASSContainer
is built out of three main components: the data,
the total counts, and the metadata. We will describe the R structure
of these components next.
This is a list of matrices, with each matrix representing a population of cells drawn from a particular sample. Each row is an individual cell, each column is a marker (cytokine), and each matrix element is an intensity measure associated with a particular cell and marker. Only cells that express at least one marker are included.
data <- replicate(n, {
nrow <- round(runif(1) * 1E4 + 1000)
ncol <- k
vals <- rexp( nrow * ncol, runif(1, 1E-5, 1E-3) )
vals[ vals < 2000 ] <- 0
output <- matrix(vals, nrow, ncol)
output <- output[ apply(output, 1, sum) > 0, ]
colnames(output) <- paste0("M", 1:k)
return(output)
})
names(data) <- sid_vec
head( data[[1]] )
## M1 M2 M3 M4 M5 M6
## [1,] 0.000 2989.978 0.000 2121.293 0.000 3119.056
## [2,] 0.000 2631.113 0.000 3423.290 3559.051 0.000
## [3,] 0.000 2970.300 0.000 3520.768 0.000 3351.278
## [4,] 0.000 0.000 3004.248 0.000 0.000 0.000
## [5,] 0.000 0.000 2364.512 0.000 5174.260 3715.566
## [6,] 3449.089 0.000 0.000 0.000 0.000 0.000
This is a named integer vector, denoting the total number of cells that were
recovered from each sample in data
.
counts <- sapply(data, nrow) + round( rnorm(n, 1E4, 1E3) )
counts <- setNames( as.integer(counts), names(counts) )
head(counts)
## sid_1 sid_2 sid_3 sid_4 sid_5 sid_6
## 12911 15610 17694 17256 11453 12499
This is a data.frame
that associates metadata information with each sample
available in data
/ counts
. We will suppose that each sample was
subject to one of two treatments named Control
and Treatment
.
meta <- data.frame(
sid=sid_vec,
iid=iid_vec,
trt=sample( c("Control", "Treatment"), n, TRUE )
)
head(meta)
## sid iid trt
## 1 sid_1 iid_1 Treatment
## 2 sid_2 iid_2 Control
## 3 sid_3 iid_3 Treatment
## 4 sid_4 iid_4 Control
## 5 sid_5 iid_5 Control
## 6 sid_6 iid_6 Control
Once we have these components, we can construct the COMPASSContainer
:
CC <- COMPASSContainer(
data=data,
counts=counts,
meta=meta,
individual_id="iid",
sample_id="sid"
)
We can see some basic information about our COMPASSContainer
:
CC
## A COMPASSContainer with 100 samples from 10 individuals, containing data across 6 markers.
summary(CC)
## A COMPASSContainer with 100 samples from 10 individuals, containing data across 6 markers.
## The metadata describes 3 variables on 100 samples. The columns are:
##
## sid iid trt
Fitting the COMPASS
model is very easy once the data has been inserted into
the COMPASSContainer
object. To specify the model, the user needs to specify
the criteria that identify samples that received a positive stimulation, and
samples that received a negative stimulation. In our data, we can specify it
as follows. (Because MCMC is used to sample the posterior and that is a slow
process, we limit ourselves to a small number of iterations for the purposes
of this vignette.)
COMPASS
is designed to give verbose output with the model fitting statement,
in order for users to compare expectations in their data to what the COMPASS
model fitting function is doing, in order to minimize potential errors.
fit <- COMPASS( CC,
treatment=trt == "Treatment",
control=trt == "Control",
iterations=100
)
## There are a total of 52 samples from 10 individuals in the 'treatment' group.
## There are multiple samples per individual; these will be aggregated.
## There are a total of 48 samples from 10 individuals in the 'control' group.
## There are multiple samples per individual; these will be aggregated.
## The model will be run on 10 paired samples.
## The category filter did not remove any categories.
## There are a total of 64 categories to be tested.
## Initializing parameters...
## Computing initial parameter estimates...
## Fitting model with 8 replications.
## Running replication 1 of 8...
## Running replication 2 of 8...
## Running replication 3 of 8...
## Running replication 4 of 8...
## Running replication 5 of 8...
## Running replication 6 of 8...
## Running replication 7 of 8...
## Running replication 8 of 8...
## Done!
## Computing the posterior difference in proportions, posterior log ratio...
## Done!
After fitting the COMPASS
model, we can examine the output in many ways:
## Extract the functionality, polyfunctionality scores as described
## within the COMPASS paper -- these are measures of the overall level
## of 'functionality' of a cell, which has shown to be correlated with
## a cell's affinity in immune response
FS <- FunctionalityScore(fit)
PFS <- PolyfunctionalityScore(fit)
## Obtain the posterior difference, posterior log ratio from a COMPASSResult
post <- Posterior(fit)
## Plot a heatmap of the mean probability of response, to visualize differences
## in expression for each category
plot(fit)
## The 'threshold' filter has removed 0 categories:
## Visualize the posterior difference, log difference with a heatmap
plot(fit, measure=PosteriorDiff(fit), threshold=0)
## The 'threshold' filter has removed 1 categories:
## !M1&!M2&!M3&!M4&!M5&!M6
plot(fit, measure=PosteriorLogDiff(fit), threshold=0)
## The 'threshold' filter has removed 1 categories:
## !M1&!M2&!M3&!M4&!M5&!M6
COMPASS
also packages a Shiny application for interactive visualization of
the fits generated by a COMPASS
call. These can be easily generated through
a call to the shinyCOMPASS
function.
shinyCOMPASS(fit, stimulated="Treatment", unstimulated="Control")
flowWorkspace
is a package used for managing and generating gates for data
obtained from flow cytometry experiments. Combined with openCyto
, users are
able to automatically gate data through flexibly-defined gating templates.
COMPASS
comes with a utility function for extracting data from a
flowWorkspace
GatingSet
object, providing a seamless workflow between
both data management and analysis for polyfunctionality cytokine data.
Data can be extracted using COMPASSContainerFromGatingSet
, with appropriate
documentation available in ?COMPASSContainerFromGatingSet
. As an example,
a researcher might first gate his data in order to find CD4+
cells, and then
gate a number of markers, or cytokines, for these CD4+
cells, in order to
identify cells expressing different combinations of markers.
Users who have gated their data with flowJo
and are interested in analyzing
their data with COMPASS
can do so by first loading and parsing their
workspace with flowWorkspace
, and next generating the COMPASSContainer
through COMPASSContainerFromGatingSet
.
Lin, L. Finak, G. Ushey, K. Seshadri C. et al. COMPASS identifies T-cell subsets correlated with clinical outcomes. Nature biotechnology (2015). doi:10.1038/nbt.3187
Finak G., Frelinger J., Newell E.W., Ramey J., Davis M.M., Kalams S.A., De Rosa S.C., Gottardo R. (2014) OpenCyto: An Open Source Infrastructure for Scalable, Robust, Reproducible, and Automated, End-to-End Flow Cytometry Data Analysis. PLoS Comput Biol 10:e1003806.
Greg Finak and Mike Jiang (2011). flowWorkspace: Infrastructure for representing and interacting with the gated cytometry. R package version 3.26.6.
Greg Finak, Andrew McDavid, Pratip Chattopadhyay, Maria Dominguez, Steve De Rosa, Mario Roederer, Raphael Gottardo. Mixture models for single-cell assays with applications to vaccine studies. Biostatistics 2014 Jan;15(1):87-101
Lynn Lin, Kevin Ushey and Greg Finak (2014). COMPASS: Combinatorial Polyfunctionality Analysis of Single Cells. R package version 1.16.2.
R Core Team (2018). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.