Rapid advances in flow cytometry and other single-cell technologies have enabled high-dimensional, multi-parameter, high-throughput measurements of individual cells. Numerous questions about cell population heterogeneity can now be addressed, as these novel technologies permit single-cell analysis of antigen-specific T-cells. Unfortunately, there is a lack of computational tools to take full advantage of these complex data. COMPASS is a statistical framework that enables unbiased analysis of antigen-specific T-cell subsets. COMPASS uses a Bayesian hierarchical framework to model all observed cell-subsets and select the most likely to be antigen-specific while regularizing the small cell counts that often arise in multi-parameter space. The model provides a posterior probability of specificity for each cell subset and each sample, which can be used to profile a subjectâ€™s immune response to external stimuli such as infection or vaccination.

We will use a simulated dataset to illustrate the use of the `COMPASS`

package.

First, we will outline the components used to construct the `COMPASSContainer`

,
the data structure used to hold data from an ICS experiment. We will first
initialize some parameters used in generating the simulated data.

```
library(COMPASS)
set.seed(123)
n <- 100 ## number of samples
k <- 6 ## number of markers
sid_vec <- paste0("sid_", 1:n) ## sample ids; unique names used to denote samples
iid_vec <- rep_len(paste0("iid_", 1:(n/10)), n) ## individual ids
```

The `COMPASSContainer`

is built out of three main components: the **data**,
the total **counts**, and the **metadata**. We will describe the R structure
of these components next.

This is a list of matrices, with each matrix representing a population of cells drawn from a particular sample. Each row is an individual cell, each column is a marker (cytokine), and each matrix element is an intensity measure associated with a particular cell and marker. Only cells that express at least one marker are included.

```
data <- replicate(n, {
nrow <- round(runif(1) * 10000 + 1000)
ncol <- k
vals <- rexp(nrow * ncol, runif(1, 1e-05, 0.001))
vals[vals < 2000] <- 0
output <- matrix(vals, nrow, ncol)
output <- output[apply(output, 1, sum) > 0, ]
colnames(output) <- paste0("M", 1:k)
return(output)
})
names(data) <- sid_vec
head(data[[1]])
```

```
## M1 M2 M3 M4 M5 M6
## [1,] 0 2990 0 2121 0 3119
## [2,] 0 2631 0 3423 3559 0
## [3,] 0 2970 0 3521 0 3351
## [4,] 0 0 3004 0 0 0
## [5,] 0 0 2365 0 5174 3716
## [6,] 3449 0 0 0 0 0
```

This is a named integer vector, denoting the total number of cells that were
recovered from each sample in `data`

.

```
counts <- sapply(data, nrow) + round(rnorm(n, 10000, 1000))
counts <- setNames(as.integer(counts), names(counts))
head(counts)
```

```
## sid_1 sid_2 sid_3 sid_4 sid_5 sid_6
## 12911 15610 17694 17256 11453 12499
```

This is a `data.frame`

that associates metadata information with each sample
available in `data`

/ `counts`

. We will suppose that each sample was
subject to one of two treatments named `Control`

and `Treatment`

.

```
meta <- data.frame(sid = sid_vec, iid = iid_vec, trt = sample(c("Control", "Treatment"),
n, TRUE))
head(meta)
```

```
## sid iid trt
## 1 sid_1 iid_1 Treatment
## 2 sid_2 iid_2 Control
## 3 sid_3 iid_3 Treatment
## 4 sid_4 iid_4 Control
## 5 sid_5 iid_5 Control
## 6 sid_6 iid_6 Control
```

Once we have these components, we can construct the `COMPASSContainer`

:

```
CC <- COMPASSContainer(data = data, counts = counts, meta = meta, individual_id = "iid",
sample_id = "sid")
```

We can see some basic information about our `COMPASSContainer`

:

```
CC
```

```
## A COMPASSContainer with 100 samples and 6 markers.
```

```
summary(CC)
```

```
## A COMPASSContainer with 100 samples and 6 markers.
## The metadata describes 3 variables on 100 samples. The columns are:
##
## sid iid trt
```

Fitting the `COMPASS`

model is very easy once the data has been inserted into
the `COMPASSContainer`

object. To specify the model, the user needs to specify
the criteria that identify samples that received a positive stimulation, and
samples that received a negative stimulation. In our data, we can specify it
as follows. (Because MCMC is used to sample the posterior and that is a slow
process, we limit ourselves to a small number of iterations for the purposes
of this vignette.)

`COMPASS`

is designed to give verbose output with the model fitting statement,
in order for users to compare expectations in their data to what the `COMPASS`

model fitting function is doing, in order to minimize potential errors.

```
fit <- COMPASS(CC, treatment = trt == "Treatment", control = trt == "Control",
iterations = 100)
```

```
## There are a total of 52 samples from 10 individuals in the 'treatment' group.
## There are multiple samples per individual; these will be aggregated.
## There are a total of 48 samples from 10 individuals in the 'control' group.
## There are multiple samples per individual; these will be aggregated.
## The model will be run on 10 paired samples.
## The category filter did not remove any categories.
## There are a total of 64 categories to be tested.
## Initializing parameters...
## Computing initial parameter estimates...
## Fitting model with 8 replications.
## Running replication 1 of 8...
## Running replication 2 of 8...
## Running replication 3 of 8...
## Running replication 4 of 8...
## Running replication 5 of 8...
## Running replication 6 of 8...
## Running replication 7 of 8...
## Running replication 8 of 8...
## Done!
## Computing the posterior difference in proportions, posterior log ratio...
## Done!
```

After fitting the `COMPASS`

model, we can examine the output in many ways:

```
## Extract the functionality, polyfunctionality scores as described within
## the COMPASS paper -- these are measures of the overall level of
## 'functionality' of a cell, which has shown to be correlated with a cell's
## affinity in immune response
FS <- FunctionalityScore(fit)
PFS <- PolyfunctionalityScore(fit)
## Obtain the posterior difference, posterior log ratio from a COMPASSResult
post <- Posterior(fit)
## Plot a heatmap of the mean probability of response, to visualize
## differences in expression for each category
plot(fit)
```

```
## The 'threshold' filter has removed 0 categories:
```

```
## Visualize the posterior difference, log difference with a heatmap
plot(fit, measure = PosteriorDiff(fit), threshold = 0)
```

```
## The 'threshold' filter has removed 0 categories:
```

```
plot(fit, measure = PosteriorLogDiff(fit), threshold = 0)
```

```
## The 'threshold' filter has removed 0 categories:
```

`COMPASS`

also packages a Shiny application for interactive visualization of
the fits generated by a `COMPASS`

call. These can be easily generated through
a call to the `shinyCOMPASS`

function.

```
shinyCOMPASS(fit, stimulated="Treatment", unstimulated="Control")
```

`flowWorkspace`

is a package used for managing and generating gates for data
obtained from flow cytometry experiments. Combined with `openCyto`

, users are
able to automatically gate data through flexibly-defined gating templates.

`COMPASS`

comes with a utility function for extracting data from a
`flowWorkspace`

`GatingSet`

object, providing a seamless workflow between
both data management and analysis for polyfunctionality cytokine data.
Data can be extracted using `COMPASSContainerFromGatingSet`

, with appropriate
documentation available in `?COMPASSContainerFromGatingSet`

. As an example,
a researcher might first gate his data in order to find `CD4+`

cells, and then
gate a number of markers, or cytokines, for these `CD4+`

cells, in order to
identify cells expressing different combinations of markers.

Users who have gated their data with `flowJo`

and are interested in analyzing
their data with `COMPASS`

can do so by first loading and parsing their
workspace with `flowWorkspace`

, and next generating the `COMPASSContainer`

through `COMPASSContainerFromGatingSet`

.

Greg Finak and Mike Jiang (2011). flowWorkspace: Import flowJo Workspaces into BioConductor and replicate flowJo gating with flowCore. R package version 3.10.05.

Lynn Lin, Kevin Ushey and Greg Finak (2014). COMPASS: Combinatorial Polyfunctionality Analysis of Single Cells. R package version 1.2.1.

Greg Finak, Andrew McDavid, Pratip Chattopadhyay, Maria Dominguez, Steve De Rosa, Mario Roederer, Raphael Gottardo. Mixture models for single-cell assays with applications to vaccine studies. Biostatistics 2014 Jan;15(1):87-101

Mike Jiang, John Ramey, Greg Finak and Raphael Gottardo (2012). openCyto: Hierarchical Gating Pipeline for flow cytometry data. R package version 1.2.1.

R Core Team (2014). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/.

h