## Loading required package: flowWorkspaceData

An Introduction to openCyto package

1. Introduction

The openCyto package is designed to facilitate the automated gating methods in sequential way to mimic the manual gating scheme.

1.1. Manual gating

Traditionally, scientists have to draw the gates for each individual sample on each 2D projections (2 channels) within flowJo. Or draw the 'template gate's on one sample and replicate it to other samples, then manually inspect the gate on each sample to do the correction if necessary. Either way is time consuming and subjective, thus not suitable for the large data sets generated by high-throughput flow Cytometers or the cross-lab data analysis.

Here is one xml workspace (manual gating scheme) exported from flowJo.

flowDataPath <- system.file("extdata", package = "flowWorkspaceData")
wsfile <- list.files(flowDataPath, pattern = "manual.xml", full = TRUE)
wsfile
## [1] "/home/biocbuild/bbs-2.13-bioc/R/library/flowWorkspaceData/extdata/manual.xml"

By usingflowWorkspacepackage, We can load it into R,

ws <- openWorkspace(wsfile)

apply (parseWorkspace) themanual gatesdefined inxmlto the rawFSCfiles,

gs <- parseWorkspace(ws, name = "T-cell", subset = 1, isNcdf = TRUE)

and then visualize theGating Hierarchy

gh <- gs[[1]]
plot(gh)

plot of chunk plot-manual-GatingHierarchy

and thegates:

plotGate(gh)

plot of chunk plot-manual-gates

This is a gating scheme for T cell panel, which tries to identify T cell sub-populations. We can achieve the same results by using automated gating pipeline provided by this package.

1.2. Automated Gating


flowCore,flowStats,flowClust and other packages provides many different gating methods to detect cell populations and draw the gates automatically.

flowWorkspace package provides the GatingSet as an efficient data structure to store, query and visualize the hierarchical gated data.

By taking advantage of these tools, openCyto package can create the automated gating pipeline by a gating template, which is essentially the same kind of hierarchical gating scheme used by the biologists and scientists.

2. Create gating templates

2.1. Template format

First of all, we need to describe the gating hierarchy in a spread sheet (a plain text format). This spread sheet must have the following columns:

2.2. Example template

Here is the an example of the gating template.

library(openCyto)
library(data.table)
gtFile <- system.file("extdata/gating_template/tcell.csv", package = "openCyto")
dtTemplate <- fread(gtFile, autostart = 1L)
dtTemplate
##             alias              pop    parent        dims gating_method
##  1:     nonDebris        nonDebris      root       FSC-A    mindensity
##  2:      singlets         singlets nonDebris FSC-A,FSC-H   singletGate
##  3:         lymph            lymph  singlets FSC-A,SSC-A     flowClust
##  4:           cd3              cd3     lymph         CD3    mindensity
##  5:             *     cd4-/+cd8+/-       cd3     cd4,cd8    mindensity
##  6: activated cd4        CD38+HLA+  cd4+cd8-    CD38,HLA      tailgate
##  7: activated cd8        CD38+HLA+  cd4-cd8+    CD38,HLA      tailgate
##  8:      CD45_neg          CD45RA-  cd4+cd8-      CD45RA    mindensity
##  9:     CCR7_gate            CCR7+  CD45_neg        CCR7     flowClust
## 10:             * CCR7+/-CD45RA+/-  cd4+cd8- CCR7,CD45RA       refGate
## 11:             * CCR7+/-CD45RA+/-  cd4-cd8+ CCR7,CD45RA    mindensity
##               gating_args collapseDataForGating groupBy
##  1:                                                  NA
##  2:                                                  NA
##  3: K=2,target=c(1e5,5e4)                            NA
##  4:                                        TRUE       4
##  5:     gate_range=c(1,3)                            NA
##  6:                                                  NA
##  7:                                                  NA
##  8:     gate_range=c(2,3)                            NA
##  9:           neg=1,pos=1                            NA
## 10:    CD45_neg:CCR7_gate                            NA
## 11:                                                  NA
##     preprocessing_method preprocessing_args
##  1:                                      NA
##  2:                                      NA
##  3:      prior_flowClust                 NA
##  4:                                      NA
##  5:                                      NA
##  6:                                      NA
##  7:                                      NA
##  8:                                      NA
##  9:                                      NA
## 10:                                      NA
## 11:                                      NA

Each row is usually corresponding to one cell population and the gating method that is used to get that population. We will try to explain how to create this gating template based on the manual gating scheme row by row.

2.2.1. “nonDebris”

dtTemplate[1, ]
##        alias       pop parent  dims gating_method gating_args
## 1: nonDebris nonDebris   root FSC-A    mindensity            
##    collapseDataForGating groupBy preprocessing_method preprocessing_args
## 1:                            NA                                      NA

2.2.2. “singlets”

dtTemplate[2, ]
##       alias      pop    parent        dims gating_method gating_args
## 1: singlets singlets nonDebris FSC-A,FSC-H   singletGate            
##    collapseDataForGating groupBy preprocessing_method preprocessing_args
## 1:                            NA                                      NA

2.2.3. “lymphocyte”

dtTemplate[3, ]
##    alias   pop   parent        dims gating_method           gating_args
## 1: lymph lymph singlets FSC-A,SSC-A     flowClust K=2,target=c(1e5,5e4)
##    collapseDataForGating groupBy preprocessing_method preprocessing_args
## 1:                            NA      prior_flowClust                 NA

2.2.4. “cd3+” (Tcell)

dtTemplate[4, ]
##    alias pop parent dims gating_method gating_args collapseDataForGating
## 1:   cd3 cd3  lymph  CD3    mindensity                              TRUE
##    groupBy preprocessing_method preprocessing_args
## 1:       4                                      NA

It is similar to the nonDebris gate except that we specify collapseDataForGating as TRUE, which tells the pipeline to collapse all samples into one and applies mindensity to the collapsed data on CD3 dimension. Once the gate is generated, it is replicated across all samples. This is only useful when each individual sample does not have enough events to deduce the gate. Here we do this just for the purpose of proof of concept.

2.2.5. CD4 and CD8

The forth row specifies pop as cd4+/-cd8+/-, which will be expanded this into 6 rows.

dtTemplate[5, ]
##    alias          pop parent    dims gating_method       gating_args
## 1:     * cd4-/+cd8+/-    cd3 cd4,cd8    mindensity gate_range=c(1,3)
##    collapseDataForGating groupBy preprocessing_method preprocessing_args
## 1:                            NA                                      NA

First two rows are two 1d gates that will be generated by gating_method on each dimension (cd4 and cd8) independently:

##    alias  pop                        parent dims gating_method
## 1:  cd4+ cd4+ /nonDebris/singlets/lymph/cd3  cd4    mindensity
## 2:  cd8+ cd8+ /nonDebris/singlets/lymph/cd3  cd8    mindensity
##          gating_args collapseDataForGating groupBy preprocessing_method
## 1: gate_range=c(1,3)                                                   
## 2: gate_range=c(1,3)                                                   
##    preprocessing_args
## 1:                   
## 2:

Then another 4 rows are 4 rectangleGates that corresponds to the 4 quadrants in 2d projection (cd4 vs cd8).

##       alias      pop                        parent    dims gating_method
## 1: cd4+cd8+ cd4+cd8+ /nonDebris/singlets/lymph/cd3 cd4,cd8       refGate
## 2: cd4-cd8+ cd4-cd8+ /nonDebris/singlets/lymph/cd3 cd4,cd8       refGate
## 3: cd4+cd8- cd4+cd8- /nonDebris/singlets/lymph/cd3 cd4,cd8       refGate
## 4: cd4-cd8- cd4-cd8- /nonDebris/singlets/lymph/cd3 cd4,cd8       refGate
##                                                              gating_args
## 1: /nonDebris/singlets/lymph/cd3/cd4+:/nonDebris/singlets/lymph/cd3/cd8+
## 2: /nonDebris/singlets/lymph/cd3/cd4+:/nonDebris/singlets/lymph/cd3/cd8+
## 3: /nonDebris/singlets/lymph/cd3/cd4+:/nonDebris/singlets/lymph/cd3/cd8+
## 4: /nonDebris/singlets/lymph/cd3/cd4+:/nonDebris/singlets/lymph/cd3/cd8+
##    collapseDataForGating groupBy preprocessing_method preprocessing_args
## 1:                                                                      
## 2:                                                                      
## 3:                                                                      
## 4:

As we see here, "refGate" in gating_method indicates that they are constructed based on the gate coordinates of the previous two 1d gates. Those 1d gates are thus considered as "reference gates" that are referred by colon separated alias string in gating_args: "cd4+:cd8+".

Alternatively, we can expand it into these 6 rows explicitly in the spread sheet. But this convenient representation is recommended unless user wants have finer control on how the gating is done. For instance, sometime we need to use different gating_methods to generate 1d gates on cd4 and cd8. Or cd8 gating needs to depend on cd4 gating ,i.e. the parent of c8+ is cd4+(or cd4-) instead of cd3. Sometimes we want to have the customized alias other than quadrant-like name (x+y+) that gets generated automatically. (e.g. 5th row of the gating template)

3. Load gating template

After the gating template is defined in the spread sheet, it can be loaded into R:

gt_tcell <- gatingTemplate(gtFile, autostart = 1L)
gt_tcell
## --- Gating Template: default
##  with  29  populations defined

Besides looking at the spread sheet, we can examine the gating scheme by visualizing it:

plot(gt_tcell)

plot of chunk plot-gt

As we can see, the gating scheme has been expanded as we described above. All the colored arrows source from the parent population and the grey arrows source from the reference population(/gate).

4. Run the gating pipeline

Once we are satisfied with the gating template, we can apply it to the actual flow data.

4.1. Load the raw data

First of all, we load the raw FCS files into R by ncdfFlow::read.ncdfFlowSet (It uses less memory than flowCore::read.flowSet).

fcsFiles <- list.files(pattern = "CytoTrol", flowDataPath, full = TRUE)
ncfs <- read.ncdfFlowSet(fcsFiles)
ncfs
## An ncdfFlowSet with 2 samples.
## flowSetId :  
## NCDF file : /tmp/RtmpLfPIIF/ncfs3c432c24f288.nc 
## An object of class 'AnnotatedDataFrame'
##   rowNames: CytoTrol_CytoTrol_1.fcs CytoTrol_CytoTrol_2.fcs
##   varLabels: name
##   varMetadata: labelDescription
## 
##   column names:
##     FSC-A, FSC-H, FSC-W, SSC-A, B710-A, R660-A, R780-A, V450-A, V545-A, G560-A, G780-A, Time

4.2. Compensation

Then, compensate the data. If we have compensation controls (i.e. single stained samples), we can calculate the compensation matrix by flowCore::spillover function. Here we simply use the compensation matrix defined in flowJo workspace.

compMat <- getCompensationMatrices(gh)
ncfs_comp <- compensate(ncfs, compMat)
## [1] "copying data slice: CytoTrol_CytoTrol_1.fcs"
## [1] "copying data slice: CytoTrol_CytoTrol_2.fcs"

Here is one example showing the compensation outcome: plot of chunk compensate_plot

4.3. Transformation

All the stained channels need to be transformed properly before the gating. Here we use the flowCore::estimateLogicle to do the logicle transformation.

chnls <- parameters(compMat)
transFuncts <- estimateLogicle(ncfs[[1]], channels = chnls)
ncfs_trans <- transform(ncfs_comp, transFuncts)
## [1] "copying data slice: CytoTrol_CytoTrol_1.fcs"
## [1] "copying data slice: CytoTrol_CytoTrol_2.fcs"

Here is one example showing the transformation outcome: plot of chunk transformation_plot

4.4. Create 'GatingSet'

Once data is preprocessed, it can be loaded into GatingSet object.

gs <- GatingSet(ncfs_trans)
getNodes(gs[[1]])
## [1] "root"

As getNodes shows, there is only one population node root at this point.

4.5. Gating

Now we can apply the gating template to the data:

gating(gt_tcell, gs)

Optionally, we can run the pipeline in parallel to speed up gating. e.g.

gating(gt_tcell, gs, mc.cores = 2, parallel_type = "multicore")

4.6. Hide nodes

After gating, there are some extra populations generated automatically by the pipeline (e.g. refGate).

plot(gs[[1]])

plot of chunk plot_afterGating

We can hide these populations if we are not interested in them:

dodesToHide <- c("cd8+", "cd4+", "cd4-cd8-", "cd4+cd8+", "cd4+cd8-/HLA+", "cd4+cd8-/CD38+", 
    "cd4-cd8+/HLA+", "cd4-cd8+/CD38+", "CD45_neg/CCR7_gate", "cd4+cd8-/CD45_neg", 
    "cd4-cd8+/CCR7+", "cd4-cd8+/CD45RA+")
lapply(dodesToHide, function(thisNode) setNode(gs, thisNode, FALSE))

4.7. rename nodes

And rename the populations:

setNode(gs, "cd4+cd8-", "cd4")
setNode(gs, "cd4-cd8+", "cd8")

4.7. visualization

plot(gs[[1]])

plot of chunk plot_afterHiding

plotGate(gs[[1]])

plot of chunk plotGate_autoGate

5. Conclusion

The openCyto package allows user to specify their gating schemes and gate the data in a data-driven fasion. It frees the scientists from the labor-intensitive manual gating routines and increases the speed as well as the reproducibilty and objectivity of the data analysis work.