## Loading required package: flowWorkspaceData
The openCyto package is designed to facilitate the automated gating methods in sequential way to mimic the manual gating scheme.
Traditionally, scientists have to draw the gates for each individual sample on each 2D projections (2 channels) within flowJo
.
Or draw the 'template gate's on one sample and replicate it to other samples, then manually inspect the gate on each sample
to do the correction if necessary. Either way is time consuming and subjective, thus not suitable for the large data sets
generated by high-throughput flow Cytometers or the cross-lab
data analysis.
Here is one xml
workspace (manual gating scheme) exported from flowJo
.
flowDataPath <- system.file("extdata", package = "flowWorkspaceData")
wsfile <- list.files(flowDataPath, pattern="manual.xml",full = TRUE)
wsfile
## [1] "/home/biocbuild/bbs-3.3-bioc/R/library/flowWorkspaceData/extdata/manual.xml"
By usingflowWorkspace
package, We can load it into R,
ws <- openWorkspace(wsfile)
apply (parseWorkspace
) themanual gates
defined inxml
to the rawFSC
files,
gs <- parseWorkspace(ws, name= "T-cell", subset =1, isNcdf = TRUE)
and then visualize theGating Hierarchy
gh <- gs[[1]]
plot(gh)
and thegates
:
plotGate(gh)
This is a gating scheme for T cell
panel, which tries to identify T cell
sub-populations.
We can achieve the same results by using automated gating pipeline provided by this package.
flowCore
,flowStats
,flowClust
and other packages provides many different gating methods to
detect cell populations and draw the gates automatically.
flowWorkspace
package provides the GatingSet
as an efficient data structure to store, query and visualize the hierarchical gated data.
By taking advantage of these tools, openCyto
package can create the automated gating pipeline by a gating template
, which is essentially the same kind of hierarchical gating scheme
used by the biologists and scientists.
First of all, we need to describe the gating hierarchy in a spread sheet (a plain text format). This spread sheet must have the following columns:
alias
: a name used label the cell population, the path composed by the alias and its precedent nodes (e.g. /root/A/B/alias) has to be uniquely identifiable.pop
: population patterns of A+/-
or A+/-B+/-
, which tells the algorithm which side (postive or negative) of 1d gate or which quadrant of 2d gate to be kept
when it is in the form of 'A+/-B+/-', 'A' and 'B' should be the full name (or a substring as long as it is unqiuely matched) of either channel or marker of the flow data.parent
: the parent population alias, its path has to be uniquely identifiable.dims
: characters seperated by comma specifying the dimensions(1d or 2d) used for gating. It can be either channel name or stained marker name.gating_method
: the name of the gating function (e.g. flowClust
). It is invoked by a wrapper function that has the identical function name prefixed with a dot.(e.g. .flowClust
)gating_args
: the named arguments passed to gating functioncollapseDataForGating
: When TRUE, data is collapsed (within groups if groupBy
specified) before gating and the gate is replicated across collapsed samples.
When set FALSE (or blank),then groupBy
argument is only used by preprocessing
and ignored by gating.groupBy
: If given, samples are split into groups by the unique combinations of study variable (i.e. column names of pData,e.g.“PTID:VISITNO”).
when split is numeric, then samples are grouped by every N samples preprocessing_method
: the name of the preprocessing function(e.g. prior_flowClust
). It is invoked by a wrapper function that has the identical function name prefixed with a dot.(e.g. .prior_flowClust
)
the preprocessing results are then passed to gating wrapper function through pps_res
argument.preprocessing_args
: the named arguments passed to preprocessing function.Here is the an example of the gating template.
library(openCyto)
library(data.table)
gtFile <- system.file("extdata/gating_template/tcell.csv", package = "openCyto")
dtTemplate <- fread(gtFile, autostart = 1L)
dtTemplate
## alias pop parent dims gating_method
## 1: nonDebris nonDebris root FSC-A mindensity
## 2: singlets singlets nonDebris FSC-A,FSC-H singletGate
## 3: lymph lymph singlets FSC-A,SSC-A flowClust
## 4: cd3 cd3 lymph CD3 mindensity
## 5: * cd4-/+cd8+/- cd3 cd4,cd8 mindensity
## 6: activated cd4 CD38+HLA+ cd4+cd8- CD38,HLA tailgate
## 7: activated cd8 CD38+HLA+ cd4-cd8+ CD38,HLA tailgate
## 8: CD45_neg CD45RA- cd4+cd8- CD45RA mindensity
## 9: CCR7_gate CCR7+ CD45_neg CCR7 flowClust
## 10: * CCR7+/-CD45RA+/- cd4+cd8- CCR7,CD45RA refGate
## 11: * CCR7+/-CD45RA+/- cd4-cd8+ CCR7,CD45RA mindensity
## gating_args collapseDataForGating groupBy
## 1: NA NA
## 2: NA NA
## 3: K=2,target=c(1e5,5e4) NA NA
## 4: TRUE 4
## 5: gate_range=c(1,3) NA NA
## 6: NA NA
## 7: tol=0.08 NA NA
## 8: gate_range=c(2,3) NA NA
## 9: neg=1,pos=1 NA NA
## 10: CD45_neg:CCR7_gate NA NA
## 11: NA NA
## preprocessing_method preprocessing_args
## 1: NA
## 2: NA
## 3: prior_flowClust NA
## 4: NA
## 5: NA
## 6: standardize_flowset NA
## 7: standardize_flowset NA
## 8: NA
## 9: NA
## 10: NA
## 11: NA
Each row is usually corresponding to one cell population and the gating method that is used to get that population. We will try to explain how to create this gating template based on the manual gating scheme row by row.
dtTemplate[1,]
## alias pop parent dims gating_method gating_args
## 1: nonDebris nonDebris root FSC-A mindensity
## collapseDataForGating groupBy preprocessing_method preprocessing_args
## 1: NA NA NA
"nonDebris"
(specified in alias
field).parent
node is root
(which is always the first node of gating hierarchy
by default). mindensity
(one of the gating
functions provided by openCyto
package) as gating_method
to gate on dimension (dim
) of FSC-A
.FSC-A
. "nonDebris"
(equivalent to "nonDebris+"
) in pop
field indicates the
positive
side of 1d gate is kept as the population of interest. grouping
or preprocessing
involved in this gate, thus leave the other columns as blank
dtTemplate[2,]
## alias pop parent dims gating_method gating_args
## 1: singlets singlets nonDebris FSC-A,FSC-H singletGate
## collapseDataForGating groupBy preprocessing_method preprocessing_args
## 1: NA NA NA
"singlets"
(alias
field).parent
node is nonDebris
.gating_method
is singletGate
(function from by flowStats
package)polygonGate
will be generated on FSC-A
and FSC-H
(specified by dims
) for each sample."singlets"
in pop
field stands for "singlets+"
. But here it is 2d gate, which means we want to keep the area
inside of the polygon dtTemplate[3,]
## alias pop parent dims gating_method gating_args
## 1: lymph lymph singlets FSC-A,SSC-A flowClust K=2,target=c(1e5,5e4)
## collapseDataForGating groupBy preprocessing_method preprocessing_args
## 1: NA NA prior_flowClust NA
alias
specifies the name of populationparent
points to singlets
flowClust
as gating_method
to do the 2-dimensional gating,
dims
is comma separated string, x
axis (FSC-A
) goes first, y
(SSC-A
) the second.
This order doesn't affect the gating process but will determine how the gates are displayed.flowClust
algorithm accepts can be put in gating-args
as if they are typed in R console
.
see help(flowClust)
for more details of these argumentsflowClust
algorithm accept the extra arguments priors
that is calculated during preprocessing
stage (before the actual gating
),
thus, we supply the preprocessing_method
with prior_flowClust
.dtTemplate[4,]
## alias pop parent dims gating_method gating_args collapseDataForGating
## 1: cd3 cd3 lymph CD3 mindensity TRUE
## groupBy preprocessing_method preprocessing_args
## 1: 4 NA
It is similar to the nonDebris
gate except that we specify collapseDataForGating
as TRUE
,
which tells the pipeline to collapse
all samples into one and applies mindensity
to the collapsed data on CD3
dimension.
Once the gate is generated, it is replicated across all samples. This is only useful when each individual sample does not have
enough events to deduce the gate. Here we do this just for the purpose of proof of concept.
The forth row specifies pop
as cd4+/-cd8+/-
, which will be expanded this into 6 rows.
dtTemplate[5,]
## alias pop parent dims gating_method gating_args
## 1: * cd4-/+cd8+/- cd3 cd4,cd8 mindensity gate_range=c(1,3)
## collapseDataForGating groupBy preprocessing_method preprocessing_args
## 1: NA NA NA
First two rows are two 1d gates that will be generated by gating_method
on each
dimension (cd4
and cd8
) independently:
## alias pop parent dims gating_method
## 1: cd4+ cd4+ /nonDebris/singlets/lymph/cd3 cd4 mindensity
## 2: cd8+ cd8+ /nonDebris/singlets/lymph/cd3 cd8 mindensity
## gating_args collapseDataForGating groupBy preprocessing_method
## 1: gate_range=c(1,3)
## 2: gate_range=c(1,3)
## preprocessing_args
## 1:
## 2:
Then another 4 rows are 4 rectangleGate
s that corresponds to the 4 quadrants
in 2d projection (cd4 vs cd8
).
## alias pop parent dims gating_method
## 1: cd4+cd8+ cd4+cd8+ /nonDebris/singlets/lymph/cd3 cd4,cd8 refGate
## 2: cd4-cd8+ cd4-cd8+ /nonDebris/singlets/lymph/cd3 cd4,cd8 refGate
## 3: cd4+cd8- cd4+cd8- /nonDebris/singlets/lymph/cd3 cd4,cd8 refGate
## 4: cd4-cd8- cd4-cd8- /nonDebris/singlets/lymph/cd3 cd4,cd8 refGate
## gating_args
## 1: /nonDebris/singlets/lymph/cd3/cd4+:/nonDebris/singlets/lymph/cd3/cd8+
## 2: /nonDebris/singlets/lymph/cd3/cd4+:/nonDebris/singlets/lymph/cd3/cd8+
## 3: /nonDebris/singlets/lymph/cd3/cd4+:/nonDebris/singlets/lymph/cd3/cd8+
## 4: /nonDebris/singlets/lymph/cd3/cd4+:/nonDebris/singlets/lymph/cd3/cd8+
## collapseDataForGating groupBy preprocessing_method preprocessing_args
## 1:
## 2:
## 3:
## 4:
As we see here, "refGate"
in gating_method
indicates that they are constructed based on the
gate coordinates
of the previous two 1d gates.
Those 1d gates are thus considered as "reference gates"
that are referred by colon separated alias
string in gating_args
: "cd4+:cd8+"
.
Alternatively, we can expand it into these 6 rows explicitly in the spread sheet.
But this convenient representation is recommended unless user wants have finer control on how the gating is done.
For instance, sometime we need to use different gating_method
s to generate 1d gates on cd4
and cd8
.
Or cd8
gating needs to depend on cd4
gating ,i.e. the parent
of c8+
is cd4+
(or cd4-
) instead of cd3
.
Sometimes we want to have the customized alias
other than quadrant-like name (x+y+
) that gets generated automatically.
(e.g. 5th row of the gating template)
After the gating template is defined in the spread sheet, it can be loaded into R:
gt_tcell <- gatingTemplate(gtFile, autostart = 1L)
gt_tcell
## --- Gating Template: default
## with 29 populations defined
Besides looking at the spread sheet, we can examine the gating scheme by visualizing it:
plot(gt_tcell)
As we can see, the gating scheme has been expanded as we described above.
All the colored arrows source from the parent
population and the grey arrows source from the reference
population(/gate).
Once we are satisfied with the gating template, we can apply it to the actual flow data.
First of all, we load the raw FCS files into R by ncdfFlow::read.ncdfFlowSet
(It uses less memory than flowCore::read.flowSet
).
fcsFiles <- list.files(pattern = "CytoTrol", flowDataPath, full = TRUE)
ncfs <- read.ncdfFlowSet(fcsFiles)
ncfs
## An ncdfFlowSet with 2 samples.
## NCDF file : /tmp/RtmpYOAyQl/ncfs90d7a35b922.nc
## An object of class 'AnnotatedDataFrame'
## rowNames: CytoTrol_CytoTrol_1.fcs CytoTrol_CytoTrol_2.fcs
## varLabels: name
## varMetadata: labelDescription
##
## column names:
## FSC-A, FSC-H, FSC-W, SSC-A, B710-A, R660-A, R780-A, V450-A, V545-A, G560-A, G780-A, Time
Then, compensate the data. If we have compensation controls (i.e. single stained samples), we can calculate the
compensation matrix by flowCore::spillover
function.
Here we simply use the compensation matrix defined in flowJo workspace
.
compMat <- getCompensationMatrices(gh)
ncfs_comp <- compensate(ncfs, compMat)
Here is one example showing the compensation outcome:
All the stained channels need to be transformed properly before the gating.
Here we use the flowCore::estimateLogicle
to do the logicle
transformation.
chnls <- parameters(compMat)
transFuncts <- estimateLogicle(ncfs[[1]], channels = chnls)
ncfs_trans <- transform(ncfs_comp, transFuncts)
Here is one example showing the transformation outcome:
Once data is preprocessed, it can be loaded into GatingSet
object.
gs <- GatingSet(ncfs_trans)
getNodes(gs[[1]])
## [1] "root"
As getNodes
shows, there is only one population node root
at this point.
Now we can apply the gating template to the data:
gating(gt_tcell, gs)
Optionally, we can run the pipeline in parallel
to speed up gating. e.g.
gating(gt_tcell, gs, mc.cores=2, parallel_type = "multicore")
After gating, there are some extra populations generated automatically by the pipeline (e.g. refGate
).
plot(gs[[1]])
We can hide these populations if we are not interested in them:
dodesToHide <- c("cd8+", "cd4+"
, "cd4-cd8-", "cd4+cd8+"
, "cd4+cd8-/HLA+", "cd4+cd8-/CD38+"
, "cd4-cd8+/HLA+", "cd4-cd8+/CD38+"
, "CD45_neg/CCR7_gate", "cd4+cd8-/CD45_neg"
, "cd4-cd8+/CCR7+", "cd4-cd8+/CD45RA+"
)
lapply(dodesToHide, function(thisNode)setNode(gs, thisNode, FALSE))
And rename the populations:
setNode(gs, "cd4+cd8-", "cd4")
setNode(gs, "cd4-cd8+", "cd8")
plot(gs[[1]])
plotGate(gs[[1]])
Sometime it will be helpful (especially to work with an already gated data) to be able to interact with he GatingSet directly without the need to write the compelete csv gating template.
The openCyto
package allows user to specify their gating schemes and gate the data
in a data-driven fasion. It frees the scientists from the labor-intensitive manual gating routines
and increases the speed as well as the reproducibilty and objectivity of the data analysis work.