%% LyX 1.5.6 created this file.  For more info, see http://www.lyx.org/.
%% Do not edit unless you really know what you are doing.
%
%\VignetteIndexEntry{Basic Functions for working with FlowJo}
%\VignetteDepends{flowFlowJo}
%\VignetteKeywords{}
%\VignettePackage{flowFlowJo}
%
\documentclass[english]{article}
\usepackage[T1]{fontenc}
\usepackage[latin9]{inputenc}
\usepackage{babel}

\begin{document}
\begin{center}
{\Large flowCore Access to FlowJo Workspaces}
\par\end{center}{\Large \par}

\begin{center}
John Gosink and Hugh Rand
\par\end{center}

\begin{center}
Amgen, Inc.
\par\end{center}

\begin{center}
{\footnotesize 
Vignette last rewritten:  12th December 2008
\par
Vignette last recompiled:  \today
\bigskip{}
}


\par\end{center}{\footnotesize \par}

\begin{center}
\textbf{Abstract\bigskip{}
}
\par\end{center}

The flowFlowJo package provides a way to parse FlowJo workspaces into
R data structures with flowCore compliant objects representing the
various gates, compensation matrices, and other related information.
The objective of this package it to make it easy (in R) to use compensation
and gating information that has been produced using FlowJo.\bigskip{}
\bigskip{}


\textbf{1. Introduction}

FlowJo (www.flowjo.com) is a commercially available software package
used for the gating, visualization, and analysis of data from flow
cytometry experiments. FlowJo saves its session information in an
XML formatted file called a {}``workspace'' that contains all the
information necessary to describe the gating structures, compensation,
transformation, location of the FCS files, and opened graphs and figures
last created by the user. FlowJo workspace files do not contain raw
data. This package (flowFlowJo) is a set of methods designed to extract
the file locations, gates, compensation matrices, and other data contained
in FlowJo workspace files and return the information in a manner consistent
with the BioConductor flowCore packages. The typical internal steps
taken by flowFlowJo are: 

\smallskip{}


(1) read and parse a FlowJo workspace,

(2) extract the structure of the gates for the FCS files, 

(3) extract compensation matrices for the FCS files, 

(4a) produce a series of lists of R/flowCore information necessary
to load, compensate, and gate all of the data files in R as was done
in FlowJo , --or--

(4b) directly compile a table of summary statistics for each of the
signal channels (parameters) on all the cell populations for all of
the FCS files referenced in the workspace. \medskip{}


The flowFlowJo package also contains a method for combining extracted
summary statistics information with any arbitrary meta data describing
the experimental design or layout and/or meta data embedded within
the header section of each FCS file. We describe all of these processes
in detail below. Finally, this package is known to work with FlowJo
version 7.2.2. As a commercial package FlowJo workspace formats change
from time-to-time and we cannot guarantee compatibility with future
versions.\bigskip{}


\textbf{2. Gates/Filters and Compensation Matrices}

Prior to using this package, it is assumed that FlowJo will have been
used to process (compensate and gate) one or more FCS files to produce
one or more FlowJo workspaces. This is a routine process for those
analyzing flow cytometry data. The only caution we mention is that
the location of the FCS files is held in the FlowJo workspace, typically
as absolute paths. Moving the FCS files to another location will cause
the location of these files extracted from the workspace to be in
error. For this reason the examples provided below have to go the
extra step of specifying an alternate location for the FCS files other
than as they are given within the provided demonstration FlowJo workspace.

The first step when using flowFlowJo is to read in all of the information
associated with a set of FlowJo workspaces. This is accomplished with
the \textquotedblleft{}readFlowJoList\textquotedblright{} method.
This method takes in a list, vector, or character string with the
full path of one or more FlowJo workspaces and reads in the gating
structures, compensation matrices, transformations and other information
about the FCS files referenced in the workspace(s) and returns a flowJoList
object. \medskip{}


\noindent 
<<>>=
library(flowCore);
library(XML);
library(flowFlowJo);
demoLocation <- system.file("extdata", "DemoWorkspace.wsp", package="flowFlowJo");
actualFCSLoc <- system.file("extdata/fcsFiles", package="flowFlowJo");
testList <- readFlowJoList(demoLocation, altFileLocation=actualFCSLoc);
@


\noindent \medskip{}


To be clear, the readFlowJoList method does not look for, or use,
any of the referenced FCS files. It only looks at the information
contained in the workspace(s). It is important to note that at this
stage the XML gating structures embedded within the workspace(s) are
converted into flowCore style filter objects.

The flowJoList object is a list of flowJoObj objects. Each flowJoObj
encapsulates the contents of one FlowJo workspace. Since FlowJo allows
for the possibility of having a different compensation matrix for
each FCS file referenced within a single workspace and because a given
FCS file may be dealt with separately in several different FlowJo
workspaces, the readFlowJoList method allows data from more than one
workspace to be combined into one analysis. This latter approach is
often used at our institution such that an assay may be run over many
weeks or months, with the data from each day's run being accumulated
into a single FlowJo workspace. 

After generating a flowJoList object, a user may extract a subset
of the gates. The \textquotedblleft{}getFlowJoGates\textquotedblright{}
method extracts the file name, file name with full path, filter objects,
simplified filter names, and compensation matrices associated with
all of the FCS files that match one or more file name patterns supplied
by the user.\medskip{}



<<>>=
z <- getFlowJoGates(testList, fileNamePatterns=c("C02"));
print(summary(z));
print(summary(z$filter));
@
\medskip{}


The return item is a list of lists, with each of the sub lists corresponding
to the items described above. These items may be accessed and used
arbitrarily. Note that a single FCS file may be partitioned with several
different gates (filters) and that each of these gates may dependend
on other gates as is appropriate to subdivide the cell populations.
The getFlowJoGates method accounts for this by simply expanding the
number of returned items in each of the sub lists to correspond to
each of the intermediate gates. By default, the getFlowJoGates method
concatenates each child gate to its parent gates as an intersect filter.
Correspondingly, by default, the code concatenates the gate names
for each child gate to its parents gate names with a colon. Finally,
by default each gate name is preappended with the name of the FCS
file it references. Thus, typical gate names might be:

\textquotedblleft{}SampleA3.fcs:Lymphocytes\textquotedblright{},

\textquotedblleft{}SampleA3.fcs:Lymphocytes:CD3+\textquotedblright{}, 

\textquotedblleft{}SampleA3.fcs:Lymphocytes:CD3+:CD8+\textquotedblright{}

etc...

\smallskip{}


Consider the case in which a researcher uses FlowJo to gate a set
of 30 FCS files each for 6 different cell populations and sub-populations.
A getFlowJoGates call on the resulting workspace file will return
a list of length 5 with elements: \char`\"{}fcsName\char`\"{}, \char`\"{}FCSFilename\char`\"{},
\char`\"{}filter\char`\"{}, \char`\"{}filterName\char`\"{}, and \char`\"{}compMats\char`\"{}.
Each element (e.g. fcsName) will in turn be a list of length 180 because
there are 6 gates for each of the 30 FCS files. 

Importantly, each of the 5 lists will be in the same order. Thus,
for example, the 37th filter in the {}``filter'' list will be for
the 37th FCS file in the {}``FCSFilename'' list, and that FCS file
should be compensated with the 37th compensation matrix in the {}``compMats''
list. Although other, more compact representations could be found,
these data structures as currently implemented were effective and
general enough for the problems at hand. 

Finally, FlowJo currently implements its compensation/spillover matrices
differently than they are implemented in the general flow cytometry
community. Currently, in order to obtain similar results (e.g. MFIs
and cell counts) between FlowJo and flowCore, it is necessary to apply
the compensation matrix to the data in the usual way (ie. via {}``compensate''),
and then to divide all of the observed data by the maximum of the
values in the compensation matrix. The flowFlowJo package implements
a method, flowJoCompensate, to automatically take care of this issue. 

In some cases a data analyst may wish to proceed with the list of
lists to extract and analyze the data to their own design. In many
cases however, the analyst may be satisfied with the gating choices
derived during the FlowJo session and wish to simply proceed with
a complete set of summary statistics on all of the cell populations.
This is described in section 4 below. 

\bigskip{}


\textbf{3. Transformations}

The information contained within the \textquotedblleft{}DivaSettings\textquotedblright{}
and \textquotedblleft{}TransformSettings\textquotedblright{} sections
of the FlowJo workspace is currently parsed by the readFlowJoList
method. These components are returned in the data structure produced
by the readFlowJoList method, but there are no other methods in the
flowFlowJo package that utilize these data. All of the non-scatter
gates (fluorescence channel gates) are encoded by their non-transformed
gate coordinates. Furthermore, the scatter gates (FSC, and SSC) are
currently (FlowJo 7.2.2) encoded as 1/64 of their actual (untransformed)
gating coordinates. The readFlowJoList method automatically (internally)
multiplies all of the scatter gating coordinates by 64 to adjust for
this prior to generating its flowCore filter objects. \bigskip{}


\textbf{4. FlowJo Summary Objects}

The flowFlowJo package contains a few methods for automatically extracting
the major types of information that are often needed from flow experiments
such as median fluorescent intensity, and cell counts. The first step,
however, in automating the analysis of manually gated data is to ensure
uniformity of the naming convention across all of the samples and
to confirm that all of the expected data is present. It has been our
experience that bench researchers often (accidentally) supply slightly
different names for the same cell populations, neglect to collect
certain populations, lose samples, {}``unexpectedly'' add unplanned
samples etc. during the course of a study. Such uncommunicated variances
from the experimental plan often provide hours of entertainment for
the downstream data analyst. A simple summary of the observed data
sets often helps identify these anomalies. Toward this end, the \textquotedblleft{}getFlowJoSummary\textquotedblright{}
method returns a table showing the number and counts of different
gate names associated with all of the FCS files.\medskip{}



<<>>=
getFlowJoSummary(testList, gatesByFile=FALSE, removeParentalNames=TRUE); 
getFlowJoSummary(testList, removeParentalNames=TRUE);
@
\medskip{}


Passing this check, the \textquotedblleft{}collectSummaryFlowInfo\textquotedblright{}
method returns a data structure with median fluorescent intensities
and cell counts for each of the channels (parameters) for each of
the gates, and any arbitrary header information from each of the FCS
files. It is only at this point that the flowFlowJo methods actually
accesses the FCS files. 

\medskip{}



<<>>=
summaryStatsObj <- collectSummaryFlowInfo(testList);
@


\medskip{}


Depending on the scale of the experiment and the size of the FCS files,
this method may take some time to execute. Specifically, as each FCS
file may be many Mb in size, the code only reads one FCS file into
memory at a time, extracts the appropriate information, and then moves
on to the next file. The actual steps executed by the code include: 

\smallskip{}


1. Read in the FCS file 

2. Apply compensation (accounting for the FlowJo/flowCore compensation
issues discussed above) 

3. Gate out each population (and intermediate sub-population) 

4. Collect summary statistics on each population and sub-population 

5. Collect any header/keyword information embedded in the FCS file
as requested 

6. Advance to the next FCS file

\medskip{}


A note about step 5. Each FCS file is composed of several parts in
addition to the raw list-mode data. The header section of each FCS
file contains 100+ pieces of information about each flow run including
such things as laser settings, photomultiplier gain settings, run
times, and other information. The collectSummaryFlowInfo method can
be configured to collect one or more of these items from each FCS
file using the {}``keywords'' option. Through the \textquotedblleft{}createFlowReport\textquotedblright{}
method, the summary object can be converted directly into a simple
tabular report (data frame). Alternatively the createFlowReport method
can combine the summary object with a data frame containing additional
meta data about the experiment. In that case, the data frame of additional
meta data (e.g. experimental design factors and sample information)
must contain at least the columns {}``FCSFilename'' and {}``FlowJoWorkspace''.
Each of these columns should give the full path to the relevant FCS
file and FlowJo workspace. Each row of the data frame should contain
information relevant to one sample (FCS file) of data. The other columns
of the data frame can be any arbitrary meta information such as drug
name, treatment time, sample ID, etc. The resulting flow report will
contain 1 line for each parameter of each cell population of each
FCS file along with any associated meta data and keywords from the
header section of the FCS file. For example if we had a standard R
data frame that had experimental design information in it as shown:\medskip{}



<<>>=
expDescFrame <- data.frame(Drug=c(rep("Amospho", 3), 
                                  rep("Gleevec", 3), 
                                  rep("Chloro", 3)), 
   Conc=rep(c(0.001, 0.0001, 0.00001), 3), 
   FCSFilename=dir(actualFCSLoc, full.names=TRUE), 
   FlowJoWorkspace=rep(demoLocation, length(dir(actualFCSLoc)))); 
@


\medskip{}


We can combine this data frame with the summary statistics via the
createFlowReport method to create a data frame as follows:\medskip{}



<<>>=
flowReport <- createFlowReport(summaryStatsObj,  factorsFrame=expDescFrame);
print(head(flowReport));
@


\medskip{}


At this point we are free to create any arbitrary report or visualization
of the data using any of the R packages or export some or all of the
data to text files as appropriate. The lattice package can be particularly
helpful in this endeavour. Finally, note that the demo data, gating,
and experimental meta information shown here is completely fabricated
for the purposes of this tutorial and don't reflect an actual experiment. 



\bigskip{}


\textbf{5. Summary}

In summary, the flowFlowJo package provides a set of methods for extracting
and organizing information from FlowJo workspaces and the FCS files
referenced within. In its most basic application it allows the user
to retrieve all of the gates and compensation matrices for all of
the FCS files described within one or more FlowJo workspaces. The
gates are returned as flowCore style filter objects, and the compensation
matrices are returned as numeric matrices. Additional functionality
is gained by the ability for the user to effectively run all of the
compensation and gating functions described by the workspace(s) and
automatically retrieve all of the relevant summary statistics into
a concise data structure. These data may also be easily combined with
any meta data describing the nature or source of each sample and any
experimental conditions to which they were subjected.

\bigskip{}


\textbf{6. References}

Many thanks to Mark Dalphin, Cheng Su, Adam Triester, and Florian
Hahne without whose gentle guidance none of this would be possible.
\end{document}
