flowPloidy: Overview

Tyler Smith

2016-10-17

Installation

flowPloidy is part of the BioConductor respository. To install it, you must first have the Bioconductor system installed. To do, from within R, call:

source("https://bioconductor.org/biocLite.R")
biocLite()

Then install flowPloidy:

biocLite("flowPloidy")

The examples in this vignette, and the flowPloidy help files, use data in the flowPloidyData package, which is also on BioConductor. (You don’t need flowPloidyData to use flowPloidy on your own data, only for the examples).

biocLite("flowPloidyData")

Development Version

You may also install the developmental version of flowPloidy directly from its Github repository. To do so, you will need the devtools package:

install.packages("devtools", dependencies = TRUE)
library("devtools")
install_github("plantarum/flowPloidy")

Note that the developmental version of flowPloidy is also available directly from BioConductor, if you are using the development branch. Instructions on how to do so are available on the BioConductor website.

Preliminaries

Sample Data Files

Once our packages are installed, we only need to load them into memory to use them:

library(flowPloidy)
library(flowPloidyData)

The flowPloidyData package provides a single variable, flowPloidyFiles. This is a vector of file names that points to the sample FCS files in your R installation that we will use in this vignette. The actual path to the files will differ slightly on different computers, and on different operating systems; R takes care of those details for us.

Data Channel

You will need to know which channel to use for your analysis. This will most likely remain the same for all analyses from the same flow cytometer. We can use the viewFlowChannels function to see the options:

viewFlowChannels(flowPloidyFiles[1])
## [1] "FS.INT.LIN"   "SS.INT.LIN"   "TIME"         "FL3.INT.LIN" 
## [5] "FL3.PEAK.LIN" "FS.PEAK.LIN"  "SS.PEAK.LIN"  "FS.TOF.LIN"  
## [9] "SS.TOF.LIN"

Hopefully you see something there that corresponds to the channel you’re interested in. In our case, it’s “FL3.INT.LIN”.

Simple Analysis

With all the necessary packages installed, and the name of the channel of interest identified, we are ready to proceed with a flowPloidy analysis.

fh1 <- FlowHist(file = flowPloidyFiles[1], channel = "FL3.INT.LIN",
                analyze = FALSE)

(Note that we used the analyze = FALSE option, to delay analysis until we have inspected the data, for the purposes of this tutorial. This is not necessary, and if you omit the analyze argument analysis will proceed automatically.)

Now that we have our data loaded, plot will display it for us:

plot(fh1)
FCS Histogram

FCS Histogram

During the loading process, flowPloidy attempts to locate cell populations and identify initial values for model fitting. You can see this by setting the init = TRUE argument in the plot function:

plot(fh1, init = TRUE)
Histogram with Initial Values

Histogram with Initial Values

A number of features are displayed on the histogram:

Sample A G1 estimate

a thick vertical blue line with a label at the top indicates the estimated position of the first sample (‘A’) G1 peak. A blue circle indicates the actual position on the histogram used as the initial value.

Sample A G2 estimate

a thin vertical blue line without a label indicates the estimated position of the G2 peak corresponding to the A G1 peak.

Sample B G1 and G2 estimates

as for Peak A, but orange

Initial model estimate

the thin, dashed grey line indicates the initial parameters used for the model-fitting.

The initial estimate is rough, but in this case it’s close enough for the model-fitting routines to find a solution, so we can immediately proceed to the complete analysis.

fh1 <- fhAnalyze(fh1)
## analyzing 188-15.LMD

Note that I have assigned the output of fhAnalyze() to the original flowHist object fh1. The output of the analysis is stored in a new slot in the fh1 object – all the original data remains unaltered in that object.

Now that fh1 contains a fitted model, the results will be displayed when we call plot again:

plot(fh1)
Histogram with Fitted Model

Histogram with Fitted Model

The plot has been updated to include the fitted model components in different colours:

The aggregate and S-phase components may be quite small, and therefore difficult to see.

In the upper right corner, we see a brief summary of the numerical results. For Sample A and Sample B, peak statistics are reported as position / cell counts / coefficient of variation. We can see that the coarse initial estimates were indeed close enough to give us a reasonable model fitting. The numerical results are now available by printing the FlowHist object:

fh1
## FlowHist object '188-15.LMD'
## channel: FL3.INT.LIN
## bins: 256
## linearity: variable
## debris: SC
## 7 model components: fA1, fA2, fB1, SC, AG, brA, brB
## 12 parameters: a1, Ma, Sa, a2, d, b1, Mb, Sb, SCa, Ap, BRA, BRB
## 5 special parameters: xx, SCvals, DBvals, TRvals, QDvals
## Model fit
## 
## Analysis
## ========
## 
## Ratio Peak A / Peak B: 0.728, SE: 4e-04
## 
## |       |   counts|    size|   cvs|
## |:------|--------:|-------:|-----:|
## |Peak A | 1443.859|  99.041| 0.022|
## |Peak B | 2822.705| 136.115| 0.020|
## 
## RCS: 1.59

Here we see the ratio between the peaks with the standard error of the ratio, and a table of peak data: nuclei counts (i.e., the number of nuclei modelled in the G1 peaks), peak size, the coefficient of variation for the peak, and the Reduced Chi-Square value (RCS) recommended by Bagwell (1993) as an objective assessment of model fit (but see Rabinovitch (1994) for a contrasting view).

Manually Selecting Starting Values

Sometimes, particularly with noisy histograms, flowPloidy will not produce useful starting values:

fh2 <- FlowHist(file = flowPloidyFiles["734.LMD"], channel = "FL3.INT.LIN")
## analyzing 734.LMD
plot(fh2, init = TRUE)
Histogram with Bad Initial Values

Histogram with Bad Initial Values

Here we can see that high spike around bin 50 was ignored, resulting in both the A and B peaks being placed together near bin 136. We can manually set the appropriate starting values with pickInit:

fh2 <- pickInit(fh2)

flowHist will prompt you to click on the peaks of the first and second peak manually, adding a circle to the plot at each point.

Histogram with Manually Selected Peaks

Histogram with Manually Selected Peaks

The new peak values are used to recalculate the initial values:

plot(fh2, init = TRUE)
Histogram with Corrected Initial Values

Histogram with Corrected Initial Values

Now we are ready for the analysis:

fh2 <- fhAnalyze(fh2)
## analyzing 734.LMD
plot(fh2)
Histogram with Fitted Model

Histogram with Fitted Model

fh2
## FlowHist object '734.LMD'
## channel: FL3.INT.LIN
## bins: 256
## linearity: variable
## debris: SC
## 7 model components: fA1, fA2, fB1, SC, AG, brA, brB
## 12 parameters: a1, Ma, Sa, a2, d, b1, Mb, Sb, SCa, Ap, BRA, BRB
## 5 special parameters: xx, SCvals, DBvals, TRvals, QDvals
## Model fit
## 
## Analysis
## ========
## 
## Ratio Peak A / Peak B: 0.366, SE: 0.00056
## 
## |       |   counts|    size|   cvs|
## |:------|--------:|-------:|-----:|
## |Peak A | 1011.892|  50.903| 0.035|
## |Peak B | 1598.355| 138.932| 0.027|
## 
## RCS: 2.332

Model Options

Debris Models

By default, the histogram analysis uses the Single-Cut Debris model, (described in Bagwell et al. 1991). This works well for most of our data. However, Bagwell (1993) recommends the Multi-Cut model for fresh material, and in some cases it provides better RCS values (although peak parameters rarely change significantly).

There are a few ways to switch to the Multi-Cut model. You can update a FlowHist object with the function updateFlowHist:

fh2MC <- updateFlowHist(fh2, debris = "MC")
## updating FlowHist
## analyzing 734.LMD
plot(fh2MC)

In this case, the Multi-Cut model produced a RCS value nearly twice that of the Single-Cut model. Note, however, that the peak parameters hardly changed, and the A/B ratio is identical.

You may also choose which debris model to use as an argument to FlowHist (and the related batchFlowHist discussed below):

fh2MC <- FlowHist(file = flowPloidyFiles["734.LMD"],
                  channel = "FL3.INT.LIN", debris = "MC")

A third option for changing the debris model is to use the interactive plotting function browseFlowHist, described below.

Linearity

By default, linearity (the ratio between the G1 and G2 peaks of the same sample) is fit as a parameter. Ideally, linearity should be 2, but due to issues with the running of the flow cytometer, it frequently varies between ca 1.95 and 2.05. If you want to force linearity to be fixed at 2, you can use the linearity argument to updateFlowHist or FlowHist, as we did for debris above. In this case, possible values for linearity are fixed (fixed at 2), or variable (fit as a parameter). When linearity is fixed, it is reported as NA in plot and summary functions. You may also set/change linearity with the browseFlowHist GUI described below.

Exporting Results

To summarize the results in a data table, use the tabulateFlowHist command:

tabulateFlowHist(fh1)
##                channel                components totalEvents  countsA
## 188-15.LMD FL3.INT.LIN fA1;fA2;fB1;SC;AG;brA;brB       10351 1443.859
##             countsB    sizeA    sizeB        cvA        cvB   ratioAB
## 188-15.LMD 2822.705 99.04086 136.1154 0.02187847 0.01987391 0.7276241
##                 ratioSE      rcs linearity
## 188-15.LMD 0.0003994698 1.590141  1.982862

To combine multiple objects in a single summary, combine them as a list:

tabulateFlowHist(list(fh1, fh2))
##                channel                components totalEvents  countsA
## 188-15.LMD FL3.INT.LIN fA1;fA2;fB1;SC;AG;brA;brB       10351 1443.859
## 734.LMD    FL3.INT.LIN fA1;fA2;fB1;SC;AG;brA;brB        8630 1011.892
##             countsB    sizeA    sizeB        cvA        cvB   ratioAB
## 188-15.LMD 2822.705 99.04086 136.1154 0.02187847 0.01987391 0.7276241
## 734.LMD    1598.355 50.90349 138.9319 0.03458740 0.02737535 0.3663918
##                 ratioSE      rcs linearity
## 188-15.LMD 0.0003994698 1.590141  1.982862
## 734.LMD    0.0005609404 2.331781  1.952023

As a convenience, if you pass a file name to the file argument of exportFlowHist, the table will be saved to that file:

tabulateFlowHist(list(fh1, fh2), file = "flow-results.csv")

Processing Directories with batchFlowHist

Generating a file list in R

If you have a large number of files to process, flowHist can read them all in together. The batchFlowHist function provides this feature. To use it, you need a list of file names to process. The R function list.files() is helpful here. If your FCS files are in the directory “datadir/”, you can collect all their names using:

my.files <- list.files("datadir/", full.names = TRUE)

If you have a mix of FCS files and other files together, you can supply a pattern to match only your data:

my.files <- list.files("datadir/", pattern = ".LMD", full.names = TRUE)

If you want to combine directories, use c():

my.files <- list.files("datadir1/", full.names = TRUE)
my.files <- c(my.files, list.files("datadir2/", full.names = TRUE))

For this example, I’ll continue to use the sample data included with flowPloidyData. Here, we’ll use all of them (i.e., flowPloidyFiles), not just one (e.g. flowPloidyFiles[1]).

Using batchFlowHist

The batchFlowHist() function has two required arguments:

files

a vector of file names (character), such as we generated in the previous section.

channel

a string (character) indicating the data channel to read, which we discovered using the function viewFlowChannels() above.

The following optional arguments may also be used:

bins

the number of bins to group the data into (default: 256)

verbose

boolean, if TRUE (the default) report the filenames as they are processed. Useful for debugging, or tracking progress on long-running jobs

window

numeric, the size of the moving window used to identify local peaks (default: 20)

smooth

numeric, the size of the moving window used to reduce noise in the histogram during peak detection (default: 20)

debris

character, either “SC” or “MC” to use the Single-Cut or Multi-Cut debris model, as described above. (default: “SC”)

linearity

character, either “fixed” or “variable” to set the linearity parameter to 2, or allow it to be fit as a model parameter, as described above. (default: “variable”)

If you have clean, narrow peaks, you may want to lower the values of window and smooth, say to 16 and 8 respectively. Don’t worry about finding perfect values, we provide a GUI interface (below) to allow for quick and painless correction of peak detection.

batch1 <- batchFlowHist(flowPloidyFiles, channel = "FL3.INT.LIN")
## processing /home/biocbuild/bbs-3.4-bioc/R/library/flowPloidyData/extdata//188-15.LMD
## analyzing 188-15.LMD
## processing /home/biocbuild/bbs-3.4-bioc/R/library/flowPloidyData/extdata//222.LMD
## analyzing 222.LMD
## 
## *** Analysis Failed: 222.LMD ***
## processing /home/biocbuild/bbs-3.4-bioc/R/library/flowPloidyData/extdata//240+S.LMD
## analyzing 240+S.LMD
## processing /home/biocbuild/bbs-3.4-bioc/R/library/flowPloidyData/extdata//240-4-2+rad.LMD
## analyzing 240-4-2+rad.LMD
## processing /home/biocbuild/bbs-3.4-bioc/R/library/flowPloidyData/extdata//248+S.LMD
## analyzing 248+S.LMD
## processing /home/biocbuild/bbs-3.4-bioc/R/library/flowPloidyData/extdata//248+r.LMD
## analyzing 248+r.LMD
## processing /home/biocbuild/bbs-3.4-bioc/R/library/flowPloidyData/extdata//337.LMD
## analyzing 337.LMD
## processing /home/biocbuild/bbs-3.4-bioc/R/library/flowPloidyData/extdata//350.LMD
## analyzing 350.LMD
## processing /home/biocbuild/bbs-3.4-bioc/R/library/flowPloidyData/extdata//358.LMD
## analyzing 358.LMD
## processing /home/biocbuild/bbs-3.4-bioc/R/library/flowPloidyData/extdata//597.LMD
## analyzing 597.LMD
## 
## *** Analysis Failed: 597.LMD ***
## processing /home/biocbuild/bbs-3.4-bioc/R/library/flowPloidyData/extdata//734.LMD
## analyzing 734.LMD
## processing /home/biocbuild/bbs-3.4-bioc/R/library/flowPloidyData/extdata//744.LMD
## analyzing 744.LMD
## processing /home/biocbuild/bbs-3.4-bioc/R/library/flowPloidyData/extdata//S.621.LMD
## analyzing S.621.LMD
## processing /home/biocbuild/bbs-3.4-bioc/R/library/flowPloidyData/extdata//SM239.LMD
## analyzing SM239.LMD

(You will see some Error messages in the output. That’s ok, we’ll fix the problems shortly!)

You can export the results with tabulateFlowHist():

tabulateFlowHist(batch1)
##                     channel                    components totalEvents
## 188-15.LMD      FL3.INT.LIN     fA1;fA2;fB1;SC;AG;brA;brB       10351
## 222.LMD         FL3.INT.LIN     fA1;fA2;fB1;SC;AG;brA;brB        9503
## 240+S.LMD       FL3.INT.LIN     fA1;fA2;fB1;SC;AG;brA;brB       14453
## 240-4-2+rad.LMD FL3.INT.LIN fA1;fA2;fB1;fB2;SC;AG;brA;brB        5122
## 248+S.LMD       FL3.INT.LIN     fA1;fA2;fB1;SC;AG;brA;brB       14739
## 248+r.LMD       FL3.INT.LIN fA1;fA2;fB1;fB2;SC;AG;brA;brB        9026
## 337.LMD         FL3.INT.LIN     fA1;fA2;fB1;SC;AG;brA;brB       18948
## 350.LMD         FL3.INT.LIN     fA1;fA2;fB1;SC;AG;brA;brB       14426
## 358.LMD         FL3.INT.LIN         fA1;fB1;SC;AG;brA;brB       13453
## 597.LMD         FL3.INT.LIN     fA1;fA2;fB1;SC;AG;brA;brB        9263
## 734.LMD         FL3.INT.LIN         fA1;fB1;SC;AG;brA;brB        8630
## 744.LMD         FL3.INT.LIN     fA1;fA2;fB1;SC;AG;brA;brB        6432
## S.621.LMD       FL3.INT.LIN     fA1;fA2;fB1;SC;AG;brA;brB       11400
## SM239.LMD       FL3.INT.LIN     fA1;fA2;fB1;SC;AG;brA;brB        5658
##                   countsA   countsB     sizeA    sizeB        cvA
## 188-15.LMD      1443.8587 2822.7054  99.04086 136.1154 0.02187847
## 222.LMD                NA        NA        NA       NA         NA
## 240+S.LMD        164.7604 2026.9687  87.69784 164.8631 0.12545655
## 240-4-2+rad.LMD  461.7506  660.9406  63.99719 108.2691 0.03053639
## 248+S.LMD       2661.7095 2935.5363  77.76789 196.9778 0.02715978
## 248+r.LMD        894.2283 2079.3352  68.76870 111.2225 0.02520056
## 337.LMD            0.0000 2293.7212  88.03744 234.0855 0.03175282
## 350.LMD         2688.3866 1646.6759 124.26213 167.4479 0.03523762
## 358.LMD          282.6209 1441.7430 131.60593 200.5481 0.04889248
## 597.LMD                NA        NA        NA       NA         NA
## 734.LMD          590.4361  924.2660 139.30629 139.0746 0.02733551
## 744.LMD          465.1411 1617.8503 102.66327 140.2114 0.02497280
## S.621.LMD       1584.5243  781.6297  66.64673 201.5002 0.04414191
## SM239.LMD       1150.2769 1612.0374  99.12578 140.9764 0.02740613
##                        cvB   ratioAB      ratioSE         rcs linearity
## 188-15.LMD      0.01987391 0.7276241 0.0003994698    1.590141  1.982862
## 222.LMD                 NA        NA           NA          NA        NA
## 240+S.LMD       0.06401397 0.5319435 0.1190280689    5.128025  1.904515
## 240-4-2+rad.LMD 0.03418960 0.5910938 0.0016707736    2.112959  2.100000
## 248+S.LMD       0.02576721 0.3948053 0.0004575979    3.455624  1.979404
## 248+r.LMD       0.03593791 0.6182986 0.0012838065 3188.197683  1.925709
## 337.LMD         0.04936347 0.3760910 1.6495732764    2.506160  1.923674
## 350.LMD         0.05406628 0.7420944 0.0034498176    1.353658  1.962301
## 358.LMD         0.05989804 0.6562311 0.0141204064    2.706424        NA
## 597.LMD                 NA        NA           NA          NA        NA
## 734.LMD         0.02577074 1.0016662 0.7096797892   16.478613        NA
## 744.LMD         0.04243955 0.7322036 0.0025099492    3.235380  1.944316
## S.621.LMD       0.03758527 0.3307526 0.0019463835    2.648945  2.016584
## SM239.LMD       0.03064514 0.7031372 0.0005715006    1.286542  1.960193

You can scroll through the resulting plots with the following code:

parOld <- par(ask = TRUE)
lapply(batch1, FUN = plot)
## press enter to scroll through your files!
par(parOld)

However, that’s a bit tedious if you need to tweak the parameters for any of the individual FlowHist datasets in the list. A more covenient approach is provided by the browseFlowHist function, which also allows us to address the histograms for which our analysis failed. This is described in the next section.

browseFlowHist

browseFlowHist uses the Shiny framework to provide an interface for viewing histograms. The main purpose of this interface is to allow users a simple, quick way to confirm that the fitted models are sensible, and to correct the most common problem: misidentified peaks that lead to bad starting parameters.

To use browseFlowHist, pass it the list of FlowHist objects we created with batchFlowHist above:

batch1 <- browseFlowHist(batch1)

IMPORTANT remember to assign browseFlowHist(batch1) to a variable. If you don’t, all of the corrections you make will be discarded when you close the app. If you want to keep the original values, assign it to a new variable, i.e., batch1_corrected <- browseFlowHist(batch1).

At this point, R should respond with:

Listening on http://127.0.0.1:3459

(the number after the : will probably be different!) and your web browser will open to display something like this:

(You may need to maximize your browser window for the layout to look ‘correct’)

In the upper-left corner you will find the navigation. Click on Next and Prev to move forwards and backwards over your list of files. Note the counter - you can’t move past the first or last files, of course.

For the first histogram, file 188-15.LMD, the model fit is quite good, so we don’t need to worry about improving the initial parameters. Click on the Linearity or Debris Model buttons to see how much difference these options make (not much in this case). Notice that as you change the options, the model fit updates automatically. The last selections you make before moving on to the next plot are the ones that will be saved!

The second plot, for 222.LMD, doesn’t show a fitted model:

This is due to bad initial peak estimates, despite the fact that they look ok. To fix this, move the Samp. A peak estimate by clicking around near the top of the highest peak. You might need to try a few different spots before you find one that works (I don’t know why this sample is so tricky!).

As you click on the plot, the analysis updates using the new initial peak estimate. If you want to move the peak for sample B, select that in the checkbox above the plot.

Once a good initial peak is selected, the analysis succeeds with a good fit:

The analysis for 240+S.LMD succeeded, and even produced a not-terrible RCS value:

However, a quick inspection of the plot shows that it’s not a very good fit at all. So we need to move the A and/or B peaks until we get something sensible:

That’s better. However, in this case, the Multi-Cut debris model fits even better:

Browse through the rest of the histograms to see which ones succeeded, and which ones need correction. You should be able to get acceptable fits (RCS < 4) for all of them, despite the fact that some of these files are poor quality data.

When you’re done, click on Return to R to get back to the command line. (You can close the browser window after you click this, not before). Now all you’re corrected analyses will be stored in the variable you passed the results of browseFlowHist() to. You can check this with:

plot(batch1[["240+S.LMD"]])

We have the corrected model fits, so we can export the data for further analysis:

tabulateFlowHist(batch1)

Remember, if you don’t pass a file argument to tabulateFlowHist it only prints the results to he screen, they aren’t saved in a file (which file could it be?). To save the results of your analysis to disk, use:

tabulateFlowHist(batch1, file = "my-analysis-output.csv")

References

Bagwell, C. Bruce. 1993. “Theoretical Aspects of Flow Cytometry Data Analysis.” In Clinical Flow Cytometry: Principles and Applications, edited by Kennth D. Bauer, Ricardo E. Duque, and T. Vincent Shankey, 41–61. Williams & Wilkins.

Bagwell, C. Bruce, Sara W. Mayo, Sherry D. Whetstone, Shelly A. Hitchcox, David R. Baker, Donald J. Herbert, Donald L. Weaver, Michael A. Jones, and Edmund J. Lovett. 1991. “DNA Histogram Debris Theory and Compensation.” Cytometry 12 (2). Wiley Subscription Services, Inc., A Wiley Company: 107–18. doi:10.1002/cyto.990120203.

Rabinovitch, Peter S. 1994. “DNA Content Histogram and Cell-Cycle Analysis.” In Methods in Cell Biology, edited by Zbigniew Darzynkiewicz, J. Paul Robinson, and Harry A. Crissman, 41:263–96. San Diego, California: Academic Press.