Checking hybridization quality of MEEBO/HEEBO set arrays

June 29, 2006

Agnes Paquet, Yuanyuan Xiao, (Jean) Yee Hwa Yang, Andrea Barczak, David Erle

 

 

1.                Introduction to MEEBO/HEEBO set quality

 

This component of arrayQuality aims at checking the overall performance of the hybridization, given that the quality of the print-run is satisfactory. Please refer to PRv9mers and PRvQCHyb functions in arrayQuality for inspection of print-run quality. This suite of functions are specifically designed for MEEBO/HEEBO arrays, which provide a wide panel of control sequences, such as positive and negative controls, and tiled controls and doping controls. For more details on MEEBO or HEEBO controls, please refer to the section 4 of this guide.

 

arrayQuality provides 3 types of quality plots for MEEBO/HEEBO arrays:

 

1- A diagnostic plot that includes several statistics and exploratory plots and provides a quick graphic insight on the quality of the array. This plot is very similar to the other diagnostic plots that can be generated using arrayQuality.

 

2- Doping controls quality check: these plots show the performance of the doping controls that were added to the hybridization mix, and compare them to expected results.

 

3- Mismatch and tiled controls plots: these plots are designed to show the specificity of the MEEBO set and to demonstrate amplification bias toward the 3’ end of the transcripts. These plots can be used as a whole oligo set quality check rather than hybridization quality assessment.

 

The Users can choose to generate any of these 3 kinds of plots as needed. In any case, doping controls performance and whole set quality plots results should be interpreted with care; they will not be informative if the hybridization quality is poor.

 

 

2.                Quick starting guide

 

The function in arrayQuality producing MEEBO quality plots is called meeboQuality(). The function producing HEEBO quality plots is called heeboQuality(). By default, both expect GenePix file format as input, and the identities of the doping controls are set to be the mix used at SFGF. However, most arguments can be customized to the user’s own data, please refer to other parts of this guide for more details. We will use the MEEBO functions as example is this guide. All HEEBO functions work in exactly the same way.

 

1) Create a directory and move the image processing output files (e.g. .gpr files) of the slides of interest to this directory. Make sure that all files in the directory come from the SAME print-run (same GAL file).

 

2) Start R, and change R working directory to the one you have just created. In the R menu, select File, then click on “Change dir…”. Browse to your directory from the pop-up window, or enter it manually, and click OK. To double check that you are in the correct directory: in the File menu, click on “Display file(s)…”.

 

3) To load the package in your R session: type:

library(arrayQuality)

 

If needed, you may have to install other required packages like marray, limma, convert, hexbin and RColorBrewer.

 

4) Place your SpikeTypeFile in the same directory as your image processing output files.

 

5) To generate all 3 kinds of quality plots for all files in the directory, type:

test <- meeboQuality()

 

6) To generate DIAGNOSTIC PLOT ONLY, type:

test <- meeboQuality(diagnosticPlot=TRUE, DOPING=FALSE, meeboSetQC=FALSE)

 

7) To generate DIAGNOSTIC PLOT and DOPING CONTROLS QC:

test <- meeboQuality(diagnosticPlot=TRUE, DOPING=TRUE, meeboSetQC=FALSE)

 

8) By default, meeboQuality uses no background subtraction and print-tip loess normalization. If you would like to change these defaults, you can modify the arguments bgMethod and normMethod to any method implemented in limma. For example, if you would like to perform background subtraction and no normalization:

 

test <- meeboQuality(bgMethod=”subtract”, normMethod=”none”)

 

3.                Results

 

meeboQuality will output several plots in the working directory and a MAList object representing all tested slides  using specified background subtraction and normalization in the R working environment. Note if no normalization method is specified by the user, the MAList object output contains raw data. More details about the results can be found in the section 6 of this document.

 

 

4.                MEEBO arrays controls

 

The MEEBO set integrates a large collection of control features corresponding to both endogenous mouse/human transcripts as well as a diverse array of over 200 spiked doping control RNAs. These features can be used to examine in detail the performance of any given hybridization based on criteria such as, sensitivity, specificity, dynamic range, and linearity of the hybridization. In addition, the control features are also helpful in  detecting various possible hybridization or labeling biases. This section describes the properties and designs of these control features that are used to generate the quality plots in the package. For more details, please refer to MEEBO documentation:

 

http://alizadehlab.stanford.edu/

http://www.arrays.ucsf.edu/meebo.html

 

 

Doping controls: oligo ids mCD

They correspond to probes that recognize spike-in transcript from Methanococcus and B. subtilis (Stanford) and from commercial suppliers (Affymetrix, Ambion and Stratagene).

 

Mismatch controls: mCM

Anchored and distributed mismatched versions of 5 selected spike-in transcripts and 5 of the positive control mouse genes. Probes that perfectly match their targeted sequences are called Perfect Matches (PM), and probes that contain deliberate point mutations are called Mismatches (MM).

Each PM probe is replicated between 8 to 23 times on the array.  For each PM, the corresponding MMs have the following features:

1) different numbers of mismatches: 1, 3, …, 63;

2) different directions: anchored at the extremities and distributed.

3) each MM is replicated 3 times

 

 

Negative controls: mCN

Randomized 70mers, selected not to recognize mouse transcripts.

 

Positive controls: mCP

The MEEBO set includes several kinds of positive controls:

- Ubiquitin C probe as a corner placed PMT aid, assuming that sector widths are 28 or 29 spots (192 replicates)

- Normalization genes: 10 mouse “housekeeping” genes, 20 copies of each, based on Vandesompele et al., Genome Biol. 2002 3(7):RESEARCH0034.

 

They are all gathered under Positive controls in the diagnostic plots at the moment.

 

Tiling controls, mCT

Series of probes designed to recognize sequences at varying distances from the 3’ end (used to assess 3’ bias): 11 mouse genes and selected spike-in transcripts.

 

Mouse constitutive exonic oligos: mMC

We are including these oligos in the quality plots in order to compare the expression level of control probes to the expression level of “real genes”.

 

 

5.                Input files

 

Spot Type File

 

A spot Type File is a tab-delimited text file, which is used to identify different types of spots on your array. By default, meeboQuality will use the spot types described in the figure below.

 

 

If you would like to use your own spot types, please follow the recommendations in limma’s User’s Guide to create the file. Then, copy your file, say “MySpotTypeFile.txt” to the SAME DIRECTORY as your gpr files, and pass the name of the file in meeboquality input:

 

test <- meeboQuality(SpotTypesFile=”MySpotTypes.txt”)

 

 

Spike Type File

 

A Spike Type File is a tab-delimited text file containing the information on the user’s spike-in mixture. An example of the file used at SFGF is presented in the figure below. This file MUST contain the following columns:

 

1) SeqID

This column contains the MEEBO sequence identifier for each probe. It is used to retrieve replicates of each doping control. This column can be replaced by the MEEBO Probe_Name, but not any of the other MEEBO set annotations (e.g. MEEBO id or probe description) because they might not be unique among replicated probes on the array. The identifier used to retrieve replicates MUST be passed as argument in meeboQuality; it is set to “SeqID” by default.

 

2) Type

This column is used to group doping controls by manufacturer or transcripts. By default, the doping controls used at UCSF or SFGF will be split into Ambion, Stratagene and MJ.

 

3) MassCY3 and MassCY5

These 2 columns should contain the mass of each doping control added to the hybridization mixture; the “MassCY3” column should contain the mass of oligo labeled with Cy3; “MassCY5” the mass of oligo labeled with Cy5. Numbers should be provided using the SAME UNIT; they will be used as provided to generate the quality plots and compute the expected ratios.

 

The name of both columns should be passed as argument in meeboQuality. By default, they are set to the SFGF doping mixture version 1.7.

 

4) Name

If you would like to use a different naming for the graph labels, you can specify your own names of symbols in the file and pass the column name as arguments.

 

 

 

You can also add any other columns as needed in your experiment. To use your own Spike Type File, copy your file in the working directory containing your image processing output files (gpr). Then pass the necessary arguments to meeboQuality:

 

test <- meeboQuality(SpikeTypeFile=”MySpikeTypeFile.txt”, cy3col=”MyCy3”, cy5col=”MyCy5”, namecol=”MyName”)

 

6.                Quality plots description and examples

 

Diagnostic plot

 

Filename= diagPlot.SLIDENAME.png

 

 

QC plots using mismatch controls

Mismatch controls are defined as follows: 10 transcripts have been selected as wild-type probes (around 20 replicates of each WT), and the sequence of these WT probes have been modified to create anchored and distributed mismatch probes. 

Signal intensity vs. binding energy

Boxplot of normalized raw signal intensity for all MisMatch controls and associated wild-type oligos, binned by Binding Energy.

Filename: BindingEnergy.SLIDENAME.png

Raw signal intensity: Rf + Gf (background corrected as specified)

Filtering: lowly expressed controls are removed from the plot. The median raw intensity of the wild-type controls should be > than the 75 percentile of the intensity for the whole array for a set of mismatch probes to be included.

Normalization: the normalized raw intensity for each mismatch oligo is obtained by dividing the raw intensity by the median raw intensity of the associated wild-type probes.

 

 

 

Signal intensity vs. percentage of mismatch

Scatter plot of the log intensity vs. percentage of mismatched bases for each of the 10 transcripts with mismatch probes, .

Filename: Mismatch.SLIDENAME.png

Anchored and distributed mismatch controls are represented separately, and the loess line for both types is overlaid on top of the scatter plot.

Each plot also includes the boxplot of log intensity for the associated WT probes and the boxplot of log intensity for negative controls. The right axis represents the percentiles of A values for the array in question, and the 50th, 75th and 90th percentiles of A values are highlighted in red.

 

 

 

Signal intensity vs. 3’ distance

For each of the 11 tiling controls, scatter plot of the raw log-intensity vs. 3’ distance, for both channels.

Filename: Tiling.SLIDENAME.png

 

 

 

QC plots using doping controls

 

Cy5 raw signal intensity vs. Cy3 raw signal intensity (log2 scale)

Scatter plot of raw Cy5 signal intensity over the raw Cy3 signal intensity (background correction is performed if requested) for all spiked doping-controls, colored by expected ratio.

Filename: Spike.Cy5vsCy3.SLIDENAME.png

A doping control will not be used if the corresponding Cy5 mass column is empty.

No filtering is performed.

 

 

 

Scatter plot of observed log-ratios

For each spiked doping-control, plot the observed log-ratios of each replicate and overlay the expected log-ratio on top.

Filename: Spike.MM.Scatter.SLIDENAME.png

If doping controls are separated into several types in the SpikeTypeFile, a plot is generated for each type.

 

 

Observed ratio vs. expected ratio

For each type of doping controls, plot the observed log-ratio vs. the expected values for each probe (letters) and the median observed log-ratios vs expected ratio. Log-ratios are extracted after background correction and normalization.

Filename: Spike.MMplot.SLIDENAME.png

If there are more than 16 spiked doping control in one type (usually for MJs), only 1 color will be used and the median of replicated probes will not be printed. No legend on the left in this case.

 

 

Sensitivity

For each type of doping controls, boxplot of the raw signal intensity vs. spiked mass, both in log2 scale, for each channel.

Filename: Spike.Sensitivity.FILENAME.png

This plot assumes that all spiked-mass are on the same mass scale.

 

 

 

 

 

Sensitivity of each individual spike

 

Boxplot of raw signal intensity (log2 scale) for each doping controls, ordered by increasing mass. Doping controls are separated by spike types. A boxplot of negative controls log2 signal intensity and signal intensity quartiles are provided on each graph as scale indicators.

 

Filename: Spike.Sensitivity.Indi.FILENAME.pngThis plot assumes that all spiked-mass are on the same mass scale.