%\VignetteIndexEntry{oneChannelGUI Overview}
%\VignetteDepends{}
%\VignetteKeywords{one channel microarray,extended Affymetrix GUI, limma, quality control, data filtering, time course, alternative splicing}
%\VignettePackage{oneChannelGUI}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\documentclass[12pt]{article}
\usepackage{times}
\usepackage{hyperref}

\textwidth=6.2in
\textheight=8.5in
\oddsidemargin=0.2in
\evensidemargin=0.2in
\headheight=0in
\headsep=0in


\begin{document}
\title{oneChannelGUI Package Vignette}
\author{Raffaele A Calogero, Francesca Cordero, Remo Sanges}
\date{February 09, 2009}
\maketitle

\section{Introduction}
This package is an add-on of affylmGUI for \textit{mouse-click} based QC, 
statistical analysis and data mining for one channel microarray data. It is designed 
for Bioconductor beginners having limited or no experience in interacting with 
Bioconductor line commands.
OneChannelGUI is a set of functions extending the affylmGUI capabilities, rearranging and extending 
the affylmGUI menus. 

This package allows to perform, in a graphical environment, the analysis
pipe-line shown in figure \ref{fig:fig1}, green box.

\begin{figure}[htbp]
  \begin{center}
    \includegraphics{fig1}
    \caption{\label{fig:fig1}Microarray analysis pipe-line.}
  \end{center}
\end{figure}

This vignette gives a general overview of the available graphical tools present in oneChannelGUI.
\begin{Schunk}
\begin{Sinput}
N.B: 
   All the oneChannelGUI graphical outputs are 
   visualized in the R main window, to reduce RAM usage, 
   which is a critical issue when new generation Affymetrix 
   array data or large set of data are loaded.
   Furthermore, exon data generated with APT tools 
   produce, in the working directory, 
   a certain amount of temporary files and directories. 
   A cleanup function is under development. 
   At the present time, user can manually remove, from the working folder, any file 
   starting with target, elevels, glevels, e.g. target51f81aeb, 
   elevels3e9f6b76, and folders starting with out and outMidas, 
   e.g. out17fb164, outMidas4a31ac4, without affecting 
   the results stored in oneChannelGUI. 
\end{Sinput}
\end{Schunk}


\section{Installation}
For the complete functionality of oneChannelGUI some external softwares and data 
need to be installed. Please refer to the \textit{install vignette} of oneChannelGUI package.

\section{Main graphical window}
oneChannelGUI inherits the core functionalities of affylmGUI and its main GUI.
In oneChannelGUI some extra topics are available in the main affylmGUI info left frame, e.g. maSigPro results,
Normalized Exon data, APT DABG, APT MiDAS, Splice Index, etc.
Furthermore, four different menus are automatically exchanged depending on the type of array loaded:
\begin{enumerate}
  \item .CEL IVT Affymetrix arrays. 
  \item .CEL exon 1.0 ST arrays uploaded in oneChannelGUI by Affymetrix APT tools or gene/exon data exported from Affymetrix Expression Console.
  \item .CEL Gene 1.0 ST arrays uploaded in oneChannelGUI by Affymetrix APT tools.
  \item GEO/flat tab delimited expression data file.
  \item ILLUMINA output from BeadStudio software version 1 and 2.
\end{enumerate}

Each item in the menus is simply a graphical implementation of a function of a specific Bioconductor library 
, e.g. ssize: sample size and statistical power estimation.
To get more information on those libraries please refer to their specific vignettes, accessible from the \textit{Help menu}.

\section{File}
This menu allows the loading of .CEL IVT Affymetrix arrays as well as exon arrays, 
GEO Matrix Series files, tab delimited files containing only expression data and ILLUMINA data produced by BeadStudio software version 1 or 2.
In this menu, fig. \ref{fig:fignew1},  are given the main functionalities to handle a microarray analysis project. 

\begin{figure}[htbp]
  \begin{center}
    \includegraphics{fignew1}
    \caption{\label{fig:fignew1} File menu.}
  \end{center}
\end{figure}

\subsection{New}
The item \textit{New}, fig. \ref{fig:fignew1}, allow to load various types of array data, using the sub menu shown in fig. \ref{fig:fignew2},

\begin{figure}[htbp]
  \begin{center}
    \includegraphics{fignew2}
    \caption{\label{fig:fignew2} New: array type selection menu.}
  \end{center}
\end{figure}

\subsubsection{Target file structure}
To load arrays oneChannelGUI uses the information available in a file describing the experimental structure of the data set. 
This file is called \textit{target file} and it is a tab delimited file with a fixed header structure also used by affylmGUI, fig. \ref{fig:fig3}.

\begin{figure}[htbp]
  \begin{center}
    \includegraphics{fig3}
    \caption{\label{fig:fig3} Target file structure.}
  \end{center}
\end{figure}

\begin{Schunk}
\begin{Sinput}
IMPORTANT:
    TARGET FILE MUST NOT CONTAIN CHARACTERS LIKE ;,:_-|\!?+*^()[]{} 
\end{Sinput}
\end{Schunk} 

\subsubsection{Loading Affy .CEL files}
This sub menu, fig. \ref{fig:fignew2}, is enterily inherited by affylmGUI and allows to load .CEL files,
if a Bioconductor cdf file is available. User will be asked to select the working folder, 
i.e. the one in which are present the .CEL files and the target file.

\subsubsection{Loading EXON/GENE ARRAYS}
This sub menu, fig. \ref{fig:fignew2}, allows to load exon/gene 1.0 ST arrays starting from .CEL, taking advantage of Affymetrix
APT tools (\url{http://www.affymetrix.com/support/developer/powertools/index.affx}),
or flat tab delimited files containing gene/exon level expression data exported from 
Affymetrix Expression Console (EC, \url{http://www.affymetrix.com/support/technical/software_downloads.affx}).
If APT tool option is not used (it works only for Exon 1.0 ST data exported from EC), 
a sub-menu allows to select, for tab delimited data, the organism and the subset of 
exon data to be evaluated, fig. \ref{fig:fignew3}

\begin{Schunk}
\begin{Sinput}
IMPORTANT:
    TO USE APT TOOLS IT IS NEEDED TO DOWNLOAD THE GENE/EXON LIBRARY FILES.
    THIS CAN BE DONE WITH THE FUNCTION 
    oeChannelGUI: Set library folder and install Affy gene/Exon library files
    LOCATED IN THE GENERAL TOOLS MENU
\end{Sinput}
\end{Schunk} 


\begin{figure}[htbp]
  \begin{center}
    \includegraphics{fignew3}
    \caption{\label{fig:fignew3} Sub menu to define the organism and the subset of exon data that will be loaded.}
  \end{center}
\end{figure} 

Subsequently, the user will select:
\begin{enumerate}
      \item a working directory, a target file, 
      \item the flat tab delimited files containing respectively gene-level and exon-level data.
\end{enumerate}
If instead, APT tool option is selected, user will select:
\begin{enumerate}
       \item the organism and the subset of exon arrays to be evaluated, fig. \ref{fig:fignew3}, 
       \item a working directory, 
       \item a target file,
       \item the type of probe set summary to be applied to gene/exon level data, fig. \ref{fig:fignew4}.
\end{enumerate}

\begin{figure}[htbp]
  \begin{center}
    \includegraphics{fignew4}
    \caption{\label{fig:fignew4} Sub menu to define the type of probe set summary to be applied.}
  \end{center}
\end{figure} 

Concerning probe set summary options, fig. \ref{fig:fignew4}, since PLIER/RMA are model-based algorithms, exons that are alternatively
spliced in the samples, therefore exhibiting different expression patterns compared
to the constitutive exons, will have down-weighted effect in overall gene-level
target response values. A better estimation of gene-level signal could be obtained using
IterPLIER, which is a variation of PLIER that iteratively discards features (probes) that do not correlate
well with the overall gene-level signal and then recalculates the signal estimate to
derive a robust estimation of the gene expression value primarily based on the expression
levels of the constitutive exons.
Concerning exon level expression estimation, most probe sets only have four probes, which is too
limited to be useful with IterPLIER at the individual exon level, therfore il will be better to use PLIER/RMA.

Probe set summary calculation and uploading will take few minutes depending on the number of .CEL to be loaded and the PC in use.
Ones probe set summary has been calculated, using APT tool, it is also possibile to calculate DABG p-values, fig. \ref{fig:fignew5}.

\begin{figure}[htbp]
  \begin{center}
    \includegraphics{fignew5}
    \caption{\label{fig:fignew5} Selecting DABG p-value calculation.}
  \end{center}
\end{figure} 

DABG p-values represent \textit{data above background}, it is a p-value similar to that used to derive presence/absence calls in MAS 5.0.
DABG p-values could be useful to remove low intensity signals which could produce mis-leading results when alternative splicing
events are evaluated using the Splice Index, where signal intensity information is not considered.

The progress of the probe set summary calculation is shown in the main R window.
\begin{Schunk}
\begin{Sinput}
Gene level probe sets summary started
Read 6 cel files from: target3d92750
Opening bgp file: HuEx-1_0-st-v2.r2.antigenomic.bgp
Opening clf file: HuEx-1_0-st-v2.r2.clf
Opening pgf file: HuEx-1_0-st-v2.r2.pgf
Expecting 1 iteration.
Doing iteration: 1
Opening clf file: HuEx-1_0-st-v2.r2.clf
Opening pgf file: HuEx-1_0-st-v2.r2.pgf
Loading 22011 probesets and 908532 probes.
Reading 6 cel files......Done.
Processing Probesets......................Done.
Cleaning up.
Done.
Run took approximately: 9.56 minutes.

Gene level probe sets summary ended

Gene level probe sets summary ended

Exon level probe sets summary started

Exon level probe sets summary started
Read 6 cel files from: target3d92750
Opening bgp file: HuEx-1_0-st-v2.r2.antigenomic.bgp
Opening clf file: HuEx-1_0-st-v2.r2.clf
Opening pgf file: HuEx-1_0-st-v2.r2.pgf
Expecting 1 iteration.
Doing iteration: 1
Opening clf file: HuEx-1_0-st-v2.r2.clf
Opening pgf file: HuEx-1_0-st-v2.r2.pgf
Loading 287329 probesets and 1111849 probes.
Reading 6 cel files......Done.
Processing Probesets......................Done.
Cleaning up.
Done.
Run took approximately: 6.41 minutes.

Exon level probe sets summary ended

Exon level probe sets summary ended

DABG calculation started
Read 6 cel files from: target3d92750
Opening bgp file: HuEx-1_0-st-v2.r2.antigenomic.bgp
Opening clf file: HuEx-1_0-st-v2.r2.clf
Opening pgf file: HuEx-1_0-st-v2.r2.pgf
Expecting 1 iteration.
Doing iteration: 1
Opening clf file: HuEx-1_0-st-v2.r2.clf
Opening pgf file: HuEx-1_0-st-v2.r2.pgf
Loading 22011 probesets and 908532 probes.
Reading 6 cel files......Done.
Processing Probesets......................Done.
Cleaning up.
Done.
Run took approximately: 3.55 minutes.

DABG calculation ended
\end{Sinput}
\end{Schunk}

\subsubsection{Loading GENE ARRAYS}
This sub menu, fig. \ref{fig:fignew2}, allows to load gene 1.0 ST arrays starting from .CEL, taking advantage of Affymetrix
APT tools (\url{http://www.affymetrix.com/support/developer/powertools/index.affx}).
Subsequently, the user will select:
\begin{enumerate}
       \item the organism and the subset of exon arrays to be evaluated, fig. \ref{fig:fignew73}, 
       \item a working directory, 
       \item a target file,
       \item the type of probe set summary to be applied to gene/exon level data, fig. \ref{fig:fignew74}.
\end{enumerate}


\begin{figure}[htbp]
  \begin{center}
    \includegraphics{fignew73}
    \caption{\label{fig:fignew73} Sub menu to define the organism and the subset of data that will be loaded.}
  \end{center}
\end{figure} 


\begin{figure}[htbp]
  \begin{center}
    \includegraphics{fignew74}
    \caption{\label{fig:fignew74} Sub menu to define the type of probe set summary to be applied.}
  \end{center}
\end{figure} 

Probe set summary calculation and uploading will take few minutes depending on the number of .CEL to be loaded and the PC in use.

\subsubsection{Loading ILLUMINA BeadStudio}
This sub menu, fig. \ref{fig:fignew2}, allows user to load expression data starting form the output
of BeadStudio software. The interface allows to load outputs generated by BeadStudio version 1 or 2.
The Bioconductor annotation libraries for illumina arrays are associated to the loaded data. Since output of
BeadStudio is not log2 transformed, a popup menu will allow the data modification. Furthermore, if BeadStudio
data were not normalized, user could apply various normalization procedures available in the Menu Probe set summary.

\subsubsection{Loading GEO Matrix Series files}
This sub menu, fig. \ref{fig:fignew2}, allows to load GEO Matrix Series files.
To load a GEO Matrix Series file it is only necessary to locate in a specific folder a target file and the Matrix Series 
file downloaded from GEO database.

\begin{Schunk}
\begin{Sinput}
NB: In the target file the FileName column must contain 
exactly the same names present in the header 
below the row !series_matrix_table_begin 
in the Matrix Series file.
Instead Target column could be derived by the row 
!Sample_description in the Matrix Series file.
\end{Sinput}
\end{Schunk}

\subsubsection{Creating a Target file from GEO matrix series file}
To make easier to user the creation of target file for GEO matrix series files. This function, fig. \ref{fig:fignew1} opens
the GEO matrix file of interest and creates a data frame with the following columns, Name, FileName, using the informations written in GEO file:
\begin{Schunk}
\begin{Sinput}
Name: !Sample_title
FileName: ID_REF
Target: !Sample_source_name_ch1
\end{Sinput}
\end{Schunk}
The data frame is the written in the working directory
This target can be further edited and used to load the GEO matrix series file in oneChannelGUI.
\begin{Schunk}
\begin{Sinput}
N.B. Editing of the target file is frequently neded to 
correctly organize the Target column, 
to fullfil the user analysis needs. 
The Target file could contain a subset of the array data present in the series matrix file. oneChannelGUI will load on ly the columns of
the GEO matrix series file present in the Target file.
\end{Sinput}
\end{Schunk}

\subsubsection{Combining GEO matrix series file}
In large GEO experiments, e.g. GSE2109, the experiment is splitted in mutiple Matrix Series Files.
The function \textit{Combining GEO matrix series file} ,\ref{fig:fignew1},  allows to combine the splitted Matrix series Files in a unique
ExpressionSet to be used in oneChannelGUI. The user need to prepare a target file for each of the pieces of the experiment to be combined.
The function will ask the user the number of GEO matrix series files to be combined and subsequently for each of them will ask for the Target file name
and for the corresponding GEO matrix series file to be loaded.
  

\subsubsection{Loading Tab delimited files}
This sub menu, fig. \ref{fig:fignew2}, allows to load tab delimited file containing expression data only.
Also in this case the target and the expression file are the only two files needed to load these data in oneChannelGUI.
In the target file the FileName column should contain exactly the same names present in the header of the tab delimited matrix file.
Example of targets are available at \url{http://www.bioinformatica.unito.it/bioinformatics/DAGEL.II/}.
Actually a specialized module to load \textit{processed-data} derived from ArrayExpress database \url{http://www.ebi.ac.uk/arrayexpress/} 
is not available.
However, \textit{processed-data}, reorganized in a flat tab delimited file containing only expression values, can be loaded on oneChannelGUI.
 
\subsection{Open, Save, Save as}
A project can be saved using the functions \textit{Save as} or \textit{Save}, fig. \ref{fig:fignew1}. A microarray project can also be 
uploaded again in oneChannelGUI with the function \textit{open}. 

\subsection{Exporting normalized expression values}
This function, fig. \ref{fig:fignew1}, allows to export, as tab delimited files expression data, loaded in oneChannelGUI.
This function is also located in \textit{filtering menu} and in the \textit{exon menu}.
If exon arrays are loaded in oneChannelGUI it is possible to extract not only the gene level expression data available in
Normalize Affy Data but also exon level expression data. Furthermore, if already calculated it is possible to export Splice Index,
MiDAS p-values, RP alternative splicing data.

\subsection{Info about the loaded data set}
This function, fig. \ref{fig:fignew1}, gives information about the set of data loaded in oneChannelGUI and on the corresponding annotation library,
if available.

\subsection{Attaching annotation lib info}
If a Bioconductor library is available this is attached to the data loaded in oneChannelGUI and
it will appear in the output of \textit{Info about the loaded data set}.
Using \textit{Attaching annotation lib info} function, after loading expression data as a tab delimited file, 
it is possible to attach the Bioconductor annotation library associated to it.

\subsubsection{Probe set annotation}
The Bioconductor annotation library for IVT Affymetrix arrays or GEO Matrix Series file are 
directly attached.
Concerning Gene and Exon 1.0 ST arrays, annotation information are actually embedded in oneChannelGUI.
For exon arrays annotation is available at the gene level for the core subset of Hs/Mm/Rn. 
As soon as Bioconductor annotation libraries will be available for exon arrays the oneChannelGUI annotation will use 
them for annotation.
Info about the available Affymetrix annotation release can be found in the main R window as part of the oneChannelGUI
release major changes.
For EXON 1.0 ST arrays, it is possible to link GeneBank accession numbers and EG to the gene-level probe sets of data 
present in Normalized Affy Data using the function \textit{Attaching ACC to Probe set IDs}, present in the Biological Interpretation menu.
This funciton also allows to ling EGs to glevel probe sets of a tab delimited fine that has in the first column the probe set ids.

\section{RNA target}
The first item in the menu, fig. \ref{fig:fignew6}, is inherited from affylmGUI and allows the visualization of the experimental structure
described by the target file used to load the expression data.

\begin{figure}[htbp]
  \begin{center}
    \includegraphics{fignew6}
    \caption{\label{fig:fignew6} RNA target menu.}
  \end{center}
\end{figure} 

The second item, fig. \ref{fig:fignew6}, \textit{maSigPro create/view edesign} reorganizes the target file to extract all the information needed
to analyse a time course experiment using maSigPro.
For time course experiments a specific target file is needed, fig. \ref{fig:fig4}.

\begin{figure}[htbp]
  \begin{center}
    \includegraphics{fig4}
    \caption{\label{fig:fig4} Target file for time course analysis.}
  \end{center}
\end{figure}

 Each row of the column named Target, in the target file, describes the array on the basis
 of the experimental design.
 Each element needed for the construction of design fro time course is separated from the others by an underscore. 
 The first three elements of the row are fixed and represent \texttt{Time Replicate Control}, all separated by an underscore:
\begin{Schunk}
\begin{Sinput}
Time_Replicate_Control
\end{Sinput}
\end{Schunk}
All the other elements refer to various experimental conditions.

Considering two different conditions to be evaluated each row is made of 5 elements:
\begin{Schunk}
\begin{Sinput}
Time_Replicate_Control_cond1_cond2 all separated by an underscore.
\end{Sinput}
\end{Schunk}

 Having an experiment made of 9 arrays, with two time points, 0h and 24h, in triplicate, and two different
 experimental conditions to be evaluated, the target file will look like:
\begin{Schunk}
\begin{Sinput}
Name	FileName	Target
mC1	M1.CEL	0_1_1_0_0 
mC2	M4.CEL	0_1_1_0_0
mC3	M7.CEL	0_1_1_0_0
mE1	M3.CEL	24_2_0_1_0
mE2	M6.CEL	24_2_0_1_0
mE3	M9.CEL	24_2_0_1_0
mI1	M2.CEL	24_3_0_0_1
mI2	M5.CEL	24_3_0_0_1
mI3	M8.CEL	24_3_0_0_1
\end{Sinput}
\end{Schunk}

The third item, fig. \ref{fig:fignew6}, instead refers to the reorganization of a target file containing the information related to
clinical parameters to be used for classification pourposes.
In this case each clinical parameter is separated from the others by an underscore as in the case of the time
course. 
The absence of a parameter \texttt{NEEDS} to be indicated in the Target file by NA.
Having an experiment made of 9 arrays with 4 different experimental/clinical parameters 
the target file will look like:
\begin{Schunk}
\begin{Sinput}
Name	FileName	Target
mC1	M1.CEL	0_1_pos_0_NA 
mC2	M4.CEL	0_1_pos_0_yes
mC3	M7.CEL	0_1_neg_0_no
mE1	M3.CEL	24_2_neg_1_NA
mE2	M6.CEL	24_2_NA_1_yes
mE3	M9.CEL	24_2_neg_1_yes
mI1	M2.CEL	12_3_0_pos_yes
mI2	M5.CEL	12_3_0_pos_no
mI3	M8.CEL	12_3_0_pos_no
\end{Sinput}
\end{Schunk}

Once the target file is reorganized by \textit{create/view classification parameters} function, 
the user will be requested to selected an external file containing the description of the experimental/clinical parameters.
In this file, the description of each parameter is separated from the others by a carriage return.  
\begin{Schunk}
\begin{Sinput}
Drug treatment time
Tumor grade 
IHC ER
Metastasis within 5 years
Positive lymphonode
\end{Sinput}
\end{Schunk}

This information will be used to selected a specific clinical parameter for classification analysis.   

\section{QC}
This menu is specialized depending on the type of microarray data set loaded

\subsection{QC for IVT arrays loaded starting from .CEL files}
This menu, fig. \ref{fig:fignew7}, inherits all affylmGUI probe/probe set level quality controls, refer to affylmGUI for their usage.

\begin{figure}[htbp]
  \begin{center}
    \includegraphics{fignew7}
    \caption{\label{fig:fignew7} QC for IVT arrays.}
  \end{center}
\end{figure}


Furthermore, after probe set summary is calculated, samples similarities can be visualized using the
\textit{Sample QC: PCA/HCL} function, producing  a 2D PCA plot and a hierachical clustering of the samples, fig. \ref{fig:fignew8}.

\begin{figure}[htbp]
  \begin{center}
    \includegraphics{fignew8}
    \caption{\label{fig:fignew8} Sample QC: PCA/HCL output for IVT arrays.}
  \end{center}
\end{figure}
If exon data are loaded the function \textit{Gene/Exon PCA/HCL} results could be visualized both at gene or exon level.
Furthermore, the function \textit{Gene/Exon Intensity Histogram} will show the density plot of the normalized intensities
both at gene and at exon level.
  
\subsection{QC for GEO/flat tab delimited files}
Ones probe set expression data derived by GEO Matrix Series file or an expression tab delimited file
\textit{Sample QC: PCA/HCL} function is available as QC.
There is also the function \textit{Box plot of normalized data} which show the array distribution as box plot \ref{fig:fignew72}.

\begin{figure}[htbp]
  \begin{center}
    \includegraphics{fignew72}
    \caption{\label{fig:fignew72} Box plot of normalized data.}
  \end{center}
\end{figure}
 

\subsection{QC for exon arrays}
In the case of exon array the QC menu is slighly different, as shown in fig. \ref{fig:fignew9}

\begin{figure}[htbp]
  \begin{center}
    \includegraphics{fignew9}
    \caption{\label{fig:fignew9} QC menu for exon arrays.}
  \end{center}
\end{figure}

Two functions are available: 
\begin{description}
   \item[Sample QC: PCA/HCL] This function will produce a PCA/HCL for both gene/exon level data
   \item[Gene/Exon intensity histogram] This function will produce a density histogram for gene or exon expression levels.
   \item[Controls raw intensity histogram] This function will produce a box plot for exon, positive controls, and introns,
   negative controls, for housekeeping genes. Probe level data are directly extracted from CEL files using APT tools.
\end{description}

It useful, as quality control, to check intensities before normalization. 

\begin{figure}[htbp]
  \begin{center}
    \includegraphics{fignew75}
    \caption{\label{fig:fignew75} A set of Illumina arrays before and after data normalization.}
  \end{center}
\end{figure}

As it can be seen in fig. \ref{fig:fignew75}
normalization masks the fact that a sub set of arrays, i.e. those with a very narrow boxplot 
\ref{fig:fignew75}A, had something wrong in hybridization.
This problem is completely masked in the normalized data \ref{fig:fignew75}B. 
For this reason \textit{Controls raw intensity histogram}  was written for exon array data since probe sets data 
are directly uploaded as normalized in oneChannelGUI, via APT tools.  This function produce a box plot for exon, positive controls, 
and introns,    negative controls, for housekeeping genes. 
This box plot gives an idea of signals both at high and low intensity range.








\section{Study design}
This menu allows to investigate the statistical quality of a microarray study, fig. \ref{fig:fignew10}.

\begin{figure}[htbp]
  \begin{center}
    \includegraphics{fignew10}
    \caption{\label{fig:fignew10} Study design menu.}
  \end{center}
\end{figure}

This menu gives access to two functions, which are graphical implementations of the ssize and sizepower 
Bioconductor libraries. These functions allow user to determine how many samples are needed to achieve a 
specified power for a test of whether a gene is differentially expressed or, in reverse, to determine 
the power of a given sample size.

\section{Probe set summary}
This menu inherits the affylmGUI probe set summary methods for IVT arrays. Furthermore, the \textit{expresso} function, 
which allows the integration of different methods for background correction, 
normalization, probe specific correction, and summary value computation, is added.
This menu is also available for GEO and tab delimited expression data files and it allows to perform the following 
normalization procedures if a data set without normalization is loaded:
\begin{enumerate}
  \item Cyclic LOESS. 
  \item QUANTILE.
  \item QSPLINE.
\end{enumerate}

In the case of exon arrays this menu is not available since expression data, for exon arrays 
are calculated by APT tools using the oneChannelGUI interface or they are loaded as tab delimited files 
exported by Affymetrix Expression Console. 

\section{Filtering}
A central problem in microarray data analysis is the high dimensionality of gene expression space, 
which prohibits a comprehensive statistical analysis without focusing on particular aspects of the 
joint distribution of the gene expression levels. Possible strategies are to perform data-driven nonspecific 
filtering of genes (von Heydebreck, 2004) before the actual statistical analysis or to filter, making use of 
biologically relevant a priori knowledge.
This menu allows user to apply a variety of filtering procedures, fig. \ref{fig:fignew11}

\begin{figure}[htbp]
  \begin{center}
    \includegraphics{fignew11}
    \caption{\label{fig:fignew11} Filtering menu for GEO/Affy IVT arrays.}
  \end{center}
\end{figure}

\subsection{Filtering by IQR}
The IQR filter will select only those probe sets characterized by a relative large signal distribution. 
The way the IQR filter is shown in fig. \ref{fig:fignew12}

\begin{figure}[htbp]
  \begin{center}
    \includegraphics{fignew12}
    \caption{\label{fig:fignew12} IQR filtering: The distributions of the various probe sets beloging to a data set are shown in red, if they are
    wide and they are retained by the filter, and in blue, if they are narrow and they are discarded by the filter.}
  \end{center}
\end{figure}

In oneChannelGUI it is possible to select three filtering values:
\begin{enumerate}
  \item IQR 0.1, weak filter, i.e. only the estreme unchanging probe sets are removed. 
  \item IQR 0.25, intermediate filter.
  \item IQR 0.5, strong filter, i.e. the majority of the unchanged probe sets are removed.
\end{enumerate}

More informations about the efficacy of the filtering procedure can be seem in:
\url{http://www.bioinformatica.unito.it/oneChannelGUI/diaset.1.usa.ppt}

This filtering procedure can be applied to any kind of loaded arrays. However, it seems not to
be very effective when it is used to gene level expression data calculated with iterPlier.

\subsection{Filtering by intensity}
For IVT/GEO/tab delimited expression data files it is also possibile to apply a filtering 
procedure based on intensity signals, the graphical interface to do it is shown in fig. \ref{fig:fignew13}.

\begin{figure}[htbp]
  \begin{center}
    \includegraphics{fignew13}
    \caption{\label{fig:fignew13} Intensity fitering: This filter will retain a probe set only if a certain fraction
    of the samples are characterized by an intensity value over a certain user defined threshold.}
  \end{center}
\end{figure}
This filtering approach is quite useful to remove probe sets having very low intensity values. 

\subsection{Filtering by list of probe sets/EG ids}
It is also possible to filter expression data using a text file containing a list of probe set ids separated by
carriage return. If the data set is associated to a Bioconductor annotation library the filtering procedure can be 
also done using a text file containing a list of Entrez gene identifiers separated by carriage return.  

\subsection{Recovering unfiltered data}
It is possible to recover the data before the last filtering using the \textit{Recovering unfiltered data} function.

\subsection{Filtering menu: exon data}
If exon data are loaded the filtering menu appear slighly different, fig. \ref{fig:fignew50}. 

\begin{figure}[htbp]
  \begin{center}
    \includegraphics{fignew50}
    \caption{\label{fig:fignew50} Filtering menu for exon data.}
  \end{center}
\end{figure}

In particular, the function \textit{Set background threshold}  
collects the exon/intron expression values for a set of housekeeping genes present in the
chip within chip quality controls and it offers the opportunity to set a background intensity threshold on the basis
of the desidered level of intersection between the expression of exons versus introns. 
RMA intensity calculation is preferred, fig. \ref{fig:fignew52}, since, 
if probe set summaries are calculated with Plier or iterPlier, the differences in expression distribution between 
exon and introns are not enought wide, fig. \ref{fig:fignew51}. 

\begin{figure}[htbp]
  \begin{center}
    \includegraphics{fignew52}
    \caption{\label{fig:fignew52} Human exon arrays, probe set summaries were calculated with RMA, 
                                  exon/intron distribution of HK present in the chip as quality controls.}
  \end{center}
\end{figure}   

Setting a background threshold using exon/intro distributions for HK genes, it is possible to 
apply to the full data set an intensity filtering that will remove gene and the corresponding exons 
on the basis of the selected threshold.
The intensity filter for exon arrays works exactly as that for IVT arrays but using a fixed threshold
defined as described. 


\begin{figure}[htbp]
  \begin{center}
    \includegraphics{fignew51}
    \caption{\label{fig:fignew51} Human exon arrays, probe set summaries were calculated with iterPlier (gene level) 
                                  and Plier (exon level), exon/intron distribution of HK present in the chip as quality controls.}
  \end{center}
\end{figure} 

An other filter that allows the removal of low intensity probe sets is based on the DABG p-values.
Using the function \textit{Filtering on DABG p-values} it is possible to select the desidered level of filtering using a
mask, fig. \ref{fig:fignew53}. 

\begin{figure}[htbp]
  \begin{center}
    \includegraphics{fignew53}
    \caption{\label{fig:fignew53} DABG p-value based filtering selection mask.}
  \end{center}
\end{figure} 

A threshold of 50\% means that only probe sets where in half of the samples over the selected DABG p-value threshold
will kept.
As can be seen in fig. \ref{fig:fignew54} this filtering also removes low intensity signals very near to zero.
\begin{Schunk}
\begin{Sinput}
N.B. Recovering the data prior filtering is not 
implemented for DABG p-value filtering, yet.
\end{Sinput}
\end{Schunk}

\begin{figure}[htbp]
  \begin{center}
    \includegraphics{fignew54}
    \caption{\label{fig:fignew54} DABG p-value filtering results with parameters: DABG p-value threshold 0.05 and 
                                  50\% of arrays over the threshold.}
  \end{center}
\end{figure}   

Reguarding very low intensity probe sets present if iterPlier/Plier are used, the function 
\textit{Setting to 0 log2 intensity below 1, to be used with plier only} will set them zero.

A filter to eliminate cross hybridizing probe sets is based on Affymetrix XHYB and CROSSHYB annotations, which are
part of the data embedded in oneChannelGUI.
XHYB field is mainly an indicator of weak assignment between a transcript cluster and the assigned mRNA, 
suggesting a potential crosshyb.
CROSSHYB is a measure of the promiscuity of the probes within a probe set among transcribed sequences.
\begin{enumerate}
\item unique. All probes in the probe set perfectly match only one sequence in the putatively
 transcribed array design content. The vast majority of probe sets are unique.  
\item similar. All the probes in the probe set perfectly match more than one sequence in 
the putatively transcribed array design content.
\item mixed. The probes in the probe set either perfectly match or partially match more than one sequence 
in the putatively transcribed array design content. 
\end{enumerate}
XHYB and CROSSHYB  are used to remove probe sets characterized by multiple hybridization of exon probes
Cross-hybridization potential of the probe set determined at the time of array design. 
The function \textit{Filtering out cross hybridizing probe sets} allows to remove all gene level probe sets, 
and the corresponding exon data, associated to exon level probe sets mapped as XHYB or CROSSHYB.
The filtering option mask is shown in fig. \ref{fig:fignew61}.
\begin{figure}[htbp]
  \begin{center}
    \includegraphics{fignew61}
    \caption{\label{fig:fignew61} Cross hybridization filtering options mask.}
  \end{center}
\end{figure}

The function \textit{Selecting only probe sets with multiple mRNA association in ensembl} it is very useful
when alternative splicing events are investigated, if the researcher is interested to investigate only those
probe sets associated to multiple transcripts annotated on ensembl database. We strongly suggest to apply this 
filter at least to get an overview of the possibile known alternative splicing events that could be collected
within the annotated ensembl data. This filter will reduce both the computational time to calculate splice index and 
type I statistical error, at the level of statistical analysis for alternative splicing detection.

Specifically, this function select at gene-level only those probe sets which are associated to multiple entries 
on ensembl data base. The filter uses the biomaRt package to collect this information from ensembl database.

The function oneChannelGUI: \textit{Exporting Gene-level probe set ids} is useful to extract the list of probe set ids
associated to the gene-level data set loaded on oneChannelGUI.


\section{Modelling statistics}
This menu allows to perform limma differential expression analysis as well as time course analysis using the 
maSigPro package, fig. \ref{fig:fignew14}.

\begin{figure}[htbp]
  \begin{center}
    \includegraphics{fignew14}
    \caption{\label{fig:fignew14} Modelling statistics}
  \end{center}
\end{figure}

\subsection{limma}
The affylmGUI interface to limma is fully inherited, see limma and affylmGUI vignettes for usage. 
The function \textit{raw p-value distribution} is implemented to evaluate if the BH/BY type I error correction methods
could be used.
To apply BH correction two conditions should be satisfied:
\begin{enumerate}
  \item The gene expressions are independent from each other. 
  \item The raw distribution of p-values should be uniform in the non significant range.
\end{enumerate}
Instead if BY correction is used it is sufficient only the second one, fig. \ref{fig:fignew15}.

\begin{figure}[htbp]
  \begin{center}
    \includegraphics{fignew15}
    \caption{\label{fig:fignew15} Output of \textit{raw p-value distribution}: 
               The raw distribution of p-values is uniform in the non significant range.}
  \end{center}
\end{figure}


The affylmGUI function \textit{Table of genes ranked in order of differential expression} is a modified version of the original found in affylmGUI
 to allow users to check with MA/Volcano plots the set of differentially expressed probe sets before saving the table, fig. \ref{fig:fignew16}.

\begin{figure}[htbp]
  \begin{center}
    \includegraphics{fignew16}
    \caption{\label{fig:fignew16} MA and Volcano plots for differentially expressed probe sets, red dots, detected by limma method.}
  \end{center}
\end{figure}

\subsection{Venn diagrams between probe set list}
This function is modified with respect to the original one presents in affylmGUI to allow Venn diagrams using lists of
probe sets, saved in text files where each id is separed by the others by carriage return, derived by any of the 
available statistical methods implemented in oneChannelGUI.
Furthermore, if a Bioconductor annotation library is linked to the loaded data set, Venn diagrams can be 
generated using the Entrez Gene ids associated to the probe sets, removing probe sets retundancy.


\subsection{Time course analysis}
Time course analysis can be performed on oneChannelGUI using maSigPro package, fig. \ref{fig:fignew14}.

maSigPro is a R package for the analysis of single and multiseries time course microarray
experiments. maSigPro follows a two steps regression strategy to find genes
with significant temporal expression changes and significant differences between experimental
groups.

The first step, to run maSigPro analysis, is to reorganize the target file using 
the function \textit{create an edesign for maSigPro}, see also target file paragraph for time course experiment requirements.
Using the function \textit{Execute maSigPro} user will select the parameters needed for maSigPro, fig. \ref{fig:fignew17}.

\begin{figure}[htbp]
  \begin{center}
    \includegraphics{fignew17}
    \caption{\label{fig:fignew17} maSipPro parameters setup.}
  \end{center}
\end{figure}

\subsubsection{Lever of FDR control: Q parameter}
The first step is to compute a regression fit for each gene. The p-value associated to the F-Statistic of the model 
are computed and they are subsequently used to select significant genes. 
maSigPro corrects this p-value for multiple comparisons by applying false discovery rate (FDR) procedures. 
The level of FDR control is given by the function parameter Q, fig. \ref{fig:fignew17}.
\subsubsection{P-value cut off: alfa}
maSigPro applies, as second step, a variable selection procedure to find significant variables for each gene. 
This will ultimatelly be used to find which are the profile differences between experimental groups. 
At each regression step the p-value of each variable is computed and variables get in/out the model 
when this p-value is lower or higher than the given cut-off value alfa, fig. \ref{fig:fignew17}.
\subsubsection{R-squared threshold of the regression model}
The last step in maSigPro analysis is to generate a lists of significant genes.
As filtering maSigPro uses the R-squared of the regression model, fig. \ref{fig:fignew17}.

maSigPro calculation steps can be followed on the main R window.
The end of the maSigPro analysis will be given by a popup message, fig. \ref{fig:fignew18}.

\begin{figure}[htbp]
  \begin{center}
    \includegraphics{fignew18}
    \caption{\label{fig:fignew18} End of maSigPro calculation.}
  \end{center}
\end{figure}

N.B: Multiple test problem is also present in maSigPro analysis. Therefore, before running maSigPro, 
remember to perform some filter based on functional information or samples distribution.

\subsubsection{View maSipPro results}
The coefficients obtained in the second regression model will be useful to cluster together 
significant genes with similar expression patterns and to visualize results.
Various visualization options are available:
\begin{enumerate}
  \item Venn diagrams, fig. \ref{fig:fignew19} . 
  \item Expression profiles saved in a pdf file, figs. \ref{fig:fignew20}, \ref{fig:fignew21}.
  \item Tab delimited files with the probe sets found differentially expressed in each of the experimental conditions.
\end{enumerate}



\begin{figure}[htbp]
  \begin{center}
    \includegraphics{fignew19}
    \caption{\label{fig:fignew19} maSigPro Venn diagrams output.}
  \end{center}
\end{figure}

\begin{figure}[htbp]
  \begin{center}
    \includegraphics{fignew20}
    \caption{\label{fig:fignew20} Selecting the experimental condition to be used to profiles plotting.}
  \end{center}
\end{figure}

\begin{figure}[htbp]
  \begin{center}
    \includegraphics{fignew21}
    \caption{\label{fig:fignew21} An example of profiles plotting.}
  \end{center}
\end{figure}


\section{Permutation statistics}
The permutation statistics menu, fig. \ref{fig:fignew22}, allows to run two class unpaired SAM analysis
implemented in the siggenes package and two class samples analysis using the rank product method implemented in
RankProd package.

\begin{figure}[htbp]
  \begin{center}
    \includegraphics{fignew22}
    \caption{\label{fig:fignew22} Permutation statistics menu.}
  \end{center}
\end{figure}

\subsection{SAM analysis}
The module recognizes if a two class unpaired analysis can be performed. Subsequently, a table with DELTA values and FDRs
will be shown to the user. Furthermore, user need to select a delta threshold to continue the analysis, fig. \ref{fig:fignew23}. 

\begin{figure}[htbp]
  \begin{center}
    \includegraphics{fignew23}
    \caption{\label{fig:fignew23} DELTA table and DELTA value selection module.}
  \end{center}
\end{figure}

Siggenes output for differentially expressed genes, given the selected DELTA value, will be shown in the main R window, fig. \ref{fig:fignew24},
togheter with a absolute log2(FC) selection module, fig. \ref{fig:fignew24}.

\begin{figure}[htbp]
  \begin{center}
    \includegraphics{fignew24}
    \caption{\label{fig:fignew24} SAM results given at specific user defined DELTA value and the absolute log2(fc) selection mask.}
  \end{center}
\end{figure}

The fold change filters allows the selection, within the SAM significant probe sets, of those grater than a user defined threshold.
Subsequently, the differentially expressed genes will be shown, fig. \ref{fig:fignew25}, and the user will decide if they should be saved.

\begin{figure}[htbp]
  \begin{center}
    \includegraphics{fignew25}
    \caption{\label{fig:fignew25} Differentially expressed probe sets to be saved.}
  \end{center}
\end{figure}
  
\subsection{Rank product analysis}
The RankProd module is a graphical interface to the RankProd package functions for the analysis of gene expression microarray data.
RankProduct package allows the identification of differentially expressed genes using the so called rank
product non-parametric method (Breitling et al., 2004, FEBS Letters 573:83) to identify up-regulated
or down-regulated genes under one condition against another condition, e.g. two different treatments,
two different tissue types, etc.
The user needs only to define the pfp (percentage of false prediction) threshold 
and the number of permutations to be applied, fig. \ref{fig:fignew26}. 
\begin{figure}[htbp]
  \begin{center}
    \includegraphics{fignew26}
    \caption{\label{fig:fignew26} RankProd selection parameters mask.}
  \end{center}
\end{figure}

At the end of the analysis the user will decide if he would like to save the differentially expressed probe sets in a tab
delimited file. If a Bioconductor annotation library is available Entrez Gene identifier and Symbols will be added to the saved output.  

\subsubsection{Target structure}
In a rank product analysis for data sets from different origin the structure of the Target column of the target file
can contain also an integer describing the data origin.
\begin{Schunk}
\begin{Sinput}
Name	FileName	Target
mC1	M1.CEL	0_1 
mC2	M4.CEL	0_1
mC3	M7.CEL	0_1
mE1	M3.CEL	0_2
mE2	M6.CEL	0_2
mE3	M9.CEL	1_1
mI1	M2.CEL	1_1
mI2	M5.CEL	1_2
mI3	M8.CEL	1_2
\end{Sinput}
\end{Schunk}
The oneChannelGUI module will select the RankProd method on the basis of the Target structure. 

\section{Classification}
This module, fig. \ref{fig:fignew27}, provides a link to the pamr and pdmclass packages 
designed to carry out sample classification from gene expression data, respectively by 
the method of nearest shrunken centroids (Tibshirani, et al., 2002) and by penalized discriminant methods.
\begin{figure}[htbp]
  \begin{center}
    \includegraphics{fignew27}
    \caption{\label{fig:fignew27} Classification menu.}
  \end{center}
\end{figure}

\subsubsection{Create/view/reset classification parameters}
The \textit{Create/view classification parameters} function reorganizes the Target columns separating the experimental/clinical
parameters. The \textit{Reset classification parameters} function deletes the Targets reorganization and the association to the file 
containing the names of the parameters present in the Target column of the target file.

\subsubsection{Create a training/test set}
The first step of this module is the definition of the covariate to be used for the classification analysis.
 The user will be requested to select, from a table, listing the names clinical parameters, 
i.e. phenoData covariate names, one of them indicating its row number, fig.  \ref{fig:fignew29}.
\begin{figure}[htbp]
  \begin{center}
    \includegraphics{fignew29}
    \caption{\label{fig:fignew29} Selecting the classification parameter.}
  \end{center}
\end{figure}
Subsequently, the user could decide to divide the data set in a training (2/3) and a test (1/3) set or 
use the full data set as training set.
All arrays, which are not linked to any of clinical/experimental params, i.e. those marked as NA,
will be discarded from the following analyses.  

The \textit{Create a training/test set} function then allows the access to PAMR/PDMCLASS classification tools and to a PCA 
visualization module, fig.  \ref{fig:fignew28}.
\begin{figure}[htbp]
  \begin{center}
    \includegraphics{fignew28}
    \caption{\label{fig:fignew28} Classification methods selection mask.}
  \end{center}
\end{figure}

It is also possible to evaluate how the classification performance can be associated to a single subset of probesets or if it is a general
characteristic of the data set. ideally we espect that only a small subset of probe sets should be able to discriminate between groups, if 
the full data set is able to discriminate in a way independent by the subset of probe sets considered for the classification it might be an indication of some
strong bias that could not be necessarely associated to the biological event under investigation, eg. it could be due to some experimental bias.
This functionality is provided if the \textit{Probability of classification given a random set of data} 
option is selected, see fig.  \ref{fig:fignew28}


\subsubsection{PAMR}
If PAMR method is selected, 2-3 steps are performed and pop-up info messages allow
to check the resulting plots.
Initially the cross-validated misclassification error curves are calculated, fig. \ref{fig:fignew30},
and shown in the main R window.
\begin{figure}[htbp]
  \begin{center}
    \includegraphics{fignew30}
    \caption{\label{fig:fignew30} Cross-validated misclassification error curves}
  \end{center}
\end{figure}
Then, user defines a shrinking threshold and if the number of selected probe sets is below 50 the
centroids will be plotted, fig. \ref{fig:fignew31}. 

\begin{figure}[htbp]
  \begin{center}
    \includegraphics{fignew31}
    \caption{\label{fig:fignew31} Shrunken class centroids.}
  \end{center}
\end{figure}

Subsequently the classification performance of the selected sub group  of probe sets will be
shown as plot and as text in the R window , fig. \ref{fig:fignew32}. 

\begin{figure}[htbp]
  \begin{center}
    \includegraphics{fignew32}
    \caption{\label{fig:fignew32} Cross-validated sample probabilities.}
  \end{center}
\end{figure}
Results are also available as numerica values in the R window:

\begin{Schunk}
\begin{Sinput}
    neg pos Class Error rate
neg  23   5        0.1785714
pos   2  48        0.0400000
\end{Sinput}
\end{Schunk}

If the results are satisfying user can save the probe sets defined by this analysis, fig. \ref{fig:fignew33}. 

\begin{figure}[htbp]
  \begin{center}
    \includegraphics{fignew33}
    \caption{\label{fig:fignew33} Probe sets to be use as classifier.}
  \end{center}
\end{figure}
Furthermore, if the test set was created it will be possible to check the ability of the selected sub set of genes to
separate the classes under analysis using a hierarchical clustering, fig. \ref{fig:fignew34}. 

\begin{figure}[htbp]
  \begin{center}
    \includegraphics{fignew34}
    \caption{\label{fig:fignew34} Testing the efficacy of the classifier on the test set by HCL.}
  \end{center}
\end{figure}

\subsubsection{PCA}
The PCA visualization method offers the possibility to see how the data set can be grouped on the basis of the 
used clinical/experimental parameter under analysis, fig. \ref{fig:fignew35}. 

\begin{figure}[htbp]
  \begin{center}
    \includegraphics{fignew35}
    \caption{\label{fig:fignew35} 1st and 2nd principal components space.}
  \end{center}
\end{figure}

\subsubsection{PDMCLASS}
The PDMCLASS module allows the selection of different type of classification procedures, fig. \ref{fig:fignew36}. 

\begin{figure}[htbp]
  \begin{center}
    \includegraphics{fignew36}
    \caption{\label{fig:fignew36} PDMCLASS methods selection mask.}
  \end{center}
\end{figure}
 
The analysis will produce a numerical output of the effecacy of the dataset as classifier:
\begin{Schunk}
\begin{Sinput}
object neg pos
   neg  21   1
   pos  22  74
attr(,"error")
[1] 0.1949153
\end{Sinput}
\end{Schunk}
It is notable that this part of the analysis could take some minutes depending on the data set under analysis and the 
machine used for the analysis.
Subsequently it will be possible to select the probe sets that have the gratest influence in differentiating sample classes.
To do it, user will be requested to select the number of top ranked probe sets and the number of permutations to be used 
for the cross-validation.
Probe set will be shown in a TK/TCL table with their probabilities to be able to discriminate between classes:
\begin{Schunk}
\begin{Sinput}
	pos vs neg
209604_s_at	1
202088_at	0.92
218807_at	0.8
211430_s_at	0.56
205081_at	0.48
213693_s_at	0.4
209138_x_at	0.44
200670_at	0.32
212099_at	0.44
208682_s_at	0.28
\end{Sinput}
\end{Schunk}
These results could be saved as a tab delimited file.
Testing the efficacy of the selected probe sets in the test set it is not implemented, yet.


\section{Biological Interpretation}
This section gives a graphical interface to the GOstats package and it allows the preparation of template A 
for IPA analysis on \url{http://www.ingenuity.com}, fig. \ref{fig:fignew38}. It also allows very basic meta-analysis using the metaArray package. 

\begin{figure}[htbp]
  \begin{center}
    \includegraphics{fignew38}
    \caption{\label{fig:fignew38} Biological interpretation menu.}
  \end{center}
\end{figure}

\subsection{Identifying enriched GO terms and related issues}
This function is also available for gene level exon array analysis.
Specific annotation libraries are not available for exon arrays, yet.
Therefore, to perform this analysis we use the annotation informations embedded in oneChannelGUI
and link the accession ids available in this annotation to Entrez Gene ids using the humanLLMappings,
mouseLLMapppings and ratLLMappings available in Bioconductor. 
The function \textit{oneChannelGUI: Identifying enriched GO terms} searches for the presence of enriched 
GO terms within a set of differentially expressed probe sets, given a certain
probe set universe, i.e. the array data available in Normalized Affy Data. For more information about 
GO enrichment please refer to the GOstats vignette in the oneChannelGUI help menu.
The user needs to select some parameters using a selection mask, fig. \ref{fig:fignew39}. 

\begin{figure}[htbp]
  \begin{center}
    \includegraphics{fignew39}
    \caption{\label{fig:fignew39} GO terms enrichment parameters selection mask.}
  \end{center}
\end{figure} 

Subsequently, the user will be requested to select a list of differentially expressed probe sets, saved in a txt file.
The file should contain only a list of probe set separated by carriage return, without header:
\begin{Schunk}
\begin{Sinput}
1452968_at
1448228_at
1418028_at
1439113_at
1424338_at
1416503_at
1416371_at
1437165_a_at
1451047_at
1434005_at
1421916_at
1457012_at
1443823_s_at
1429379_at
1416168_at
1429974_at
1416121_at
1421917_at
1416405_at
\end{Sinput}
\end{Schunk}
The analysis could require quite a lot of RAM and when it is finished a message 
summarizing the results pops up, fig. \ref{fig:fignew40}. 

\begin{figure}[htbp]
  \begin{center}
    \includegraphics{fignew40}
    \caption{\label{fig:fignew40} GO enrichment results summary message.}
  \end{center}
\end{figure}

A table with the enriched GO terms will be then shown and it could be saved as tab delimited file, fig. \ref{fig:fignew41}. 

\begin{figure}[htbp]
  \begin{center}
    \includegraphics{fignew41}
    \caption{\label{fig:fignew41} Enriched GO terms table.}
  \end{center}
\end{figure}

In the main R window it will be possible to see a plot summarizing the GO terms relations 
existing between the enriched GO terms, fig. \ref{fig:fignew42}. 

\begin{figure}[htbp]
  \begin{center}
    \includegraphics{fignew42.pdf}
    \caption{\label{fig:fignew42} Relations between enriched GO terms. Enriched GO terms, red, others, light blue.}
  \end{center}
\end{figure}

It is also possible to highlight parents of a specific GO term using the function \textit{Plotting parents of a GO term}
In this case a dialog will be used to pass to the function the GO term, e.g. GO:0001525.
Subsequently after selecting the GO class, i.e. BP, MF or CC, the results will be available in the main R window,  fig. \ref{fig:fignew43}. 

\begin{figure}[htbp]
  \begin{center}
    \includegraphics{fignew43}
    \caption{\label{fig:fignew43} Plotting GO term parents.}
  \end{center}
\end{figure}

It is also possible to annotate and save, in an html file, the subset of differentially expressed probe sets associated to a
specific enriched GO term using the function \textit{oneChannelGUI: Extracting Affy IDs linked to an enriched GO term}.
In the case exon arrays are used with \textit{oneChannelGUI: Extracting Affy IDs linked to an enriched GO term} function 
the output file is a tab delimited file with the available annotations instead of an HTML file.
The user will be requested to select the GO term of interest, fig. \ref{fig:fignew42}, and subsequently to open the file list of differentially
expressed probe sets used for the GO enrichment analysis.
A pop-up message will indicate when the annotation table will be ready to be saved in an HTML file, fig. \ref{fig:fignew44}.
The output for exon arrays will be instead a tab delimited file. 

\begin{figure}[htbp]
  \begin{center}
    \includegraphics{fignew44}
    \caption{\label{fig:fignew44} Annotation file for a subset of differentially expressed probe sets linked to GO:0001525 BP enriched term.}
  \end{center}
\end{figure}

\subsection{oneChannelGUI: Making template A for Ingenuity analysis}
This function reorganizes the output derived by any of the tables generated by \\ limma/siggenes/RankProd to 
generate a template A to be uploaded to Ingenuity database.
The function initially requests to select the type of top table to be used, fig. \ref{fig:fignew45}. 

\begin{figure}[htbp]
  \begin{center}
    \includegraphics{fignew45}
    \caption{\label{fig:fignew45} Differentially expressed probe sets tables selection mask.}
  \end{center}
\end{figure}

The output templateA table will have the following structure,  fig. \ref{fig:fignew46}.

\begin{figure}[htbp]
  \begin{center}
    \includegraphics{fignew46}
    \caption{\label{fig:fignew46} Template A structure, The first
    column contains the Gene/Protein IDs the second column log2(fc) only for the set of probe sets
    considered differentially expressed. The third column has 0 for all the differentially expressed probe sets and 1 for the rest.
    The 4th column has the true p-value only for the differentially expressed probe set, the rest is set to 1.}
  \end{center}
\end{figure}

\subsection{Biological Interpretation for EXON 1.0 ST arrays}
In case EXON 1.0 arrays are loaded inte Biological Interpretation menu contains the following functions:
\textit{oneChannelGUI: Attaching ACC and Entrez Gene IDs to Probe set IDs (EXON 1.0 ST)} which allows to associate
EG ids to gene-level probe sets.
\textit{oneChannelGUI: Mapping exon level prober sets to EG exons} which associates the statistical and expression 
data produced by a oneChannleGUI exon-level analysis to the exonic structure of Entrez Gene ID. 
This function uses biomaRt to retrieve the sequence of EG exons.
RRE database (\url{http://www6.unito.it/RRE/EN/}) is instead used to retrieve the exon-level target sequences. 
Any exon-level probe set id to be associated to the EG exonic sequence need to be a perfect matching substring of the exon. 
In the otehr case no exon is associated to the probe set.
Furthermore, the conservation of each exon over the various isoform is defined.
The data frame containing these information can be saved using the function \textit{oneChannelGUI: Exporting Gene exprs and/or Exon/SI/MiDAS/RP data/elevel IDs to exon EGs}
The column referring to exon conservation is called conserved.exons. If the value 
in this column is 1 the exon is conserved over all isoforms. If it is lower than 1 is conserved only in some of the isoforms, fig. \ref{fig:fignew76}.

\begin{figure}[htbp]
  \begin{center}
    \includegraphics{fignew76}
    \caption{\label{fig:fignew76} Statistical and expression data mapped on EG exons.}
  \end{center}
\end{figure}

The previous function also export a file where the structure of alternative spliced isoforms is described in a tab delimite format, fig. \ref{fig:fignew77}. 
\begin{figure}[htbp]
  \begin{center}
    \includegraphics{fignew77}
    \caption{\label{fig:fignew77} Structure of alternative spliced isoforms.}
  \end{center}
\end{figure}


\section{Biological Interpretation STILL UNDER DEVEL}
This menu gives also access to some meta-analysis tools, fig. \ref{fig:fignew48}. 

\begin{figure}[htbp]
  \begin{center}
    \includegraphics{fignew48}
    \caption{\label{fig:fignew48} Biological Interpretation DEVEL version.}
  \end{center}
\end{figure}

It is possible to merge to the NormalizedAffyData up to 3 other data sets characterized by having
the same ids and the same order of the NormalizedAffyData ids. To merge the data sets it is necessary a tab
delimited file and a target for each data set.
Integrative correlation (Parmignani et al. 2004), implemented in the metaArray package, can be accessed with the function
\textit{Mining similarities/dissimilarities between merged data sets (IC)}.
The function produces an histogram of the various comparisons and it saves, in tab delimited file, the IC values for the 
various comparisons.

\section{General tools}
This section allows the use of some functions which are not part of a specific Bioconductor package
but could be of general use.
The function \textit{oneChannelGUI: Update all Bioconductor libraries} allows to run a on-line update of all the Bioconductor
libraries present in the system. This function is very useful to keep updated for bugs correction during the 6 months life of a
Bioconductor release.
The function \textit{oneChannelGUI: Extract a column from a tab delimited file}
allows the extraction of any of the colums of a tab delimited file. This function is particularly useful to
generate probeset ids list to be used for Venn diagram rapresentation.
The function \textit{oneChannelGUI: Filtering a tab delimited file}
allows to subset a tab delimited file  given a list of values, e.g. values, symbols, probe sets, etc., present in a file
where each value is separated from the others by carriage return.
The tab delimited file subsetting is performed on the basis of the column, fig. \ref{fig:fignew47} yellow, sharing the
same header of the list of values, fig. \ref{fig:fignew47}. 

\begin{figure}[htbp]
  \begin{center}
    \includegraphics{fignew47}
    \caption{\label{fig:fignew47} Sub setting a tab delimited file by a list of symbols.}
  \end{center}
\end{figure}

The function \textit{oneChannelGUI: Downloading Gene/Exon library files} allows to download all the library files needed to use
APT tools for probe set summaries for Gene and Exon 1.0 ST arrays.
The function \textit{oneChannelGUI: Set Affymetrix apt tools folder and download Reference Sequences} 
allows the user to define a folder where apt tools were installed and to download in the subdir blast of the apt dir
the reference  sequences from NCBI repository. Those file are compressed and have the gz extension. 
They can be unpacked manually by the user or via oneChannelGUI. This option is available at the end of
the downloading but takes quite a long time.
 
The function \textit{oneChannelGUI: deleteLocalData} will reset the folders defined by \textit{oneChannelGUI: Downloading Gene/Exon library files}
and \textit{oneChannelGUI: Downloading Gene/Exon library files}. Data present in the two folders will not be deleted! 
The function \textit{oneChannelGUI: buildingLocalAnnotation} allows to update the internal oneChannelGUI gene-level annotations
quiering netaffx database using the affyCompatible library. Annotation files are saved in .rda format in the subdir data in
located in the oneChannelGUI folder. Windows users need to drag those .rda files in the Rdata.zip file present in the data dir.
A file called netaffxUpdates.txt in the etc subdir keept tracks of annotaiton file updating.  


\section{Help}
This menu allows to acces to the vignettes of the Bioconductor packages implemented in oneChannelGUI and to this oneChannelGUI vignette.



\section{Exon analysis and data mining}
Once exon data are loaded the filtering menu appear slighly different, fig. \ref{fig:fignew50}. 

This menu allows a certain number of functions to identify and visualize alternative splicing events.
User should remember that exon arrays are a relatively new technology and very little is still known
on their analysis. Furthermore, benchmark experiments to test the efficacy of statistical methods for 
alternative splicing detection are not available, yet.
Therefore, this module will be subjected to various upgrading and improvement during the year.
The part related to loading gene/exon level data is described in the File menu chapter.
If APT tools are used to calculate probe set intensities in oneChannelGUI will be available
gene level expression data in Normalized Affy Data, exon level expression data in Normalized Exon data
and, if selected, DABG p-values.
The functions actually available for exon analysis are summarised in fig. \ref{fig:fignew49}. 

\begin{figure}[htbp]
  \begin{center}
    \includegraphics{fignew49}
    \caption{\label{fig:fignew49} Exon menu.}
  \end{center}
\end{figure}

Splice Index (SI), which represents the exon expression normalized with respect to the transcript expression, 
can be calculated with \textit{oneChannelGUI: Calculating splice index}. For a two group experiment 
the function \textit{oneChannelGUI: Calculating MiDAS p-value (APT)} uses APT tools to calculate MiDAS p-values
for the difference between SIs in the two conditions, i.e. alternative splicing events. 
It is possible to subset gene/exon level data on the basis of a MiDAS p-value threshold using the function 
\textit{oneChannelGUI: Selecting alternative splicing events by MiDAS p-values}.
In the devel version of oneChannelGUI we have also applied the rank product method (RankProd package)
\textit{oneChannelGUI: Rank Product alternative splicing detection (devel)} to detect
significant differences between SI or exon-level log2(intensities) in two experimental conditions, i.e. alternative splicing events.
Rank Product is a non-parametric statistic that detects items that are consistently
highly ranked in a number of lists. It is based on the
assumption that under the null hypothesis that the order of all items is random
the probability of finding a specific item among
the top $r$ of $n$ items in a list is $p=\frac{r}{n}$. Multiplying these probabilities leads to
the definition of the rank product $RP=\prod_{i}\frac{r_i}{n_i}$, where $r_i$ is the rank
of the item in the $i$-th list and $n_i$ is the total number of items in the $i$-th list.
The smaller the $RP$ value, the smaller the probability that the observed placement
of the item at the top of the lists is due to chance. 
Due to performance reasons on windows based computers, the number of random permutations is fixed to 100, 
a menu to select the number of permutations will be implemented soon.
At the end of the analysis p-values of class 1 < class2 and p-values of class 1 > class2 and average SI 
difference histograms are shown in the main R window.  
IMPORTANT All filtering functions devoted to the selection of alternative spliced events now 
produce a flat file containing only the detected spliced exons. 
This files are used for exon-level probe set annotation with the function \textit{oneChannelGUI: Mapping exon level probe sets to Reference Sequences}
It is possible to  subset gene/exon level data on the basis of rank product results using the function
\textit{oneChannelGUI: Selecting alternative splicing events by RankProd p-values (devel)}.
Since, benchmark experiments to test the efficacy of alternative splicing events are not yet available,
we cannot indicate how effectives are, the methods actually implemented in oneChannelGUI, for the detection of alternative splicing
events.
It is also possible to filter data on the basis of the average mean or min SI difference with the function 
\textit{oneChannelGUI: Filtering gene/exon data by absolute SI mean or min difference}, and to inspect a sub set of putative
alternative splicing events, fig. \ref{fig:fignew71}, 

\begin{figure}[htbp]
  \begin{center}
    \includegraphics{fignew71}
    \caption{\label{fig:fignew71} Example of the output of the putative alternative splicing inspection.
    The output is made of a tab delimited file where glevel probe sets associated to elevel probe sets and of a pdf file
    where each page is made of a plot of MiDAS/RP p-values with respect to exon index (black dot MiDAS, red triangle and green triangle RP). 
    The horizontal black dashed line indicates a p-value of 0.05. The vertical yellow dashed line indicates a condition in which 
    both MiDAS and RP p-values are below 0.05 value. 
    In the second plot, p-values of RP are plotted versus MiDAS p-values. Those p-values that below 0.05 both in RP and MiDAS
    will appear in the upper right rectangle. In the third/fourth plot, it is shown the behaviour of splice indexes or exon-level log2(intensity) 
    with respect to exon indexes.
    The vertical yellow dashed lines indicate those exon-level log2(intensity)/SI associated to MiDAS and RP p-values below 0.05 value. 
    }
  \end{center}
\end{figure}


with the function \textit{oneChannelGUI: Inspecting splice indexes}, fig. \ref{fig:fignew49}.
\textit{oneChannelGUI: Inspecting splice indexes} also produces a txt file as output containing 
gene level probeset id and exon level probeset ids for all spliced exon in the following format:
\begin{Schunk}
\begin{Sinput}
"glevel id|exon level ids "
"3899173|3899229"
"3210737|3358127"
"3358112|3234972"
"2587961|3267416"
"3644510|3357399|3357446"
"3415109|2759224"
"3611625|3308001|3308013|3308031"
"3234760|3308001|3308013|3308031"
"3267382|3308001|3308013|3308031"
"3357397|2455983|2455993|2456013"
"2759205|2455983|2455993|2456013"
"3307939|3733603|3733609"
\end{Sinput}
\end{Schunk}
These data are also stored in the affylmGUIenvironment and they are used by  
\textit{oneChannelGUI: Mapping exon level probe sets to Reference Sequences} to
identify the isoform specific alternative splcing events.
The function flowchart is shown in fig.  \ref{fig:fignew78}

\begin{figure}[htbp]
  \begin{center}
    \includegraphics{fignew78}
    \caption{\label{fig:fignew71} Mapping exon-level prob selection regions to reference sequences.
    exon-level probesets associated to putative alternative splicing events (ASEs) are used to query
    RRE database \url{https://www6.unito.it/RRE/EN/} to retrieve the probe selection regions (PSRs) 
    associated to them. BLASTN is used to map PSRs on reference sequence, which need to be downloaded
    using the function  \textit{oneChannelGUI: Set Affymetrix apt tools folder and download Reference Sequences}
    available in the general tools menu. Each reference sequence associated to a specific exon-level probeset is
    used to retrieve from org.XX.eg.db the entrez gene id (EG) of the gene linked to that specific reference sequence.
    The EG is then used to retrieve all the reference sequences associated to it. BLASTN is used again to map
    exon-level PSR on each of the reference sequences associated to an EG to evaluate id the detected splicing event
    is associated to an isoform specific splicing or it maps on a conserved exon.
    }
  \end{center}
\end{figure}

The final output of \textit{oneChannelGUI: Mapping exon level probe sets to Reference Sequences} is
a txt file with the following format:
\begin{Schunk}
\begin{Sinput}
eprobesetid= 2384145 isoform specific= NM_052843 EG= 84033 RefSeqs= NM_001098623 NM_052843 NP_001092093 NP_443075
eprobesetid= 2414420 isoform specific= NM_052843 EG= 84033 RefSeqs= NM_001098623 NM_052843 NP_001092093 NP_443075
NA
eprobesetid= 2455993 isoform specific= NM_052843 EG= 84033 RefSeqs= NM_001098623 NM_052843 NP_001092093 NP_443075
NA
eprobesetid= 2469879 isoform specific= NM_052843 EG= 84033 RefSeqs= NM_001098623 NM_052843 NP_001092093 NP_443075
eprobesetid= 2471251 isoform specific= NM_052843 EG= 84033 RefSeqs= NM_001098623 NM_052843 NP_001092093 NP_443075
eprobesetid= 2471297 isoform specific= NM_052843 EG= 84033 RefSeqs= NM_001098623 NM_052843 NP_001092093 NP_443075
\end{Sinput}
\end{Schunk}
where NA indicates that the a PSR could not be perfectly mapped a reference sequence. 


The function \textit{oneChannelGUI: Mapping exon level probe sets to exon} 
uses the output file of \textit{oneChannelGUI: Mapping exon level probe sets to Reference Sequences} 
and associate exon-level Probe Selection Region, PSR, to exon. The output is a fasta file where exonic
probeset sequence is followed by the exon on which the PSR is mapping. The identification of the exon associated to the 
PSR is made using the countPattern function from the Biostrings package. The mapping is done allowing a maximun number of 3 mismatches.

It is also possible to filter exon data integrating midas p-values with RP p-values and average mean SI difference.
This option is given by the function \textit{oneChannelGUI: Selecting alternative splicing by RP/MiDAS p-values/average mean SI difference (devel)}
Sin no correction for statistical type I error is given for MiDAS we decided to use the integration of two statistical tests based on different
approaches to reduce statistical type I errors. Furthermore, in this filter is integrated also the possibility to subset data on the basis of a
average mean SI difference threshold.
Visualization of the splicing events using one gene-level probe set at a time is possible with the function  
\textit{oneChannelGUI: Inspecting splice indexes of one glevel probe set}.
The output,  fig. \ref{fig:fignew71}, represents a plot of -log2(MiDAS p-values) plotted versus the exon-level probe set index, i.e. 1 for exon 1, 2 for exon 2 etc.. A plot of the -log2(p-value) consistency
between MiDAS and RP, i.e. consistency is given by the fact that both methods give a p-value < 0.05. A plot where SI or exon-level log2(intensity) are plotted versus the 
exon-level probe set index. The consistent splicing events are indicated by a yellow bar. 

\section{Example of Exon array analysis}
Data described in this example are produced using the data human set: \url{http://www.bioinformatica.unito.it/downloads/exon.zip.}
As first step the function \textit{New} calculates gene/exon level probe set summaries.
In this example the 3 liver and 3 heart .CEL files of the Affymetrix tissue human set were used.
To detect alternative splicing a certain number of prefiltering steps are needed, fig. \ref{fig:fignew60}.
\begin{figure}[htbp]
  \begin{center}
    \includegraphics{fignew60}
    \caption{\label{fig:fignew60} prefiltering steps for alternative splicing analysis.}
  \end{center}
\end{figure}

 
In this example gene level probe sets are calculated with iterplier and exon level probe sets with plier, as 
suggested by Affymetrix.
The density plot of the gene and exon level data can be evaluated with the function \textit{Gene/Exon Intensity Histogram}
To see both plots at the same time, as in fig. \ref{fig:fignew55}, you need to type in the main R window:
\begin{Schunk}
\begin{Sinput}
> par(mfrow=c(1,2))
\end{Sinput}
\end{Schunk}
and subsequently apply the \textit{Gene/Exon Intensity Histogram} command.

\begin{figure}[htbp]
  \begin{center}
    \includegraphics{fignew55}
    \caption{\label{fig:fignew55} Iterplier gene level and plier exon level probe set distributions.}
  \end{center}
\end{figure}
A first cleanup is part of the QC checks, since samples outliers can be detected by the function
 \textit{Gene/Exon PCA/HCL}, fig. \ref{fig:fignew56}, which gives an idea on the 
homogeneity of the experimental replicates and the level of separation between the experimental groups.

\begin{figure}[htbp]
  \begin{center}
    \includegraphics{fignew56}
    \caption{\label{fig:fignew56} Gene and exon level PCA and hierarchical clustering.}
  \end{center}
\end{figure}

If Plier/iterPLier was used for expression summaries we strongly suggest to use the function 
\textit{Setting to 0 log2 intensity below 1, to be used with plier only} that brings the negative 
log2 values, i.e. values near 0 as intensities, to 0.  


An other cleanup is made removing low intensity probe sets using the function \textit{Filtering on DABG p-values}.
The filtering threshold is a user decision. In this example we
remove all probe sets that show a low intensity expression in at least half of the experiments, fig. \ref{fig:fignew57}.

\begin{figure}[htbp]
  \begin{center}
    \includegraphics{fignew57}
    \caption{\label{fig:fignew57} DABG filter half of the data set should have a DABG p-value different from 0.05.}
  \end{center}
\end{figure}

An other clean up step is the removal of probe sets characterized by cross-hybridization. This is actually done 
using the crosshyb and xhyb annotations available in the Affymetrix annotation files. This filter is only available
for the core exon subset, fig. \ref{fig:fignew62}. A description of this filter options is present in this 
vignette in the filtering section.

\begin{figure}[htbp]
  \begin{center}
    \includegraphics{fignew62}
    \caption{\label{fig:fignew62} Cross bybridizing probe set removal for similar/mixed crosshyb type.}
  \end{center}
\end{figure}

The resulting data set is made of 14630 gene level probe sets and 211038 exon level probe sets, these values can be 
visualized using the function \textit{Info about the loaded data set} available in the file menu and in the filtering menu.

In our opinion it is better to keep separated the detection of gene-level differential expression
with respect to exon level alternative splicing detection.
Therefore, we use a prefiltering step to remove, before exon splicing analysis starts, all the transcripts which might
be differentially expressed at gene-level and get back to them in a separate analysis.
This is done removing probe sets characterized, at gene level, by a broad variation over samples that could be due 
to gene level differential expression.
This filter is a reverse implementation of the IQR filter, function \textit{Filtering by reverse IQR (for alternative splicing analysis only!)}, 
fig  \ref{fig:fignew58}. 

\begin{figure}[htbp]
  \begin{center}
    \includegraphics{fignew58}
    \caption{\label{fig:fignew58} Reverse IQR mask.}
  \end{center}
\end{figure}
In this example, the probe sets that are characterized by strong changes are removed using a reverse filter at IQR 0.8, 
fig  \ref{fig:fignew59}. 

\begin{figure}[htbp]
  \begin{center}
    \includegraphics{fignew59}
    \caption{\label{fig:fignew59} Reverse IQR filtering 0.8.}
  \end{center}
\end{figure}

From this filtering we reduce the gene-level probe set to 7086 probe sets and at exon level to 95091.
After these filters data are ready for the calculation of putative alternative splicing p-value calculation 
by mean of MiDAS or RankProd.

MiDAS p-values can be calculated for the data available after the preprocessing
using the function \textit{Calculating MiDAS p-value (APT)} available in the \textit{Exon menu}.
The histogram of the p-values distribution is shown in the R window, fig. \ref{fig:fignew63}. 

\begin{figure}[htbp]
  \begin{center}
    \includegraphics{fignew63}
    \caption{\label{fig:fignew63} MiDAS p-values distribution. It is clear, from the p-values distribution, that type I error correction
    methods like as BH and BY cannot be applied due to the lack of a uniform distribution of p-values in the non-
    significant range.}
  \end{center}
\end{figure} 
Subsequeltly also the p-value calculated with the rank product can be produced using the function
\textit{Rank Product alternative splicing detection}, this function is actually only available in the 
devel version of oneChannelGUI.
The histograms results are shown in the main R window, fig. \ref{fig:fignew64}. 
To perform RP p-values calculation is necessary
to calculate the Splice Indexes using the function \textit{oneChannelGUI: Calculating splice index}.

\begin{figure}[htbp]
  \begin{center}
    \includegraphics{fignew64}
    \caption{\label{fig:fignew64} RankProd histograms of the p-values and average SI differences}
  \end{center}
\end{figure} 

The reason to use two different statistical approaches is due to the actuall lack of benchmark experiments
allowing to evaluate alternative splice index method performances. In principle if an alternative splicing event
is sufficiently robust it should be indentified independently by different methods and different intensity
summary methods used to calculate splice index. Furthermore, the intersection of results coming from two different
methods will reduce the number of type I errors.
In this example we have used a weak filtering threshold for both p-values, i.e. 0.05, and we have applied it
using the functions \textit{Selecting alternative splicing events by MiDAS p-values} and 
\textit{Selecting alternative splicing events by RankProd p-values (devel)}.
The results can be exported using the function \textit{Exporting Gene exprs and/or Exon/SI/MiDAS/RP data}.
The intersection between the two methods at gene and exon level is shown in fig.  \ref{fig:fignew65}. 

\begin{figure}[htbp]
  \begin{center}
    \includegraphics{fignew65}
    \caption{\label{fig:fignew65} Venn diagrams of the intersection between MiDAS p-values and RP p-values. Clearly RP is less stringent
    that MiDAS.}
  \end{center}
\end{figure} 

At this point it is necessary to sub set the probe sets on the basis of a SI mean difference that user considers to be significant.
The range of SI mean difference values can be seen by the output of the RP p-values results and filtering on gene/exon expression data
can be done using the function \textit{Filtering by mean of absolute SI mean difference}. 
In this example we use a threshold of 2 for absolute SI mean difference. This filter yields a  total of 1743 glevel probe sets and
34033 elevel probe sets.

An important issue at this point of the analysis is to rank the alternative splicing events in order to start studing those more
related to the biogical event under study.
Our suggestion is to approach the problem at two levels:
\begin{enumerate}
  \item Search for GO enriched terms within the set of putative alternatively spliced genes.
  \item Integrate alternative splicing analysis with information that can be depicted by conventional
        differential expression analysis at gene level.
\end{enumerate}



\subsection{Search for GO enriched terms within the set of putative alternatively spliced genes.}
The reason to search of enriched GO terms is due to the hypothesis that  
alternative splicing events are linked to the biological problem under investigation as happen for
differential expression analysis.
To do that we can use the function \textit{Identifying enriched GO terms}.
In this case we load the data after DABG, cross hyb and revIQR filters and we search for the presence of
enriched GO terms in the sub set of putative alternative splcing events.
In our example searching for enriched GO terms using the sub set of putative alternative splicing event we
could identify a sub set of enriched GO terms, fig. fig.  \ref{fig:fignew69}. 
At this point, it will be necessary to identify the most interesting GO terms on the basis of the user
biological knowledge, extract those putative alternative splicing events using the function 
\textit{Extracting Affy IDs linked to an enriched GO term}  and inspect their splicing using the
function \textit{Inspecting splice indexes}.

\begin{figure}[htbp]
  \begin{center}
    \includegraphics{fignew69}
    \caption{\label{fig:fignew69} Sub set of enriched GO BP terms linked to putative alternative splicing events
                                  observed between heart and liver human tissues.}
  \end{center}
\end{figure} 


 

\subsection{Integrate alternative splicing analysis with information that can be depicted by conventional differential expression analysis at gene level.}
We save the data set at this point of the filtering exon level analysis, with \textit{Save As} function, 
we export gene and exon level data with \textit{Exporting Gene exprs and/or Exon/SI/MiDAS/RP data} function available in the filtering menu, 
and we get back to a conventional glevel differential expression analysis.

The analysis procedure is described in fig. \ref{fig:fignew66}. 

\begin{figure}[htbp]
  \begin{center}
    \includegraphics{fignew66}
    \caption{\label{fig:fignew66} Gene level preprocessing.}
  \end{center}
\end{figure}  

Therefore, after DABG filter and cross hybridizing probe set removal we will apply a IQR filter to remove non-variant probe sets at gene level.
Clearly as in the case on the exon analysis any filter applied to the gene level data will produce the corresponding sub set of 
exon level data. 

The results of the IQR filter are shown in fig. \ref{fig:fignew67}. 

\begin{figure}[htbp]
  \begin{center}
    \includegraphics{fignew67}
    \caption{\label{fig:fignew67} IQR filter at 0.25.}
  \end{center}
\end{figure}

At this point, since our exercise is based on two groups, it is possible to apply linear model analysis
based on limma or a permutation based approach as SAM or rank product.
To be conservative we use limma for two groups analysis.
Checking the raw p-value distribution, function \textit{Raw p-value distribution plot}, 
obtained for heart versus liver differential expression 
we can confirm that the BH or BY type I error correction can be applied since the 
distribution in the non significant range is uniform.
The differential expression selection is shown in fig. \ref{fig:fignew68}. 

\begin{figure}[htbp]
  \begin{center}
    \includegraphics{fignew68}
    \caption{\label{fig:fignew68} BY correction, absolute log2(fc)>1 and adjusted p-value < 0.01}
  \end{center}
\end{figure}

For this sub set we search for enriched GO terms and we save the list of enriched classed.
Since the putative alternative splicing events we have identified previously are too many to be 
experimentally investigated we in this case we will integrate GO terms found enriched within
alternative splicing events, see previous sub section, and those found within the differential expression set.
The reason of such intersection is due to the assumption that both alternative splcing events as well as
differential expression should be linked to the biological problem under investigation.
Using the two list of enriched GO terms and the Venn diagram function available in the modelling stats menu
we can detect those GO enriched terms in common between the two analysis, fig. \ref{fig:fignew70}. 

\begin{figure}[htbp]
  \begin{center}
    \includegraphics{fignew70}
    \caption{\label{fig:fignew70} Intersection between the GO terms found enriched in exon level alternative splicing 
                                  analysis and gene level differential expression analysis.}
  \end{center}
\end{figure} 
From this analysis we identified 1 GO term:
\begin{Schunk}
\begin{Sinput}
GO:0007155
\end{Sinput}
\end{Schunk}

At this point, it will be necessary to extract those putative alternative splicing events using the function 
\textit{Extracting Affy IDs linked to an enriched GO term}  and to inspect their splicing using the
function \textit{Inspecting splice indexes}.

\end{document}

