% NOTE -- ONLY EDIT THE .Rnw FILE!!!  The .tex file is
% likely to be overwritten.
%
% \VignetteIndexEntry{affycoretools, refactored}
% \VignetteDepends{affycoretools, Biobase, ReportingTools, limma,
% knitr, AnnotationDbi, hgu95av2.db}
% \VignetteKeywords{Expression Analysis, Postprocessing}
% \VignettePackage{affycoretools}
%\VignetteEngine{knitr::knitr}

<<include=FALSE>>=
knitr::opts_chunk$set(tidy=FALSE)
@

\documentclass[11pt]{article}

<<style, eval=TRUE, echo=FALSE, results="asis">>=
BiocStyle::latex()
options(bitmapType = "cairo")
@ 



\parindent 0.5in




\begin{document}

\title{\bf Creating annotated output with \Biocpkg{affycoretools} and ReportingTools}

\author{James W. MacDonald\footnote{jmacdon@u.washington.edu}}

\maketitle
\tableofcontents
\newpage



\section{Overview}

The \Biocpkg{affycoretools} package is intended to help people easily create
useful output from various analyses. While affycoretools was
originally intended for those using Affymetrix microarrays, this is no
longer the case. While some functions remain Affy-centric, most are
now much more general, and can be used for any microarray or RNA-Seq
experiment.

\section{Introduction}

This package has evolved from my work as a service core
biostatistician. I routinely analyze very similar experiments, and
wanted to create a way to minimize all the cutting and pasting of code
that I found myself doing. In addition, I wanted to come up with a
good way to make an analysis reproducible, in the sense that I (or
somebody else) could easily re-create the results.

In the past this package relied on the \Biocpkg{annaffy} package, and
was intended to be used in concert with a 'Sweave' document that
contained both the code that was used to analyze the data, as well as
explanatory text that would end up in a pdf (or HTML page) that could
be given to a client. In the intervening period, people have developed
other, better packages such as \CRANpkg{knitr} and
\Biocpkg{ReportingTools} that make it much easier to create the sort
of output I like to present to my clients.

\section{Using affycoretools}

For this section we will be using the \Robject{sample.ExpressionSet}
data set that comes with the \Biocpkg{Biobase} package. Remember that
you can always run this code at home by doing this:

<<eval=FALSE>>=

library(knitr)
purl(system.file("doc/RefactoredAffycoretools.Rnw", package="affycoretools"))

@ 

And then you will have a file called RefactoredAffycoretools.R in your
working directory that you can either \Rfunction{source} or open with
\software{RStudio} or \software{Emacs/ESS}, and run by chunk or line
by line.

We first load and rename the data:

<<start>>=
suppressMessages(library(affycoretools))
data(sample.ExpressionSet)
eset <- sample.ExpressionSet
eset

@ 

<<include=FALSE>>=
featureNames(eset) <- gsub("/", "_", featureNames(eset))
@ 

This \Rclass{ExpressionSet} object is a truncated data set, based on
an Affymetrix HG-U95av2 array. There are 26 samples and 500
probesets. We will use the \Rclass{phenoData} to fit a linear model
using \Biocpkg{limma}. \bioccomment{We will not cover any aspects of fitting a
linear model here; the limma User's Guide covers this topic in
depth. In addition, this analysis isn't meant to be correct in any
sense; we are just doing this to get some data to annotate and output.}

<<model>>=

suppressMessages(library(limma))
pd <- pData(phenoData(eset))
design <- model.matrix(~0+type+sex, pd)
colnames(design) <- gsub("type|sex", "", colnames(design))
contrast <- matrix(c(1,-1,0))
colnames(contrast) <- "Case vs control"
fit <- lmFit(eset, design)
fit2 <- contrasts.fit(fit, contrast)
fit2 <- eBayes(fit)
topTable(fit2, 1)[,1:4]

@ 

At this point we can generate a data.frame, but this data.frame has no
annotation, such as gene names or symbols, etc, that say what each
probeset is measuring. The \Rclass{MArrayLM} object that we are
calling 'fit2', is capable of containing these data, and will append
those data to the \Rfunction{topTable} output.

<<annotate>>=

suppressMessages(library(hgu95av2.db))
gns <- select(hgu95av2.db, featureNames(eset), c("ENTREZID","SYMBOL","GENENAME"))
## There are one-to many mappings here, so we just
## removed duplicates in a very naive way.
gns <- gns[!duplicated(gns[,1]),]
fit2$genes <- gns
topTable(fit2, 1)[,1:3]

@ 

After adding the annotation data to the \Robject{MArrayLM} object, the
\Rfunction{topTable} output now contains the appropriate annotation
data for each probeset. At this point we can output an HTML table that
contains these data.

<<output>>=

suppressMessages(library(ReportingTools))
htab <- HTMLReport("afile", "My cool results")
publish(topTable(fit2, 1), htab)
finish(htab)

@ 

And now we have a HTML table called 'afile.html' in
our working directory, that contains the data for our top 10
genes. This table is not particularly interesting, and the
\Biocpkg{ReportingTools} package already has functionality to just do
something like 

<<rtools>>=

htab <- HTMLReport("afile2", "My cool results, ReportingTools style")
publish(fit2, htab, eset, factor = pd$type, coef = 1, n = 10)
finish(htab)

@ 

and it will automatically generate an annotated table, with some extra
plots that show the different groups, and we didn't even have to use
\Rfunction{topTable} directly. However, the default plots in the HTML
table are a combination of dotplot and boxplot, which I find weird
(see afile2.html if you are running this code yourself). Since
\Biocpkg{ReportingTools} is easily extensible, we can make changes
that are more pleasing.

<<output2>>=

d.f <- topTable(fit2, 2)
out <- makeImages(df = d.f, eset = eset, grp.factor = pd$type, design = design,
                  contrast = contrast, colind = 1, repdir = ".")
htab <- HTMLReport("afile3", "My cool results, affycoretools style")
publish(out$df, htab, .mofifyDF = list(entrezLinks, affyLinks))
finish(htab)

@ 

Note that there are two differences in the way we did things. First,
we create a \Robject{data.frame}, and then decorate it with the
plots using the \Rfunction{makeImages} function. This will by default
create dotplots (or you can specify boxplots). For the plots to fit in
an HTML table, there are no axis labels. However, each plot is also a
link, and if you click on it, a larger plot with axis labels will be
presented. See 'afile3.html', if you are running this code yourself.

All the little files that get created can get pretty messy, so the
default is to put everything into a 'reports' subdirectory, so your
working directory stays clean. For this example we over-ride the
defaults so we do not have to go searching in subdirectories for our
tables.

An alternative parameterization that probably makes more sense is to
fit coefficients for each sex/treatment combination.

<<param2>>=

grps <- factor(apply(pd[,1:2], 1, paste, collapse = "_"))
design <- model.matrix(~0+grps)
colnames(design) <- gsub("grps", "", colnames(design))
contrast <- matrix(c(1,-1,0,0,
                     0,0,1,-1,
                     1,-1,-1,1),
                   ncol = 3)
colnames(contrast) <- c("Female_Case vs Female_Control",
                        "Male_Case vs Male_Control",
                        "Interaction")
fit <- lmFit(eset, design)
fit2 <- contrasts.fit(fit, contrast)
fit2 <- eBayes(fit2)
fit2$genes <- gns


@ 

With this parameterization we can look at intra-sex differences, as
well as the interaction (looking for sex-specific changes). This now
means that we have a total of three HTML tables to output, which makes
things a bit more complex to present. Luckily, this is pretty simple
to accomplish. For this step we will now use the default 'reports'
subdirectory to keep everything straight. In addition, we will trim
down the output a bit.

<<output3>>=

## get a list containing the output for each comparison
out <- lapply(1:3, function(x) topTable(fit2, x))
## process the output to add images
htab <- lapply(1:3, function(x){
    tmp <- HTMLReport(gsub("_", " ", colnames(contrast)[x]), colnames(contrast)[x], "./reports")
    tmp2 <- makeImages(out[[x]], eset, grps, design, contrast, x)
    publish(tmp2$df, tmp, .modifyDF = list(affyLinks, entrezLinks))
    finish(tmp)
    return(tmp)
})

## Now make an index.html file to contain links to the various comps
idx <- HTMLReport("index", "Links to our stuff")
publish(hwriter::hwrite("Univariate comparisons", br = TRUE, header = 2), idx)
publish(Link(htab), idx)
finish(idx)


@ 

Now there will be an index.html file in the current
directory that has individual links to each of the three comparisons
we made. This is nice, as you only have to point a client or PI to a
single link that they can use to explore all the results.

We are often asked to create a Venn diagram showing overlap between
groups. This is pretty simple to do, but it would be nicer to have an
HTML version with clickable links, so your PI or end user can see what
genes are in each cell of the Venn diagram. As an example, we can
generate a Venn diagram comparing overlapping genes between the male
and female comparisons.

<<makevenn>>=

collist <- list(1:2)
venn <- makeVenn(fit2, contrast, design, collist = collist, adj.meth = "none")
vennlnk <- vennPage(venn, "venn_diagram", "Venn diagram")


@ 

The \Rfunction{makeVenn} function also returns a \Robject{vennCounts}
object that we can use in our \CRANpkg{knitr} document to generate a
Venn diagram there as well (\ref{fig:venn}).

<<venn, fig.cap="Venn diagram">>=

vennDiagram(venn[[1]]$venncounts, cex = 0.9)

@ 

And we can add a link to our index page quite easily.

<<addvenn>>=

idx <- HTMLReport("index","Links to our stuff")
publish(hwriter::hwrite("Univariate comparisons", br = TRUE, header = 2), idx)
publish(Link(htab), idx)
publish(hwriter::hwrite("Venn diagrams", br = TRUE, header = 2), idx)
publish(Link("Venn1", vennlnk), idx)
finish(idx)

@ 

There is similar functionality for presenting the results of a GO
hypergeometric analysis (\Rfunction{makeGoTable}), and GSEA analysis,
based on the \Rfunction{romer} function in \Biocpkg{limma}
(\Rfunction{runRomer} and \Rfunction{outputRomer}).

\section{Session information}
The version of R and packages loaded when creating this vignette were:
<<sessioninfo, results="asis">>=

toLatex(sessionInfo())

@ 


\end{document}


