\documentclass[11pt]{article}
\usepackage[margin=1in]{geometry}              
\geometry{letterpaper}  
\usepackage[parfill]{parskip}    
\usepackage{graphicx}
\usepackage{subfig}
\usepackage{amssymb}
\usepackage{epstopdf}
\usepackage{amsmath}
\usepackage{dsfont}
\usepackage{tikz}
\usetikzlibrary{positioning,shapes.geometric}
\usepackage{mathrsfs}
\usepackage{hyperref}
\usepackage{enumerate}
\DeclareGraphicsRule{.tif}{png}{.png}{`convert #1 `dirname #1`/`basename #1 .tif`.png}
\usepackage{color}
\usepackage{placeins}
% \VignetteIndexEntry{Vignette for R package asmn}
% \VignetteDepends{asmn}
% \VignetteKeyword{normalization}
\usepackage{setspace}
\usepackage{cite}
\usepackage{authblk}
\usepackage{Sweave}

\title{Vignette for {\tt R} package {\tt asmn}}       
\author[1]{Anna Decker}
\author[2]{Paul Yousefi}
\affil[1]{University of California, Berkeley, Division of Biostatistics} 
\affil[2]{University of California, Berkeley, Department of Environmental Health Sciences}
\date{\today}

\begin{document}
\maketitle

\section{Introduction}
The {\tt asmn} package performs the all-sample mean normalization procedure for Illumina BeadArray 450k methylation data. This package does not contain a complete pipeline for normalizing raw data, but the functions do take data in the {\tt MethyLumiSet} format for integration with existing pipelines for analysis of methylation data. The functions can also take raw experimental and control data as well as feature information from BeadStudio, which can be read in as {\tt data.frames}. 

The {\tt asmn} package is loaded by 
<<label="load", echo = TRUE>>=
library(asmn)
@
To access the help files, type {\tt help(package = asmn)} in the {\tt R} console. 

The example data come from the {\tt TCGAMethylation450k} package, and are loaded as a {\tt MethyLumiSet} object. The procedure to load the data is from a vignette for the {\tt methylumi} package. 

<<label = "data", echo = TRUE>>=
library("methylumi")
library("TCGAMethylation450k")
idatPath <- system.file('extdata/idat',package='TCGAMethylation450k')
mset450k <- methylumIDAT(getBarcodes(path=idatPath), idatPath=idatPath)
sampleNames(mset450k) <- paste0('TCGA', seq_along(sampleNames(mset450k)))
show(mset450k)
@
\section{Normalization factors}

The normalization factors are calculated using the control data for each subject. The default settings of the {\tt norm\_factors()} function uses the mean of all control samples to create the normalization factors. The output from this function is a list of length 2 (one for each color channel), each containing a vector of normalization factors equal in length to the number of subjects. 

One of either {\tt controldata} or {\tt methylumidata} must be supplied, but supplying both will produce an error, since {\tt controldata} is the raw control data whereas {\tt methylumidata} is a {\tt MethyLumiSet} object, which may contain control and experimental data. The {\tt subjects} argument is optional. Specifying a range of names or indices of subjects will calculate the normalization factors using only the control data for those subjects as opposed to using all samples. Finally, the {\tt  type} argument must be one of either "raw`` or "methylumi,`` indicating the type of data being supplied. 

<<label=makenormfacs, echo = TRUE>>=
normfactors <- norm_factors(controldata=NULL, 
                            subjects=NULL, 
                            methylumidata=mset450k, 
                            type="methylumi")
str(normfactors)
normfactors
@

\section{Normalization}

The normalization factors can then be used in the {\tt normalize()} function. For data of type {\tt MethyLumiSet}, this function return the data object with the normalized data in the {\tt betas()} slot. 

The {\tt normfactors} argument is the output from {\tt norm\_factors()}. Either one of {\tt rawdata} or {\tt methylumidata} must be supplied, depending on the format of the data set to be normalized (similar to the creation of the normalization factors). 
The {\tt type} argument must be either "raw`` or "methylumi`` indicating the type of data to be normalized. If {\tt type = "raw"}, then the {\tt featuredata} argument must be supplied, containing information on the assay type and color channel for each probe. This information is stored in the {\tt MethyLumiSet} data in the {\tt featureData} slot. 

<<normalize, echo = TRUE>>=
featureData(mset450k)
str(fData(mset450k))
normdata <- normalize_asmn(normfactors = normfactors, 
                      rawdata=NULL, 
                      featuredata=NULL, 
                      methylumidata=mset450k, 
                      type="methylumi")
show(normdata)
@

For raw BeadStudio data, this function returns a {\tt data.frame} of the beta values ordered by CpG site identifier. The methylated and unmethylated sites must be identified by "SignalA`` and "SignalB`` headers according to the BeadStudio documentation. 

\section{Other data types}
Coercing the resulting {\tt MethyLumiSet} object to a different data type after normalization can be achieved using the {\tt as()} function. See the vignette for {\tt methylumi} for other examples. 

For example, coercing the new data set from above into a {\tt MethyLumiM} object:
<<coerce, echo= TRUE>>=
normdataM <- as(normdata, 'MethyLumiM')
show(normdataM)
@

\end{document}