\documentclass[12pt]{article}
\usepackage{fullpage}


\usepackage[pdftex,
bookmarks,
bookmarksopen,
pdfauthor={David Clayton},
pdftitle={snpMatrix-differences Vignette}]
{hyperref}

\title{Differences between snpStats and snpMatrix}
\author{David Clayton}
\date{\today}

\usepackage{Sweave}
\SweaveOpts{echo=TRUE, pdf=TRUE, eps=FALSE}

\begin{document}
\setkeys{Gin}{width=1.0\textwidth}

%\VignetteIndexEntry{snpMatrix-differences}
%\VignettePackage{snpStats}

\maketitle

\section*{The {\tt snpMatrix} and {\tt snpStats} packages}
The package ``{\tt snpMatrix}'' was written to provide data classes
and methods to
facilitate the analysis of whole genome association studies in R.
In the data classes it implements,
each genotype call is stored as a single byte and, at this density,
data for single chromosomes derived from large studies and new
high-throughput gene chip platforms can be handled in memory by modern
PCs and workstations.
The object--oriented programming model introduced with version 4 of
the S-plus package, usually termed ``S4 methods'' was used to
implement these classes.

{\tt snpStats} initially arose out of the need to store, and analyse, SNP 
genotype data in which subjects cannot be assigned to the three
possible genotypes with certainty. This necessitated a change in the
way in which data are stored internally, although {\tt snpStats} can
still handle conventionally called 
genotype data stored in the original {\tt snpMatrix} storage
mode. {\tt snpStats} currently lacks some facilities which were
present in {\tt snpMatrix} (although, hopefully, the important  gaps will 
soon be filled) but it also includes several  new
facilities. This vignette simply describes differences for
users converting from the old {\tt snpMatrix} package.

\section*{Classes}
Function names have, for the most part, remained unchanged so that
existing analysis scripts will continue to work with minimal modification. 
Initially it was hoped also to maintain the old class names since the
classes were (mostly) backwards-compatible. But this proved
troublesome and, in versions 1.1.4 and later,
the class names have been changed (see
Table). 
\begin{table}[h]
  \centering
  \begin{tabular}{ll}
    \hline
    {\tt snpMatrix} class & {\tt snpStats} class\\
    \hline
    {\tt snp.matrix} & {\tt SnpMatrix}\\
    {\tt X.snp.matrix} & {\tt XSnpMatrix}\\
    {\tt single.snp.tests}& {\tt SingleSnpTests}\\
    {\tt single.snp.tests.score}& {\tt SingleSnpTestsScore}\\
    {\tt snp.tests.glm} & {\tt GlmTests}\\
    {\tt snp.tests.glm.score} & {\tt GlmTestsScore}\\
    {\tt snp.estimates.glm} & {\tt GlmEstimates}\\
    {\tt imputation.rules}&{\tt ImputationRules}\\
    \hline
  \end{tabular}
  \caption{Changes in class names}
\end{table}
Two functions have been provided to help users
convert objects of a {\tt snpMatrix} class to the corresponding {\tt
  snpStats} class:
\begin{itemize}
\item {\tt convert.snpMatrix}:  Converts a {\tt snpMatrix} object to
  the corresponding {\tt snpStats} class
\item {\tt convert.snpMatrix.dir}: Converts all saved {\tt snpMatrix}
  objects in a given directory
\end{itemize}
\section*{Differences}

A major difference is that the basic class, now {\tt SnpMatrix}, 
supports uncertain genotypes, as generated by imputation programs. Two
classes have been removed, namely the {\tt snp} and {\tt X.snp}
classes. These were originally devised to support a loss of dimension
of a {\tt snp.matrix} or {\tt X.snp.matrix} due to selection of a
single row or column with {\tt drop=TRUE} in force in the selection
operator {\tt[]}. However these classes were never fully satisfactory
and were seldom used. In {\tt snpStats} the {\tt drop=} option is no
longer allowed during row and column selection; dimensions are never dropped. 
A word or warning, however: in the event that {\tt drop=}
does occur in the selection operator, this will force the object to be
regarded as a simple matrix of type {\tt raw}; this is the class
that {\tt SnpMatrix} extends and this class does allow {\tt drop=}. 

There has been a cosmetic, but important, change in the {\tt
  XSnpMatrix} class as compared with its forerunner. The {\tt Female}
slot has been renamed as {\tt diploid} to emphasize that this class is
not only used for SNPs on the X chromosome, but for any SNP genotypes
which may be haploid; this includes SNPs on the Y chromosome and
mitocondrial SNPs. 

The functions for computing pairwise linkage disequilibrium statistics
have been replaced by a rewritten single function, {\tt ld}. The
large band matrix which this function generates in one usage is stored
using the {\tt dsCMatrix} class defined in the {\tt Matrix} package,
(which is now required).

The function {\tt read.pedfile} has been rewritten, this time entirely
in R. It has different arguments from the function of the same name in
{\tt snpMatrix} and may be somewhat slower, but is somewhat more
flexible. 

The {\tt ImputationRules} class has changed  as a result
of the introduction of the new storage convention for uncertain genotypes. 
In the new coding,
uncertainty of calls is represented by (grouped) posterior
probabilities of assignment to the three genotypes. This change was
necessary because one of the imputation methods of in {\tt snpMatrix} 
only produced a posterior expectation of the genotype (when coded 0, 1 or 2) 
and this could not be accomodated unambiguously in the extended coding. 

The {\tt GlmTests} and {\tt GlmTestsScore} classes (formerly {\tt
  snp.tests.glm} and {\tt snp.tests.glm.score}) have changed
slightly in order to accomodate ongoing work on methods for
multinomial and multivariate phenotypes. The {\tt test.names} slot has 
been renamed as {\tt
  snp.names} and its function has been changed slightly (although this
should only affect more complicated uses of {\tt snp.rhs.tests}). A new
slot, {\tt var.names} has been added; this holds the name of the
variable(s) tested against SNPs.
\end{document}

