%\VignetteIndexEntry{TypeInfo R News}
%\VignetteKeywords{type specification argument return}
%\VignettePackage{TypeInfo}

\documentclass[a4paper]{report}
\usepackage{Rnews}
%% \usepackage[round]{natbib}
%% \bibliographystyle{abbrvnat}

\usepackage{graphicx}

\newcommand{\TypeInfo}{\pkg{TypeInfo}}
\newcommand{\STS}{\code{SimultaneousTypeSpecification}}
\newcommand{\ITS}{\code{IndependentTypeSpecification}}
\newcommand{\TypedSignature}{\code{TypedSignature}}
\newcommand{\STT}{\code{StrictIsTypeTest}}
\newcommand{\DTT}{\code{DynamicTypeTest}}
\newcommand{\ITT}{\code{InheritsTypeTest}}
\newcommand{\oneWayAnova}{\command{oneWayAnova}}

\begin{document}
\begin{article}

<<echo=FALSE>>=
options(width=45)
@ 

\title{Type specification for your functions}
\author{by MT Morgan, Seth Falcon, Robert Gentleman, and Duncan Temple Lang}

\maketitle

You've written some amazing \R{} functions.  How can others, even
non-\R{} users, benefit from your hard work?  Maybe you can make it
easy for other programmers to learn about the arguments and return
values of your function?  Perhaps a web-based form or dialog box, like
those provided by \pkg{widgetInvoke}, would help users choose
appropriate arguments?  These objectives are easier to obtain when
\R{} functions provide information about themselves.

The \TypeInfo{} package annotates functions with information about
argument and return types.  \TypeInfo{} automatically checks that your
function is called with appropriate arguments.  You can then focus on
writing the code in the body of your function, rather than checking
values supplied by users.  Other \R{} programmers can ask functions
about their argument and return types.  This `reflection' opens the
door to creative possibilities, for instance automatically creating a
work flow (perhaps a graphical `wizard'?) chaining function calls
together into a complicated overall analysis.

This article illustrate how to use \TypeInfo{} to specify argument and
return types.  We start with straight-forward ways of applying type
information.  Then we illustrate the flexibility of \TypeInfo{} for
apply complicated type checks, including types satisfying arbitrary
\R{} expressions.  The article concludes with a brief look behind the
scenes to expose limitations of \TypeInfo{}, and to highlight
opportunities for using typed functions in advanced aspects of your
own work.

\section{The basics: applying \command{typeInfo}}

Suppose your colleagues clamor for a function to perform one-way
analysis of variance on data where the predictor is a \code{factor}.
Easily done in \R{} with \code{lm}, but the \R{} formula notation
might be more than needed for our simple function.  To help our
colleagues, we simplify the interface to refer to \code{response} and
\code{predictor} variables.  Many users expect an ANOVA table as
output, so we return the result of \command{anova} rather than
\command{lm}.  Here is our function:

<<oneWayAnova>>=
oneWayAnova <- function( response, predictor ) {
    expr <- substitute( response ~ predictor )
    result <- lm( as.formula( expr ))
    anova( result )
}
copyOfOneWayAnova <- oneWayAnova
@ 
% 
We make a copy of the function definition to conveniently re-apply
different type information, as will become apparent below.  To test
our function, we use data from the help page for \command{lm}:

<<analysis1>>=
ctl <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14)
trt <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69)
group <- gl(2,10,20, labels=c("Ctl","Trt"))
weight <- c(ctl, trt)
oneWayAnova( weight, group )
@ 

\subsection{Applying type checks} 

We want to make sure our users enter the right kinds of arguments.  To
do this, we add type information to make sure that \code{response} is
\code{numeric}, and \code{predictor} a \code{factor}.  We start by
loading the \TypeInfo{} library\ldots

<<>>=
library(TypeInfo)
@ 
% 
and, after defining \code{oneWayAnova},  add type information:
% 
<<keep.source=TRUE,eval=TRUE,results=hide>>=
typeInfo(oneWayAnova) <-
  SimultaneousTypeSpecification(
    TypedSignature(
      response = "numeric",
      predictor = "factor"),
    returnType = "anova")
@ 
% 

The command \STS{} means that we want to impose a set of conditions
that apply simultaneously to all arguments (and, optionally, the
return type).  \TypedSignature{} corresponds to one set of conditions.
The structure of \TypedSignature{} is a list of argument names and
their types.  Specifying \code{returnType} advertises that our
function returns an object of type \code{anova}, and checks that it
really does return this type.  The reason for placing
\code{returnType} after \TypedSignature{} is clarified below.

Using a typed function is exactly the same as using an untyped function:
% 
<<typedFunc>>=
oneWayAnova( weight, group )
@ 

\subsection{Finding out about type information}

Once applied, functions can be queried with \command{typeInfo}.
<<introspection, eval=FALSE>>=
typeInfo(oneWayAnova)
@ 
\begin{figure*}[htbp]
<<typeInfoOutupt>>=
typeInfo(oneWayAnova)
ngroup <- as.numeric( group )
res <- 
    tryCatch(oneWayAnova( weight, ngroup ),
             error=function(err) {
                 cat("Error:", 
                     conditionMessage(err), "\n")
             })
@ 
  \caption{Finding out about type information, and the informative
  consequences of supplying incorrect arguments.}
  \label{fig:TypeInfoOutput}
\end{figure*}

The output, in Figure~\ref{fig:TypeInfoOutput}, illustrates how type
information is stored.  \command{typeInfo} returns an object of class
\STS{}.  The object contains a list of objects of class
\code{TypedSignature}, and a \code{returnType} slot.  Each
\TypedSignature{} is a list with entries for each element with type
specification.

The information returned from \command{typeInfo} provides useful
information as-is.  It can also be parsed by computer code to provide
information useful in creation of graphical widgets or other
interfaces.

\subsection{Incorrect arguments}

When the user supplies incorrect data, e.g., representing the
\code{predictor} as \code{numeric} rather than \code{factor} 
% 
<<sgroup, eval=FALSE>>=
ngroup <- as.numeric( group )
res <- try( oneWayAnova( weight, ngroup ))
@ 
% 
\TypeInfo{} intervenes with the error show in
Figure~\ref{fig:TypeInfoOutput}.  Providing \TypeInfo{} is helpful, as
our user might otherwise have performed a linear regression rather
than fixed-effects ANOVA.

\subsection{Elaborating on type signatures}

We might decide that, for our purposes, the \code{predictor} can be
either \code{factor} or \code{numeric}.  We change the \STS{} to
include another \TypedSignature{} (re-applying \command{typeInfo} does
\emph{not} automatically overwrite previous type specifications, so we
must use \command{typeInfo} on a fresh version of \code{oneWayAnova}):
% 
<<keep.source=TRUE>>=
oneWayAnova <- copyOfOneWayAnova
typeInfo(oneWayAnova) <-
  SimultaneousTypeSpecification(
    TypedSignature(
      response = "numeric",
      predictor = "factor"),
    TypedSignature(
      response = "numeric",
      predictor = "numeric"),
    returnType = "anova")
@ 

This starts to show the flexibility of \TypeInfo{}.  \STS{} allows for
more than one \TypedSignature{}.  At least one of the
\TypedSignature{}s must be correct for the function to be evaluated.
Conceptually, \STS{} performs a logical OR operation across the
\TypedSignature{}s.  On the other hand, each \TypedSignature{}
specifies conditions that must all apply.  \TypedSignature{} performs
a logical AND on its elements.  In the example here, regardless of
argument type, the function returns an object of class \code{anova}.

\section{Flexible \TypeInfo{}}

The presentation so far emphasizes the sort of basic type
specification that is likely to be most useful when making \R{}
functions available to other programming languages.  \TypeInfo{}
offers a range of methods for validating type that can be very useful
for \R{} programmers, but that employ concepts not readily translated
into other languages.  A sampling of these are presented here, along
with additional detail about the application of type specification .

\subsection{\ITT{}}

Notice in the example above that arguments are labeled with character
strings of type names.  A type specification of class \code{character}
corresponds to an \ITT{}, as indicated explicitly for the
\code{returnType} specification.  An \ITT{} requires that the object
belong to, or extends, the specified class.  For instance, the values
passed to the function in the \code{response} variable must return
\code{TRUE} from the test \code{is(response, "numeric")}.  Because of
this, \command{oneWayAnova} works with \code{response} as either
\code{numeric} or \code{integer}.
% 
<<integerVals>>=
iweight <- as.integer( weight )
oneWayAnova( iweight, group )
@ 

\subsection{\STT{} and \DTT{}}

What other ways does \TypeInfo{} offer to specify type?  \STT{}
requires an exact match between the class of an object and the
specified class(es).  To specifying a strict match for \code{response}
and \code{returnValue}, but an inherited match for \code{predictor},
write
% 
<<strictIs, keep.source=TRUE,eval=FALSE>>=
oneWayAnova <- copyOfOneWayAnova
typeInfo(oneWayAnova) <-
  SimultaneousTypeSpecification(
    TypedSignature(
      response = StrictIsTypeTest("numeric"),
      predictor = InheritsTypeTest("factor")),
    returnType = StrictIsTypeTest("anova"))
oneWayAnova(iweight, group) # ERROR
@ 
% 
Both \STT and \ITT{} accept a vector of class names.

The \DTT{} allows evaluation of arbitrary expressions during type
checking.  For instance, the ANOVA anticipates that the length of
\code{predictor} is the same as the length of \code{response}:
% 
<<dynamicTypeTest, keep.source=TRUE, eval=FALSE>>=
typeInfo(oneWayAnova) <-
  SimultaneousTypeSpecification(
    TypedSignature(
      response = "numeric",
      predictor = quote(
        length(predictor) == 
            length(response) &&
        is(predictor, "factor"))),
    returnType = StrictIsTypeTest("anova"))
short <- weight[-1]
oneWayAnova(short, group)   # ERROR
@ 
% 
Note that the expression in \DTT{} has access to argument names, and
uses \command{quote} to protect premature evaluation.  \DTT{} can also
be used in the return statement.

\subsection{Return types}

As written here, the \code{returnType} applies to all
\TypedSignature's.  That is, the function always returns an
\code{anova} object, regardless of argument type.  Actually, each
\TypedSignature{} can have its own \code{returnType}, allowing for a
return type that depends on argument type.

\subsection{\ITS{}}

We saw how several \TypedSignature{} statements allow different types
for \code{predictor}.  \ITS{} provides another mechanism to specifying
alternative types:
% 
<<IndependentTypeSpecification, keep.source=TRUE>>=
oneWayAnova <- copyOfOneWayAnova
typeInfo(oneWayAnova) <-
  IndependentTypeSpecification(
    response = "numeric",
    predictor = c( "factor", "numeric" ))
@ 

\ITS{} expects a list of argument names, each with a character vector
of possible types.  While \STS{} performs a logical OR across each
\TypedSignature{}, \ITS{} performs logical OR within arguments.

\section[Behind the scenes]{Behind the scenes: \TypeInfo{} limitations
and opportunities} 

Several aspects of \TypeInfo{} provide opportunities for creative
application.  Type tests in \TypeInfo{} form a hierarchy of S4
classes.  This makes it easy to transform \code{typeInfo} output to
structures or text representations that interface with other packages
or programming languages.  The approach is to specify methods that
traverse the \TypeInfo{} hierarchy, transforming \TypeInfo{} objects
into the desired format.  \ITT{} and \STT{} rely on named classes,
including user-defined S4 classes.  Coupled with methods for parsing
S4 objects (e.g., in the \pkg{XML} package), this provides one route
to automatically generating \pkg{widgetInvoke} graphical interfaces or
bindings for web-based services.

\TypeInfo{} is not a universal solution.  Perhaps the biggest
limitation is that \TypeInfo{} does not deal with S4 methods.  The
rationale is that arguments used for method dispatch must already
satisfy type criteria.  However, S4 methods may contain arguments that
are not used for method dispatch, and the type of these arguments
cannot be typed.  S4 method dispatch imposes a test equivalent only to
\STS{} in conjunction with \code{InheritsTypeTest}, rather than
allowing the flexibility of \TypeInfo{}.  In addition, the
\code{valueClass} argument of \command{setMethod} allows specification
but not checking of the return type.  A useful extension would enable
\command{typeInfo} to query S4 generics, providing a uniform interface
to retrieving type information.

Conceptually, \TypeInfo{} works by inserting type-checking code at the
first line of the function, and after (implicit or explicit)
\code{return} statements.  The extra code requires evaluation, slowing
execution and making \TypeInfo{} inappropriate in situations where
speed is of the essence.  In such situation, present a public
type-checked `wrapper' to help ensures only correct argument types
reach the speed-critical functions.  \TypeInfo{} does not usually have
side-effects, but a poorly written \DTT{} could alter argument or
return values unexpectedly.

\section{Final hints and tips}

We have seen how \TypeInfo{} provides reflection, annotating function
definitions with information to check argument types.  The reflection
provided by \TypeInfo{} has several advantages.  The user is informed
of incorrect arguments, avoiding possibly subtle errors during
function execution.  The programmer is free to focus on the body of
the function, rather than argument type checking.  Other programmers
can determine argument requirements or return values without detailed
inspection of code or documentation.  These and other advantages
suggest application of \TypeInfo{} as a way to enhance the
effectiveness of your R programming.

For many purposes, the combination of \STS{}, \TypedSignature{}, and a
single \code{returnType} is the best way to use \TypeInfo{}.  This
results in type signatures that translate readily into prototype
concepts in other programming languages, making it easier for both R
and non-R users to benefit from the functions you create.  More
elaborate formulations, especially \DTT{}, are unlikely to be useful
outside the R community, and may unnecessarily blur the distinction
between type information, argument validation, and function execution.
On the other hand, clearly specified argument types may aid rigorous
formal testing (e.g., unit and regression tests) of return values
prior to package release.

\TypeInfo{} can be useful for all functions, but is especially
beneficial for functions exported in a package NAMESPACE.  Don't
forget to add \TypeInfo{} to the `Depends' field of your package
DESCRIPTION file!

\end{article}
\end{document}

%% 

\begin{thebibliography}{1}
\expandafter\ifx\csname natexlab\endcsname\relax\def\natexlab#1{#1}\fi
\expandafter\ifx\csname url\endcsname\relax
  \def\url#1{{\tt #1}}\fi

\bibitem[Ihaka and Gentleman(1996)]{R:Ihaka+Gentleman:1996}
R.~Ihaka and R.~Gentleman.
\newblock R: A language for data analysis and graphics.
\newblock {\em Journal of Computational and Graphical Statistics}, 5\penalty0
  (3):\penalty0 299--314, 1996.
\newblock URL \url{http://www.amstat.org/publications/jcgs/}.

\end{thebibliography}
