%\VignetteIndexEntry{Enabling packages as web services}
%\VignetteKeywords{Web services}
%\VignettePackage{RWebServices}

\documentclass[]{article}

\usepackage[colorlinks,linkcolor=blue,pagecolor=blue,urlcolor=blue]{hyperref}
\usepackage{graphicx}
\usepackage{Sweave}

\newcommand{\lang}[1]{{\texttt{#1}}}
\newcommand{\pkg}[1]{{\textsf{#1}}}
\newcommand{\code}[1]{\texttt{#1}}
\newcommand{\func}[1]{{\texttt{#1}}}
\newcommand{\method}[1]{{\texttt{#1}}}
\renewcommand{\arg}[1]{{\texttt{#1}}}
\newcommand{\ret}[1]{{\texttt{#1}}}
\newcommand{\obj}[1]{{\texttt{#1}}}
\newcommand{\class}[1]{{\textit{#1}}}

\newcommand{\R}{\textsf{R}}
\newcommand{\Java}{\textsf{Java}}
\newcommand{\caBIG}{\textsf{caBIG}}
\newcommand{\caGrid}{\textsf{caGrid}}
\newcommand{\introduce}{\pkg{introduce}}
\newcommand{\Globus}{\textsf{Globus}}
\newcommand{\activeMQ}{\textsf{activeMQ}}

\newcommand{\RWebServices}{\pkg{RWebServices}}
\newcommand{\TypeInfo}{\pkg{TypeInfo}}
\newcommand{\SJava}{\pkg{SJava}}

\newcommand{\file}[1]{\texttt{#1}}

\newcommand{\STS}{\code{SimultaneousTypeSpecification}}
\newcommand{\ITS}{\code{IndependentTypeSpecification}}
\newcommand{\TypedSignature}{\code{TypedSignature}}
\newcommand{\STT}{\code{StrictIsTypeTest}}
\newcommand{\DTT}{\code{DynamicTypeTest}}
\newcommand{\ITT}{\code{InheritsTypeTest}}

%% \newcommand{\STS}{\func{SimultaneousTypeSpecification}}
%% \newcommand{\ITS}{\func{IndependentTypeSpecification}}
%% \newcommand{\TypedSignature}{\func{TypedSignature}}
%% \newcommand{\STT}{\func{StrictIsTypeTest}}
%% \newcommand{\DTT}{\func{DynamicTypeTest}}
%% \newcommand{\ITT}{\func{InheritsTypeTest}}

\begin{document}

\title{Enabling \R{} packages for web or grid services}
\author{
  Martin T. Morgan\footnote{Fred Hutchinson Cancer Research Center, 
    1100 Fairview Ave.\ N., PO Box 19024 Seattle, WA 98109},
  Nianhua Li, 
  Seth Falcon,\\
  Robert Gentleman,
}
\date{30 November, 2006, 20 March, 2007}
\maketitle

<<echo=FALSE>>=
options(width=69)
@ 

\section{Preliminaries}

\subsection{Prerequisites}

\RWebServices{} and associated software must be installed; see the
accompanying documentation ``Installing and testing RWebServices and
enabled packages''.

You must have a valid \R{} package, including NAMESPACE file. See the
Writing \R{} Extensions manual.

All complex objects to be translated to \Java{} \emph{must} be either
primitive types (e.g., numeric, character) or S4 classes.

\section{Creating \Java{} templates}

\subsection{\TypeInfo}

Add type information to your functions.
\begin{enumerate}
\item Include \TypeInfo{} as a `Depends' line in the DESCRIPTION file.
\item Provide \func{typeInfo} for each method to be exposed. From the
  \pkg{caDNAcopy} package, an example is:
<<typeinfo-caDNAcopy, eval=FALSE, keep.source=TRUE>>=
typeInfo(caDNAcopy) <-
  SimultaneousTypeSpecification(
    TypedSignature(dnacopyAssays= "DNAcopyAssays",
                   dnacopyParameter="DNAcopyParameter"),
    returnType="DerivedDNAcopySegment")
@ 
%% 
Provide this information within the package, in a `.R' file after the
corresponding function (\func{caDNAcopy}) has been defined. See
documentation and vignettes in the \TypeInfo{} package for
detail.
\item Install the package, e.g., 
\begin{verbatim}
R CMD INSTALL --clean <pkg>
\end{verbatim}
where \verb|<pkg>| is the name of your package. This can also be done
from within \R{} using \func{install.packages} or other means.
\end{enumerate}

\subsection{Unpack ant scripts}

Unpack ant scripts with the \R{} \func{unpackAntScript} command, or at
the command line with
\begin{verbatim}
R -e "library(RWebServices); unpackAntScript('~/tmp/<pkg>')"
\end{verbatim}
where \verb|~/tmp/<pkg>| is the path to a temporary directory.

\subsection{Create \Java{} templates}

There are several ways of proceeding. One way is to use
\func{createMap} from within R. A second way is to change to the
directroy where the ant scripts were unpacked, and evaluate
\begin{verbatim}
cd ~/tmp/<pkg>
ant -Dpkg=<pkg> map-package
\end{verbatim}
(\verb|~/tmp/<pkg>| is the directory where the ant scripts were unpacked).
Both methods create a directory hierarchy \verb|src/|, and usually
\verb|test/src|. 

Sometimes additional \Java{} templates maybe required for extra R data types. 
Suppose your function returns  a \class{list} of \class{DerivedDNAcopySegment}.  
Your type information only shows \verb|returnType="list"|, but
you need the \Java{} templates of \class{DerivedDNAcopySegment}. If you use
\func{createMap} within R, use argument \arg{extraClasses}. If you use the
ant scripts, set the property \arg{extra.classes} in 
\verb|~/tmp/<pkg>/RWebServicesTuning.properties| to \arg{DerivedDNAcopySegment}. 
You can also specify multiple R data types as extra classes in a comma 
delimited character string.

\section{Writing and running tests}

\subsection{Writing test code -- data}

The files
\begin{verbatim}
test/src/org/bioconductor/rserviceJms/worker/RWorkerDataTest.java
test/src/org/bioconductor/rserviceJms/worker/R/*.R
test/src/org/bioconductor/rserviceJms/worker/Data/*.data
\end{verbatim}
contain skeletons to help generate \Java{} and \R{} components for testing
data transfer between \R{} and Java. Templates are established for tests
from \Java{} to \R{} for all function arguments, and from \R{} to \Java{} for all
return values. If any extra classes are specified, their tests are
established in both directions. 

The \Java{} code for testing uses the \pkg{JUnit} framework. A typical
method starts with
\begin{verbatim}
 @Ignore("please initialize data")
 @Test
public void TestDNAcopyParameterToR() throws Exception {
    org.bioconductor.packages.caDNAcopy.DNAcopyParameter 
        inputVal = null;
    inputVal = new ...
    String rScript = 
        getClass().getResource("R/DNAcopyParameterData.R").getFile();
    String rVariable = "DNAcopyParameterData";
    assertTrue(myService.mockJava2R(inputVal, rScript, rVariable));
}
\end{verbatim}
The first two lines are directives for \pkg{JUnit}. The test
framework will arrange to pass \obj{inputVal} to \R{}, and use the value
of the variable \obj{rVariable} in \obj{rScript} to assess whether
the data transfer is successful.  The developer needs to customize
\obj{inputVal} and the source file in the \verb|test/src| hierarchy).
Comment \verb|@Ignore| to enable the test.

Serialized data instances can be added to the \verb|Data| directory. 
Brave users can even render serialized \Java{} data instances from \R{} 
data instances. Save \R{} objects into binary files, and put them in
one directory, say \verb|<data_dir>|, and then evaluate: 
\begin{verbatim}
cd ~/tmp/<pkg>
ant create-data -Daction=load -Ddata.dir=<data_dir>
\end{verbatim}
The ant task transfers those \R{} objects into \Java{} objects and saves
them into binary files in the same directory. You can then use the serialized 
\Java{} data in the test. This task requires the \R{} to \Java{} converts
of the \R{} objects. The \R{} to \Java{} converts are not created for
function arguments. So PLEASE make sure your \R{} objects are either 
a function return type or an extra class. An alternative task 
\begin{verbatim}
ant create-data -Daction=data -Ddata.name=<dataset_name> \
        -Ddata.dir=<data_dir>
\end{verbatim}
invokes \R{} function \func{data} with argument \arg{<dataset\_name>}, 
and saves the serialized \Java{} data in \arg{<data\_dir>}.  The 
default \arg{<data\_dir>} for the task \func{create-data} is 
\verb|~/tmp/<pkg>/test/src/org/bioconductor/rservicesJms/worker/Data|. 

The argument \arg{action} in this ant task corresponds to R function
\func{load} and \func{data} respectively. If the \R{} objects is provided
by the package, you can use \arg{action=data} and provide the object
name as argument \arg{data.dir}. The \arg{action=load} is more useful
for loading your own data files or for loading multiple files. 

The argument \arg{data.dir} has different meanings on different \arg{action}
types. When \arg{action} is \verb|load|, \arg{data.dir} is the path for 
both the input \R{} data files and the output \Java{} data files. Both
absolete and relative path will work. But please make sure all the files
in \arg{data.dir} are \R{} data files when you invoke the ant task. 
When \arg{action} is \verb|data|, \arg{data.dir} is the path for the 
output \Java{} data file. The argument \arg{data.name} is only used when
\arg{action} is \verb|data| and it has to be a \R{} object name, 
not a \R{} data file name.

\subsection{Writing test code -- methods}

The file
\begin{verbatim}
test/src/org/bioconductor/rserviceJms/services/<pkg>.java
\end{verbatim}
contains a template for writing test methods. The methods in this
class arrange for input parameters to be provided by the developer,
and for the corresponding \R{} function to be invoked. The developer is
free to implement tests on the return value; the default is to compare
the return value with an expected value provided by the developer. 

\subsection{Running tests}

Tests require (1) a running activemq (2) a `worker' to perform
calculations and (3) the \Java{} program to run the tests. The strategy
(to be refined) is:
\begin{enumerate}
\item Open a terminal window and start activemq
\begin{verbatim}
cd $JMS_HOME
bin/activemq
\end{verbatim}
(alternatives are in the activemq documentation.)
\item Open another teriminal window, compile the test and package
  source code, and start the worker:
\begin{verbatim}
cd ~/tmp/<pkg>
ant precompile start-worker
\end{verbatim}
  Several files should be compiled, and the worker should start. The
  ant task will remain active.
\item Finally, open a third teriminal window and run the test program:
\begin{verbatim}
cd ~/tmp/<pkg>
ant local-test
\end{verbatim}
  The test files will be compiled and and executed. 
\end{enumerate}
As the test program executes, any output directed toward stderr in
\R{} (warnings or errors) will appear in the `worker' window.
Java-based errors (e.g., failed unit tests or explicit print
statements) in the test code are echoed in the local-test console, or
printed in the test output directory, \verb|test/output|.

\section{Creating web services from \Java{} templates}

The \Java{} code you have now is a standard \Java{} application.
Converting it into a web service application allows your functions to
be accessed remotely in a platform and implementation indenpendant
way.  This process is enabled by
\href{http://ws.apache.org/axis/}{Apache Axis} , a java platform for
creating and deploying web services applications. Please make sure
Apache Axis is correctly installed and deployed. If you have no
existing web server, use \href{http://tomcat.apache.org/}{Apache
  Tomcat} as a starting point. Please also specify related properties
in \verb|~/tmp/<pkg>/RWebServicesEnv.properties|

\subsection{Creating web services}
\begin{enumerate}
\item Create WSDL from \Java{} code and \Java{} templates from WSDL 
\begin{verbatim}
cd ~/tmp/<pkg>
ant gen-wsdl
\end{verbatim}
The outputs in \verb|~/tmp/<pkg>| are: 
\begin{verbatim}
wsdl/*.wsdl
wsdl/org/bioconductor/packages/*/*.java
wsdl/org/bioconductor/rservicesJms/services/*/*
\end{verbatim}
The file \verb|*.wsdl| is written in WSDL, the
\href{http://www.w3.org/TR/wsdl}{Web Service Description Language}. It
specifies the type information of your functions, and defines all
related data types. It is the agreement between the web service server
and client for service invocations. The file is generated by a tool
called Java2WSDL from Axis by extracting information from your \Java{}
codes. Advanced users can customize the WSDL style via properties
\arg{wsdl.style} and \arg{wsdl.use} in
\verb|~/tmp/<pkg>/RWebServicesTuning.properties|. The default is
\code{Document/literal wrapped}.
\href{http://www-128.ibm.com/developerworks/webservices/library/ws-whichwsdl/}{More
  information} about WSDL style is available.

All other \Java{} files in directory \verb|wsdl| are generated by a
tool called WSDL2Java from Axis by extracting information from the
WSDL file.  \file{wsdl/org/bioconductor/rservicesJms/services/*/*}
contains server binding skeletons, client binding stubs and a template
for test. The stubs and skeletons handle all the low-level details of
the remote method invocation. They allow seemless interactions between
your \Java{} application, Axis and web service clients.
\file{wsdl/org/bioconductor/packages/*/*.java} are \Java{}
implementations for the data type definitions in WSDL.

\item Creating web service server and web service client

The outputs from WSDL2Java need to be connected with your \Java{} codes.
\begin{verbatim}
cd ~/tmp/<pkg>
ant mkserver
ant mkclient
\end{verbatim}
  Two directories are created: \verb|server| and \verb|client|, to hold
all data for the web service server and client respectively. The client
is only for testing pupose. Any users of your web service can create
a client from the WSDL file, by using any tool or any programming language.
\end{enumerate}

The ant tasks gen-wsdl, mkserver and mkclient can also be invoked in one
composite task:
\begin{verbatim}
cd ~/tmp/<pkg>
ant ws
\end{verbatim}

\subsection{Deploying the web service to Axis}

To deploy the service:
\begin{verbatim}
cd ~/tmp/<pkg>
ant deploy-serv
\end{verbatim}
If it fails, check Tomcat log files for error messages. Please also access
your Axis instance from browser, and view the list of deployed web services.
Sometimes the service does not appear on the list even if the above ant
call returns no error information. Try the ant call again. You may also want
to restart Tomcat server after deploying the service. The deployment step 
copies \verb|wsdl/org/bioconductor/rservicesJms/services/*/deploy.wsdd| to the file
\verb|<AXIS_HOME>/WEB-INF/server-config.wsdd|.
  
Always remember to undeploy the service afterwards:
\begin{verbatim}
cd ~/tmp/<pkg>
ant undeploy-serv
\end{verbatim}

\subsection{Testing the web service}

Add test code to
\begin{verbatim}
client/*/src/org/bioconductor/rservicesJms/services/*/*TestCase.java
\end{verbatim}
Make sure activemq, the `worker', and Tomcat are all running, and then
perform tests:
\begin{verbatim}
cd ~/tmp/<pkg>
ant web-test
\end{verbatim}
Test output is collated in \verb|client/test_output|.

\section{Adding \Java{} code to \R{} packages for redistribution}

After \R{} methods have been exposed and working tests developed, a
next (and optional) step is to add the \Java{} code to the original
\R{} package. In this way, the combined \R{} and \Java{} code can be
redistributed for others to use or deploy as web services.

The approach is to add \Java{} files to the directory
\verb+<pkg>inst/rservices+. The commands
\begin{verbatim}
ant map-package unpack-package -Dpkg=<pkg>
\end{verbatim}
will then create an \RWebServices{} skeleton as outlined for
\verb+map-package+, and then copy the files in the
\verb+inst/rservices+ folder into their corresponding location in the
skeleton. The typical contents of \verb+inst/rservices+ might be
\Java{} source files and perhaps data instances used for implementing
tests or simple clients.

\section{Alternative deployments: caGrid services}

\RWebServices{} packages can be used as traditional web services, or
integrated into other projects. One example of the latter involves
\href{http://cabig.nci.nih.gov}{\caBIG{}} and
\href{http://www.cagrid.org}{\caGrid{}}. \caBIG{} is an effort by the
US National Cancer Institute to develop standardized software that
uses strongly typed data. \caGrid{} builds on this foundation to offer
analytic and data services in a grid-based computing environment built
on top of the \href{http://www.globus.org}{\Globus{}} toolkit.

Here is how one might proceed to create a \caGrid{} analytic service
based on an \RWebServices{}-enabled package; the assumption is that
\pkg{caSurvey} contains functions with \func{typeInfo} applied.
\pkg{caSurvey} has been built with \code{R CMD build --clean
  caSurvey}. One can then
\begin{verbatim}
tar xzf caSurvey_1.0.tar.gz
R CMD INSTALL --clean caSurvey
echo "library(RWebServices);unpackAntScript('caSurveyImpl')" | \
    R --vanilla
cd caSurveyImpl
ant map-package -Dpkg=caSurvey
\end{verbatim}
To start the project.  Just as described above, this creates src/ and
test/ directories. the test directories are meant to be populated with
unit tests to ensure that data are being translated between R and Java
correctly (RWorkerDataTest.java) and that the service is invoked
correctly (caSurveyTest.java). The worker tests require
\RWebServices{}, \SJava{}, and \pkg{caSurvey} to work correctly; the
service tests also require \activeMQ{} and a worker to be working
correectly. The tests constructed and run as described above.

You can go on to create and deploy a web service (ant ws deploy-serv),
but for the workflow we want the next step is to use \caGrid{} and the
\introduce{} tool to create a grid service. We will forward grid
service requests to the \pkg{caSurvey} application created by
\RWebServices{}' map-package.

Creating a \caGrid{} analytic service is document in this
\href{http://gforge.nci.nih.gov/plugins/scmcvs/cvsweb.php/archvcdebpsig/analytical_services/building_analytical_services_bp.doc?cvsroot=archvcdebpsig}{best
  practices} document.  Think of application produced by
\code{map-package} as a `silver level' application (chapter 4), with
the goal being to reach `gold level' (chapter 5). The basic steps
involved are
\begin{enumerate}
\item Create xsd from the \Java{} data beans produced by
  \RWebServices{}.
\item Create a \caGrid{} /  \introduce{} `project' based on the xsd and services to
  be exposed;
\item Add relevant components from the \RWebServices{} project
  to the \caGrid{} / \introduce{} project.
\item Translate grid service requests to requests handled by the
  \RWebServices{} project.
\end{enumerate}
The first two steps are necessary when brining any \Java{} project to
\caGrid{}, and are described in the \caGrid{} best practices document.

Components of the \RWebServices{} project need to be added to
the \file{lib} directory of the \caGrid{} project. These are:
\begin{enumerate}
\item A jar file of compiled classes, e.g.,
\begin{verbatim}
ant precompile
jar -cf caSurvey.jar -C bin .
\end{verbatim}
\item \file{rservices.jar} from \RWebServices{}, and
  \file{activemq-core-4.02.jar} and \file{geronimo-jms} from
  \activeMQ{}.
\end{enumerate}

The best practices document suggests that \caGrid{} services use
\file{<service>Impl} to wrap the underlying business logic. For us,
this means
\begin{enumerate}
\item Import data packaages and the service provider, e.g.,
\begin{verbatim}
import org.bioconductor.packages.caSurvey.*;
import org.bioconductor.rserviceJms.services.caSuvery.caSurvey;
\end{verbatim}
\item Create a persistent service when the grid service is initialized, e.g., 
\begin{verbatim}
public class CaSurveyImpl extends CaSurveyImplBase {
    private caSurvey caService = null;
    public CaSurveyImpl() throws RemoteException {
        super();
        // Start our service; the service has a lifetime
        // equal to that of this instance.
        try {
            // logs/catalina.out
            System.out.println("Starting caSurvey");
            caService = new caSurvey();
        } catch (Exception ex) {
            throw new RemoteException(ex.getMessage());
        }
        System.out.println("Start caSurvey successful");
    }
    ...
\end{verbatim}
\item Forward service requests. The \code{<survey>Impl} class contains
  methods. Each method represents a grid service. We map each to a
  \code{caSurvey} service, perhaps using \code{get} methods to access
  the grid data types. Generally:
\begin{verbatim}
    ...
    public <caGrid type> <caGrid service>(<caGrid types>) {
        // map input types, i.e., create <caSurvey type> 
        // from <caGrid type>
        <caSurvey type> var = 
            new <caSurvey type>(<caGrid type>);

        // invoke service
        <caSurvey result> = null;
        try {
            <caSurvey result> = 
                caService.<caSurvey method>(<caSurvey types>);
        } catch (RemoteException ex) {
            // maybe log?
            throw (ex);
        }

        // map from <caSurvey result> to <caGrid result>
        return(<caGrid result>)
    }
    ...
\end{verbatim}
\end{enumerate}

\section{More information}

The vignette ``Installing and testing RWebServices and enabled
packages'' provides guidance on package and software installation.

Additional vignettes contain thoughts and `lessons learned' from this
project, and are not essential reading.


\end{document}

