parGLM-methods {parglms} | R Documentation |
This package addresses the problem of fitting GLM-like models in a scalable way, recognizing that data may be dispersed, with chunks processed in parallel, to create low-dimensional summaries from which model fits may be constructed.
signature(formula = "formula", store = "Registry")
The model data are assumed to lie in the file.dir/jobs/*
folders, with file.dir
defined in the store
, which is
an instance of Registry
.
Additional arguments must be supplied:
a function that serves as a family for stats::glm
a vector of initial values for regression
parameter estimation, must conform to expectations of formula
an integer giving the maximum number of iterations allowed
a numeric giving the tolerance criterion
Failure to specify these triggers a fatal error.
The Registry instance can be modified to include a list element
'extractor'. This must be a function with arguments store
, and
codei. The standard extraction function is
function(store, i) loadResult(store, i)
It must return a data frame, conformant with the expectations of formula
.
Limited checking is performed.
The predict method computes the linear predictor on data identified by jobid in a BatchJobs registry. Results are returned as output of foreach over the jobids specified in the predict call.
Note that setting option parGLM.showiter to TRUE will provide a message tracing progress of the optimization.
if (require(MASS) & require(BatchJobs)) { # here is the 'sharding' of a small dataset data(anorexia) # N = 72 # in .BatchJobs.R: # best setting for sharding a small dataset on a small machine: # cluster.functions = BatchJobs::makeClusterFunctionsInteractive() myr = makeRegistry("abc", file.dir=tempfile()) chs = chunk(1:nrow(anorexia), n.chunks=18) # 4 recs/chunk f = function(x) {library(MASS); data(anorexia); anorexia[x,]} batchMap(myr, f, chs) submitJobs(myr) # now getResult(myr,1) gives back a data.frame waitForJobs(myr) # simple dispersal # now myr is populated oldopt = options()$parGLM.showiter options(parGLM.showiter=TRUE) pp = parGLM( Postwt ~ Treat + Prewt, myr, family=gaussian, binit = c(0,0,0,0), maxit=10, tol=.001 ) print(summary(theLM <- lm(Postwt~Treat+Prewt, data=anorexia))) print(pp$coefficients - coef(theLM)) if (require(sandwich)) { hc0 <- vcovHC(theLM, type="HC0") print(pp$robust.variance - hc0) } } predict(pp, store=myr, jobids=2:3) options(parGLM.showiter=oldopt)