basiliskStart {basilisk} | R Documentation |
Creates a basilisk process in which Python operations (via reticulate) can be safely performed with the correct versions of Python packages.
basiliskStart(env, fork = getBasiliskFork(), shared = getBasiliskShared()) basiliskStop(proc) basiliskRun( proc = NULL, fun, ..., env, fork = getBasiliskFork(), shared = getBasiliskShared() )
env |
A BasiliskEnvironment object specifying the basilisk environment to use. Alternatively, a string specifying the path to an environment, though this should only be used for testing purposes. Alternatively, |
fork |
Logical scalar indicating whether forking should be performed on non-Windows systems, see |
shared |
Logical scalar indicating whether |
proc |
A process object generated by |
fun |
A function to be executed in the basilisk process. |
... |
Further arguments to be passed to |
These functions ensure that any Python operations in fun
will use the environment specified by envname
.
This avoids version conflicts in the presence of other Python instances or environments loaded by other packages or by the user.
Thus, basilisk clients are not affected by (and if shared=FALSE
, do not affect) the activity of other R packages.
If necessary, objects created in fun
can persist across calls to basiliskRun
, e.g., for file handles.
This requires the use of assign
with envir
set to findPersistentEnv
to persist a variable,
and a corresponding get
to retrieve that object in later calls.
See Examples for more details.
It is good practice to call basiliskStop
once computation is finished.
This will close the basilisk processes and restore certain environment variables to their original state (e.g., "PYTHONPATH"
) so that other non-basilisk operations can operate properly.
Any Python-related operations between basiliskStart
and basiliskStop
should only occur via basiliskRun
.
Calling reticulate functions directly will have unpredictable consequences,
Similarly, it would be unwise to interact with proc
via any function other than the ones listed here.
If proc=NULL
in basiliskRun
, a process will be created and closed automatically.
This may be convenient in functions where persistence is not required.
Note that doing so requires specification of pkgname
and envname
.
basiliskStart
returns a process object, the exact nature of which depends on fork
and shared
.
This object should only be used in basiliskRun
and basiliskStop
.
basiliskRun
returns the output of fun(...)
when executed inside the separate process.
basiliskStop
stops the process in proc
.
If shared=TRUE
and no Python version has already been loaded, basiliskStart
will load Python directly into the R session from the specified environment.
Similarly, if the existing environment is the same as the requested environment, basiliskStart
will use that directly.
This mode is most efficient as it avoids creating any new processes, but the use of a shared Python configuration may prevent non-basilisk packages from working correctly in the same session.
Otherwise, if fork=TRUE
, no Python version has already been loaded and we are not on Windows, basiliskStart
will create a new process by forking.
In the forked process, basiliskStart
will load the specified environment for operations in Python.
This is less efficient as it needs to create a new process but it avoids forcing a Python configuration on other packages in the same R session.
Otherwise, basiliskStart
will create a parallel socket process containing a separate R session.
In the new process, basiliskStart
will load the specified environment for Python operations.
This is the least efficient as it needs to transfer data over sockets but is guaranteed to work.
Developers can control these choices directly by explicitly specifying shared
and fork
,
while users can control them indirectly with setBasiliskFork
and related functions.
If the base conda installation provided with basilisk satisfies the requirements of the client package,
it is strongly recommended to set env=NULL
rather than constructing a separate environment.
This is obviously easier but it is also more efficient as it increases the chance of multiple basilisk clients being able to share a common Python instance within the same R session.
In basiliskRun
, there is no guarantee that fun
has access to the environment in which basiliskRun
is called.
This has a number of consequences for the type of code that can be written inside fun
:
Functions or variables from non-base R packages used inside fun
should be prefixed with the package namespace, or the package itself should be reloaded inside fun
.
Any other variables used inside fun
should be explicitly passed as an argument.
Developers should not rely on closures to capture variables in the calling environment of basiliskRun
.
Relevant global variables should be reset inside fun
.
Developers should not attempt to pass complex objects to memory in or out of fun
.
This mostly refers to objects that contain custom pointers to memory, e.g., file handles, pointers to reticulate objects.
If the specified basilisk environment is not present and env
is a BasiliskEnvironment object,
the environment will be created upon first use of basiliskStart
.
If the base conda installation is not present, it will also be installed upon first use of basiliskStart
.
The motivation for this is to avoid portability problems with hard-coded paths when basilisk is provided as a binary.
By default, both the base conda installation and the environments will be placed in an external user-writable directory.
The location of this directory can be changed by setting the BASILISK_EXTERNAL_DIR
environment variable to the desired path.
This may occasionally be necessary if the file path to the default location is too long for Windows,
or if the default path has spaces that break the Miniconda installer.
Advanced users may consider setting the environment variable BASILISK_USE_SYSTEM_DIR
to 1
when installing basilisk and its client packages from source.
which will place both the base installation and the environments in the R system directory.
This simplifies permission management and avoids duplication in enterprise settings.
Aaron Lun
setupBasiliskEnv
, to set up the conda environments.
getBasiliskFork
and getBasiliskShared
, to control various global options.
# Loading one environment: tmploc <- file.path(tempdir(), "my_package_B") setupBasiliskEnv(tmploc, c('pandas=0.25.1', "python-dateutil=2.8.0", "pytz=2019.3")) cl <- basiliskStart(tmploc) basiliskRun(proc=cl, function() { X <- reticulate::import("pandas"); X$`__version__` }) basiliskStop(cl) # Co-exists with our other environment: tmploc2 <- file.path(tempdir(), "my_package_C") setupBasiliskEnv(tmploc2, c('pandas=0.24.1', "python-dateutil=2.7.1", "pytz=2018.7")) cl2 <- basiliskStart(tmploc2) basiliskRun(proc=cl2, function() { X <- reticulate::import("pandas"); X$`__version__` }) basiliskStop(cl2) # Persistence of variables is possible within a Start/Stop pair. cl <- basiliskStart(tmploc) basiliskRun(proc=cl, function() { assign(x="snake.in.my.shoes", 1, envir=basilisk::findPersistentEnv()) }) basiliskRun(proc=cl, function() { get("snake.in.my.shoes", envir=basilisk::findPersistentEnv()) }) basiliskStop(cl)