Contents

Introduction

In progress

In this tutorial, we will go through ways to

There are many fun ways to do it, here I am more focused for R developers.

Existing docker repos

Before you create any, make sure you don’t re-invent the wheel and use the best base image for your container as your tool chain may save your lots of time later on.

Rocker Project

Official R Docker images is called “Rocker”" project and is on github, please visit the page to find more details and Dockerfile.

Image Description
rocker/r-base base package to build from
rocker/r-devel base plus R-devel from SVN
rocker/rstudio base plus RStudio Server
rocker/hadleyverse rstudio + Hadley’s packages, LaTeX
rocker/ropensci hadleyverse + rOpenSci packages
rocker/r-devel-san base, SVN’s R-devel and SAN

Bioconductor Images

Bioconductor have a nice page about the official docker images, please read for more details.

Image
bioconductor/devel_base
bioconductor/devel_core
bioconductor/devel_flow
bioconductor/devel_microarray
bioconductor/devel_proteomics
bioconductor/devel_sequencing
bioconductor/release_base
bioconductor/release_core
bioconductor/release_flow
bioconductor/release_microarray
bioconductor/release_proteomics
bioconductor/release_sequencing

To understand the image quickly here is the short instruction for the image name:

  • release images are based on rocker/rstudio
  • devel images are based on rocker/rstudio-daily
  • base: Contains R, RStudio, and BiocInstaller + system dependencies.
  • core: base + a selection of core.
  • flow: core + all packages tagged with the FlowCytometry biocView.
  • microarray: core + all packages tagged with the Microarray biocView.
  • proteomics: core + all packages tagged with the Proteomics biocView.
  • sequencing: core + all packages tagged with the Sequencing biocView.

Dockerhub

Dockerhub also provide public/private repos, you can search existing tools without building yourself, it’s very likely some popular tool already have docker container well maintained there.

Seven Bridges Docker Registry

Tutorial comming soon.

Example Seven Bridges registry:

  • SevenBridges : images.sbgenomics.com/<repository>[:<tag>]
  • Cancer Genomics Cloud: cgc-images.sbgenomics.com/<repository>[:<tag>]

Tutorial: random number generator

Our goal here is to making a cwl app to generate unif random numbers, yes, the core function is runif(), it’s native function in R.

runif
## function (n, min = 0, max = 1) 
## .Call(C_runif, n, min, max)
## <bytecode: 0x000000000c05e160>
## <environment: namespace:stats>
runif(10)
##  [1] 0.845177693 0.747044130 0.003123868 0.336307296 0.384871132
##  [6] 0.729321697 0.731138049 0.378364347 0.079871819 0.753924910

Using docopt package

In R, we also have a nice implementation in a package called docopt, developed by Edwin de Jonge. Check out its tutorial on github.

So let’s quickly create a command line interface for our R scripts with a dummy example. Let’s turn the uniform distribution function runif into a command line tool.

when you check out the help page for runif, here is the key information you want to mark down.

Usage

runif(n, min = 0, max = 1)

Arguments

n   
number of observations. If length(n) > 1, the length is taken to be the number required.

min, max    
lower and upper limits of the distribution. Must be finite.

I will add one more parameter to set seed, here is the R script file called runif.R.

At the beginning of the commadn line script, I use docopt standard to write my tool help.

'usage: runif.R [--n=<int> --min=<float> --max=<float> --seed=<float>]

options:
 --n=<int>        number of observations. If length(n) > 1, the length is taken to be the number required [default: 1].
 --min=<float>   lower limits of the distribution. Must be finite [default: 0].
 --max=<float>   upper limits of the distribution. Must be finite [default: 1].
 --seed=<float>  seed for set.seed() function [default: 1]' -> doc

library(docopt)

Let’s first do some testing in your R session before you make it a full functional command line tool.

docopt(doc) #with no argumetns provided
## $`--n`
## [1] "1"
## 
## $`--min`
## [1] "0"
## 
## $`--max`
## [1] "1"
## 
## $`--seed`
## [1] "1"
## 
## $n
## [1] "1"
## 
## $min
## [1] "0"
## 
## $max
## [1] "1"
## 
## $seed
## [1] "1"
docopt(doc, "--n 10 --min=3 --max=5")
## $`--n`
## [1] "10"
## 
## $`--min`
## [1] "3"
## 
## $`--max`
## [1] "5"
## 
## $`--seed`
## [1] "1"
## 
## $n
## [1] "10"
## 
## $min
## [1] "3"
## 
## $max
## [1] "5"
## 
## $seed
## [1] "1"

Looks like it works, now let’s add main function script for this command line tool.

opts <- docopt(doc)
set.seed(opts$seed)
runif(n = as.integer(opts$n), 
      min = as.numeric(opts$min), 
      max = as.numeric(opts$max))
## [1] 0.2655087

Add Shebang at the top of the file, this is a complete example for runif.R command line will be like this

#!/usr/bin/Rscript
'usage: runif.R [--n=<int> --min=<float> --max=<float> --seed=<float>]

options:
 --n=<int>        number of observations. If length(n) > 1, the length is taken to be the number required [default: 1].
 --min=<float>   lower limits of the distribution. Must be finite [default: 0].
 --max=<float>   upper limits of the distribution. Must be finite [default: 1].
 --seed=<float>  seed for set.seed() function [default: 1]' -> doc

library(docopt)
opts <- docopt(doc)
set.seed(opts$seed)
runif(n = as.integer(opts$n), 
      min = as.numeric(opts$min), 
      max = as.numeric(opts$max))

Let’s test this command line.

$ runif.R --help
Loading required package: methods
usage: runif.R [--n=<int> --min=<float> --max=<float> --seed=<float>]

options:
 --n=<int>        number of observations. If length(n) > 1, the length is taken to be the number required [default: 1].
 --min=<float>   lower limits of the distribution. Must be finite [default: 0].
 --max=<float>   upper limits of the distribution. Must be finite [default: 1].
 --seed=<float>  seed for set.seed() function [default: 1]
$ runif.R
Loading required package: methods
[1] 0.2655087
$ runif.R
Loading required package: methods
[1] 0.2655087
$ runif.R --seed=123 --n 10 --min=1 --max=100
Loading required package: methods
 [1] 29.470174 79.042208 41.488715 88.418723 94.106261  5.510093 53.282443
 [8] 89.349485 55.592066 46.204859

For full example you can check my github example

Executable report with R markdown

We cannot really make a Rmarkdown file executable in it by simply put

#!/bin/bash/Rscript

In your markdown

Of course, we can figure out a way to do it in liftr or knitr. But rmarkdown allow you to pass parameters to your Rmardown template, please read this tutorial Parameterized Reports. This doesn’t solve my problem that I want to directly describe command line interface in the markdown template. However, here is alternative method:

Create an command line interface to pass params from docopt into rmarkdown::render() function. In this way, we can pass as many as possible parameters from command line interface into our Rmarkdown template.

So here we go, here is updated methods and it’s also what I use for another tutorial about RNA-seq workflow.

fl <- system.file("docker/sevenbridges/src/", "runif.R", package = "sevenbridges")

Here is the current content of command line interface

#!/usr/local/bin/Rscript
'usage: runif.R [--n=<int> --min=<float> --max=<float> --seed=<float>]

options:
 --n=<int>        number of observations. If length(n) > 1, the length is taken to be the number required [default: 1].
 --min=<float>   lower limits of the distribution. Must be finite [default: 0].
 --max=<float>   upper limits of the distribution. Must be finite [default: 1].
 --seed=<float>  seed for set.seed() function [default: 1]'  -> doc

library(docopt)
opts <- docopt(doc)

## create param list
lst <- list(n = as.integer(opts$n), 
            min = as.numeric(opts$min), 
            max = as.numeric(opts$max),
            seed = as.numeric(opts$seed))

## execute your Rmarkdown with these parameters
rmarkdown::render("/report/report.Rmd", BiocStyle::html_document(toc = TRUE),
                      output_dir = ".", params = lst)

And here is the report template

---
title: "Uniform randome number generator example"
output:
  BiocStyle::html_document:
    toc: true
    number_sections: true
    highlight: haddock
    css: style.css
    includes:
      in_header: logo.md
params:
  seed: 1
  n: 1
  min: 0
  max: 1
---

```{r style, echo = FALSE, results = 'asis'}
BiocStyle::markdown(css.files = "custom.css")
```


## Summary
```{r}
set.seed(params$seed)
r <- runif(n = as.integer(params$n), 
           min = as.numeric(params$min), 
           max = as.numeric(params$max))
summary(r)
hist(r)
```

Setup dockerhub automated build

To make things more reproducible and explicit and automatic, you can do a autohook to automatically build your container/image on docker hub. Here is what I do

  1. I created some project called ‘docker’ on my github and it has all container that crated from a Dockerfile, for example, tengfei/docker/runif, please go here to check it out
  2. This folder root has a Dockerfile and subfolders for extra materials I added at build time, like script or report template.
  3. login your dockerhub account, following this tutorial to make “automated build” from your github account. Make sure you input the right location for your Dockerfile, by customizing it.
  4. Then you will have auto-build every time you push a new update in github.
  5. Start using your docker image like ‘tengfei/runif’
  6. Feel free to push it onto your SevenBridges platform registry as well.

More examples

There are more examples under ‘inst/docker’ folder, you can check out how to describe command line and build docker, how to make report template. You can read the online github code. Or you can read another tutorial about how we wrap RNA-seq workflow from bioconductor.