Master Tutorial: Use R for Cancer Genomics Cloud

Tengfei Yin <tengfei.yin@sevenbridges.com>

2018-10-11

1 Introduction

This tutorials originates from 2016 Cancer Genomics Cloud Hackathon R workshop I prepared, and it’s recommended for beginner to read and run through all examples here yourself in your R IDE like Rstudio. Then try to make your own app.

In this tutorial, you will learn:

  1. API client in R with sevenbridges R package to fully automate analysis
  2. Describe command line interface with R package docopt
  3. Make your own Docker app
  4. Describe a standard RNA-seq Bioconductor workflow in CWL with pre-defined report template
  5. Execute it in the cloud
  6. Reporting tool to generate as many R markdown report or Shiny apps report as you want
  7. Deploy directly on shiny server like shinyapps.io from a “report” tool

2 Prerequisites

This tutorial doesn’t require you to be an advanced R user, everything you need is R or even better, a cool IDE like Rstudio (or Emacs+ESS), then just open this R Markdown document in Rstudio. It’s easy to learn!

Suggest learning for all users: Docker.

Now we are ready to go!

2.1 Installation

First download the R Markdown source of this page, so you can load it to your RStudio or your favorite IDE to run through all examples and tweak setup.

This package sevenbridges is available on Bioconductor (release branch, development branch). The latest development version is on GitHub.

To install the latest development version from GitHub, run the following commands in R:

After the installation you can always browse the vignettes

2.2 Register on Cancer Genomics Cloud

cgc-home

You can find login/registration on the Cancer Genomics Cloud homepage https://www.cancergenomicscloud.org. Follow the signup tutorial if you need to access TCGA Controlled Data on the CGC via NIH eRA Commons.

2.3 Authentication

After logged in, you can get your authentication token under your account setting and the “Developer” tab (tutorial).

2.4 Register on shinyapps.io (Optional)

In this tutorial, if you want to try to deploy the Shiny web application automatically on a remote server like shinyapps.io, please visit https://www.shinyapps.io/ to register and login.

Get you token and secret ready to deploy:

shinyapps-token

2.5 Report issues

This package is under active development, will bring many new features as well, at any moment, if you have questions or problem about this R package, please file issues on GitHub.

If you have question regarding the Cancer Genomics Cloud or other Seven Bridges platforms, we have a different channel for each platform, for example, Cancer Genomics Cloud have lots documentation and a forum.

Please, feedback is always welcomed!

3 Quickstart

The final goal is make a workflow that

  1. Input gene feature, design matrix, bam files, and generate differential expression report and output full report, a picture and a count table as example.
  2. Add report tool with two Shiny app template and two R Markdown template to collect files from previous flow and generate new report, even deploy on shinyapps.io automatically after a task is finished.

The final workflow looks like this, it’s composed of two tools: RNA-seq analysis tool and reporting tool.

quickstart-flow

The Shiny app report with ggvis module on the shinyapps.io server looks like this

A ggvis interactive scatter plot

ggvis

A differential expression table

de-table

A full HTML report included, it’s also the output from the first tool, in this way, you can orchestrate many tools output into single report for your task.

html-report

heatmap

Now let’s start building tools.

3.1 Create a project under your account via API R client

I know, we can always do it via graphic user interface, but let’s have fun with the sevenbridges packages you just installed.

For complete API tutorial and reference manual, please read another tutorial.

vignette("api", package = "sevenbridges")

Now let’s do some simple steps, first thing to do is to create an Auth object, almost everything started from this object. Our API client follow a style like this Auth$properties$action. On the platform, Auth is your account, and it contains projects, billing groups, users, project contains tasks, apps, files etc, so it’s easy to imagine your API call.

To create an Auth object, simply pass the token and platform name (or alternatively, API base URL) to the Auth function. The default platform is set to CGC. Good news you can use the sevenbridges package to access any Seven Bridges platform with API (v2).

This is the main way to create an Auth object, just replace your_token with your own authentication token from the CGC:

Alternatively, you can save your authentication cross different platforms in a user configuration file named credentials under the directory $HOME/.sevenbridges/. This allows you to manage multiple user profiles registered on multiple Seven Bridges environments. An example user configuration file looks like this:

[aws-us-tengfei]
api_endpoint = https://api.sbgenomics.com/v2
auth_token = token_for_this_user

# This is a comment:
# another user on platform aws-us
[aws-us-yintengfei]
api_endpoint = https://api.sbgenomics.com/v2
auth_token = token_for_this_user

[cgc]
api_endpoint = https://cgc-api.sbgenomics.com/v2
auth_token = token_for_this_user

[gcp]
api_endpoint = https://gcp-api.sbgenomics.com/v2
auth_token = token_for_this_user

When you have this user configuration file ready at this default location, all you need to do is setting from = "file" and choose the profile_name to use. For example:

The third way to save authentication information is by setting system environment variables. To set the two environment variables (SB_API_ENDPOINT and SB_AUTH_TOKEN) in your system, you could use the function sbg_set_env(). For example:

To create an Auth object using credentials in the environment variables:

To create a new project, you need to know your billing group id, cost related to this project will be charged from this billing group, now play with your free credit.

Now let’s create a new project called “hackathon”, save it to an object named p for convenient usage for any call related to this project.

Now check it on CGC, you will see a fresh new project is created.