This tutorial assume you have basic knowledge about docker concept.
Note: Right now we are supporting CWL draft 2 with SBG extension, but we will support CWL V1.0 soon.
In our terminology, a workflow is composed of one or more tool, both of them are just app to users. You can imagine some raw input data go through a pipeline with many nodes that each step perform a function on the data in the flow, and in the end, you got want you want: a fully processed data or result (plot, report, action)
Here are some key ideas
Looks like full of jargons and hard to understand. Here is an example. You have a csv table, full of missing value and you want to process it in 3 step
You can describe each step into a single module or tool then connect them one by one to form a flow. You can put everything into one single “tool”, then downside is that other user cannot use your step1 for missing value problem. So it’s both art and sciense to leverage between flexibility and efficiency.
Why we are using CWL? Imagine a single file represeting a tool or workflow, could be executed anywhere in a reproducible manner and you don’t have to install anything because docker container is imaged, that’s going to change the world of computational scientific research and how we do research and publish results. In this package we are trying to hide CWL details as much as possible, so user can just use it like a typical R function.
Tool
is the basic unit, and also your “lego brick” you usually start with. As developer you also want to provide those “lego” piecies to users to directly run it or make their own flow with it.
The main interface provided by sevenbridges
package is Tool
function, it’s much more straight forward to describe than composing your raw CWL json file from scratch. A “Tool” object in R could be exported into JSON or imported from a CWL JSON file.
I highly recommend user go over documentation The Tool Editor chapter for cancer genomic cloud to understand how it works, and even try it on the platform with the GUI. This will help understand our R interface better.
Sometimes people share Tool in pure JSON text format. You can simply load it into R by using convert_app
function, this will recognize your JSON file class (Tool or Workflow) automatically.
library(sevenbridges)
t1 = system.file("extdata/app", "tool_star.json", package = "sevenbridges")
## convert json file into a Tool object
t1 = convert_app(t1)
## try print it yourself
## t1
In this way, you can load it, revise it, use it with API or edit and export it back to JSON file. However, in this tutorial, the most important thing is that you learn how to desribe it directly in R.
We provide couple utitlities to help construct your own CWL tool quickly in R. For all availale utiles please check out help("Tool")
Some utiles you will find it useful when you execute a task, you need to know what is the input type and what is the input id and if it’s required or not, so you can execute the task with parameters it need. Try play with input_matrix
or input_type
as shown below.
## get input type information
head(t1$input_type())
reads readMatesLengthsIn readMapNumber
"File..." "enum" "int"
limitOutSJoneRead limitOutSJcollapsed outReadsUnmapped
"int" "int" "enum"
## get output type information
head(t1$output_type())
aligned_reads transcriptome_aligned_reads
"File" "File"
reads_per_gene log_files
"File" "File..."
splice_junctions chimeric_junctions
"File" "File"
## return a input matrix with more informtion
head(t1$input_matrix())
id label type required
1 #reads Read sequence File... TRUE
95 #sjdbGTFfile Splice junction file File... FALSE
102 #genome Genome files File TRUE
2 #readMatesLengthsIn Reads lengths enum FALSE
3 #readMapNumber Reads to map int FALSE
4 #limitOutSJoneRead Junctions max number int FALSE
prefix
1 <NA>
95 <NA>
102 <NA>
2 --readMatesLengthsIn
3 --readMapNumber
4 --limitOutSJoneRead
fileTypes
1 FASTA, FASTQ, FA, FQ, FASTQ.GZ, FQ.GZ, FASTQ.BZ2, FQ.BZ2
95 GTF, GFF, TXT
102 TAR
2 null
3 null
4 null
## return only a few fields
head(t1$input_matrix(c("id", "type", "required")))
id type required
1 #reads File... TRUE
95 #sjdbGTFfile File... FALSE
102 #genome File TRUE
2 #readMatesLengthsIn enum FALSE
3 #readMapNumber int FALSE
4 #limitOutSJoneRead int FALSE
## return only required
t1$input_matrix(required = TRUE)
id label type required prefix
1 #reads Read sequence File... TRUE <NA>
102 #genome Genome files File TRUE <NA>
fileTypes
1 FASTA, FASTQ, FA, FQ, FASTQ.GZ, FQ.GZ, FASTQ.BZ2, FQ.BZ2
102 TAR
## return a output matrix with more informtion
t1$output_matrix()
id label type fileTypes
1 #aligned_reads Aligned SAM/BAM File SAM, BAM
2 #transcriptome_aligned_reads Transcriptome alignments File BAM
3 #reads_per_gene Reads per gene File TAB
4 #log_files Log files File... OUT
5 #splice_junctions Splice junctions File TAB
6 #chimeric_junctions Chimeric junctions File JUNCTION
7 #unmapped_reads Unmapped reads File... FASTQ
8 #intermediate_genome Intermediate genome files File TAR
9 #chimeric_alignments Chimeric alignments File SAM
## return only a few fields
t1$output_matrix(c("id", "type"))
id type
1 #aligned_reads File
2 #transcriptome_aligned_reads File
3 #reads_per_gene File
4 #log_files File...
5 #splice_junctions File
6 #chimeric_junctions File
7 #unmapped_reads File...
8 #intermediate_genome File
9 #chimeric_alignments File
## get required input id
t1$get_required()
reads genome
"File..." "File"
## set new required input with ID, # or without #
t1$set_required(c("#reads", "winFlankNbins"))
[1] TRUE TRUE
t1$get_required()
reads winFlankNbins genome
"File..." "int" "File"
## turn off requirements for input node #reads
t1$set_required("reads", FALSE)
[1] FALSE
t1$get_required()
winFlankNbins genome
"int" "File"
#' ## get input id
head(t1$input_id())
#STAR #STAR #STAR
"#reads" "#readMatesLengthsIn" "#readMapNumber"
#STAR #STAR #STAR
"#limitOutSJoneRead" "#limitOutSJcollapsed" "#outReadsUnmapped"
#' ## get full input id with Tool name
head(t1$input_id(TRUE))
File... enum
"#STAR.reads" "#STAR.readMatesLengthsIn"
int int
"#STAR.readMapNumber" "#STAR.limitOutSJoneRead"
int enum
"#STAR.limitOutSJcollapsed" "#STAR.outReadsUnmapped"
## get output id
head(t1$output_id())
#STAR #STAR
"#aligned_reads" "#transcriptome_aligned_reads"
#STAR #STAR
"#reads_per_gene" "#log_files"
#STAR #STAR
"#splice_junctions" "#chimeric_junctions"
## get full output id
head(t1$output_id(TRUE))
File File
"#STAR.aligned_reads" "#STAR.transcriptome_aligned_reads"
File File...
"#STAR.reads_per_gene" "#STAR.log_files"
File File
"#STAR.splice_junctions" "#STAR.chimeric_junctions"
## get input and output object
t1$get_input(id = "#winFlankNbins")
type:
- 'null'
- int
label: Flanking regions size
description: =log2(winFlank), where win Flank is the size of the left and right flanking
regions for each window (int>0).
streamable: no
id: '#winFlankNbins'
inputBinding:
position: 0
prefix: --winFlankNbins
separate: yes
sbg:cmdInclude: yes
sbg:category: Windows, Anchors, Binning
sbg:toolDefaultValue: '4'
required: yes
t1$get_input(name = "ins")
[[1]]
type:
- 'null'
- int
label: Max bins between anchors
description: Max number of bins between two anchors that allows aggregation of anchors
into one window (int>0).
streamable: no
id: '#winAnchorDistNbins'
inputBinding:
position: 0
prefix: --winAnchorDistNbins
separate: yes
sbg:cmdInclude: yes
sbg:category: Windows, Anchors, Binning
sbg:toolDefaultValue: '9'
required: no
[[2]]
type:
- 'null'
- int
label: Max insert junctions
description: Maximum number of junction to be inserted to the genome on the fly at
the mapping stage, including those from annotations and those detected in the 1st
step of the 2-pass run.
streamable: no
id: '#limitSjdbInsertNsj'
inputBinding:
position: 0
prefix: --limitSjdbInsertNsj
separate: yes
sbg:cmdInclude: yes
sbg:category: Limits
sbg:toolDefaultValue: '1000000'
required: no
t1$get_output(id = "#aligned_reads")
type:
- 'null'
- File
label: Aligned SAM/BAM
description: Aligned sequence in SAM/BAM format.
streamable: no
id: '#aligned_reads'
outputBinding:
glob:
engine: '#cwl-js-engine'
script: |-
{
if ($job.inputs.outSortingType == 'SortedByCoordinate') {
sort_name = '.sortedByCoord'
}
else {
sort_name = ''
}
if ($job.inputs.outSAMtype == 'BAM') {
sam_name = "*.Aligned".concat( sort_name, '.out.bam')
}
else {
sam_name = "*.Aligned.out.sam"
}
return sam_name
}
class: Expression
sbg:fileTypes: SAM, BAM
t1$get_output(name = "gene")
type:
- 'null'
- File
label: Reads per gene
description: File with number of reads per gene. A read is counted if it overlaps
(1nt or more) one and only one gene.
streamable: no
id: '#reads_per_gene'
outputBinding:
glob: '*ReadsPerGene*'
sbg:fileTypes: TAB
Before we continue, this is how it looks like for full tool description, you don’t always need to describe all those details, following section will walk you through simple examples to full examples like this one.
fl <- system.file("docker/rnaseqGene/rabix", "generator.R", package = "sevenbridges")
cat(readLines(fl), sep = '\n')
library(sevenbridges)
rbx <- Tool(id = "rnaseqGene",
label = "rnaseqgene",
description = "A RNA-seq Differiencial Expression Flow and Report",
hints = requirements(docker(pull = "tengfei/rnaseqgene"), cpu(1), mem(2000)),
baseCommand = "performDE.R",
inputs = list(
input(
id = "bamfiles", label = "bam files",
description = "a list of bam files",
type = "File...", ## or type = ItemArray("File")
prefix = "--bamfiles",
required = TRUE,
itemSeparator = ","
),
input(
id = "design", label = "design matrix",
type = "File",
required = TRUE,
prefix = "--design"
),
input(
id = "gtffile", label = "gene feature files",
type = "File",
stageInput = "copy",
required = TRUE,
prefix = "--gtffile"
),
input(
id = "format", label = "report foramt html or pdf",
type = enum("format", c("pdf", "html")),
prefix = "--format"
)
),
outputs = list(
output(id = "report", label = "report",
description = "A reproducible report created by Rmarkdown",
glob = Expression(engine = "#cwl-js-engine",
script = "x = $job[['inputs']][['format']];
if(x == 'undefined' || x == null){
x = 'html';
};
'rnaseqGene.' + x")),
output(id = "heatmap", label = "heatmap",
description = "A heatmap plot to show the Euclidean distance between samples",
glob = "heatmap.pdf"),
output(id = "count", label = "count",
description = "Reads counts matrix",
glob = "count.csv"),
output(id = "de", label = "Differential expression table",
description = "Differential expression table",
glob = "de.csv")
))
fl <- "inst/docker/rnaseqGene/rabix/rnaseqGene.json"
write(rbx$toJSON(pretty = TRUE), fl)
Now let’s break it down:
Some key arguments used in Tool
function.
cpu
, mem
, docker
, fileDef
; and you can easily construct them via requirements()
constructor. This is how you describe the resources you need to execute the tool, so the system knows what type of instances suit your case best.To specify inputs and outpus, usually your command line interface accept extra arguments as input, for example, file(s), string, enum, int, float, boolean. So to specify that in your tool, you can use input
function, then pass it to the inputs
arguments as a list or single item. You can even construct them as data.frame with less flexibility. input()
require arguments id
and type
. output()
require arguments id
because type
by default is file.
There are some special type: ItemArray and enum. For ItemArray the type could be an array of single type, the most common case is that if your input is a list of files, you can do something like type = ItemArray("File")
or as simple as type = "File..."
to diffenciate from a single file input. When you add “…” suffix, R will know it’s an ItemArray
.
We also provide an enum type, when you specify the enum, please pass the required name and symbols like this type = enum("format", c("pdf", "html"))
then in the UI on the platform you will be poped with drop down when you execute the task.
Now let’s work though from simple case to most flexible case.
If you already have a docker image in mind that provide the functionality you need, you can just use it. The baseCommand
is the command line you want to execute in that container. stdout
specify the output file you want to capture the standard output and collect it on the platform.
In this simple example, I know docker image “rocker/r-base” has a function called runif
I can directly called in command line with Rscript -e
. Then I want the ouput is collected in stdout
and ask the file system to capture the files matches “*.txt“. Please pay attention to this, you tool may produce many intermediate files in current folder, if you don’t tell which output you need, they will all be ignored, so make sure you collect those files via outputs
parameter.
library(sevenbridges)
rbx <- Tool(id = "runif",
label = "runif",
hints = requirements(docker(pull = "rocker/r-base")),
baseCommand = "Rscript -e 'runif(100)'",
stdout = "output.txt",
outputs = output(id = "random", glob = "*.txt"))
rbx
sbg:id: runif
id: '#runif'
inputs: []
outputs:
- type:
- 'null'
- File
label: ''
description: ''
streamable: no
default: ''
id: '#random'
outputBinding:
glob: '*.txt'
requirements: []
hints:
- class: DockerRequirement
dockerPull: rocker/r-base
label: runif
class: CommandLineTool
baseCommand:
- Rscript -e 'runif(100)'
arguments: []
stdout: output.txt
rbx$toJSON()
{"sbg:id":"runif","id":"#runif","inputs":[],"outputs":[{"type":["null","File"],"label":"","description":"","streamable":false,"default":"","id":"#random","outputBinding":{"glob":"*.txt"}}],"requirements":[],"hints":[{"class":"DockerRequirement","dockerPull":"rocker/r-base"}],"label":"runif","class":"CommandLineTool","baseCommand":["Rscript -e 'runif(100)'"],"arguments":[],"stdout":"output.txt"}
By default the tool object shows YAML, but you can simply convert it to JSON and copy it to your seven bridges platform graphic editor by importing JSON.
rbx$toJSON()
{"sbg:id":"runif","id":"#runif","inputs":[],"outputs":[{"type":["null","File"],"label":"","description":"","streamable":false,"default":"","id":"#random","outputBinding":{"glob":"*.txt"}}],"requirements":[],"hints":[{"class":"DockerRequirement","dockerPull":"rocker/r-base"}],"label":"runif","class":"CommandLineTool","baseCommand":["Rscript -e 'runif(100)'"],"arguments":[],"stdout":"output.txt"}
rbx$toJSON(pretty = TRUE)
{
"sbg:id": "runif",
"id": "#runif",
"inputs": [],
"outputs": [
{
"type": ["null", "File"],
"label": "",
"description": "",
"streamable": false,
"default": "",
"id": "#random",
"outputBinding": {
"glob": "*.txt"
}
}
],
"requirements": [],
"hints": [
{
"class": "DockerRequirement",
"dockerPull": "rocker/r-base"
}
],
"label": "runif",
"class": "CommandLineTool",
"baseCommand": [
"Rscript -e 'runif(100)'"
],
"arguments": [],
"stdout": "output.txt"
}
rbx$toYAML()
[1] "sbg:id: runif\nid: '#runif'\ninputs: []\noutputs:\n- type:\n - 'null'\n - File\n label: ''\n description: ''\n streamable: no\n default: ''\n id: '#random'\n outputBinding:\n glob: '*.txt'\nrequirements: []\nhints:\n- class: DockerRequirement\n dockerPull: rocker/r-base\nlabel: runif\nclass: CommandLineTool\nbaseCommand:\n- Rscript -e 'runif(100)'\narguments: []\nstdout: output.txt\n"
Now you make want to run your own R script, but you still don’t want to create new command line and a new docker image. You just want to run your script with new input files in existing container, it’s time to introduce fileDef
. You can either directly write script as string or just import a R file to content
. And provided as requirements
.
## Make a new file
fd <- fileDef(name = "runif.R",
content = "set.seed(1)
runif(100)")
## read via reader
.srcfile <- system.file("docker/sevenbridges/src/runif.R", package = "sevenbridges")
library(readr)
fd <- fileDef(name = "runif.R",
content = read_file(.srcfile))
## add script to your tool
rbx <- Tool(id = "runif",
label = "runif",
hints = requirements(docker(pull = "rocker/r-base")),
requirements = requirements(fd),
baseCommand = "Rscript runif.R",
stdout = "output.txt",
outputs = output(id = "random", glob = "*.txt"))
How about multiple script?
## or simply readLines
.srcfile <- system.file("docker/sevenbridges/src/runif.R", package = "sevenbridges")
library(readr)
fd1 <- fileDef(name = "runif.R",
content = read_file(.srcfile))
fd2 <- fileDef(name = "runif2.R",
content = "set.seed(1)
runif(100)")
rbx <- Tool(id = "runif_twoscript",
label = "runif_twoscript",
hints = requirements(docker(pull = "rocker/r-base")),
requirements = requirements(fd1, fd2),
baseCommand = "Rscript runif.R",
stdout = "output.txt",
outputs = output(id = "random", glob = "*.txt"))
All those examples above, many parameters are hard-coded in your script, you don’t have flexiblity to control how many numbers to generate. Most often, your tools or command line tools expose some inputs arguments to users. You need a better way to describe a command line with input/output.
Now we bring the example to next level, for example, I prepare a docker image called “tengfei/runif” on dockerhub, this container has a exeutable command called “runif.R”, you don’t have to know what’s inside, you only have to know when you run the command line in that container it looks like this
runif.R --n=100 --max=100 --min=1 --seed=123
This command outpus two files directly, so you don’t need standard output to capture random number.
So the goal here is to describe this command and expose all input parameters and collect all two files.
To define input, you can specify
Output is similar, espeicaly when you want to collect file, you can use glob
for pattern matching.
## pass a input list
in.lst <- list(input(id = "number",
description = "number of observations",
type = "integer",
label = "number",
prefix = "--n",
default = 1,
required = TRUE,
cmdInclude = TRUE),
input(id = "min",
description = "lower limits of the distribution",
type = "float",
label = "min",
prefix = "--min",
default = 0),
input(id = "max",
description = "upper limits of the distribution",
type = "float",
label = "max",
prefix = "--max",
default = 1),
input(id = "seed",
description = "seed with set.seed",
type = "float",
label = "seed",
prefix = "--seed",
default = 1))
## the same method for outputs
out.lst <- list(output(id = "random",
type = "file",
label = "output",
description = "random number file",
glob = "*.txt"),
output(id = "report",
type = "file",
label = "report",
glob = "*.html"))
rbx <- Tool(id = "runif",
label = "Random number generator",
hints = requirements(docker(pull = "tengfei/runif")),
baseCommand = "runif.R",
inputs = in.lst, ## or ins.df
outputs = out.lst)
Alternatively you can use data.frame as example for input and output, but it’s less flexible.
in.df <- data.frame(id = c("number", "min", "max", "seed"),
description = c("number of observation",
"lower limits of the distribution",
"upper limits of the distribution",
"seed with set.seed"),
type = c("integer", "float", "float", "float"),
label = c("number" ,"min", "max", "seed"),
prefix = c("--n", "--min", "--max", "--seed"),
default = c(1, 0, 10, 123),
required = c(TRUE, FALSE, FALSE, FALSE))
out.df <- data.frame(id = c("random", "report"),
type = c("file", "file"),
glob = c("*.txt", "*.html"))
rbx <- Tool(id = "runif",
label = "Random number generator",
hints = requirements(docker(pull = "tengfei/runif"),
cpu(1), mem(2000)),
baseCommand = "runif.R",
inputs = in.df, ## or ins.df
outputs = out.df)
Now you must be wondering, I have a docker container with R, but I don’t have any existing command line that I could directly use. Can I provide a script with a formal and quick command line interface to make an App for existing container. The anwser is yes. When you add script to your tool, you can always use some trick to do so, one popular one you may already head of is commandArgs
. More formal one is called “docopt” which I will show you later.
Suppose you have a R script “runif2spin.R” with three arguments using position mapping
My base command will be somethine like
Rscript runif2spin.R 10 30 50
This is how you do in your R script
fl <- system.file("docker/sevenbridges/src", "runif2spin.R", package = "sevenbridges")
cat(readLines(fl), sep = '\n')
#'---
#'title: "Uniform randome number generator example"
#'output:
#' html_document:
#' toc: true
#'number_sections: true
#'highlight: haddock
#'---
#'## summary report
#'
#'This is a randome number generator
#+
args <- commandArgs(TRUE)
r <- runif(n = as.integer(args[1]),
min = as.numeric(args[2]),
max = as.numeric(args[3]))
head(r)
summary(r)
hist(r)
Ignore the comment part, I will introduce spin/stich later.
Then just describe my tool in this way, add your script as you learned in previous sections.
library(readr)
fd <- fileDef(name = "runif.R",
content = read_file(fl))
rbx <- Tool(id = "runif",
label = "runif",
hints = requirements(docker(pull = "rocker/r-base"),
cpu(1), mem(2000)),
requirements = requirements(fd),
baseCommand = "Rscript runif.R",
stdout = "output.txt",
inputs = list(input(id = "number",
type = "integer",
position = 1),
input(id = "min",
type = "float",
position = 2),
input(id = "max",
type = "float",
position = 3)),
outputs = output(id = "random", glob = "output.txt"))
How about named argumentments? I will still recommend use “docopt” package, but for simple way. You want command line looks like this
Rscript runif_args.R --n=10 --min=30 --max=50
Here is how you do in R script.
fl <- system.file("docker/sevenbridges/src", "runif_args.R", package = "sevenbridges")
cat(readLines(fl), sep = '\n')
Warning in readLines(fl): incomplete final line found on '/tmp/RtmpONvHXF/
Rinst62fe424537de/sevenbridges/docker/sevenbridges/src/runif_args.R'
#'---
#'title: "Uniform randome number generator example"
#'output:
#' html_document:
#' toc: true
#'number_sections: true
#'highlight: haddock
#'---
#'## summary report
#'
#'This is a randome number generator
#+
args <- commandArgs(TRUE)
## quick hack to split named arguments
splitArgs <- function(x){
res <- do.call(rbind, lapply(x, function(i){
res <- strsplit(i, "=")[[1]]
nm <- gsub("-+", "",res[1])
c(nm, res[2])
}))
.r <- res[,2]
names(.r) <- res[,1]
.r
}
args <- splitArgs(args)
#+
r <- runif(n = as.integer(args["n"]),
min = as.numeric(args["min"]),
max = as.numeric(args["max"]))
summary(r)
hist(r)
write.csv(r, file = "out.csv")
Then just describe my tool in this way, note, I use separate=FALSE
and add =
to my prefix as a hack.
library(readr)
fd <- fileDef(name = "runif.R",
content = read_file(fl))
rbx <- Tool(id = "runif",
label = "runif",
hints = requirements(docker(pull = "rocker/r-base"),
cpu(1), mem(2000)),
requirements = requirements(fd),
baseCommand = "Rscript runif.R",
stdout = "output.txt",
inputs = list(input(id = "number",
type = "integer",
separate = FALSE,
prefix = "--n="),
input(id = "min",
type = "float",
separate = FALSE,
prefix = "--min="),
input(id = "max",
type = "float",
separate = FALSE,
prefix = "--max=")),
outputs = output(id = "random", glob = "output.txt"))
You can use spin/stich from knitr to generate report directly from a Rscript with special format. For example, let’s use above example
fl <- system.file("docker/sevenbridges/src", "runif_args.R", package = "sevenbridges")
cat(readLines(fl), sep = '\n')
Warning in readLines(fl): incomplete final line found on '/tmp/RtmpONvHXF/
Rinst62fe424537de/sevenbridges/docker/sevenbridges/src/runif_args.R'
#'---
#'title: "Uniform randome number generator example"
#'output:
#' html_document:
#' toc: true
#'number_sections: true
#'highlight: haddock
#'---
#'## summary report
#'
#'This is a randome number generator
#+
args <- commandArgs(TRUE)
## quick hack to split named arguments
splitArgs <- function(x){
res <- do.call(rbind, lapply(x, function(i){
res <- strsplit(i, "=")[[1]]
nm <- gsub("-+", "",res[1])
c(nm, res[2])
}))
.r <- res[,2]
names(.r) <- res[,1]
.r
}
args <- splitArgs(args)
#+
r <- runif(n = as.integer(args["n"]),
min = as.numeric(args["min"]),
max = as.numeric(args["max"]))
summary(r)
hist(r)
write.csv(r, file = "out.csv")
You command is something like this
Rscript -e "rmarkdown::render(knitr::spin('runif_args.R', FALSE))" --args --n=100 --min=30 --max=50
And so I describe my tool like this with docker image rocker/hadleyverse
this contians knitr and rmarkdown package.
library(readr)
fd <- fileDef(name = "runif.R",
content = read_file(fl))
rbx <- Tool(id = "runif",
label = "runif",
hints = requirements(docker(pull = "rocker/hadleyverse"),
cpu(1), mem(2000)),
requirements = requirements(fd),
baseCommand = "Rscript -e \"rmarkdown::render(knitr::spin('runif.R', FALSE))\" --args",
stdout = "output.txt",
inputs = list(input(id = "number",
type = "integer",
separate = FALSE,
prefix = "--n="),
input(id = "min",
type = "float",
separate = FALSE,
prefix = "--min="),
input(id = "max",
type = "float",
separate = FALSE,
prefix = "--max=")),
outputs = list(output(id = "stdout", type = "file", glob = "output.txt"),
output(id = "random", type = "file", glob = "*.csv"),
output(id = "report", type = "file", glob = "*.html")))
You will get a report in the end
Sometimes if you want your output files inherit from particular input file, just use inheritMetadataFrom
in your output() call and pass the input file id. If you want to add additional metadata, you could pass metadata
a list in your output() function call. For example, I want my output report inherit all metadata from my “bam_file” input node (which I don’t have in this example though) with two additional metadata fields.
out.lst <- list(output(id = "random",
type = "file",
label = "output",
description = "random number file",
glob = "*.txt"),
output(id = "report",
type = "file",
label = "report",
glob = "*.html",
inheritMetadataFrom = "bam_file",
metadata = list(author = "tengfei",
sample = "random")))
out.lst
[[1]]
type:
- 'null'
- File
label: output
description: random number file
streamable: no
default: ''
id: '#random'
outputBinding:
glob: '*.txt'
[[2]]
type:
- 'null'
- File
label: report
description: ''
streamable: no
default: ''
id: '#report'
outputBinding:
glob: '*.html'
sbg:inheritMetadataFrom: '#bam_file'
sbg:metadata:
author: tengfei
sample: random
fl <- system.file("docker/rnaseqGene/rabix", "generator.R", package = "sevenbridges")
cat(readLines(fl), sep = '\n')
library(sevenbridges)
rbx <- Tool(id = "rnaseqGene",
label = "rnaseqgene",
description = "A RNA-seq Differiencial Expression Flow and Report",
hints = requirements(docker(pull = "tengfei/rnaseqgene"), cpu(1), mem(2000)),
baseCommand = "performDE.R",
inputs = list(
input(
id = "bamfiles", label = "bam files",
description = "a list of bam files",
type = "File...", ## or type = ItemArray("File")
prefix = "--bamfiles",
required = TRUE,
itemSeparator = ","
),
input(
id = "design", label = "design matrix",
type = "File",
required = TRUE,
prefix = "--design"
),
input(
id = "gtffile", label = "gene feature files",
type = "File",
stageInput = "copy",
required = TRUE,
prefix = "--gtffile"
),
input(
id = "format", label = "report foramt html or pdf",
type = enum("format", c("pdf", "html")),
prefix = "--format"
)
),
outputs = list(
output(id = "report", label = "report",
description = "A reproducible report created by Rmarkdown",
glob = Expression(engine = "#cwl-js-engine",
script = "x = $job[['inputs']][['format']];
if(x == 'undefined' || x == null){
x = 'html';
};
'rnaseqGene.' + x")),
output(id = "heatmap", label = "heatmap",
description = "A heatmap plot to show the Euclidean distance between samples",
glob = "heatmap.pdf"),
output(id = "count", label = "count",
description = "Reads counts matrix",
glob = "count.csv"),
output(id = "de", label = "Differential expression table",
description = "Differential expression table",
glob = "de.csv")
))
fl <- "inst/docker/rnaseqGene/rabix/rnaseqGene.json"
write(rbx$toJSON(pretty = TRUE), fl)
Note the stageInput example in the above script, you can set it to “copy” or “link”.
Batch by File
f1 = system.file("extdata/app", "flow_star.json", package = "sevenbridges")
f1 = convert_app(f1)
f1$set_batch("sjdbGTFfile", type = "ITEM")
sbg:validationErrors: []
sbg:sbgMaintained: no
sbg:latestRevision: 2
sbg:toolAuthor: Seven Bridges Genomics
sbg:createdOn: 1463601910
sbg:categories:
- Alignment
- RNA
sbg:contributors:
- tengfei
sbg:project: tengfei/quickstart
sbg:createdBy: tengfei
sbg:toolkitVersion: 2.4.2a
sbg:id: tengfei/quickstart/rna-seq-alignment-star-demo/2
sbg:license: Apache License 2.0
sbg:revision: 2
sbg:modifiedOn: 1463601974
sbg:modifiedBy: tengfei
sbg:revisionsInfo:
- sbg:modifiedBy: tengfei
sbg:modifiedOn: 1463601910
sbg:revision: 0
- sbg:modifiedBy: tengfei
sbg:modifiedOn: 1463601952
sbg:revision: 1
- sbg:modifiedBy: tengfei
sbg:modifiedOn: 1463601974
sbg:revision: 2
sbg:toolkit: STAR
id: '#tengfei/quickstart/rna-seq-alignment-star-demo/2'
inputs:
- type:
- 'null'
- items: File
type: array
label: sjdbGTFfile
streamable: no
id: '#sjdbGTFfile'
sbg:x: 160.4999759
sbg:y: 195.0833106
required: no
- type:
- items: File
type: array
label: fastq
streamable: no
id: '#fastq'
sbg:x: 164.2499914
sbg:y: 323.7499502
sbg:includeInPorts: yes
required: yes
- type:
- File
label: genomeFastaFiles
streamable: no
id: '#genomeFastaFiles'
sbg:x: 167.7499601
sbg:y: 469.9999106
required: yes
- type:
- 'null'
- string
label: Exons' parents name
description: Tag name to be used as exons’ transcript-parents.
streamable: no
id: '#sjdbGTFtagExonParentTranscript'
sbg:category: Splice junctions db parameters
sbg:x: 200.0
sbg:y: 350.0
sbg:toolDefaultValue: transcript_id
required: no
- type:
- 'null'
- string
label: Gene name
description: Tag name to be used as exons’ gene-parents.
streamable: no
id: '#sjdbGTFtagExonParentGene'
sbg:category: Splice junctions db parameters
sbg:x: 200.0
sbg:y: 400.0
sbg:toolDefaultValue: gene_id
required: no
- type:
- 'null'
- int
label: Max loci anchors
description: Max number of loci anchors are allowed to map to (int>0).
streamable: no
id: '#winAnchorMultimapNmax'
sbg:category: Windows, Anchors, Binning
sbg:x: 200.0
sbg:y: 450.0
sbg:toolDefaultValue: '50'
required: no
- type:
- 'null'
- int
label: Max bins between anchors
description: Max number of bins between two anchors that allows aggregation of anchors
into one window (int>0).
streamable: no
id: '#winAnchorDistNbins'
sbg:category: Windows, Anchors, Binning
sbg:x: 200.0
sbg:y: 500.0
sbg:toolDefaultValue: '9'
required: no
outputs:
- type:
- 'null'
- items: File
type: array
label: unmapped_reads
streamable: no
id: '#unmapped_reads'
source: '#STAR.unmapped_reads'
sbg:x: 766.2497863
sbg:y: 159.5833091
sbg:includeInPorts: yes
required: no
- type:
- 'null'
- File
label: transcriptome_aligned_reads
streamable: no
id: '#transcriptome_aligned_reads'
source: '#STAR.transcriptome_aligned_reads'
sbg:x: 1118.9998003
sbg:y: 86.5833216
sbg:includeInPorts: yes
required: no
- type:
- 'null'
- File
label: splice_junctions
streamable: no
id: '#splice_junctions'
source: '#STAR.splice_junctions'
sbg:x: 1282.3330177
sbg:y: 167.499976
sbg:includeInPorts: yes
required: no
- type:
- 'null'
- File
label: reads_per_gene
streamable: no
id: '#reads_per_gene'
source: '#STAR.reads_per_gene'
sbg:x: 1394.4163557
sbg:y: 245.749964
sbg:includeInPorts: yes
required: no
- type:
- 'null'
- items: File
type: array
label: log_files
streamable: no
id: '#log_files'
source: '#STAR.log_files'
sbg:x: 1505.0830269
sbg:y: 322.9999518
sbg:includeInPorts: yes
required: no
- type:
- 'null'
- File
label: chimeric_junctions
streamable: no
id: '#chimeric_junctions'
source: '#STAR.chimeric_junctions'
sbg:x: 1278.7498062
sbg:y: 446.7499567
sbg:includeInPorts: yes
required: no
- type:
- 'null'
- File
label: intermediate_genome
streamable: no
id: '#intermediate_genome'
source: '#STAR.intermediate_genome'
sbg:x: 1408.9164783
sbg:y: 386.0832876
sbg:includeInPorts: yes
required: no
- type:
- 'null'
- File
label: chimeric_alignments
streamable: no
id: '#chimeric_alignments'
source: '#STAR.chimeric_alignments'
sbg:x: 1147.5831348
sbg:y: 503.2499285
sbg:includeInPorts: yes
required: no
- type:
- 'null'
- File
label: sorted_bam
streamable: no
id: '#sorted_bam'
source: '#Picard_SortSam.sorted_bam'
sbg:x: 934.2498228
sbg:y: 557.2498436
sbg:includeInPorts: yes
required: no
- type:
- 'null'
- File
label: result
streamable: no
id: '#result'
source: '#SBG_FASTQ_Quality_Detector.result'
sbg:x: 1431.6666548
sbg:y: 644.9999898
sbg:includeInPorts: yes
required: no
requirements:
- class: CreateFileRequirement
fileDef: []
hints:
- class: sbg:AWSInstanceType
value: c3.8xlarge
label: RNA-seq Alignment - STAR
description: "Alignment to a reference genome and transcriptome presents the first
step of RNA-Seq analysis. This pipeline uses STAR, an ultrafast RNA-seq aligner
capable of mapping full length RNA sequences and detecting de novo canonical junctions,
non-canonical splices, and chimeric (fusion) transcripts. It is optimized for mammalian
sequence reads, but fine tuning of its parameters enables customization to satisfy
unique needs.\n\nSTAR accepts one file per sample (or two files for paired-end data).
\ \nSplice junction annotations can optionally be collected from splice junction
databases. Set the \"Overhang length\" parameter to a value larger than zero in
order to use splice junction databases. For constant read length, this value should
(ideally) be equal to mate length decreased by 1; for long reads with non-constant
length, this value should be 100 (pipeline default). \nFastQC Analysis on FASTQ
files reveals read length distribution. STAR can detect chimeric transcripts, but
parameter \"Min segment length\" in \"Chimeric Alignments\" category must be adjusted
to a desired minimum chimeric segment length. Aligned reads are reported in BAM
format and can be viewed in a genome browser (such as IGV). A file containing detected
splice junctions is also produced.\n\nUnmapped reads are reported in FASTQ format
and can be included in an output BAM file. The \"Output unmapped reads\" and \"Write
unmapped in SAM\" parameters enable unmapped output type selection."
class: Workflow
steps:
- id: '#STAR_Genome_Generate'
inputs:
- id: '#STAR_Genome_Generate.sjdbScore'
- id: '#STAR_Genome_Generate.sjdbOverhang'
- id: '#STAR_Genome_Generate.sjdbGTFtagExonParentTranscript'
source: '#sjdbGTFtagExonParentTranscript'
- id: '#STAR_Genome_Generate.sjdbGTFtagExonParentGene'
source: '#sjdbGTFtagExonParentGene'
- id: '#STAR_Genome_Generate.sjdbGTFfile'
source: '#sjdbGTFfile'
- id: '#STAR_Genome_Generate.sjdbGTFfeatureExon'
- id: '#STAR_Genome_Generate.sjdbGTFchrPrefix'
- id: '#STAR_Genome_Generate.genomeSAsparseD'
- id: '#STAR_Genome_Generate.genomeSAindexNbases'
- id: '#STAR_Genome_Generate.genomeFastaFiles'
source: '#genomeFastaFiles'
- id: '#STAR_Genome_Generate.genomeChrBinNbits'
outputs:
- id: '#STAR_Genome_Generate.genome'
hints: []
run:
sbg:validationErrors: []
sbg:sbgMaintained: no
sbg:latestRevision: 1
sbg:job:
allocatedResources:
mem: 60000
cpu: 15
inputs:
sjdbScore: 0
sjdbGTFfeatureExon: sjdbGTFfeatureExon
sjdbOverhang: 0
sjdbGTFtagExonParentTranscript: sjdbGTFtagExonParentTranscript
genomeChrBinNbits: genomeChrBinNbits
genomeSAsparseD: 0
sjdbGTFfile:
- size: 0
secondaryFiles: []
class: File
path: /demo/test-files/chr20.gtf
sjdbGTFtagExonParentGene: sjdbGTFtagExonParentGene
genomeFastaFiles:
size: 0
secondaryFiles: []
class: File
path: /sbgenomics/test-data/chr20.fa
sjdbGTFchrPrefix: sjdbGTFchrPrefix
genomeSAindexNbases: 0
sbg:toolAuthor: Alexander Dobin/CSHL
sbg:createdOn: 1450911469
sbg:categories:
- Alignment
sbg:contributors:
- bix-demo
sbg:links:
- id: https://github.com/alexdobin/STAR
label: Homepage
- id: https://github.com/alexdobin/STAR/releases
label: Releases
- id: https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf
label: Manual
- id: https://groups.google.com/forum/#!forum/rna-star
label: Support
- id: http://www.ncbi.nlm.nih.gov/pubmed/23104886
label: Publication
sbg:project: bix-demo/star-2-4-2a-demo
sbg:createdBy: bix-demo
sbg:toolkitVersion: 2.4.2a
sbg:id: sevenbridges/public-apps/star-genome-generate/1
sbg:license: GNU General Public License v3.0 only
sbg:revision: 1
sbg:cmdPreview: mkdir genomeDir && /opt/STAR --runMode genomeGenerate --genomeDir
./genomeDir --runThreadN 15 --genomeFastaFiles /sbgenomics/test-data/chr20.fa
--genomeChrBinNbits genomeChrBinNbits --genomeSAindexNbases 0 --genomeSAsparseD
0 --sjdbGTFfeatureExon sjdbGTFfeatureExon --sjdbGTFtagExonParentTranscript sjdbGTFtagExonParentTranscript
--sjdbGTFtagExonParentGene sjdbGTFtagExonParentGene --sjdbOverhang 0 --sjdbScore
0 --sjdbGTFchrPrefix sjdbGTFchrPrefix --sjdbGTFfile /demo/test-files/chr20.gtf &&
tar -vcf genome.tar ./genomeDir /sbgenomics/test-data/chr20.fa
sbg:modifiedOn: 1450911470
sbg:modifiedBy: bix-demo
sbg:revisionsInfo:
- sbg:modifiedBy: bix-demo
sbg:modifiedOn: 1450911469
sbg:revision: 0
- sbg:modifiedBy: bix-demo
sbg:modifiedOn: 1450911470
sbg:revision: 1
sbg:toolkit: STAR
id: sevenbridges/public-apps/star-genome-generate/1
inputs:
- type:
- 'null'
- int
label: Extra alignment score
description: Extra alignment score for alignments that cross database junctions.
streamable: no
id: '#sjdbScore'
inputBinding:
position: 0
prefix: --sjdbScore
separate: yes
sbg:cmdInclude: yes
sbg:category: Splice junctions db parameters
sbg:includeInPorts: yes
sbg:toolDefaultValue: '2'
required: no
- type:
- 'null'
- int
label: '"Overhang" length'
description: Length of the donor/acceptor sequence on each side of the junctions,
ideally = (mate_length - 1) (int >= 0), if int = 0, splice junction database
is not used.
streamable: no
id: '#sjdbOverhang'
inputBinding:
position: 0
prefix: --sjdbOverhang
separate: yes
sbg:cmdInclude: yes
sbg:category: Splice junctions db parameters
sbg:includeInPorts: yes
sbg:toolDefaultValue: '100'
required: no
- type:
- 'null'
- string
label: Exons' parents name
description: Tag name to be used as exons’ transcript-parents.
streamable: no
id: '#sjdbGTFtagExonParentTranscript'
inputBinding:
position: 0
prefix: --sjdbGTFtagExonParentTranscript
separate: yes
sbg:cmdInclude: yes
sbg:category: Splice junctions db parameters
sbg:toolDefaultValue: transcript_id
required: no
- type:
- 'null'
- string
label: Gene name
description: Tag name to be used as exons’ gene-parents.
streamable: no
id: '#sjdbGTFtagExonParentGene'
inputBinding:
position: 0
prefix: --sjdbGTFtagExonParentGene
separate: yes
sbg:cmdInclude: yes
sbg:category: Splice junctions db parameters
sbg:toolDefaultValue: gene_id
required: no
- type:
- 'null'
- items: File
type: array
label: Splice junction file
description: Gene model annotations and/or known transcripts.
streamable: no
id: '#sjdbGTFfile'
sbg:category: Basic
sbg:fileTypes: GTF, GFF, TXT
required: no
- type:
- 'null'
- string
label: Set exons feature
description: Feature type in GTF file to be used as exons for building transcripts.
streamable: no
id: '#sjdbGTFfeatureExon'
inputBinding:
position: 0
prefix: --sjdbGTFfeatureExon
separate: yes
sbg:cmdInclude: yes
sbg:category: Splice junctions db parameters
sbg:toolDefaultValue: exon
required: no
- type:
- 'null'
- string
label: Chromosome names
description: Prefix for chromosome names in a GTF file (e.g. 'chr' for using
ENSMEBL annotations with UCSC geneomes).
streamable: no
id: '#sjdbGTFchrPrefix'
inputBinding:
position: 0
prefix: --sjdbGTFchrPrefix
separate: yes
sbg:cmdInclude: yes
sbg:category: Splice junctions db parameters
sbg:toolDefaultValue: '-'
required: no
- type:
- 'null'
- int
label: Suffux array sparsity
description: 'Distance between indices: use bigger numbers to decrease needed
RAM at the cost of mapping speed reduction (int>0).'
streamable: no
id: '#genomeSAsparseD'
inputBinding:
position: 0
prefix: --genomeSAsparseD
separate: yes
sbg:cmdInclude: yes
sbg:category: Genome generation parameters
sbg:toolDefaultValue: '1'
required: no
- type:
- 'null'
- int
label: Pre-indexing string length
description: Length (bases) of the SA pre-indexing string. Typically between
10 and 15. Longer strings will use much more memory, but allow faster searches.
For small genomes, this number needs to be scaled down, with a typical value
of min(14, log2(GenomeLength)/2 - 1). For example, for 1 megaBase genome,
this is equal to 9, for 100 kiloBase genome, this is equal to 7.
streamable: no
id: '#genomeSAindexNbases'
inputBinding:
position: 0
prefix: --genomeSAindexNbases
separate: yes
sbg:cmdInclude: yes
sbg:category: Genome generation parameters
sbg:toolDefaultValue: '14'
required: no
- type:
- File
label: Genome fasta files
description: Reference sequence to which to align the reads.
streamable: no
id: '#genomeFastaFiles'
inputBinding:
position: 0
prefix: --genomeFastaFiles
separate: yes
sbg:cmdInclude: yes
sbg:category: Basic
sbg:fileTypes: FASTA, FA
required: yes
- type:
- 'null'
- string
label: Bins size
description: 'Set log2(chrBin), where chrBin is the size (bits) of the bins
for genome storage: each chromosome will occupy an integer number of bins.
If you are using a genome with a large (>5,000) number of chrosomes/scaffolds,
you may need to reduce this number to reduce RAM consumption. The following
scaling is recomended: genomeChrBinNbits = min(18, log2(GenomeLength/NumberOfReferences)).
For example, for 3 gigaBase genome with 100,000 chromosomes/scaffolds, this
is equal to 15.'
streamable: no
id: '#genomeChrBinNbits'
inputBinding:
position: 0
prefix: --genomeChrBinNbits
separate: yes
sbg:cmdInclude: yes
sbg:category: Genome generation parameters
sbg:toolDefaultValue: '18'
required: no
outputs:
- type:
- 'null'
- File
label: Genome Files
description: Genome files comprise binary genome sequence, suffix arrays, text
chromosome names/lengths, splice junctions coordinates, and transcripts/genes
information.
streamable: no
id: '#genome'
outputBinding:
glob: '*.tar'
sbg:fileTypes: TAR
requirements:
- class: ExpressionEngineRequirement
id: '#cwl-js-engine'
requirements:
- class: DockerRequirement
dockerPull: rabix/js-engine
hints:
- class: DockerRequirement
dockerPull: images.sbgenomics.com/ana_d/star:2.4.2a
dockerImageId: a4b0ad2c3cae
- class: sbg:CPURequirement
value: 15
- class: sbg:MemRequirement
value: 60000
label: STAR Genome Generate
description: STAR Genome Generate is a tool that generates genome index files.
One set of files should be generated per each genome/annotation combination.
Once produced, these files could be used as long as genome/annotation combination
stays the same. Also, STAR Genome Generate which produced these files and STAR
aligner using them must be the same toolkit version.
class: CommandLineTool
arguments:
- position: 99
separate: yes
valueFrom: '&& tar -vcf genome.tar ./genomeDir'
- position: 0
separate: yes
valueFrom:
engine: '#cwl-js-engine'
script: "{\t\n var sjFormat = \"False\"\n var gtfgffFormat = \"False\"\n
\ var list = $job.inputs.sjdbGTFfile\n var paths_list = []\n var joined_paths
= \"\"\n \n if (list) {\n list.forEach(function(f){return paths_list.push(f.path)})\n
\ joined_paths = paths_list.join(\" \")\n\n\n paths_list.forEach(function(f){\n
\ ext = f.replace(/^.*\\./, '')\n if (ext == \"gff\" || ext ==
\"gtf\") {\n gtfgffFormat = \"True\"\n return gtfgffFormat\n
\ }\n if (ext == \"txt\") {\n sjFormat = \"True\"\n return
sjFormat\n }\n })\n\n if ($job.inputs.sjdbGTFfile && $job.inputs.sjdbInsertSave
!= \"None\") {\n if (sjFormat == \"True\") {\n return \"--sjdbFileChrStartEnd
\".concat(joined_paths)\n }\n else if (gtfgffFormat == \"True\")
{\n return \"--sjdbGTFfile \".concat(joined_paths)\n }\n }\n
\ }\n}"
class: Expression
stdin: ''
stdout: ''
successCodes: []
temporaryFailCodes: []
x: 384.0832266
'y': 446.4998957
sbg:x: 100.0
sbg:y: 200.0
- id: '#SBG_FASTQ_Quality_Detector'
inputs:
- id: '#SBG_FASTQ_Quality_Detector.fastq'
source: '#fastq'
outputs:
- id: '#SBG_FASTQ_Quality_Detector.result'
hints: []
run:
sbg:validationErrors: []
sbg:sbgMaintained: no
sbg:latestRevision: 3
sbg:job:
allocatedResources:
mem: 1000
cpu: 1
inputs:
fastq:
size: 0
secondaryFiles: []
class: File
path: /path/to/fastq.ext
sbg:toolAuthor: Seven Bridges Genomics
sbg:createdOn: 1450911312
sbg:categories:
- FASTQ-Processing
sbg:contributors:
- bix-demo
sbg:project: bix-demo/sbgtools-demo
sbg:createdBy: bix-demo
sbg:id: sevenbridges/public-apps/sbg-fastq-quality-detector/3
sbg:license: Apache License 2.0
sbg:revision: 3
sbg:cmdPreview: python /opt/sbg_fastq_quality_scale_detector.py --fastq /path/to/fastq.ext
/path/to/fastq.ext
sbg:modifiedOn: 1450911314
sbg:modifiedBy: bix-demo
sbg:revisionsInfo:
- sbg:modifiedBy: bix-demo
sbg:modifiedOn: 1450911312
sbg:revision: 0
- sbg:modifiedBy: bix-demo
sbg:modifiedOn: 1450911314
sbg:revision: 3
- sbg:modifiedBy: bix-demo
sbg:modifiedOn: 1450911313
sbg:revision: 1
- sbg:modifiedBy: bix-demo
sbg:modifiedOn: 1450911313
sbg:revision: 2
sbg:toolkit: SBGTools
id: sevenbridges/public-apps/sbg-fastq-quality-detector/3
inputs:
- type:
- File
label: Fastq
description: FASTQ file.
streamable: no
id: '#fastq'
inputBinding:
position: 0
prefix: --fastq
separate: yes
sbg:cmdInclude: yes
required: yes
outputs:
- type:
- 'null'
- File
label: Result
description: Source FASTQ file with updated metadata.
streamable: no
id: '#result'
outputBinding:
glob: '*.fastq'
sbg:fileTypes: FASTQ
requirements:
- class: CreateFileRequirement
fileDef: []
hints:
- class: DockerRequirement
dockerPull: images.sbgenomics.com/tziotas/sbg_fastq_quality_scale_detector:1.0
dockerImageId: ''
- class: sbg:CPURequirement
value: 1
- class: sbg:MemRequirement
value: 1000
label: SBG FASTQ Quality Detector
description: FASTQ Quality Scale Detector detects which quality encoding scheme
was used in your reads and automatically enters the proper value in the "Quality
Scale" metadata field.
class: CommandLineTool
arguments: []
stdin: ''
stdout: ''
successCodes: []
temporaryFailCodes: []
x: 375.3333179
'y': 323.5833156
sbg:x: 300.0
sbg:y: 200.0
- id: '#Picard_SortSam'
inputs:
- id: '#Picard_SortSam.validation_stringency'
default: SILENT
- id: '#Picard_SortSam.sort_order'
default: Coordinate
- id: '#Picard_SortSam.quiet'
- id: '#Picard_SortSam.output_type'
- id: '#Picard_SortSam.memory_per_job'
- id: '#Picard_SortSam.max_records_in_ram'
- id: '#Picard_SortSam.input_bam'
source: '#STAR.aligned_reads'
- id: '#Picard_SortSam.create_index'
default: 'True'
- id: '#Picard_SortSam.compression_level'
outputs:
- id: '#Picard_SortSam.sorted_bam'
hints: []
run:
sbg:validationErrors: []
sbg:sbgMaintained: no
sbg:latestRevision: 2
sbg:job:
allocatedResources:
mem: 2048
cpu: 1
inputs:
sort_order: Coordinate
input_bam:
path: /root/dir/example.tested.bam
memory_per_job: 2048
output_type: ~
create_index: ~
sbg:toolAuthor: Broad Institute
sbg:createdOn: 1450911168
sbg:categories:
- SAM/BAM-Processing
sbg:contributors:
- bix-demo
sbg:links:
- id: http://broadinstitute.github.io/picard/index.html
label: Homepage
- id: https://github.com/broadinstitute/picard/releases/tag/1.138
label: Source Code
- id: http://broadinstitute.github.io/picard/
label: Wiki
- id: https://github.com/broadinstitute/picard/zipball/master
label: Download
- id: http://broadinstitute.github.io/picard/
label: Publication
sbg:project: bix-demo/picard-1-140-demo
sbg:createdBy: bix-demo
sbg:toolkitVersion: '1.140'
sbg:id: sevenbridges/public-apps/picard-sortsam-1-140/2
sbg:license: MIT License, Apache 2.0 Licence
sbg:revision: 2
sbg:cmdPreview: java -Xmx2048M -jar /opt/picard-tools-1.140/picard.jar SortSam
OUTPUT=example.tested.sorted.bam INPUT=/root/dir/example.tested.bam SORT_ORDER=coordinate INPUT=/root/dir/example.tested.bam
SORT_ORDER=coordinate /root/dir/example.tested.bam
sbg:modifiedOn: 1450911170
sbg:modifiedBy: bix-demo
sbg:revisionsInfo:
- sbg:modifiedBy: bix-demo
sbg:modifiedOn: 1450911168
sbg:revision: 0
- sbg:modifiedBy: bix-demo
sbg:modifiedOn: 1450911169
sbg:revision: 1
- sbg:modifiedBy: bix-demo
sbg:modifiedOn: 1450911170
sbg:revision: 2
sbg:toolkit: Picard
id: sevenbridges/public-apps/picard-sortsam-1-140/2
inputs:
- type:
- 'null'
- name: validation_stringency
symbols:
- STRICT
- LENIENT
- SILENT
type: enum
label: Validation stringency
description: 'Validation stringency for all SAM files read by this program.
Setting stringency to SILENT can improve performance when processing a BAM
file in which variable-length data (read, qualities, tags) do not otherwise
need to be decoded. This option can be set to ''null'' to clear the default
value. Possible values: {STRICT, LENIENT, SILENT}.'
streamable: no
id: '#validation_stringency'
inputBinding:
position: 0
prefix: VALIDATION_STRINGENCY=
separate: no
valueFrom:
engine: '#cwl-js-engine'
script: |-
{
if ($job.inputs.validation_stringency)
{
return $job.inputs.validation_stringency
}
else
{
return "SILENT"
}
}
class: Expression
sbg:cmdInclude: yes
sbg:category: Other input types
sbg:toolDefaultValue: SILENT
required: no
- type:
- name: sort_order
symbols:
- Unsorted
- Queryname
- Coordinate
type: enum
label: Sort order
description: 'Sort order of the output file. Possible values: {unsorted, queryname,
coordinate}.'
streamable: no
id: '#sort_order'
inputBinding:
position: 3
prefix: SORT_ORDER=
separate: no
valueFrom:
engine: '#cwl-js-engine'
script: |-
{
p = $job.inputs.sort_order.toLowerCase()
return p
}
class: Expression
sbg:cmdInclude: yes
sbg:category: Other input types
sbg:toolDefaultValue: Coordinate
sbg:altPrefix: SO
required: yes
- type:
- 'null'
- name: quiet
symbols:
- 'True'
- 'False'
type: enum
label: Quiet
description: 'This parameter indicates whether to suppress job-summary info
on System.err. This option can be set to ''null'' to clear the default value.
Possible values: {true, false}.'
streamable: no
id: '#quiet'
inputBinding:
position: 0
prefix: QUIET=
separate: no
sbg:cmdInclude: yes
sbg:category: Other input types
sbg:toolDefaultValue: 'False'
required: no
- type:
- 'null'
- name: output_type
symbols:
- BAM
- SAM
- SAME AS INPUT
type: enum
label: Output format
description: Since Picard tools can output both SAM and BAM files, user can
choose the format of the output file.
streamable: no
id: '#output_type'
sbg:category: Other input types
sbg:toolDefaultValue: SAME AS INPUT
required: no
- type:
- 'null'
- int
label: Memory per job
description: Amount of RAM memory to be used per job. Defaults to 2048 MB for
single threaded jobs.
streamable: no
id: '#memory_per_job'
sbg:toolDefaultValue: '2048'
required: no
- type:
- 'null'
- int
label: Max records in RAM
description: When writing SAM files that need to be sorted, this parameter will
specify the number of records stored in RAM before spilling to disk. Increasing
this number reduces the number of file handles needed to sort a SAM file,
and increases the amount of RAM needed. This option can be set to 'null' to
clear the default value.
streamable: no
id: '#max_records_in_ram'
inputBinding:
position: 0
prefix: MAX_RECORDS_IN_RAM=
separate: no
sbg:cmdInclude: yes
sbg:category: Other input types
sbg:toolDefaultValue: '500000'
required: no
- type:
- File
label: Input BAM
description: The BAM or SAM file to sort.
streamable: no
id: '#input_bam'
inputBinding:
position: 1
prefix: INPUT=
separate: no
sbg:cmdInclude: yes
sbg:category: File inputs
sbg:fileTypes: BAM, SAM
sbg:altPrefix: I
required: yes
- type:
- 'null'
- name: create_index
symbols:
- 'True'
- 'False'
type: enum
label: Create index
description: 'This parameter indicates whether to create a BAM index when writing
a coordinate-sorted BAM file. This option can be set to ''null'' to clear
the default value. Possible values: {true, false}.'
streamable: no
id: '#create_index'
inputBinding:
position: 5
prefix: CREATE_INDEX=
separate: no
sbg:cmdInclude: yes
sbg:category: Other input types
sbg:toolDefaultValue: 'False'
required: no
- type:
- 'null'
- int
label: Compression level
description: Compression level for all compressed files created (e.g. BAM and
GELI). This option can be set to 'null' to clear the default value.
streamable: no
id: '#compression_level'
inputBinding:
position: 0
prefix: COMPRESSION_LEVEL=
separate: no
sbg:cmdInclude: yes
sbg:category: Other input types
sbg:toolDefaultValue: '5'
required: no
outputs:
- type:
- 'null'
- File
label: Sorted BAM/SAM
description: Sorted BAM or SAM file.
streamable: no
id: '#sorted_bam'
outputBinding:
glob: '*.sorted.?am'
sbg:fileTypes: BAM, SAM
requirements:
- class: ExpressionEngineRequirement
id: '#cwl-js-engine'
requirements:
- class: DockerRequirement
dockerPull: rabix/js-engine
engineCommand: cwl-engine.js
hints:
- class: DockerRequirement
dockerPull: images.sbgenomics.com/mladenlsbg/picard:1.140
dockerImageId: eab0e70b6629
- class: sbg:CPURequirement
value: 1
- class: sbg:MemRequirement
value:
engine: '#cwl-js-engine'
script: "{\n if($job.inputs.memory_per_job){\n \treturn $job.inputs.memory_per_job\n
\ }\n \treturn 2048\n}"
class: Expression
label: Picard SortSam
description: Picard SortSam sorts the input SAM or BAM. Input and output formats
are determined by the file extension.
class: CommandLineTool
arguments:
- position: 0
prefix: OUTPUT=
separate: no
valueFrom:
engine: '#cwl-js-engine'
script: "{\n filename = $job.inputs.input_bam.path\n ext = $job.inputs.output_type\n\nif
(ext === \"BAM\")\n{\n return filename.split('.').slice(0, -1).concat(\"sorted.bam\").join(\".\").replace(/^.*[\\\\\\/]/,
'')\n }\n\nelse if (ext === \"SAM\")\n{\n return filename.split('.').slice(0,
-1).concat(\"sorted.sam\").join('.').replace(/^.*[\\\\\\/]/, '')\n}\n\nelse
\n{\n\treturn filename.split('.').slice(0, -1).concat(\"sorted.\"+filename.split('.').slice(-1)[0]).join(\".\").replace(/^.*[\\\\\\/]/,
'')\n}\n}"
class: Expression
- position: 1000
separate: yes
valueFrom:
engine: '#cwl-js-engine'
script: "{\n filename = $job.inputs.input_bam.path\n \n /* figuring out
output file type */\n ext = $job.inputs.output_type\n if (ext === \"BAM\")\n
\ {\n out_extension = \"BAM\"\n }\n else if (ext === \"SAM\")\n {\n
\ out_extension = \"SAM\"\n }\n else \n {\n\tout_extension = filename.split('.').slice(-1)[0].toUpperCase()\n
\ } \n \n /* if exist moving .bai in bam.bai */\n if ($job.inputs.create_index
=== 'True' && $job.inputs.sort_order === 'Coordinate' && out_extension ==
\"BAM\")\n {\n \n old_name = filename.split('.').slice(0, -1).concat('sorted.bai').join('.').replace(/^.*[\\\\\\/]/,
'')\n new_name = filename.split('.').slice(0, -1).concat('sorted.bam.bai').join('.').replace(/^.*[\\\\\\/]/,
'')\n return \"; mv \" + \" \" + old_name + \" \" + new_name\n }\n\n}"
class: Expression
stdin: ''
stdout: ''
successCodes: []
temporaryFailCodes: []
x: 773.0831807
'y': 470.9165939
sbg:x: 500.0
sbg:y: 200.0
- id: '#STAR'
inputs:
- id: '#STAR.winFlankNbins'
- id: '#STAR.winBinNbits'
- id: '#STAR.winAnchorMultimapNmax'
source: '#winAnchorMultimapNmax'
- id: '#STAR.winAnchorDistNbins'
source: '#winAnchorDistNbins'
- id: '#STAR.twopassMode'
- id: '#STAR.twopass1readsN'
- id: '#STAR.sjdbScore'
- id: '#STAR.sjdbOverhang'
default: 100
- id: '#STAR.sjdbInsertSave'
- id: '#STAR.sjdbGTFtagExonParentTranscript'
- id: '#STAR.sjdbGTFtagExonParentGene'
- id: '#STAR.sjdbGTFfile'
source: '#sjdbGTFfile'
- id: '#STAR.sjdbGTFfeatureExon'
- id: '#STAR.sjdbGTFchrPrefix'
- id: '#STAR.seedSearchStartLmaxOverLread'
- id: '#STAR.seedSearchStartLmax'
- id: '#STAR.seedSearchLmax'
- id: '#STAR.seedPerWindowNmax'
- id: '#STAR.seedPerReadNmax'
- id: '#STAR.seedNoneLociPerWindow'
- id: '#STAR.seedMultimapNmax'
- id: '#STAR.scoreStitchSJshift'
- id: '#STAR.scoreInsOpen'
- id: '#STAR.scoreInsBase'
- id: '#STAR.scoreGenomicLengthLog2scale'
- id: '#STAR.scoreGapNoncan'
- id: '#STAR.scoreGapGCAG'
- id: '#STAR.scoreGapATAC'
- id: '#STAR.scoreGap'
- id: '#STAR.scoreDelOpen'
- id: '#STAR.scoreDelBase'
- id: '#STAR.rg_seq_center'
- id: '#STAR.rg_sample_id'
- id: '#STAR.rg_platform_unit_id'
- id: '#STAR.rg_platform'
- id: '#STAR.rg_mfl'
- id: '#STAR.rg_library_id'
- id: '#STAR.reads'
source: '#SBG_FASTQ_Quality_Detector.result'
- id: '#STAR.readMatesLengthsIn'
- id: '#STAR.readMapNumber'
- id: '#STAR.quantTranscriptomeBan'
- id: '#STAR.quantMode'
default: TranscriptomeSAM
- id: '#STAR.outSortingType'
default: SortedByCoordinate
- id: '#STAR.outSJfilterReads'
- id: '#STAR.outSJfilterOverhangMin'
- id: '#STAR.outSJfilterIntronMaxVsReadN'
- id: '#STAR.outSJfilterDistToOtherSJmin'
- id: '#STAR.outSJfilterCountUniqueMin'
- id: '#STAR.outSJfilterCountTotalMin'
- id: '#STAR.outSAMunmapped'
- id: '#STAR.outSAMtype'
default: BAM
- id: '#STAR.outSAMstrandField'
- id: '#STAR.outSAMreadID'
- id: '#STAR.outSAMprimaryFlag'
- id: '#STAR.outSAMorder'
- id: '#STAR.outSAMmode'
- id: '#STAR.outSAMmapqUnique'
- id: '#STAR.outSAMheaderPG'
- id: '#STAR.outSAMheaderHD'
- id: '#STAR.outSAMflagOR'
- id: '#STAR.outSAMflagAND'
- id: '#STAR.outSAMattributes'
- id: '#STAR.outReadsUnmapped'
default: Fastx
- id: '#STAR.outQSconversionAdd'
- id: '#STAR.outFilterType'
- id: '#STAR.outFilterScoreMinOverLread'
- id: '#STAR.outFilterScoreMin'
- id: '#STAR.outFilterMultimapScoreRange'
- id: '#STAR.outFilterMultimapNmax'
- id: '#STAR.outFilterMismatchNoverReadLmax'
- id: '#STAR.outFilterMismatchNoverLmax'
- id: '#STAR.outFilterMismatchNmax'
- id: '#STAR.outFilterMatchNminOverLread'
- id: '#STAR.outFilterMatchNmin'
- id: '#STAR.outFilterIntronMotifs'
- id: '#STAR.limitSjdbInsertNsj'
- id: '#STAR.limitOutSJoneRead'
- id: '#STAR.limitOutSJcollapsed'
- id: '#STAR.limitBAMsortRAM'
- id: '#STAR.genomeDirName'
- id: '#STAR.genome'
source: '#STAR_Genome_Generate.genome'
- id: '#STAR.clip5pNbases'
- id: '#STAR.clip3pNbases'
- id: '#STAR.clip3pAfterAdapterNbases'
- id: '#STAR.clip3pAdapterSeq'
- id: '#STAR.clip3pAdapterMMp'
- id: '#STAR.chimSegmentMin'
- id: '#STAR.chimScoreSeparation'
- id: '#STAR.chimScoreMin'
- id: '#STAR.chimScoreJunctionNonGTAG'
- id: '#STAR.chimScoreDropMax'
- id: '#STAR.chimOutType'
- id: '#STAR.chimJunctionOverhangMin'
- id: '#STAR.alignWindowsPerReadNmax'
- id: '#STAR.alignTranscriptsPerWindowNmax'
- id: '#STAR.alignTranscriptsPerReadNmax'
- id: '#STAR.alignSplicedMateMapLminOverLmate'
- id: '#STAR.alignSplicedMateMapLmin'
- id: '#STAR.alignSoftClipAtReferenceEnds'
- id: '#STAR.alignSJoverhangMin'
- id: '#STAR.alignSJDBoverhangMin'
- id: '#STAR.alignMatesGapMax'
- id: '#STAR.alignIntronMin'
- id: '#STAR.alignIntronMax'
- id: '#STAR.alignEndsType'
outputs:
- id: '#STAR.unmapped_reads'
- id: '#STAR.transcriptome_aligned_reads'
- id: '#STAR.splice_junctions'
- id: '#STAR.reads_per_gene'
- id: '#STAR.log_files'
- id: '#STAR.intermediate_genome'
- id: '#STAR.chimeric_junctions'
- id: '#STAR.chimeric_alignments'
- id: '#STAR.aligned_reads'
hints: []
run:
sbg:validationErrors: []
sbg:sbgMaintained: no
sbg:latestRevision: 4
sbg:job:
allocatedResources:
mem: 60000
cpu: 15
inputs:
alignWindowsPerReadNmax: 0
outSAMheaderPG: outSAMheaderPG
GENOME_DIR_NAME: ''
outFilterMatchNminOverLread: 0
rg_platform_unit_id: rg_platform_unit
alignTranscriptsPerReadNmax: 0
readMapNumber: 0
alignSplicedMateMapLminOverLmate: 0
alignMatesGapMax: 0
outFilterMultimapNmax: 0
clip5pNbases:
- 0
outSAMstrandField: None
readMatesLengthsIn: NotEqual
outSAMattributes: Standard
seedMultimapNmax: 0
rg_mfl: rg_mfl
chimSegmentMin: 0
winAnchorDistNbins: 0
outSortingType: SortedByCoordinate
outFilterMultimapScoreRange: 0
sjdbInsertSave: Basic
clip3pAfterAdapterNbases:
- 0
scoreDelBase: 0
outFilterMatchNmin: 0
twopass1readsN: 0
outSAMunmapped: None
genome:
size: 0
secondaryFiles: []
class: File
path: genome.ext
sjdbGTFtagExonParentTranscript: ''
limitBAMsortRAM: 0
alignEndsType: Local
seedNoneLociPerWindow: 0
rg_sample_id: rg_sample
sjdbGTFtagExonParentGene: ''
chimScoreMin: 0
outSJfilterIntronMaxVsReadN:
- 0
twopassMode: Basic
alignSplicedMateMapLmin: 0
outSJfilterReads: All
outSAMprimaryFlag: OneBestScore
outSJfilterCountTotalMin:
- 3
- 1
- 1
- 1
outSAMorder: Paired
outSAMflagAND: 0
chimScoreSeparation: 0
alignSJoverhangMin: 0
outFilterScoreMin: 0
seedSearchStartLmax: 0
scoreGapGCAG: 0
scoreGenomicLengthLog2scale: 0
outFilterIntronMotifs: None
outFilterMismatchNmax: 0
reads:
- size: 0
secondaryFiles: []
class: File
metadata:
format: fastq
paired_end: '1'
seq_center: illumina
path: /test-data/mate_1.fastq.bz2
scoreGap: 0
outSJfilterOverhangMin:
- 30
- 12
- 12
- 12
outSAMflagOR: 0
outSAMmode: Full
rg_library_id: ''
chimScoreJunctionNonGTAG: 0
scoreInsOpen: 0
clip3pAdapterSeq:
- clip3pAdapterSeq
chimScoreDropMax: 0
outFilterType: Normal
scoreGapATAC: 0
rg_platform: Ion Torrent PGM
clip3pAdapterMMp:
- 0
sjdbGTFfeatureExon: ''
outQSconversionAdd: 0
quantMode: TranscriptomeSAM
alignIntronMin: 0
scoreInsBase: 0
scoreGapNoncan: 0
seedSearchLmax: 0
outSJfilterDistToOtherSJmin:
- 0
outFilterScoreMinOverLread: 0
alignSJDBoverhangMin: 0
limitOutSJcollapsed: 0
winAnchorMultimapNmax: 0
outFilterMismatchNoverLmax: 0
rg_seq_center: ''
outSAMheaderHD: outSAMheaderHD
chimOutType: Within
quantTranscriptomeBan: IndelSoftclipSingleend
limitOutSJoneRead: 0
alignTranscriptsPerWindowNmax: 0
sjdbOverhang: ~
outReadsUnmapped: Fastx
scoreStitchSJshift: 0
seedPerWindowNmax: 0
outSJfilterCountUniqueMin:
- 3
- 1
- 1
- 1
scoreDelOpen: 0
sjdbGTFfile:
- path: /demo/test-data/chr20.gtf
clip3pNbases:
- 0
- 3
winBinNbits: 0
sjdbScore: ~
seedSearchStartLmaxOverLread: 0
alignIntronMax: 0
seedPerReadNmax: 0
outFilterMismatchNoverReadLmax: 0
winFlankNbins: 0
sjdbGTFchrPrefix: chrPrefix
alignSoftClipAtReferenceEnds: 'Yes'
outSAMreadID: Standard
outSAMtype: BAM
chimJunctionOverhangMin: 0
limitSjdbInsertNsj: 0
outSAMmapqUnique: 0
sbg:toolAuthor: Alexander Dobin/CSHL
sbg:createdOn: 1450911471
sbg:categories:
- Alignment
sbg:contributors:
- ana_d
- bix-demo
- uros_sipetic
sbg:links:
- id: https://github.com/alexdobin/STAR
label: Homepage
- id: https://github.com/alexdobin/STAR/releases
label: Releases
- id: https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf
label: Manual
- id: https://groups.google.com/forum/#!forum/rna-star
label: Support
- id: http://www.ncbi.nlm.nih.gov/pubmed/23104886
label: Publication
sbg:project: bix-demo/star-2-4-2a-demo
sbg:createdBy: bix-demo
sbg:toolkitVersion: 2.4.2a
sbg:id: sevenbridges/public-apps/star/4
sbg:license: GNU General Public License v3.0 only
sbg:revision: 4
sbg:cmdPreview: tar -xvf genome.ext && /opt/STAR --runThreadN 15 --readFilesCommand
bzcat --sjdbGTFfile /demo/test-data/chr20.gtf --sjdbGTFchrPrefix chrPrefix
--sjdbInsertSave Basic --twopass1readsN 0 --chimOutType WithinBAM --outSAMattrRGline
ID:1 CN:illumina PI:rg_mfl PL:Ion_Torrent_PGM PU:rg_platform_unit SM:rg_sample --quantMode
TranscriptomeSAM --outFileNamePrefix ./mate_1.fastq.bz2. --readFilesIn /test-data/mate_1.fastq.bz2 &&
tar -vcf mate_1.fastq.bz2._STARgenome.tar ./mate_1.fastq.bz2._STARgenome &&
mv mate_1.fastq.bz2.Unmapped.out.mate1 mate_1.fastq.bz2.Unmapped.out.mate1.fastq
sbg:modifiedOn: 1462889222
sbg:modifiedBy: ana_d
sbg:revisionsInfo:
- sbg:modifiedBy: bix-demo
sbg:modifiedOn: 1450911471
sbg:revision: 0
- sbg:modifiedBy: bix-demo
sbg:modifiedOn: 1450911473
sbg:revision: 1
- sbg:modifiedBy: bix-demo
sbg:modifiedOn: 1450911475
sbg:revision: 2
- sbg:modifiedBy: uros_sipetic
sbg:modifiedOn: 1462878528
sbg:revision: 3
- sbg:modifiedBy: ana_d
sbg:modifiedOn: 1462889222
sbg:revision: 4
sbg:toolkit: STAR
id: sevenbridges/public-apps/star/4
inputs:
- type:
- 'null'
- int
label: Flanking regions size
description: =log2(winFlank), where win Flank is the size of the left and right
flanking regions for each window (int>0).
streamable: no
id: '#winFlankNbins'
inputBinding:
position: 0
prefix: --winFlankNbins
separate: yes
sbg:cmdInclude: yes
sbg:category: Windows, Anchors, Binning
sbg:includeInPorts: yes
sbg:toolDefaultValue: '4'
required: no
- type:
- 'null'
- int
label: Bin size
description: =log2(winBin), where winBin is the size of the bin for the windows/clustering,
each window will occupy an integer number of bins (int>0).
streamable: no
id: '#winBinNbits'
inputBinding:
position: 0
prefix: --winBinNbits
separate: yes
sbg:cmdInclude: yes
sbg:category: Windows, Anchors, Binning
sbg:includeInPorts: yes
sbg:toolDefaultValue: '16'
required: no
- type:
- 'null'
- int
label: Max loci anchors
description: Max number of loci anchors are allowed to map to (int>0).
streamable: no
id: '#winAnchorMultimapNmax'
inputBinding:
position: 0
prefix: --winAnchorMultimapNmax
separate: yes
sbg:cmdInclude: yes
sbg:category: Windows, Anchors, Binning
sbg:toolDefaultValue: '50'
required: no
- type:
- 'null'
- int
label: Max bins between anchors
description: Max number of bins between two anchors that allows aggregation
of anchors into one window (int>0).
streamable: no
id: '#winAnchorDistNbins'
inputBinding:
position: 0
prefix: --winAnchorDistNbins
separate: yes
sbg:cmdInclude: yes
sbg:category: Windows, Anchors, Binning
sbg:toolDefaultValue: '9'
required: no
- type:
- 'null'
- name: twopassMode
symbols:
- None
- Basic
type: enum
label: Two-pass mode
description: '2-pass mapping mode. None: 1-pass mapping; Basic: basic 2-pass
mapping, with all 1st pass junctions inserted into the genome indices on the
fly.'
streamable: no
id: '#twopassMode'
inputBinding:
position: 0
prefix: --twopassMode
separate: yes
sbg:cmdInclude: yes
sbg:category: 2-pass mapping
sbg:toolDefaultValue: None
required: no
- type:
- 'null'
- int
label: Reads to process in 1st step
description: 'Number of reads to process for the 1st step. 0: 1-step only, no
2nd pass; use very large number to map all reads in the first step (int>0).'
streamable: no
id: '#twopass1readsN'
sbg:category: 2-pass mapping
sbg:toolDefaultValue: '-1'
required: no
- type:
- 'null'
- int
label: Extra alignment score
description: Extra alignment score for alignments that cross database junctions.
streamable: no
id: '#sjdbScore'
sbg:category: Splice junctions database
sbg:toolDefaultValue: '2'
required: no
- type:
- 'null'
- int
label: '"Overhang" length'
description: Length of the donor/acceptor sequence on each side of the junctions,
ideally = (mate_length - 1) (int >= 0), if int = 0, splice junction database
is not used.
streamable: no
id: '#sjdbOverhang'
sbg:category: Splice junctions database
sbg:toolDefaultValue: '100'
required: no
- type:
- 'null'
- name: sjdbInsertSave
symbols:
- Basic
- All
- None
type: enum
label: Save junction files
description: 'Which files to save when sjdb junctions are inserted on the fly
at the mapping step. None: not saving files at all; Basic: only small junction/transcript
files; All: all files including big Genome, SA and SAindex. These files are
output as archive.'
streamable: no
id: '#sjdbInsertSave'
sbg:category: Splice junctions database
sbg:toolDefaultValue: None
required: no
- type:
- 'null'
- string
label: Exons' parents name
description: Tag name to be used as exons’ transcript-parents.
streamable: no
id: '#sjdbGTFtagExonParentTranscript'
sbg:category: Splice junctions database
sbg:toolDefaultValue: transcript_id
required: no
- type:
- 'null'
- string
label: Gene name
description: Tag name to be used as exons’ gene-parents.
streamable: no
id: '#sjdbGTFtagExonParentGene'
sbg:category: Splice junctions database
sbg:toolDefaultValue: gene_id
required: no
- type:
- 'null'
- items: File
type: array
label: Splice junction file
description: Gene model annotations and/or known transcripts. No need to include
this input, except in case of using "on the fly" annotations.
streamable: no
id: '#sjdbGTFfile'
sbg:category: Basic
sbg:fileTypes: GTF, GFF, TXT
required: no
- type:
- 'null'
- string
label: Set exons feature
description: Feature type in GTF file to be used as exons for building transcripts.
streamable: no
id: '#sjdbGTFfeatureExon'
sbg:category: Splice junctions database
sbg:toolDefaultValue: exon
required: no
- type:
- 'null'
- string
label: Chromosome names
description: Prefix for chromosome names in a GTF file (e.g. 'chr' for using
ENSMEBL annotations with UCSC geneomes).
streamable: no
id: '#sjdbGTFchrPrefix'
sbg:category: Splice junctions database
sbg:toolDefaultValue: '-'
required: no
- type:
- 'null'
- float
label: Search start point normalized
description: seedSearchStartLmax normalized to read length (sum of mates' lengths
for paired-end reads).
streamable: no
id: '#seedSearchStartLmaxOverLread'
inputBinding:
position: 0
prefix: --seedSearchStartLmaxOverLread
separate: yes
sbg:cmdInclude: yes
sbg:category: Alignments and Seeding
sbg:toolDefaultValue: '1.0'
required: no
- type:
- 'null'
- int
label: Search start point
description: Defines the search start point through the read - the read is split
into pieces no longer than this value (int>0).
streamable: no
id: '#seedSearchStartLmax'
inputBinding:
position: 0
prefix: --seedSearchStartLmax
separate: yes
sbg:cmdInclude: yes
sbg:category: Alignments and Seeding
sbg:toolDefaultValue: '50'
required: no
- type:
- 'null'
- int
label: Max seed length
description: Defines the maximum length of the seeds, if =0 max seed length
is infinite (int>=0).
streamable: no
id: '#seedSearchLmax'
inputBinding:
position: 0
prefix: --seedSearchLmax
separate: yes
sbg:cmdInclude: yes
sbg:category: Alignments and Seeding
sbg:toolDefaultValue: '0'
required: no
- type:
- 'null'
- int
label: Max seeds per window
description: Max number of seeds per window (int>=0).
streamable: no
id: '#seedPerWindowNmax'
inputBinding:
position: 0
prefix: --seedPerWindowNmax
separate: yes
sbg:cmdInclude: yes
sbg:category: Alignments and Seeding
sbg:toolDefaultValue: '50'
required: no
- type:
- 'null'
- int
label: Max seeds per read
description: Max number of seeds per read (int>=0).
streamable: no
id: '#seedPerReadNmax'
inputBinding:
position: 0
prefix: --seedPerReadNmax
separate: yes
sbg:cmdInclude: yes
sbg:category: Alignments and Seeding
sbg:toolDefaultValue: '1000'
required: no
- type:
- 'null'
- int
label: Max one-seed loci per window
description: Max number of one seed loci per window (int>=0).
streamable: no
id: '#seedNoneLociPerWindow'
inputBinding:
position: 0
prefix: --seedNoneLociPerWindow
separate: yes
sbg:cmdInclude: yes
sbg:category: Alignments and Seeding
sbg:toolDefaultValue: '10'
required: no
- type:
- 'null'
- int
label: Filter pieces for stitching
description: Only pieces that map fewer than this value are utilized in the
stitching procedure (int>=0).
streamable: no
id: '#seedMultimapNmax'
inputBinding:
position: 0
prefix: --seedMultimapNmax
separate: yes
sbg:cmdInclude: yes
sbg:category: Alignments and Seeding
sbg:toolDefaultValue: '10000'
required: no
- type:
- 'null'
- int
label: Max score reduction
description: Maximum score reduction while searching for SJ boundaries in the
stitching step.
streamable: no
id: '#scoreStitchSJshift'
inputBinding:
position: 0
prefix: --scoreStitchSJshift
separate: yes
sbg:cmdInclude: yes
sbg:category: Scoring
sbg:toolDefaultValue: '1'
required: no
- type:
- 'null'
- int
label: Insertion Open Penalty
description: Insertion open penalty.
streamable: no
id: '#scoreInsOpen'
inputBinding:
position: 0
prefix: --scoreInsOpen
separate: yes
sbg:cmdInclude: yes
sbg:category: Scoring
sbg:toolDefaultValue: '-2'
required: no
- type:
- 'null'
- int
label: Insertion extension penalty
description: Insertion extension penalty per base (in addition to --scoreInsOpen).
streamable: no
id: '#scoreInsBase'
inputBinding:
position: 0
prefix: --scoreInsBase
separate: yes
sbg:cmdInclude: yes
sbg:category: Scoring
sbg:toolDefaultValue: '-2'
required: no
- type:
- 'null'
- float
label: Log scaled score
description: 'Extra score logarithmically scaled with genomic length of the
alignment: <int>*log2(genomicLength).'
streamable: no
id: '#scoreGenomicLengthLog2scale'
inputBinding:
position: 0
prefix: --scoreGenomicLengthLog2scale
separate: yes
sbg:cmdInclude: yes
sbg:category: Scoring
sbg:toolDefaultValue: '-0.25'
required: no
- type:
- 'null'
- int
label: Non-canonical gap open
description: Non-canonical gap open penalty (in addition to --scoreGap).
streamable: no
id: '#scoreGapNoncan'
inputBinding:
position: 0
prefix: --scoreGapNoncan
separate: yes
sbg:cmdInclude: yes
sbg:category: Scoring
sbg:toolDefaultValue: '-8'
required: no
- type:
- 'null'
- int
label: GC/AG and CT/GC gap open
description: GC/AG and CT/GC gap open penalty (in addition to --scoreGap).
streamable: no
id: '#scoreGapGCAG'
inputBinding:
position: 0
prefix: --scoreGapGCAG
separate: yes
sbg:cmdInclude: yes
sbg:category: Scoring
sbg:toolDefaultValue: '-4'
required: no
- type:
- 'null'
- int
label: AT/AC and GT/AT gap open
description: AT/AC and GT/AT gap open penalty (in addition to --scoreGap).
streamable: no
id: '#scoreGapATAC'
inputBinding:
position: 0
prefix: --scoreGapATAC
separate: yes
sbg:cmdInclude: yes
sbg:category: Scoring
sbg:toolDefaultValue: '-8'
required: no
- type:
- 'null'
- int
label: Gap open penalty
description: Gap open penalty.
streamable: no
id: '#scoreGap'
inputBinding:
position: 0
prefix: --scoreGap
separate: yes
sbg:cmdInclude: yes
sbg:category: Scoring
sbg:toolDefaultValue: '0'
required: no
- type:
- 'null'
- int
label: Deletion open penalty
description: Deletion open penalty.
streamable: no
id: '#scoreDelOpen'
inputBinding:
position: 0
prefix: --scoreDelOpen
separate: yes
sbg:cmdInclude: yes
sbg:category: Scoring
sbg:toolDefaultValue: '-2'
required: no
- type:
- 'null'
- int
label: Deletion extension penalty
description: Deletion extension penalty per base (in addition to --scoreDelOpen).
streamable: no
id: '#scoreDelBase'
inputBinding:
position: 0
prefix: --scoreDelBase
separate: yes
sbg:cmdInclude: yes
sbg:category: Scoring
sbg:toolDefaultValue: '-2'
required: no
- type:
- 'null'
- string
label: Sequencing center
description: Specify the sequencing center for RG line.
streamable: no
id: '#rg_seq_center'
sbg:category: Read group
sbg:toolDefaultValue: Inferred from metadata
required: no
- type:
- 'null'
- string
label: Sample ID
description: Specify the sample ID for RG line.
streamable: no
id: '#rg_sample_id'
sbg:category: Read group
sbg:toolDefaultValue: Inferred from metadata
required: no
- type:
- 'null'
- string
label: Platform unit ID
description: Specify the platform unit ID for RG line.
streamable: no
id: '#rg_platform_unit_id'
sbg:category: Read group
sbg:toolDefaultValue: Inferred from metadata
required: no
- type:
- 'null'
- name: rg_platform
symbols:
- LS 454
- Helicos
- Illumina
- ABI SOLiD
- Ion Torrent PGM
- PacBio
type: enum
label: Platform
description: Specify the version of the technology that was used for sequencing
or assaying.
streamable: no
id: '#rg_platform'
sbg:category: Read group
sbg:toolDefaultValue: Inferred from metadata
required: no
- type:
- 'null'
- string
label: Median fragment length
description: Specify the median fragment length for RG line.
streamable: no
id: '#rg_mfl'
sbg:category: Read group
sbg:toolDefaultValue: Inferred from metadata
required: no
- type:
- 'null'
- string
label: Library ID
description: Specify the library ID for RG line.
streamable: no
id: '#rg_library_id'
sbg:category: Read group
sbg:toolDefaultValue: Inferred from metadata
required: no
- type:
- items: File
type: array
label: Read sequence
description: Read sequence.
streamable: no
id: '#reads'
inputBinding:
position: 10
separate: yes
itemSeparator: ' '
valueFrom:
engine: '#cwl-js-engine'
script: "{\t\n var list = [].concat($job.inputs.reads)\n \n var resp
= []\n \n if (list.length == 1){\n resp.push(list[0].path)\n \n
\ }else if (list.length == 2){ \n \n left = \"\"\n right =
\"\"\n \n for (index = 0; index < list.length; ++index) {\n \n
\ if (list[index].metadata != null){\n if (list[index].metadata.paired_end
== 1){\n left = list[index].path\n }else if (list[index].metadata.paired_end
== 2){\n right = list[index].path\n }\n }\n }\n
\ \n if (left != \"\" && right != \"\"){ \n resp.push(left)\n
\ resp.push(right)\n }\n }\n else if (list.length > 2){\n left
= []\n right = []\n \n for (index = 0; index < list.length;
++index) {\n \n if (list[index].metadata != null){\n if
(list[index].metadata.paired_end == 1){\n left.push(list[index].path)\n
\ }else if (list[index].metadata.paired_end == 2){\n right.push(list[index].path)\n
\ }\n }\n }\n left_join = left.join()\n right_join
= right.join()\n if (left != [] && right != []){ \n resp.push(left_join)\n
\ resp.push(right_join)\n }\t\n }\n \n if(resp.length > 0){
\ \n return \"--readFilesIn \".concat(resp.join(\" \"))\n }\n}"
class: Expression
sbg:cmdInclude: yes
sbg:category: Basic
sbg:fileTypes: FASTA, FASTQ, FA, FQ, FASTQ.GZ, FQ.GZ, FASTQ.BZ2, FQ.BZ2
required: yes
- type:
- 'null'
- name: readMatesLengthsIn
symbols:
- NotEqual
- Equal
type: enum
label: Reads lengths
description: Equal/Not equal - lengths of names, sequences, qualities for both
mates are the same/not the same. "Not equal" is safe in all situations.
streamable: no
id: '#readMatesLengthsIn'
inputBinding:
position: 0
prefix: --readMatesLengthsIn
separate: yes
sbg:cmdInclude: yes
sbg:category: Read parameters
sbg:toolDefaultValue: NotEqual
required: no
- type:
- 'null'
- int
label: Reads to map
description: Number of reads to map from the beginning of the file.
streamable: no
id: '#readMapNumber'
inputBinding:
position: 0
prefix: --readMapNumber
separate: yes
sbg:cmdInclude: yes
sbg:category: Read parameters
sbg:toolDefaultValue: '-1'
required: no
- type:
- 'null'
- name: quantTranscriptomeBan
symbols:
- IndelSoftclipSingleend
- Singleend
type: enum
label: Prohibit alignment type
description: 'Prohibit various alignment type. IndelSoftclipSingleend: prohibit
indels, soft clipping and single-end alignments - compatible with RSEM; Singleend:
prohibit single-end alignments.'
streamable: no
id: '#quantTranscriptomeBan'
inputBinding:
position: 0
prefix: --quantTranscriptomeBan
separate: yes
sbg:cmdInclude: yes
sbg:category: Quantification of Annotations
sbg:toolDefaultValue: IndelSoftclipSingleend
required: no
- type:
- 'null'
- name: quantMode
symbols:
- TranscriptomeSAM
- GeneCounts
type: enum
label: Quantification mode
description: Types of quantification requested. 'TranscriptomeSAM' option outputs
SAM/BAM alignments to transcriptome into a separate file. With 'GeneCounts'
option, STAR will count number of reads per gene while mapping.
streamable: no
id: '#quantMode'
sbg:category: Quantification of Annotations
sbg:toolDefaultValue: '-'
required: no
- type:
- 'null'
- name: outSortingType
symbols:
- Unsorted
- SortedByCoordinate
- Unsorted SortedByCoordinate
type: enum
label: Output sorting type
description: Type of output sorting.
streamable: no
id: '#outSortingType'
sbg:category: Output
sbg:toolDefaultValue: SortedByCoordinate
required: no
- type:
- 'null'
- name: outSJfilterReads
symbols:
- All
- Unique
type: enum
label: Collapsed junctions reads
description: 'Which reads to consider for collapsed splice junctions output.
All: all reads, unique- and multi-mappers; Unique: uniquely mapping reads
only.'
streamable: no
id: '#outSJfilterReads'
inputBinding:
position: 0
prefix: --outSJfilterReads
separate: yes
sbg:cmdInclude: yes
sbg:category: 'Output filtering: splice junctions'
sbg:toolDefaultValue: All
required: no
- type:
- 'null'
- items: int
type: array
label: Min overhang SJ
description: Minimum overhang length for splice junctions on both sides for
each of the motifs. To set no output for desired motif, assign -1 to the corresponding
field. Does not apply to annotated junctions.
streamable: no
id: '#outSJfilterOverhangMin'
inputBinding:
position: 0
prefix: --outSJfilterOverhangMin
separate: yes
itemSeparator: ' '
sbg:cmdInclude: yes
sbg:category: 'Output filtering: splice junctions'
sbg:toolDefaultValue: 30 12 12 12
required: no
- type:
- 'null'
- items: int
type: array
label: Max gap allowed
description: 'Maximum gap allowed for junctions supported by 1,2,3...N reads
(int >= 0) i.e. by default junctions supported by 1 read can have gaps <=50000b,
by 2 reads: <=100000b, by 3 reads: <=200000. By 4 or more reads: any gap <=alignIntronMax.
Does not apply to annotated junctions.'
streamable: no
id: '#outSJfilterIntronMaxVsReadN'
inputBinding:
position: 0
prefix: --outSJfilterIntronMaxVsReadN
separate: yes
itemSeparator: ' '
sbg:cmdInclude: yes
sbg:category: 'Output filtering: splice junctions'
sbg:toolDefaultValue: 50000 100000 200000
required: no
- type:
- 'null'
- items: int
type: array
label: Min distance to other donor/acceptor
description: Minimum allowed distance to other junctions' donor/acceptor for
each of the motifs (int >= 0). Does not apply to annotated junctions.
streamable: no
id: '#outSJfilterDistToOtherSJmin'
inputBinding:
position: 0
prefix: --outSJfilterDistToOtherSJmin
separate: yes
itemSeparator: ' '
sbg:cmdInclude: yes
sbg:category: 'Output filtering: splice junctions'
sbg:toolDefaultValue: 10 0 5 10
required: no
- type:
- 'null'
- items: int
type: array
label: Min unique count
description: Minimum uniquely mapping read count per junction for each of the
motifs. To set no output for desired motif, assign -1 to the corresponding
field. Junctions are output if one of --outSJfilterCountUniqueMin OR --outSJfilterCountTotalMin
conditions are satisfied. Does not apply to annotated junctions.
streamable: no
id: '#outSJfilterCountUniqueMin'
inputBinding:
position: 0
prefix: --outSJfilterCountUniqueMin
separate: yes
itemSeparator: ' '
sbg:cmdInclude: yes
sbg:category: 'Output filtering: splice junctions'
sbg:toolDefaultValue: 3 1 1 1
required: no
- type:
- 'null'
- items: int
type: array
label: Min total count
description: Minimum total (multi-mapping+unique) read count per junction for
each of the motifs. To set no output for desired motif, assign -1 to the corresponding
field. Junctions are output if one of --outSJfilterCountUniqueMin OR --outSJfilterCountTotalMin
conditions are satisfied. Does not apply to annotated junctions.
streamable: no
id: '#outSJfilterCountTotalMin'
inputBinding:
position: 0
prefix: --outSJfilterCountTotalMin
separate: yes
itemSeparator: ' '
sbg:cmdInclude: yes
sbg:category: 'Output filtering: splice junctions'
sbg:toolDefaultValue: 3 1 1 1
required: no
- type:
- 'null'
- name: outSAMunmapped
symbols:
- None
- Within
type: enum
label: Write unmapped in SAM
description: 'Output of unmapped reads in the SAM format. None: no output Within:
output unmapped reads within the main SAM file (i.e. Aligned.out.sam).'
streamable: no
id: '#outSAMunmapped'
inputBinding:
position: 0
prefix: --outSAMunmapped
separate: yes
sbg:cmdInclude: yes
sbg:category: Output
sbg:toolDefaultValue: None
required: no
- type:
- 'null'
- name: outSAMtype
symbols:
- SAM
- BAM
type: enum
label: Output format
description: Format of output alignments.
streamable: no
id: '#outSAMtype'
inputBinding:
position: 0
separate: yes
valueFrom:
engine: '#cwl-js-engine'
script: |-
{
SAM_type = $job.inputs.outSAMtype
SORT_type = $job.inputs.outSortingType
if (SAM_type && SORT_type) {
return "--outSAMtype ".concat(SAM_type, " ", SORT_type)
}
}
class: Expression
sbg:cmdInclude: yes
sbg:category: Output
sbg:toolDefaultValue: SAM
required: no
- type:
- 'null'
- name: outSAMstrandField
symbols:
- None
- intronMotif
type: enum
label: Strand field flag
description: 'Cufflinks-like strand field flag. None: not used; intronMotif:
strand derived from the intron motif. Reads with inconsistent and/or non-canonical
introns are filtered out.'
streamable: no
id: '#outSAMstrandField'
inputBinding:
position: 0
prefix: --outSAMstrandField
separate: yes
sbg:cmdInclude: yes
sbg:category: Output
sbg:toolDefaultValue: None
required: no
- type:
- 'null'
- name: outSAMreadID
symbols:
- Standard
- Number
type: enum
label: Read ID
description: 'Read ID record type. Standard: first word (until space) from the
FASTx read ID line, removing /1,/2 from the end; Number: read number (index)
in the FASTx file.'
streamable: no
id: '#outSAMreadID'
inputBinding:
position: 0
prefix: --outSAMreadID
separate: yes
sbg:cmdInclude: yes
sbg:category: Output
sbg:toolDefaultValue: Standard
required: no
- type:
- 'null'
- name: outSAMprimaryFlag
symbols:
- OneBestScore
- AllBestScore
type: enum
label: Primary alignments
description: 'Which alignments are considered primary - all others will be marked
with 0x100 bit in the FLAG. OneBestScore: only one alignment with the best
score is primary; AllBestScore: all alignments with the best score are primary.'
streamable: no
id: '#outSAMprimaryFlag'
inputBinding:
position: 0
prefix: --outSAMprimaryFlag
separate: yes
sbg:cmdInclude: yes
sbg:category: Output
sbg:toolDefaultValue: OneBestScore
required: no
- type:
- 'null'
- name: outSAMorder
symbols:
- Paired
- PairedKeepInputOrder
type: enum
label: Sorting in SAM
description: 'Type of sorting for the SAM output. Paired: one mate after the
other for all paired alignments; PairedKeepInputOrder: one mate after the
other for all paired alignments, the order is kept the same as in the input
FASTQ files.'
streamable: no
id: '#outSAMorder'
inputBinding:
position: 0
prefix: --outSAMorder
separate: yes
sbg:cmdInclude: yes
sbg:category: Output
sbg:toolDefaultValue: Paired
required: no
- type:
- 'null'
- name: outSAMmode
symbols:
- Full
- NoQS
type: enum
label: SAM mode
description: 'Mode of SAM output. Full: full SAM output; NoQS: full SAM but
without quality scores.'
streamable: no
id: '#outSAMmode'
inputBinding:
position: 0
prefix: --outSAMmode
separate: yes
sbg:cmdInclude: yes
sbg:category: Output
sbg:toolDefaultValue: Full
required: no
- type:
- 'null'
- int
label: MAPQ value
description: MAPQ value for unique mappers (0 to 255).
streamable: no
id: '#outSAMmapqUnique'
inputBinding:
position: 0
prefix: --outSAMmapqUnique
separate: yes
sbg:cmdInclude: yes
sbg:category: Output
sbg:toolDefaultValue: '255'
required: no
- type:
- 'null'
- string
label: SAM header @PG
description: Extra @PG (software) line of the SAM header (in addition to STAR).
streamable: no
id: '#outSAMheaderPG'
inputBinding:
position: 0
prefix: --outSAMheaderPG
separate: yes
sbg:cmdInclude: yes
sbg:category: Output
sbg:toolDefaultValue: '-'
required: no
- type:
- 'null'
- string
label: SAM header @HD
description: '@HD (header) line of the SAM header.'
streamable: no
id: '#outSAMheaderHD'
inputBinding:
position: 0
prefix: --outSAMheaderHD
separate: yes
sbg:cmdInclude: yes
sbg:category: Output
sbg:toolDefaultValue: '-'
required: no
- type:
- 'null'
- int
label: OR SAM flag
description: Set specific bits of the SAM FLAG.
streamable: no
id: '#outSAMflagOR'
inputBinding:
position: 0
prefix: --outSAMflagOR
separate: yes
sbg:cmdInclude: yes
sbg:category: Output
sbg:toolDefaultValue: '0'
required: no
- type:
- 'null'
- int
label: AND SAM flag
description: Set specific bits of the SAM FLAG.
streamable: no
id: '#outSAMflagAND'
inputBinding:
position: 0
prefix: --outSAMflagAND
separate: yes
sbg:cmdInclude: yes
sbg:category: Output
sbg:toolDefaultValue: '65535'
required: no
- type:
- 'null'
- name: outSAMattributes
symbols:
- Standard
- NH
- All
- None
type: enum
label: SAM attributes
description: 'Desired SAM attributes, in the order desired for the output SAM.
NH: any combination in any order; Standard: NH HI AS nM; All: NH HI AS nM
NM MD jM jI; None: no attributes.'
streamable: no
id: '#outSAMattributes'
inputBinding:
position: 0
prefix: --outSAMattributes
separate: yes
sbg:cmdInclude: yes
sbg:category: Output
sbg:toolDefaultValue: Standard
required: no
- type:
- 'null'
- name: outReadsUnmapped
symbols:
- None
- Fastx
type: enum
label: Output unmapped reads
description: 'Output of unmapped reads (besides SAM). None: no output; Fastx:
output in separate fasta/fastq files, Unmapped.out.mate1/2.'
streamable: no
id: '#outReadsUnmapped'
inputBinding:
position: 0
prefix: --outReadsUnmapped
separate: yes
sbg:cmdInclude: yes
sbg:category: Output
sbg:toolDefaultValue: None
required: no
- type:
- 'null'
- int
label: Quality conversion
description: Add this number to the quality score (e.g. to convert from Illumina
to Sanger, use -31).
streamable: no
id: '#outQSconversionAdd'
inputBinding:
position: 0
prefix: --outQSconversionAdd
separate: yes
sbg:cmdInclude: yes
sbg:category: Output
sbg:toolDefaultValue: '0'
required: no
- type:
- 'null'
- name: outFilterType
symbols:
- Normal
- BySJout
type: enum
label: Filtering type
description: 'Type of filtering. Normal: standard filtering using only current
alignment; BySJout: keep only those reads that contain junctions that passed
filtering into SJ.out.tab.'
streamable: no
id: '#outFilterType'
inputBinding:
position: 0
prefix: --outFilterType
separate: yes
sbg:cmdInclude: yes
sbg:category: Output filtering
sbg:toolDefaultValue: Normal
required: no
- type:
- 'null'
- float
label: Min score normalized
description: '''Minimum score'' normalized to read length (sum of mates'' lengths
for paired-end reads).'
streamable: no
id: '#outFilterScoreMinOverLread'
inputBinding:
position: 0
prefix: --outFilterScoreMinOverLread
separate: yes
sbg:cmdInclude: yes
sbg:category: Output filtering
sbg:toolDefaultValue: '0.66'
required: no
- type:
- 'null'
- int
label: Min score
description: Alignment will be output only if its score is higher than this
value.
streamable: no
id: '#outFilterScoreMin'
inputBinding:
position: 0
prefix: --outFilterScoreMin
separate: yes
sbg:cmdInclude: yes
sbg:category: Output filtering
sbg:toolDefaultValue: '0'
required: no
- type:
- 'null'
- int
label: Multimapping score range
description: The score range below the maximum score for multimapping alignments.
streamable: no
id: '#outFilterMultimapScoreRange'
inputBinding:
position: 0
prefix: --outFilterMultimapScoreRange
separate: yes
sbg:cmdInclude: yes
sbg:category: Output filtering
sbg:toolDefaultValue: '1'
required: no
- type:
- 'null'
- int
label: Max number of mappings
description: Read alignments will be output only if the read maps fewer than
this value, otherwise no alignments will be output.
streamable: no
id: '#outFilterMultimapNmax'
inputBinding:
position: 0
prefix: --outFilterMultimapNmax
separate: yes
sbg:cmdInclude: yes
sbg:category: Output filtering
sbg:toolDefaultValue: '10'
required: no
- type:
- 'null'
- float
label: Mismatches to *read* length
description: Alignment will be output only if its ratio of mismatches to *read*
length is less than this value.
streamable: no
id: '#outFilterMismatchNoverReadLmax'
inputBinding:
position: 0
prefix: --outFilterMismatchNoverReadLmax
separate: yes
sbg:cmdInclude: yes
sbg:category: Output filtering
sbg:toolDefaultValue: '1'
required: no
- type:
- 'null'
- float
label: Mismatches to *mapped* length
description: Alignment will be output only if its ratio of mismatches to *mapped*
length is less than this value.
streamable: no
id: '#outFilterMismatchNoverLmax'
inputBinding:
position: 0
prefix: --outFilterMismatchNoverLmax
separate: yes
sbg:cmdInclude: yes
sbg:category: Output filtering
sbg:toolDefaultValue: '0.3'
required: no
- type:
- 'null'
- int
label: Max number of mismatches
description: Alignment will be output only if it has fewer mismatches than this
value.
streamable: no
id: '#outFilterMismatchNmax'
inputBinding:
position: 0
prefix: --outFilterMismatchNmax
separate: yes
sbg:cmdInclude: yes
sbg:category: Output filtering
sbg:toolDefaultValue: '10'
required: no
- type:
- 'null'
- float
label: Min matched bases normalized
description: '''Minimum matched bases'' normalized to read length (sum of mates
lengths for paired-end reads).'
streamable: no
id: '#outFilterMatchNminOverLread'
inputBinding:
position: 0
prefix: --outFilterMatchNminOverLread
separate: yes
sbg:cmdInclude: yes
sbg:category: Output filtering
sbg:toolDefaultValue: '0.66'
required: no
- type:
- 'null'
- int
label: Min matched bases
description: Alignment will be output only if the number of matched bases is
higher than this value.
streamable: no
id: '#outFilterMatchNmin'
inputBinding:
position: 0
prefix: --outFilterMatchNmin
separate: yes
sbg:cmdInclude: yes
sbg:category: Output filtering
sbg:toolDefaultValue: '0'
required: no
- type:
- 'null'
- name: outFilterIntronMotifs
symbols:
- None
- RemoveNoncanonical
- RemoveNoncanonicalUnannotated
type: enum
label: Motifs filtering
description: 'Filter alignment using their motifs. None: no filtering; RemoveNoncanonical:
filter out alignments that contain non-canonical junctions; RemoveNoncanonicalUnannotated:
filter out alignments that contain non-canonical unannotated junctions when
using annotated splice junctions database. The annotated non-canonical junctions
will be kept.'
streamable: no
id: '#outFilterIntronMotifs'
inputBinding:
position: 0
prefix: --outFilterIntronMotifs
separate: yes
sbg:cmdInclude: yes
sbg:category: Output filtering
sbg:toolDefaultValue: None
required: no
- type:
- 'null'
- int
label: Max insert junctions
description: Maximum number of junction to be inserted to the genome on the
fly at the mapping stage, including those from annotations and those detected
in the 1st step of the 2-pass run.
streamable: no
id: '#limitSjdbInsertNsj'
inputBinding:
position: 0
prefix: --limitSjdbInsertNsj
separate: yes
sbg:cmdInclude: yes
sbg:category: Limits
sbg:toolDefaultValue: '1000000'
required: no
- type:
- 'null'
- int
label: Junctions max number
description: Max number of junctions for one read (including all multi-mappers).
streamable: no
id: '#limitOutSJoneRead'
inputBinding:
position: 0
prefix: --limitOutSJoneRead
separate: yes
sbg:cmdInclude: yes
sbg:category: Limits
sbg:toolDefaultValue: '1000'
required: no
- type:
- 'null'
- int
label: Collapsed junctions max number
description: Max number of collapsed junctions.
streamable: no
id: '#limitOutSJcollapsed'
inputBinding:
position: 0
prefix: --limitOutSJcollapsed
separate: yes
sbg:cmdInclude: yes
sbg:category: Limits
sbg:toolDefaultValue: '1000000'
required: no
- type:
- 'null'
- int
label: Limit BAM sorting memory
description: Maximum available RAM for sorting BAM. If set to 0, it will be
set to the genome index size.
streamable: no
id: '#limitBAMsortRAM'
inputBinding:
position: 0
prefix: --limitBAMsortRAM
separate: yes
sbg:cmdInclude: yes
sbg:category: Limits
sbg:toolDefaultValue: '0'
required: no
- type:
- 'null'
- string
label: Genome dir name
description: Name of the directory which contains genome files (when genome.tar
is uncompressed).
streamable: no
id: '#genomeDirName'
inputBinding:
position: 0
prefix: --genomeDir
separate: yes
valueFrom:
engine: '#cwl-js-engine'
script: $job.inputs.genomeDirName || "genomeDir"
class: Expression
sbg:cmdInclude: yes
sbg:category: Basic
sbg:toolDefaultValue: genomeDir
required: no
- type:
- File
label: Genome files
description: Genome files created using STAR Genome Generate.
streamable: no
id: '#genome'
sbg:category: Basic
sbg:fileTypes: TAR
required: yes
- type:
- 'null'
- items: int
type: array
label: Clip 5p bases
description: Number of bases to clip from 5p of each mate. In case only one
value is given, it will be assumed the same for both mates.
streamable: no
id: '#clip5pNbases'
inputBinding:
position: 0
prefix: --clip5pNbases
separate: yes
itemSeparator: ' '
sbg:cmdInclude: yes
sbg:category: Read parameters
sbg:toolDefaultValue: '0'
required: no
- type:
- 'null'
- items: int
type: array
label: Clip 3p bases
description: Number of bases to clip from 3p of each mate. In case only one
value is given, it will be assumed the same for both mates.
streamable: no
id: '#clip3pNbases'
inputBinding:
position: 0
prefix: --clip3pNbases
separate: yes
itemSeparator: ' '
sbg:cmdInclude: yes
sbg:category: Read parameters
sbg:toolDefaultValue: '0'
required: no
- type:
- 'null'
- items: int
type: array
label: Clip 3p after adapter seq.
description: Number of bases to clip from 3p of each mate after the adapter
clipping. In case only one value is given, it will be assumed the same for
both mates.
streamable: no
id: '#clip3pAfterAdapterNbases'
inputBinding:
position: 0
prefix: --clip3pAfterAdapterNbases
separate: yes
itemSeparator: ' '
sbg:cmdInclude: yes
sbg:category: Read parameters
sbg:toolDefaultValue: '0'
required: no
- type:
- 'null'
- items: string
type: array
label: Clip 3p adapter sequence
description: Adapter sequence to clip from 3p of each mate. In case only one
value is given, it will be assumed the same for both mates.
streamable: no
id: '#clip3pAdapterSeq'
inputBinding:
position: 0
prefix: --clip3pAdapterSeq
separate: yes
itemSeparator: ' '
sbg:cmdInclude: yes
sbg:category: Read parameters
sbg:toolDefaultValue: '-'
required: no
- type:
- 'null'
- items: float
type: array
label: Max mismatches proportions
description: Max proportion of mismatches for 3p adapter clipping for each mate.
In case only one value is given, it will be assumed the same for both mates.
streamable: no
id: '#clip3pAdapterMMp'
inputBinding:
position: 0
prefix: --clip3pAdapterMMp
separate: yes
itemSeparator: ' '
sbg:cmdInclude: yes
sbg:category: Read parameters
sbg:toolDefaultValue: '0.1'
required: no
- type:
- 'null'
- int
label: Min segment length
description: Minimum length of chimeric segment length, if =0, no chimeric output
(int>=0).
streamable: no
id: '#chimSegmentMin'
inputBinding:
position: 0
prefix: --chimSegmentMin
separate: yes
sbg:cmdInclude: yes
sbg:category: Chimeric Alignments
sbg:toolDefaultValue: '15'
required: no
- type:
- 'null'
- int
label: Min separation score
description: Minimum difference (separation) between the best chimeric score
and the next one (int>=0).
streamable: no
id: '#chimScoreSeparation'
inputBinding:
position: 0
prefix: --chimScoreSeparation
separate: yes
sbg:cmdInclude: yes
sbg:category: Chimeric Alignments
sbg:toolDefaultValue: '10'
required: no
- type:
- 'null'
- int
label: Min total score
description: Minimum total (summed) score of the chimeric segments (int>=0).
streamable: no
id: '#chimScoreMin'
inputBinding:
position: 0
prefix: --chimScoreMin
separate: yes
sbg:cmdInclude: yes
sbg:category: Chimeric Alignments
sbg:toolDefaultValue: '0'
required: no
- type:
- 'null'
- int
label: Non-GT/AG penalty
description: Penalty for a non-GT/AG chimeric junction.
streamable: no
id: '#chimScoreJunctionNonGTAG'
inputBinding:
position: 0
prefix: --chimScoreJunctionNonGTAG
separate: yes
sbg:cmdInclude: yes
sbg:category: Chimeric Alignments
sbg:toolDefaultValue: '-1'
required: no
- type:
- 'null'
- int
label: Max drop score
description: Max drop (difference) of chimeric score (the sum of scores of all
chimeric segements) from the read length (int>=0).
streamable: no
id: '#chimScoreDropMax'
inputBinding:
position: 0
prefix: --chimScoreDropMax
separate: yes
sbg:cmdInclude: yes
sbg:category: Chimeric Alignments
sbg:toolDefaultValue: '20'
required: no
- type:
- 'null'
- name: chimOutType
symbols:
- SeparateSAMold
- Within
type: enum
label: Chimeric output type
description: 'Type of chimeric output. SeparateSAMold: output old SAM into separate
Chimeric.out.sam file; Within: output into main aligned SAM/BAM files.'
streamable: no
id: '#chimOutType'
sbg:category: Chimeric Alignments
sbg:toolDefaultValue: SeparateSAMold
required: no
- type:
- 'null'
- int
label: Min junction overhang
description: Minimum overhang for a chimeric junction (int>=0).
streamable: no
id: '#chimJunctionOverhangMin'
inputBinding:
position: 0
prefix: --chimJunctionOverhangMin
separate: yes
sbg:cmdInclude: yes
sbg:category: Chimeric Alignments
sbg:toolDefaultValue: '20'
required: no
- type:
- 'null'
- float
label: Max windows per read
description: Max number of windows per read (int>0).
streamable: no
id: '#alignWindowsPerReadNmax'
inputBinding:
position: 0
prefix: --alignWindowsPerReadNmax
separate: yes
sbg:cmdInclude: yes
sbg:category: Alignments and Seeding
sbg:toolDefaultValue: '10000'
required: no
- type:
- 'null'
- int
label: Max transcripts per window
description: Max number of transcripts per window (int>0).
streamable: no
id: '#alignTranscriptsPerWindowNmax'
inputBinding:
position: 0
prefix: --alignTranscriptsPerWindowNmax
separate: yes
sbg:cmdInclude: yes
sbg:category: Alignments and Seeding
sbg:toolDefaultValue: '100'
required: no
- type:
- 'null'
- int
label: Max transcripts per read
description: Max number of different alignments per read to consider (int>0).
streamable: no
id: '#alignTranscriptsPerReadNmax'
inputBinding:
position: 0
prefix: --alignTranscriptsPerReadNmax
separate: yes
sbg:cmdInclude: yes
sbg:category: Alignments and Seeding
sbg:toolDefaultValue: '10000'
required: no
- type:
- 'null'
- float
label: Min mapped length normalized
description: alignSplicedMateMapLmin normalized to mate length (float>0).
streamable: no
id: '#alignSplicedMateMapLminOverLmate'
inputBinding:
position: 0
prefix: --alignSplicedMateMapLminOverLmate
separate: yes
sbg:cmdInclude: yes
sbg:category: Alignments and Seeding
sbg:toolDefaultValue: '0.66'
required: no
- type:
- 'null'
- int
label: Min mapped length
description: Minimum mapped length for a read mate that is spliced (int>0).
streamable: no
id: '#alignSplicedMateMapLmin'
inputBinding:
position: 0
prefix: --alignSplicedMateMapLmin
separate: yes
sbg:cmdInclude: yes
sbg:category: Alignments and Seeding
sbg:toolDefaultValue: '0'
required: no
- type:
- 'null'
- name: alignSoftClipAtReferenceEnds
symbols:
- 'Yes'
- 'No'
type: enum
label: Soft clipping
description: 'Option which allows soft clipping of alignments at the reference
(chromosome) ends. Can be disabled for compatibility with Cufflinks/Cuffmerge.
Yes: Enables soft clipping; No: Disables soft clipping.'
streamable: no
id: '#alignSoftClipAtReferenceEnds'
inputBinding:
position: 0
prefix: --alignSoftClipAtReferenceEnds
separate: yes
sbg:cmdInclude: yes
sbg:category: Alignments and Seeding
sbg:toolDefaultValue: 'Yes'
required: no
- type:
- 'null'
- int
label: Min overhang
description: Minimum overhang (i.e. block size) for spliced alignments (int>0).
streamable: no
id: '#alignSJoverhangMin'
inputBinding:
position: 0
prefix: --alignSJoverhangMin
separate: yes
sbg:cmdInclude: yes
sbg:category: Alignments and Seeding
sbg:toolDefaultValue: '5'
required: no
- type:
- 'null'
- int
label: 'Min overhang: annotated'
description: Minimum overhang (i.e. block size) for annotated (sjdb) spliced
alignments (int>0).
streamable: no
id: '#alignSJDBoverhangMin'
inputBinding:
position: 0
prefix: --alignSJDBoverhangMin
separate: yes
sbg:cmdInclude: yes
sbg:category: Alignments and Seeding
sbg:toolDefaultValue: '3'
required: no
- type:
- 'null'
- int
label: Max mates gap
description: Maximum gap between two mates, if 0, max intron gap will be determined
by (2^winBinNbits)*winAnchorDistNbins.
streamable: no
id: '#alignMatesGapMax'
inputBinding:
position: 0
prefix: --alignMatesGapMax
separate: yes
sbg:cmdInclude: yes
sbg:category: Alignments and Seeding
sbg:toolDefaultValue: '0'
required: no
- type:
- 'null'
- int
label: Min intron size
description: 'Minimum intron size: genomic gap is considered intron if its length
>= alignIntronMin, otherwise it is considered Deletion (int>=0).'
streamable: no
id: '#alignIntronMin'
inputBinding:
position: 0
prefix: --alignIntronMin
separate: yes
sbg:cmdInclude: yes
sbg:category: Alignments and Seeding
sbg:toolDefaultValue: '21'
required: no
- type:
- 'null'
- int
label: Max intron size
description: Maximum intron size, if 0, max intron size will be determined by
(2^winBinNbits)*winAnchorDistNbins.
streamable: no
id: '#alignIntronMax'
inputBinding:
position: 0
prefix: --alignIntronMax
separate: yes
sbg:cmdInclude: yes
sbg:category: Alignments and Seeding
sbg:toolDefaultValue: '0'
required: no
- type:
- 'null'
- name: alignEndsType
symbols:
- Local
- EndToEnd
type: enum
label: Alignment type
description: 'Type of read ends alignment. Local: standard local alignment with
soft-clipping allowed. EndToEnd: force end to end read alignment, do not soft-clip.'
streamable: no
id: '#alignEndsType'
inputBinding:
position: 0
prefix: --alignEndsType
separate: yes
sbg:cmdInclude: yes
sbg:category: Alignments and Seeding
sbg:toolDefaultValue: Local
required: no
outputs:
- type:
- 'null'
- items: File
type: array
label: Unmapped reads
description: Output of unmapped reads.
streamable: no
id: '#unmapped_reads'
outputBinding:
glob: '*Unmapped.out*'
sbg:fileTypes: FASTQ
- type:
- 'null'
- File
label: Transcriptome alignments
description: Alignments translated into transcript coordinates.
streamable: no
id: '#transcriptome_aligned_reads'
outputBinding:
glob: '*Transcriptome*'
sbg:fileTypes: BAM
- type:
- 'null'
- File
label: Splice junctions
description: High confidence collapsed splice junctions in tab-delimited format.
Only junctions supported by uniquely mapping reads are reported.
streamable: no
id: '#splice_junctions'
outputBinding:
glob: '*SJ.out.tab'
sbg:fileTypes: TAB
- type:
- 'null'
- File
label: Reads per gene
description: File with number of reads per gene. A read is counted if it overlaps
(1nt or more) one and only one gene.
streamable: no
id: '#reads_per_gene'
outputBinding:
glob: '*ReadsPerGene*'
sbg:fileTypes: TAB
- type:
- 'null'
- items: File
type: array
label: Log files
description: Log files produced during alignment.
streamable: no
id: '#log_files'
outputBinding:
glob: '*Log*.out'
sbg:fileTypes: OUT
- type:
- 'null'
- File
label: Intermediate genome files
description: Archive with genome files produced when annotations are included
on the fly (in the mapping step).
streamable: no
id: '#intermediate_genome'
outputBinding:
glob: '*_STARgenome.tar'
sbg:fileTypes: TAR
- type:
- 'null'
- File
label: Chimeric junctions
description: If chimSegmentMin in 'Chimeric Alignments' section is set to 0,
'Chimeric Junctions' won't be output.
streamable: no
id: '#chimeric_junctions'
outputBinding:
glob: '*Chimeric.out.junction'
sbg:fileTypes: JUNCTION
- type:
- 'null'
- File
label: Chimeric alignments
description: Aligned Chimeric sequences SAM - if chimSegmentMin = 0, no Chimeric
Alignment SAM and Chimeric Junctions outputs.
streamable: no
id: '#chimeric_alignments'
outputBinding:
glob: '*.Chimeric.out.sam'
sbg:fileTypes: SAM
- type:
- 'null'
- File
label: Aligned SAM/BAM
description: Aligned sequence in SAM/BAM format.
streamable: no
id: '#aligned_reads'
outputBinding:
glob:
engine: '#cwl-js-engine'
script: |-
{
if ($job.inputs.outSortingType == 'SortedByCoordinate') {
sort_name = '.sortedByCoord'
}
else {
sort_name = ''
}
if ($job.inputs.outSAMtype == 'BAM') {
sam_name = "*.Aligned".concat( sort_name, '.out.bam')
}
else {
sam_name = "*.Aligned.out.sam"
}
return sam_name
}
class: Expression
sbg:fileTypes: SAM, BAM
requirements:
- class: ExpressionEngineRequirement
id: '#cwl-js-engine'
requirements:
- class: DockerRequirement
dockerPull: rabix/js-engine
hints:
- class: DockerRequirement
dockerPull: images.sbgenomics.com/ana_d/star:2.4.2a
dockerImageId: a4b0ad2c3cae
- class: sbg:MemRequirement
value: 60000
- class: sbg:CPURequirement
value: 15
label: STAR
description: STAR is an ultrafast universal RNA-seq aligner. It has very high
mapping speed, accurate alignment of contiguous and spliced reads, detection
of polyA-tails, non-canonical splices and chimeric (fusion) junctions. It works
with reads starting from lengths ~15 bases up to ~300 bases. In case of having
longer reads, use of STAR Long is recommended.
class: CommandLineTool
arguments:
- position: 0
separate: yes
valueFrom:
engine: '#cwl-js-engine'
script: |-
{
file = [].concat($job.inputs.reads)[0].path
extension = /(?:\.([^.]+))?$/.exec(file)[1]
if (extension == "gz") {
return "--readFilesCommand zcat"
} else if (extension == "bz2") {
return "--readFilesCommand bzcat"
}
}
class: Expression
- position: 0
separate: yes
valueFrom:
engine: '#cwl-js-engine'
script: "{\t\n var sjFormat = \"False\"\n var gtfgffFormat = \"False\"\n
\ var list = $job.inputs.sjdbGTFfile\n var paths_list = []\n var joined_paths
= \"\"\n \n if (list) {\n list.forEach(function(f){return paths_list.push(f.path)})\n
\ joined_paths = paths_list.join(\" \")\n\n\n paths_list.forEach(function(f){\n
\ ext = f.replace(/^.*\\./, '')\n if (ext == \"gff\" || ext ==
\"gtf\") {\n gtfgffFormat = \"True\"\n return gtfgffFormat\n
\ }\n if (ext == \"txt\") {\n sjFormat = \"True\"\n return
sjFormat\n }\n })\n\n if ($job.inputs.sjdbGTFfile && $job.inputs.sjdbInsertSave
!= \"None\") {\n if (sjFormat == \"True\") {\n return \"--sjdbFileChrStartEnd
\".concat(joined_paths)\n }\n else if (gtfgffFormat == \"True\")
{\n return \"--sjdbGTFfile \".concat(joined_paths)\n }\n }\n
\ }\n}"
class: Expression
- position: 0
separate: yes
valueFrom:
engine: '#cwl-js-engine'
script: "{\n a = b = c = d = e = f = g = []\n if ($job.inputs.sjdbGTFchrPrefix)
{\n a = [\"--sjdbGTFchrPrefix\", $job.inputs.sjdbGTFchrPrefix]\n }\n
\ if ($job.inputs.sjdbGTFfeatureExon) {\n b = [\"--sjdbGTFfeatureExon\",
$job.inputs.sjdbGTFfeatureExon]\n }\n if ($job.inputs.sjdbGTFtagExonParentTranscript)
{\n c = [\"--sjdbGTFtagExonParentTranscript\", $job.inputs.sjdbGTFtagExonParentTranscript]\n
\ }\n if ($job.inputs.sjdbGTFtagExonParentGene) {\n d = [\"--sjdbGTFtagExonParentGene\",
$job.inputs.sjdbGTFtagExonParentGene]\n }\n if ($job.inputs.sjdbOverhang)
{\n e = [\"--sjdbOverhang\", $job.inputs.sjdbOverhang]\n }\n if ($job.inputs.sjdbScore)
{\n f = [\"--sjdbScore\", $job.inputs.sjdbScore]\n }\n if ($job.inputs.sjdbInsertSave)
{\n g = [\"--sjdbInsertSave\", $job.inputs.sjdbInsertSave]\n }\n \n
\ \n \n if ($job.inputs.sjdbInsertSave != \"None\" && $job.inputs.sjdbGTFfile)
{\n new_list = a.concat(b, c, d, e, f, g)\n return new_list.join(\"
\")\n }\n}"
class: Expression
- position: 0
separate: yes
valueFrom:
engine: '#cwl-js-engine'
script: |-
{
if ($job.inputs.twopassMode == "Basic") {
return "--twopass1readsN ".concat($job.inputs.twopass1readsN)
}
}
class: Expression
- position: 0
separate: yes
valueFrom:
engine: '#cwl-js-engine'
script: |-
{
if ($job.inputs.chimOutType == "Within") {
return "--chimOutType ".concat("Within", $job.inputs.outSAMtype)
}
else {
return "--chimOutType SeparateSAMold"
}
}
class: Expression
- position: 0
separate: yes
valueFrom:
engine: '#cwl-js-engine'
script: "{\n var param_list = []\n \n function add_param(key, value){\n
\ if (value == \"\") {\n return\n }\n else {\n return
param_list.push(key.concat(\":\", value))\n }\n }\n \n add_param('ID',
\"1\")\n if ($job.inputs.rg_seq_center) {\n add_param('CN', $job.inputs.rg_seq_center)\n
\ } else if ([].concat($job.inputs.reads)[0].metadata.seq_center) {\n add_param('CN',
[].concat($job.inputs.reads)[0].metadata.seq_center)\n }\n if ($job.inputs.rg_library_id)
{\n add_param('LB', $job.inputs.rg_library_id)\n } else if ([].concat($job.inputs.reads)[0].metadata.library_id)
{\n add_param('LB', [].concat($job.inputs.reads)[0].metadata.library_id)\n
\ }\n if ($job.inputs.rg_mfl) {\n add_param('PI', $job.inputs.rg_mfl)\n
\ } else if ([].concat($job.inputs.reads)[0].metadata.median_fragment_length)
{\n add_param('PI', [].concat($job.inputs.reads)[0].metadata.median_fragment_length)\n
\ }\n if ($job.inputs.rg_platform) {\n add_param('PL', $job.inputs.rg_platform.replace(/
/g,\"_\"))\n } else if ([].concat($job.inputs.reads)[0].metadata.platform)
{\n add_param('PL', [].concat($job.inputs.reads)[0].metadata.platform.replace(/
/g,\"_\"))\n }\n if ($job.inputs.rg_platform_unit_id) {\n add_param('PU',
$job.inputs.rg_platform_unit_id)\n } else if ([].concat($job.inputs.reads)[0].metadata.platform_unit_id)
{\n add_param('PU', [].concat($job.inputs.reads)[0].metadata.platform_unit_id)\n
\ }\n if ($job.inputs.rg_sample_id) {\n add_param('SM', $job.inputs.rg_sample_id)\n
\ } else if ([].concat($job.inputs.reads)[0].metadata.sample_id) {\n add_param('SM',
[].concat($job.inputs.reads)[0].metadata.sample_id)\n }\n return \"--outSAMattrRGline
\".concat(param_list.join(\" \"))\n}"
class: Expression
- position: 0
separate: yes
valueFrom:
engine: '#cwl-js-engine'
script: |-
{
if ($job.inputs.sjdbGTFfile && $job.inputs.quantMode) {
return "--quantMode ".concat($job.inputs.quantMode)
}
}
class: Expression
- position: 100
separate: yes
valueFrom:
engine: '#cwl-js-engine'
script: "{\n function sharedStart(array){\n var A= array.concat().sort(),
\n a1= A[0], a2= A[A.length-1], L= a1.length, i= 0;\n while(i<L &&
a1.charAt(i)=== a2.charAt(i)) i++;\n return a1.substring(0, i);\n }\n
\ path_list = []\n arr = [].concat($job.inputs.reads)\n arr.forEach(function(f){return
path_list.push(f.path.replace(/\\\\/g,'/').replace( /.*\\//, '' ))})\n common_prefix
= sharedStart(path_list)\n intermediate = common_prefix.replace( /\\-$|\\_$|\\.$/,
'' ).concat(\"._STARgenome\")\n source = \"./\".concat(intermediate)\n
\ destination = intermediate.concat(\".tar\")\n if ($job.inputs.sjdbGTFfile
&& $job.inputs.sjdbInsertSave && $job.inputs.sjdbInsertSave != \"None\")
{\n return \"&& tar -vcf \".concat(destination, \" \", source)\n }\n}"
class: Expression
- position: 0
prefix: --outFileNamePrefix
separate: yes
valueFrom:
engine: '#cwl-js-engine'
script: "{\n function sharedStart(array){\n var A= array.concat().sort(),
\n a1= A[0], a2= A[A.length-1], L= a1.length, i= 0;\n while(i<L &&
a1.charAt(i)=== a2.charAt(i)) i++;\n return a1.substring(0, i);\n }\n
\ path_list = []\n arr = [].concat($job.inputs.reads)\n arr.forEach(function(f){return
path_list.push(f.path.replace(/\\\\/g,'/').replace( /.*\\//, '' ))})\n common_prefix
= sharedStart(path_list)\n return \"./\".concat(common_prefix.replace(
/\\-$|\\_$|\\.$/, '' ), \".\")\n}"
class: Expression
- position: 101
separate: yes
valueFrom:
engine: '#cwl-js-engine'
script: "{\n function sharedStart(array){\n var A= array.concat().sort(),
\n a1= A[0], a2= A[A.length-1], L= a1.length, i= 0;\n while(i<L &&
a1.charAt(i)=== a2.charAt(i)) i++;\n return a1.substring(0, i);\n }\n
\ path_list = []\n arr = [].concat($job.inputs.reads)\n arr.forEach(function(f){return
path_list.push(f.path.replace(/\\\\/g,'/').replace( /.*\\//, '' ))})\n common_prefix
= sharedStart(path_list)\n mate1 = common_prefix.replace( /\\-$|\\_$|\\.$/,
'' ).concat(\".Unmapped.out.mate1\")\n mate2 = common_prefix.replace( /\\-$|\\_$|\\.$/,
'' ).concat(\".Unmapped.out.mate2\")\n mate1fq = mate1.concat(\".fastq\")\n
\ mate2fq = mate2.concat(\".fastq\")\n if ($job.inputs.outReadsUnmapped
== \"Fastx\" && arr.length > 1) {\n return \"&& mv \".concat(mate1, \"
\", mate1fq, \" && mv \", mate2, \" \", mate2fq)\n }\n else if ($job.inputs.outReadsUnmapped
== \"Fastx\" && arr.length == 1) {\n return \"&& mv \".concat(mate1,
\" \", mate1fq)\n }\n}"
class: Expression
stdin: ''
stdout: ''
successCodes: []
temporaryFailCodes: []
x: 624.0
'y': 323
sbg:x: 700.0
sbg:y: 200.0
sbg:canvas_zoom: 0.6
sbg:canvas_y: -16
sbg:canvas_x: -41
sbg:batchInput: '#sjdbGTFfile'
sbg:batchBy:
type: item
Batch by other critieria such as metadta, following example, is using sample_id and library_id
f1 = system.file("extdata/app", "flow_star.json", package = "sevenbridges")
f1 = convert_app(f1)
f1$set_batch("sjdbGTFfile", c("metadata.sample_id", "metadata.library_id"))
criteria provided, convert type from ITEM to CRITERIA
sbg:validationErrors: []
sbg:sbgMaintained: no
sbg:latestRevision: 2
sbg:toolAuthor: Seven Bridges Genomics
sbg:createdOn: 1463601910
sbg:categories:
- Alignment
- RNA
sbg:contributors:
- tengfei
sbg:project: tengfei/quickstart
sbg:createdBy: tengfei
sbg:toolkitVersion: 2.4.2a
sbg:id: tengfei/quickstart/rna-seq-alignment-star-demo/2
sbg:license: Apache License 2.0
sbg:revision: 2
sbg:modifiedOn: 1463601974
sbg:modifiedBy: tengfei
sbg:revisionsInfo:
- sbg:modifiedBy: tengfei
sbg:modifiedOn: 1463601910
sbg:revision: 0
- sbg:modifiedBy: tengfei
sbg:modifiedOn: 1463601952
sbg:revision: 1
- sbg:modifiedBy: tengfei
sbg:modifiedOn: 1463601974
sbg:revision: 2
sbg:toolkit: STAR
id: '#tengfei/quickstart/rna-seq-alignment-star-demo/2'
inputs:
- type:
- 'null'
- items: File
type: array
label: sjdbGTFfile
streamable: no
id: '#sjdbGTFfile'
sbg:x: 160.4999759
sbg:y: 195.0833106
required: no
batchType: metadata.library_id
- type:
- items: File
type: array
label: fastq
streamable: no
id: '#fastq'
sbg:x: 164.2499914
sbg:y: 323.7499502
sbg:includeInPorts: yes
required: yes
- type:
- File
label: genomeFastaFiles
streamable: no
id: '#genomeFastaFiles'
sbg:x: 167.7499601
sbg:y: 469.9999106
required: yes
- type:
- 'null'
- string
label: Exons' parents name
description: Tag name to be used as exons’ transcript-parents.
streamable: no
id: '#sjdbGTFtagExonParentTranscript'
sbg:category: Splice junctions db parameters
sbg:x: 200.0
sbg:y: 350.0
sbg:toolDefaultValue: transcript_id
required: no
- type:
- 'null'
- string
label: Gene name
description: Tag name to be used as exons’ gene-parents.
streamable: no
id: '#sjdbGTFtagExonParentGene'
sbg:category: Splice junctions db parameters
sbg:x: 200.0
sbg:y: 400.0
sbg:toolDefaultValue: gene_id
required: no
- type:
- 'null'
- int
label: Max loci anchors
description: Max number of loci anchors are allowed to map to (int>0).
streamable: no
id: '#winAnchorMultimapNmax'
sbg:category: Windows, Anchors, Binning
sbg:x: 200.0
sbg:y: 450.0
sbg:toolDefaultValue: '50'
required: no
- type:
- 'null'
- int
label: Max bins between anchors
description: Max number of bins between two anchors that allows aggregation of anchors
into one window (int>0).
streamable: no
id: '#winAnchorDistNbins'
sbg:category: Windows, Anchors, Binning
sbg:x: 200.0
sbg:y: 500.0
sbg:toolDefaultValue: '9'
required: no
outputs:
- type:
- 'null'
- items: File
type: array
label: unmapped_reads
streamable: no
id: '#unmapped_reads'
source: '#STAR.unmapped_reads'
sbg:x: 766.2497863
sbg:y: 159.5833091
sbg:includeInPorts: yes
required: no
- type:
- 'null'
- File
label: transcriptome_aligned_reads
streamable: no
id: '#transcriptome_aligned_reads'
source: '#STAR.transcriptome_aligned_reads'
sbg:x: 1118.9998003
sbg:y: 86.5833216
sbg:includeInPorts: yes
required: no
- type:
- 'null'
- File
label: splice_junctions
streamable: no
id: '#splice_junctions'
source: '#STAR.splice_junctions'
sbg:x: 1282.3330177
sbg:y: 167.499976
sbg:includeInPorts: yes
required: no
- type:
- 'null'
- File
label: reads_per_gene
streamable: no
id: '#reads_per_gene'
source: '#STAR.reads_per_gene'
sbg:x: 1394.4163557
sbg:y: 245.749964
sbg:includeInPorts: yes
required: no
- type:
- 'null'
- items: File
type: array
label: log_files
streamable: no
id: '#log_files'
source: '#STAR.log_files'
sbg:x: 1505.0830269
sbg:y: 322.9999518
sbg:includeInPorts: yes
required: no
- type:
- 'null'
- File
label: chimeric_junctions
streamable: no
id: '#chimeric_junctions'
source: '#STAR.chimeric_junctions'
sbg:x: 1278.7498062
sbg:y: 446.7499567
sbg:includeInPorts: yes
required: no
- type:
- 'null'
- File
label: intermediate_genome
streamable: no
id: '#intermediate_genome'
source: '#STAR.intermediate_genome'
sbg:x: 1408.9164783
sbg:y: 386.0832876
sbg:includeInPorts: yes
required: no
- type:
- 'null'
- File
label: chimeric_alignments
streamable: no
id: '#chimeric_alignments'
source: '#STAR.chimeric_alignments'
sbg:x: 1147.5831348
sbg:y: 503.2499285
sbg:includeInPorts: yes
required: no
- type:
- 'null'
- File
label: sorted_bam
streamable: no
id: '#sorted_bam'
source: '#Picard_SortSam.sorted_bam'
sbg:x: 934.2498228
sbg:y: 557.2498436
sbg:includeInPorts: yes
required: no
- type:
- 'null'
- File
label: result
streamable: no
id: '#result'
source: '#SBG_FASTQ_Quality_Detector.result'
sbg:x: 1431.6666548
sbg:y: 644.9999898
sbg:includeInPorts: yes
required: no
requirements:
- class: CreateFileRequirement
fileDef: []
hints:
- class: sbg:AWSInstanceType
value: c3.8xlarge
label: RNA-seq Alignment - STAR
description: "Alignment to a reference genome and transcriptome presents the first
step of RNA-Seq analysis. This pipeline uses STAR, an ultrafast RNA-seq aligner
capable of mapping full length RNA sequences and detecting de novo canonical junctions,
non-canonical splices, and chimeric (fusion) transcripts. It is optimized for mammalian
sequence reads, but fine tuning of its parameters enables customization to satisfy
unique needs.\n\nSTAR accepts one file per sample (or two files for paired-end data).
\ \nSplice junction annotations can optionally be collected from splice junction
databases. Set the \"Overhang length\" parameter to a value larger than zero in
order to use splice junction databases. For constant read length, this value should
(ideally) be equal to mate length decreased by 1; for long reads with non-constant
length, this value should be 100 (pipeline default). \nFastQC Analysis on FASTQ
files reveals read length distribution. STAR can detect chimeric transcripts, but
parameter \"Min segment length\" in \"Chimeric Alignments\" category must be adjusted
to a desired minimum chimeric segment length. Aligned reads are reported in BAM
format and can be viewed in a genome browser (such as IGV). A file containing detected
splice junctions is also produced.\n\nUnmapped reads are reported in FASTQ format
and can be included in an output BAM file. The \"Output unmapped reads\" and \"Write
unmapped in SAM\" parameters enable unmapped output type selection."
class: Workflow
steps:
- id: '#STAR_Genome_Generate'
inputs:
- id: '#STAR_Genome_Generate.sjdbScore'
- id: '#STAR_Genome_Generate.sjdbOverhang'
- id: '#STAR_Genome_Generate.sjdbGTFtagExonParentTranscript'
source: '#sjdbGTFtagExonParentTranscript'
- id: '#STAR_Genome_Generate.sjdbGTFtagExonParentGene'
source: '#sjdbGTFtagExonParentGene'
- id: '#STAR_Genome_Generate.sjdbGTFfile'
source: '#sjdbGTFfile'
- id: '#STAR_Genome_Generate.sjdbGTFfeatureExon'
- id: '#STAR_Genome_Generate.sjdbGTFchrPrefix'
- id: '#STAR_Genome_Generate.genomeSAsparseD'
- id: '#STAR_Genome_Generate.genomeSAindexNbases'
- id: '#STAR_Genome_Generate.genomeFastaFiles'
source: '#genomeFastaFiles'
- id: '#STAR_Genome_Generate.genomeChrBinNbits'
outputs:
- id: '#STAR_Genome_Generate.genome'
hints: []
run:
sbg:validationErrors: []
sbg:sbgMaintained: no
sbg:latestRevision: 1
sbg:job:
allocatedResources:
mem: 60000
cpu: 15
inputs:
sjdbScore: 0
sjdbGTFfeatureExon: sjdbGTFfeatureExon
sjdbOverhang: 0
sjdbGTFtagExonParentTranscript: sjdbGTFtagExonParentTranscript
genomeChrBinNbits: genomeChrBinNbits
genomeSAsparseD: 0
sjdbGTFfile:
- size: 0
secondaryFiles: []
class: File
path: /demo/test-files/chr20.gtf
sjdbGTFtagExonParentGene: sjdbGTFtagExonParentGene
genomeFastaFiles:
size: 0
secondaryFiles: []
class: File
path: /sbgenomics/test-data/chr20.fa
sjdbGTFchrPrefix: sjdbGTFchrPrefix
genomeSAindexNbases: 0
sbg:toolAuthor: Alexander Dobin/CSHL
sbg:createdOn: 1450911469
sbg:categories:
- Alignment
sbg:contributors:
- bix-demo
sbg:links:
- id: https://github.com/alexdobin/STAR
label: Homepage
- id: https://github.com/alexdobin/STAR/releases
label: Releases
- id: https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf
label: Manual
- id: https://groups.google.com/forum/#!forum/rna-star
label: Support
- id: http://www.ncbi.nlm.nih.gov/pubmed/23104886
label: Publication
sbg:project: bix-demo/star-2-4-2a-demo
sbg:createdBy: bix-demo
sbg:toolkitVersion: 2.4.2a
sbg:id: sevenbridges/public-apps/star-genome-generate/1
sbg:license: GNU General Public License v3.0 only
sbg:revision: 1
sbg:cmdPreview: mkdir genomeDir && /opt/STAR --runMode genomeGenerate --genomeDir
./genomeDir --runThreadN 15 --genomeFastaFiles /sbgenomics/test-data/chr20.fa
--genomeChrBinNbits genomeChrBinNbits --genomeSAindexNbases 0 --genomeSAsparseD
0 --sjdbGTFfeatureExon sjdbGTFfeatureExon --sjdbGTFtagExonParentTranscript sjdbGTFtagExonParentTranscript
--sjdbGTFtagExonParentGene sjdbGTFtagExonParentGene --sjdbOverhang 0 --sjdbScore
0 --sjdbGTFchrPrefix sjdbGTFchrPrefix --sjdbGTFfile /demo/test-files/chr20.gtf &&
tar -vcf genome.tar ./genomeDir /sbgenomics/test-data/chr20.fa
sbg:modifiedOn: 1450911470
sbg:modifiedBy: bix-demo
sbg:revisionsInfo:
- sbg:modifiedBy: bix-demo
sbg:modifiedOn: 1450911469
sbg:revision: 0
- sbg:modifiedBy: bix-demo
sbg:modifiedOn: 1450911470
sbg:revision: 1
sbg:toolkit: STAR
id: sevenbridges/public-apps/star-genome-generate/1
inputs:
- type:
- 'null'
- int
label: Extra alignment score
description: Extra alignment score for alignments that cross database junctions.
streamable: no
id: '#sjdbScore'
inputBinding:
position: 0
prefix: --sjdbScore
separate: yes
sbg:cmdInclude: yes
sbg:category: Splice junctions db parameters
sbg:includeInPorts: yes
sbg:toolDefaultValue: '2'
required: no
- type:
- 'null'
- int
label: '"Overhang" length'
description: Length of the donor/acceptor sequence on each side of the junctions,
ideally = (mate_length - 1) (int >= 0), if int = 0, splice junction database
is not used.
streamable: no
id: '#sjdbOverhang'
inputBinding:
position: 0
prefix: --sjdbOverhang
separate: yes
sbg:cmdInclude: yes
sbg:category: Splice junctions db parameters
sbg:includeInPorts: yes
sbg:toolDefaultValue: '100'
required: no
- type:
- 'null'
- string
label: Exons' parents name
description: Tag name to be used as exons’ transcript-parents.
streamable: no
id: '#sjdbGTFtagExonParentTranscript'
inputBinding:
position: 0
prefix: --sjdbGTFtagExonParentTranscript
separate: yes
sbg:cmdInclude: yes
sbg:category: Splice junctions db parameters
sbg:toolDefaultValue: transcript_id
required: no
- type:
- 'null'
- string
label: Gene name
description: Tag name to be used as exons’ gene-parents.
streamable: no
id: '#sjdbGTFtagExonParentGene'
inputBinding:
position: 0
prefix: --sjdbGTFtagExonParentGene
separate: yes
sbg:cmdInclude: yes
sbg:category: Splice junctions db parameters
sbg:toolDefaultValue: gene_id
required: no
- type:
- 'null'
- items: File
type: array
label: Splice junction file
description: Gene model annotations and/or known transcripts.
streamable: no
id: '#sjdbGTFfile'
sbg:category: Basic
sbg:fileTypes: GTF, GFF, TXT
required: no
- type:
- 'null'
- string
label: Set exons feature
description: Feature type in GTF file to be used as exons for building transcripts.
streamable: no
id: '#sjdbGTFfeatureExon'
inputBinding:
position: 0
prefix: --sjdbGTFfeatureExon
separate: yes
sbg:cmdInclude: yes
sbg:category: Splice junctions db parameters
sbg:toolDefaultValue: exon
required: no
- type:
- 'null'
- string
label: Chromosome names
description: Prefix for chromosome names in a GTF file (e.g. 'chr' for using
ENSMEBL annotations with UCSC geneomes).
streamable: no
id: '#sjdbGTFchrPrefix'
inputBinding:
position: 0
prefix: --sjdbGTFchrPrefix
separate: yes
sbg:cmdInclude: yes
sbg:category: Splice junctions db parameters
sbg:toolDefaultValue: '-'
required: no
- type:
- 'null'
- int
label: Suffux array sparsity
description: 'Distance between indices: use bigger numbers to decrease needed
RAM at the cost of mapping speed reduction (int>0).'
streamable: no
id: '#genomeSAsparseD'
inputBinding:
position: 0
prefix: --genomeSAsparseD
separate: yes
sbg:cmdInclude: yes
sbg:category: Genome generation parameters
sbg:toolDefaultValue: '1'
required: no
- type:
- 'null'
- int
label: Pre-indexing string length
description: Length (bases) of the SA pre-indexing string. Typically between
10 and 15. Longer strings will use much more memory, but allow faster searches.
For small genomes, this number needs to be scaled down, with a typical value
of min(14, log2(GenomeLength)/2 - 1). For example, for 1 megaBase genome,
this is equal to 9, for 100 kiloBase genome, this is equal to 7.
streamable: no
id: '#genomeSAindexNbases'
inputBinding:
position: 0
prefix: --genomeSAindexNbases
separate: yes
sbg:cmdInclude: yes
sbg:category: Genome generation parameters
sbg:toolDefaultValue: '14'
required: no
- type:
- File
label: Genome fasta files
description: Reference sequence to which to align the reads.
streamable: no
id: '#genomeFastaFiles'
inputBinding:
position: 0
prefix: --genomeFastaFiles
separate: yes
sbg:cmdInclude: yes
sbg:category: Basic
sbg:fileTypes: FASTA, FA
required: yes
- type:
- 'null'
- string
label: Bins size
description: 'Set log2(chrBin), where chrBin is the size (bits) of the bins
for genome storage: each chromosome will occupy an integer number of bins.
If you are using a genome with a large (>5,000) number of chrosomes/scaffolds,
you may need to reduce this number to reduce RAM consumption. The following
scaling is recomended: genomeChrBinNbits = min(18, log2(GenomeLength/NumberOfReferences)).
For example, for 3 gigaBase genome with 100,000 chromosomes/scaffolds, this
is equal to 15.'
streamable: no
id: '#genomeChrBinNbits'
inputBinding:
position: 0
prefix: --genomeChrBinNbits
separate: yes
sbg:cmdInclude: yes
sbg:category: Genome generation parameters
sbg:toolDefaultValue: '18'
required: no
outputs:
- type:
- 'null'
- File
label: Genome Files
description: Genome files comprise binary genome sequence, suffix arrays, text
chromosome names/lengths, splice junctions coordinates, and transcripts/genes
information.
streamable: no
id: '#genome'
outputBinding:
glob: '*.tar'
sbg:fileTypes: TAR
requirements:
- class: ExpressionEngineRequirement
id: '#cwl-js-engine'
requirements:
- class: DockerRequirement
dockerPull: rabix/js-engine
hints:
- class: DockerRequirement
dockerPull: images.sbgenomics.com/ana_d/star:2.4.2a
dockerImageId: a4b0ad2c3cae
- class: sbg:CPURequirement
value: 15
- class: sbg:MemRequirement
value: 60000
label: STAR Genome Generate
description: STAR Genome Generate is a tool that generates genome index files.
One set of files should be generated per each genome/annotation combination.
Once produced, these files could be used as long as genome/annotation combination
stays the same. Also, STAR Genome Generate which produced these files and STAR
aligner using them must be the same toolkit version.
class: CommandLineTool
arguments:
- position: 99
separate: yes
valueFrom: '&& tar -vcf genome.tar ./genomeDir'
- position: 0
separate: yes
valueFrom:
engine: '#cwl-js-engine'
script: "{\t\n var sjFormat = \"False\"\n var gtfgffFormat = \"False\"\n
\ var list = $job.inputs.sjdbGTFfile\n var paths_list = []\n var joined_paths
= \"\"\n \n if (list) {\n list.forEach(function(f){return paths_list.push(f.path)})\n
\ joined_paths = paths_list.join(\" \")\n\n\n paths_list.forEach(function(f){\n
\ ext = f.replace(/^.*\\./, '')\n if (ext == \"gff\" || ext ==
\"gtf\") {\n gtfgffFormat = \"True\"\n return gtfgffFormat\n
\ }\n if (ext == \"txt\") {\n sjFormat = \"True\"\n return
sjFormat\n }\n })\n\n if ($job.inputs.sjdbGTFfile && $job.inputs.sjdbInsertSave
!= \"None\") {\n if (sjFormat == \"True\") {\n return \"--sjdbFileChrStartEnd
\".concat(joined_paths)\n }\n else if (gtfgffFormat == \"True\")
{\n return \"--sjdbGTFfile \".concat(joined_paths)\n }\n }\n
\ }\n}"
class: Expression
stdin: ''
stdout: ''
successCodes: []
temporaryFailCodes: []
x: 384.0832266
'y': 446.4998957
sbg:x: 100.0
sbg:y: 200.0
- id: '#SBG_FASTQ_Quality_Detector'
inputs:
- id: '#SBG_FASTQ_Quality_Detector.fastq'
source: '#fastq'
outputs:
- id: '#SBG_FASTQ_Quality_Detector.result'
hints: []
run:
sbg:validationErrors: []
sbg:sbgMaintained: no
sbg:latestRevision: 3
sbg:job:
allocatedResources:
mem: 1000
cpu: 1
inputs:
fastq:
size: 0
secondaryFiles: []
class: File
path: /path/to/fastq.ext
sbg:toolAuthor: Seven Bridges Genomics
sbg:createdOn: 1450911312
sbg:categories:
- FASTQ-Processing
sbg:contributors:
- bix-demo
sbg:project: bix-demo/sbgtools-demo
sbg:createdBy: bix-demo
sbg:id: sevenbridges/public-apps/sbg-fastq-quality-detector/3
sbg:license: Apache License 2.0
sbg:revision: 3
sbg:cmdPreview: python /opt/sbg_fastq_quality_scale_detector.py --fastq /path/to/fastq.ext
/path/to/fastq.ext
sbg:modifiedOn: 1450911314
sbg:modifiedBy: bix-demo
sbg:revisionsInfo:
- sbg:modifiedBy: bix-demo
sbg:modifiedOn: 1450911312
sbg:revision: 0
- sbg:modifiedBy: bix-demo
sbg:modifiedOn: 1450911314
sbg:revision: 3
- sbg:modifiedBy: bix-demo
sbg:modifiedOn: 1450911313
sbg:revision: 1
- sbg:modifiedBy: bix-demo
sbg:modifiedOn: 1450911313
sbg:revision: 2
sbg:toolkit: SBGTools
id: sevenbridges/public-apps/sbg-fastq-quality-detector/3
inputs:
- type:
- File
label: Fastq
description: FASTQ file.
streamable: no
id: '#fastq'
inputBinding:
position: 0
prefix: --fastq
separate: yes
sbg:cmdInclude: yes
required: yes
outputs:
- type:
- 'null'
- File
label: Result
description: Source FASTQ file with updated metadata.
streamable: no
id: '#result'
outputBinding:
glob: '*.fastq'
sbg:fileTypes: FASTQ
requirements:
- class: CreateFileRequirement
fileDef: []
hints:
- class: DockerRequirement
dockerPull: images.sbgenomics.com/tziotas/sbg_fastq_quality_scale_detector:1.0
dockerImageId: ''
- class: sbg:CPURequirement
value: 1
- class: sbg:MemRequirement
value: 1000
label: SBG FASTQ Quality Detector
description: FASTQ Quality Scale Detector detects which quality encoding scheme
was used in your reads and automatically enters the proper value in the "Quality
Scale" metadata field.
class: CommandLineTool
arguments: []
stdin: ''
stdout: ''
successCodes: []
temporaryFailCodes: []
x: 375.3333179
'y': 323.5833156
sbg:x: 300.0
sbg:y: 200.0
- id: '#Picard_SortSam'
inputs:
- id: '#Picard_SortSam.validation_stringency'
default: SILENT
- id: '#Picard_SortSam.sort_order'
default: Coordinate
- id: '#Picard_SortSam.quiet'
- id: '#Picard_SortSam.output_type'
- id: '#Picard_SortSam.memory_per_job'
- id: '#Picard_SortSam.max_records_in_ram'
- id: '#Picard_SortSam.input_bam'
source: '#STAR.aligned_reads'
- id: '#Picard_SortSam.create_index'
default: 'True'
- id: '#Picard_SortSam.compression_level'
outputs:
- id: '#Picard_SortSam.sorted_bam'
hints: []
run:
sbg:validationErrors: []
sbg:sbgMaintained: no
sbg:latestRevision: 2
sbg:job:
allocatedResources:
mem: 2048
cpu: 1
inputs:
sort_order: Coordinate
input_bam:
path: /root/dir/example.tested.bam
memory_per_job: 2048
output_type: ~
create_index: ~
sbg:toolAuthor: Broad Institute
sbg:createdOn: 1450911168
sbg:categories:
- SAM/BAM-Processing
sbg:contributors:
- bix-demo
sbg:links:
- id: http://broadinstitute.github.io/picard/index.html
label: Homepage
- id: https://github.com/broadinstitute/picard/releases/tag/1.138
label: Source Code
- id: http://broadinstitute.github.io/picard/
label: Wiki
- id: https://github.com/broadinstitute/picard/zipball/master
label: Download
- id: http://broadinstitute.github.io/picard/
label: Publication
sbg:project: bix-demo/picard-1-140-demo
sbg:createdBy: bix-demo
sbg:toolkitVersion: '1.140'
sbg:id: sevenbridges/public-apps/picard-sortsam-1-140/2
sbg:license: MIT License, Apache 2.0 Licence
sbg:revision: 2
sbg:cmdPreview: java -Xmx2048M -jar /opt/picard-tools-1.140/picard.jar SortSam
OUTPUT=example.tested.sorted.bam INPUT=/root/dir/example.tested.bam SORT_ORDER=coordinate INPUT=/root/dir/example.tested.bam
SORT_ORDER=coordinate /root/dir/example.tested.bam
sbg:modifiedOn: 1450911170
sbg:modifiedBy: bix-demo
sbg:revisionsInfo:
- sbg:modifiedBy: bix-demo
sbg:modifiedOn: 1450911168
sbg:revision: 0
- sbg:modifiedBy: bix-demo
sbg:modifiedOn: 1450911169
sbg:revision: 1
- sbg:modifiedBy: bix-demo
sbg:modifiedOn: 1450911170
sbg:revision: 2
sbg:toolkit: Picard
id: sevenbridges/public-apps/picard-sortsam-1-140/2
inputs:
- type:
- 'null'
- name: validation_stringency
symbols:
- STRICT
- LENIENT
- SILENT
type: enum
label: Validation stringency
description: 'Validation stringency for all SAM files read by this program.
Setting stringency to SILENT can improve performance when processing a BAM
file in which variable-length data (read, qualities, tags) do not otherwise
need to be decoded. This option can be set to ''null'' to clear the default
value. Possible values: {STRICT, LENIENT, SILENT}.'
streamable: no
id: '#validation_stringency'
inputBinding:
position: 0
prefix: VALIDATION_STRINGENCY=
separate: no
valueFrom:
engine: '#cwl-js-engine'
script: |-
{
if ($job.inputs.validation_stringency)
{
return $job.inputs.validation_stringency
}
else
{
return "SILENT"
}
}
class: Expression
sbg:cmdInclude: yes
sbg:category: Other input types
sbg:toolDefaultValue: SILENT
required: no
- type:
- name: sort_order
symbols:
- Unsorted
- Queryname
- Coordinate
type: enum
label: Sort order
description: 'Sort order of the output file. Possible values: {unsorted, queryname,
coordinate}.'
streamable: no
id: '#sort_order'
inputBinding:
position: 3
prefix: SORT_ORDER=
separate: no
valueFrom:
engine: '#cwl-js-engine'
script: |-
{
p = $job.inputs.sort_order.toLowerCase()
return p
}
class: Expression
sbg:cmdInclude: yes
sbg:category: Other input types
sbg:toolDefaultValue: Coordinate
sbg:altPrefix: SO
required: yes
- type:
- 'null'
- name: quiet
symbols:
- 'True'
- 'False'
type: enum
label: Quiet
description: 'This parameter indicates whether to suppress job-summary info
on System.err. This option can be set to ''null'' to clear the default value.
Possible values: {true, false}.'
streamable: no
id: '#quiet'
inputBinding:
position: 0
prefix: QUIET=
separate: no
sbg:cmdInclude: yes
sbg:category: Other input types
sbg:toolDefaultValue: 'False'
required: no
- type:
- 'null'
- name: output_type
symbols:
- BAM
- SAM
- SAME AS INPUT
type: enum
label: Output format
description: Since Picard tools can output both SAM and BAM files, user can
choose the format of the output file.
streamable: no
id: '#output_type'
sbg:category: Other input types
sbg:toolDefaultValue: SAME AS INPUT
required: no
- type:
- 'null'
- int
label: Memory per job
description: Amount of RAM memory to be used per job. Defaults to 2048 MB for
single threaded jobs.
streamable: no
id: '#memory_per_job'
sbg:toolDefaultValue: '2048'
required: no
- type:
- 'null'
- int
label: Max records in RAM
description: When writing SAM files that need to be sorted, this parameter will
specify the number of records stored in RAM before spilling to disk. Increasing
this number reduces the number of file handles needed to sort a SAM file,
and increases the amount of RAM needed. This option can be set to 'null' to
clear the default value.
streamable: no
id: '#max_records_in_ram'
inputBinding:
position: 0
prefix: MAX_RECORDS_IN_RAM=
separate: no
sbg:cmdInclude: yes
sbg:category: Other input types
sbg:toolDefaultValue: '500000'
required: no
- type:
- File
label: Input BAM
description: The BAM or SAM file to sort.
streamable: no
id: '#input_bam'
inputBinding:
position: 1
prefix: INPUT=
separate: no
sbg:cmdInclude: yes
sbg:category: File inputs
sbg:fileTypes: BAM, SAM
sbg:altPrefix: I
required: yes
- type:
- 'null'
- name: create_index
symbols:
- 'True'
- 'False'
type: enum
label: Create index
description: 'This parameter indicates whether to create a BAM index when writing
a coordinate-sorted BAM file. This option can be set to ''null'' to clear
the default value. Possible values: {true, false}.'
streamable: no
id: '#create_index'
inputBinding:
position: 5
prefix: CREATE_INDEX=
separate: no
sbg:cmdInclude: yes
sbg:category: Other input types
sbg:toolDefaultValue: 'False'
required: no
- type:
- 'null'
- int
label: Compression level
description: Compression level for all compressed files created (e.g. BAM and
GELI). This option can be set to 'null' to clear the default value.
streamable: no
id: '#compression_level'
inputBinding:
position: 0
prefix: COMPRESSION_LEVEL=
separate: no
sbg:cmdInclude: yes
sbg:category: Other input types
sbg:toolDefaultValue: '5'
required: no
outputs:
- type:
- 'null'
- File
label: Sorted BAM/SAM
description: Sorted BAM or SAM file.
streamable: no
id: '#sorted_bam'
outputBinding:
glob: '*.sorted.?am'
sbg:fileTypes: BAM, SAM
requirements:
- class: ExpressionEngineRequirement
id: '#cwl-js-engine'
requirements:
- class: DockerRequirement
dockerPull: rabix/js-engine
engineCommand: cwl-engine.js
hints:
- class: DockerRequirement
dockerPull: images.sbgenomics.com/mladenlsbg/picard:1.140
dockerImageId: eab0e70b6629
- class: sbg:CPURequirement
value: 1
- class: sbg:MemRequirement
value:
engine: '#cwl-js-engine'
script: "{\n if($job.inputs.memory_per_job){\n \treturn $job.inputs.memory_per_job\n
\ }\n \treturn 2048\n}"
class: Expression
label: Picard SortSam
description: Picard SortSam sorts the input SAM or BAM. Input and output formats
are determined by the file extension.
class: CommandLineTool
arguments:
- position: 0
prefix: OUTPUT=
separate: no
valueFrom:
engine: '#cwl-js-engine'
script: "{\n filename = $job.inputs.input_bam.path\n ext = $job.inputs.output_type\n\nif
(ext === \"BAM\")\n{\n return filename.split('.').slice(0, -1).concat(\"sorted.bam\").join(\".\").replace(/^.*[\\\\\\/]/,
'')\n }\n\nelse if (ext === \"SAM\")\n{\n return filename.split('.').slice(0,
-1).concat(\"sorted.sam\").join('.').replace(/^.*[\\\\\\/]/, '')\n}\n\nelse
\n{\n\treturn filename.split('.').slice(0, -1).concat(\"sorted.\"+filename.split('.').slice(-1)[0]).join(\".\").replace(/^.*[\\\\\\/]/,
'')\n}\n}"
class: Expression
- position: 1000
separate: yes
valueFrom:
engine: '#cwl-js-engine'
script: "{\n filename = $job.inputs.input_bam.path\n \n /* figuring out
output file type */\n ext = $job.inputs.output_type\n if (ext === \"BAM\")\n
\ {\n out_extension = \"BAM\"\n }\n else if (ext === \"SAM\")\n {\n
\ out_extension = \"SAM\"\n }\n else \n {\n\tout_extension = filename.split('.').slice(-1)[0].toUpperCase()\n
\ } \n \n /* if exist moving .bai in bam.bai */\n if ($job.inputs.create_index
=== 'True' && $job.inputs.sort_order === 'Coordinate' && out_extension ==
\"BAM\")\n {\n \n old_name = filename.split('.').slice(0, -1).concat('sorted.bai').join('.').replace(/^.*[\\\\\\/]/,
'')\n new_name = filename.split('.').slice(0, -1).concat('sorted.bam.bai').join('.').replace(/^.*[\\\\\\/]/,
'')\n return \"; mv \" + \" \" + old_name + \" \" + new_name\n }\n\n}"
class: Expression
stdin: ''
stdout: ''
successCodes: []
temporaryFailCodes: []
x: 773.0831807
'y': 470.9165939
sbg:x: 500.0
sbg:y: 200.0
- id: '#STAR'
inputs:
- id: '#STAR.winFlankNbins'
- id: '#STAR.winBinNbits'
- id: '#STAR.winAnchorMultimapNmax'
source: '#winAnchorMultimapNmax'
- id: '#STAR.winAnchorDistNbins'
source: '#winAnchorDistNbins'
- id: '#STAR.twopassMode'
- id: '#STAR.twopass1readsN'
- id: '#STAR.sjdbScore'
- id: '#STAR.sjdbOverhang'
default: 100
- id: '#STAR.sjdbInsertSave'
- id: '#STAR.sjdbGTFtagExonParentTranscript'
- id: '#STAR.sjdbGTFtagExonParentGene'
- id: '#STAR.sjdbGTFfile'
source: '#sjdbGTFfile'
- id: '#STAR.sjdbGTFfeatureExon'
- id: '#STAR.sjdbGTFchrPrefix'
- id: '#STAR.seedSearchStartLmaxOverLread'
- id: '#STAR.seedSearchStartLmax'
- id: '#STAR.seedSearchLmax'
- id: '#STAR.seedPerWindowNmax'
- id: '#STAR.seedPerReadNmax'
- id: '#STAR.seedNoneLociPerWindow'
- id: '#STAR.seedMultimapNmax'
- id: '#STAR.scoreStitchSJshift'
- id: '#STAR.scoreInsOpen'
- id: '#STAR.scoreInsBase'
- id: '#STAR.scoreGenomicLengthLog2scale'
- id: '#STAR.scoreGapNoncan'
- id: '#STAR.scoreGapGCAG'
- id: '#STAR.scoreGapATAC'
- id: '#STAR.scoreGap'
- id: '#STAR.scoreDelOpen'
- id: '#STAR.scoreDelBase'
- id: '#STAR.rg_seq_center'
- id: '#STAR.rg_sample_id'
- id: '#STAR.rg_platform_unit_id'
- id: '#STAR.rg_platform'
- id: '#STAR.rg_mfl'
- id: '#STAR.rg_library_id'
- id: '#STAR.reads'
source: '#SBG_FASTQ_Quality_Detector.result'
- id: '#STAR.readMatesLengthsIn'
- id: '#STAR.readMapNumber'
- id: '#STAR.quantTranscriptomeBan'
- id: '#STAR.quantMode'
default: TranscriptomeSAM
- id: '#STAR.outSortingType'
default: SortedByCoordinate
- id: '#STAR.outSJfilterReads'
- id: '#STAR.outSJfilterOverhangMin'
- id: '#STAR.outSJfilterIntronMaxVsReadN'
- id: '#STAR.outSJfilterDistToOtherSJmin'
- id: '#STAR.outSJfilterCountUniqueMin'
- id: '#STAR.outSJfilterCountTotalMin'
- id: '#STAR.outSAMunmapped'
- id: '#STAR.outSAMtype'
default: BAM
- id: '#STAR.outSAMstrandField'
- id: '#STAR.outSAMreadID'
- id: '#STAR.outSAMprimaryFlag'
- id: '#STAR.outSAMorder'
- id: '#STAR.outSAMmode'
- id: '#STAR.outSAMmapqUnique'
- id: '#STAR.outSAMheaderPG'
- id: '#STAR.outSAMheaderHD'
- id: '#STAR.outSAMflagOR'
- id: '#STAR.outSAMflagAND'
- id: '#STAR.outSAMattributes'
- id: '#STAR.outReadsUnmapped'
default: Fastx
- id: '#STAR.outQSconversionAdd'
- id: '#STAR.outFilterType'
- id: '#STAR.outFilterScoreMinOverLread'
- id: '#STAR.outFilterScoreMin'
- id: '#STAR.outFilterMultimapScoreRange'
- id: '#STAR.outFilterMultimapNmax'
- id: '#STAR.outFilterMismatchNoverReadLmax'
- id: '#STAR.outFilterMismatchNoverLmax'
- id: '#STAR.outFilterMismatchNmax'
- id: '#STAR.outFilterMatchNminOverLread'
- id: '#STAR.outFilterMatchNmin'
- id: '#STAR.outFilterIntronMotifs'
- id: '#STAR.limitSjdbInsertNsj'
- id: '#STAR.limitOutSJoneRead'
- id: '#STAR.limitOutSJcollapsed'
- id: '#STAR.limitBAMsortRAM'
- id: '#STAR.genomeDirName'
- id: '#STAR.genome'
source: '#STAR_Genome_Generate.genome'
- id: '#STAR.clip5pNbases'
- id: '#STAR.clip3pNbases'
- id: '#STAR.clip3pAfterAdapterNbases'
- id: '#STAR.clip3pAdapterSeq'
- id: '#STAR.clip3pAdapterMMp'
- id: '#STAR.chimSegmentMin'
- id: '#STAR.chimScoreSeparation'
- id: '#STAR.chimScoreMin'
- id: '#STAR.chimScoreJunctionNonGTAG'
- id: '#STAR.chimScoreDropMax'
- id: '#STAR.chimOutType'
- id: '#STAR.chimJunctionOverhangMin'
- id: '#STAR.alignWindowsPerReadNmax'
- id: '#STAR.alignTranscriptsPerWindowNmax'
- id: '#STAR.alignTranscriptsPerReadNmax'
- id: '#STAR.alignSplicedMateMapLminOverLmate'
- id: '#STAR.alignSplicedMateMapLmin'
- id: '#STAR.alignSoftClipAtReferenceEnds'
- id: '#STAR.alignSJoverhangMin'
- id: '#STAR.alignSJDBoverhangMin'
- id: '#STAR.alignMatesGapMax'
- id: '#STAR.alignIntronMin'
- id: '#STAR.alignIntronMax'
- id: '#STAR.alignEndsType'
outputs:
- id: '#STAR.unmapped_reads'
- id: '#STAR.transcriptome_aligned_reads'
- id: '#STAR.splice_junctions'
- id: '#STAR.reads_per_gene'
- id: '#STAR.log_files'
- id: '#STAR.intermediate_genome'
- id: '#STAR.chimeric_junctions'
- id: '#STAR.chimeric_alignments'
- id: '#STAR.aligned_reads'
hints: []
run:
sbg:validationErrors: []
sbg:sbgMaintained: no
sbg:latestRevision: 4
sbg:job:
allocatedResources:
mem: 60000
cpu: 15
inputs:
alignWindowsPerReadNmax: 0
outSAMheaderPG: outSAMheaderPG
GENOME_DIR_NAME: ''
outFilterMatchNminOverLread: 0
rg_platform_unit_id: rg_platform_unit
alignTranscriptsPerReadNmax: 0
readMapNumber: 0
alignSplicedMateMapLminOverLmate: 0
alignMatesGapMax: 0
outFilterMultimapNmax: 0
clip5pNbases:
- 0
outSAMstrandField: None
readMatesLengthsIn: NotEqual
outSAMattributes: Standard
seedMultimapNmax: 0
rg_mfl: rg_mfl
chimSegmentMin: 0
winAnchorDistNbins: 0
outSortingType: SortedByCoordinate
outFilterMultimapScoreRange: 0
sjdbInsertSave: Basic
clip3pAfterAdapterNbases:
- 0
scoreDelBase: 0
outFilterMatchNmin: 0
twopass1readsN: 0
outSAMunmapped: None
genome:
size: 0
secondaryFiles: []
class: File
path: genome.ext
sjdbGTFtagExonParentTranscript: ''
limitBAMsortRAM: 0
alignEndsType: Local
seedNoneLociPerWindow: 0
rg_sample_id: rg_sample
sjdbGTFtagExonParentGene: ''
chimScoreMin: 0
outSJfilterIntronMaxVsReadN:
- 0
twopassMode: Basic
alignSplicedMateMapLmin: 0
outSJfilterReads: All
outSAMprimaryFlag: OneBestScore
outSJfilterCountTotalMin:
- 3
- 1
- 1
- 1
outSAMorder: Paired
outSAMflagAND: 0
chimScoreSeparation: 0
alignSJoverhangMin: 0
outFilterScoreMin: 0
seedSearchStartLmax: 0
scoreGapGCAG: 0
scoreGenomicLengthLog2scale: 0
outFilterIntronMotifs: None
outFilterMismatchNmax: 0
reads:
- size: 0
secondaryFiles: []
class: File
metadata:
format: fastq
paired_end: '1'
seq_center: illumina
path: /test-data/mate_1.fastq.bz2
scoreGap: 0
outSJfilterOverhangMin:
- 30
- 12
- 12
- 12
outSAMflagOR: 0
outSAMmode: Full
rg_library_id: ''
chimScoreJunctionNonGTAG: 0
scoreInsOpen: 0
clip3pAdapterSeq:
- clip3pAdapterSeq
chimScoreDropMax: 0
outFilterType: Normal
scoreGapATAC: 0
rg_platform: Ion Torrent PGM
clip3pAdapterMMp:
- 0
sjdbGTFfeatureExon: ''
outQSconversionAdd: 0
quantMode: TranscriptomeSAM
alignIntronMin: 0
scoreInsBase: 0
scoreGapNoncan: 0
seedSearchLmax: 0
outSJfilterDistToOtherSJmin:
- 0
outFilterScoreMinOverLread: 0
alignSJDBoverhangMin: 0
limitOutSJcollapsed: 0
winAnchorMultimapNmax: 0
outFilterMismatchNoverLmax: 0
rg_seq_center: ''
outSAMheaderHD: outSAMheaderHD
chimOutType: Within
quantTranscriptomeBan: IndelSoftclipSingleend
limitOutSJoneRead: 0
alignTranscriptsPerWindowNmax: 0
sjdbOverhang: ~
outReadsUnmapped: Fastx
scoreStitchSJshift: 0
seedPerWindowNmax: 0
outSJfilterCountUniqueMin:
- 3
- 1
- 1
- 1
scoreDelOpen: 0
sjdbGTFfile:
- path: /demo/test-data/chr20.gtf
clip3pNbases:
- 0
- 3
winBinNbits: 0
sjdbScore: ~
seedSearchStartLmaxOverLread: 0
alignIntronMax: 0
seedPerReadNmax: 0
outFilterMismatchNoverReadLmax: 0
winFlankNbins: 0
sjdbGTFchrPrefix: chrPrefix
alignSoftClipAtReferenceEnds: 'Yes'
outSAMreadID: Standard
outSAMtype: BAM
chimJunctionOverhangMin: 0
limitSjdbInsertNsj: 0
outSAMmapqUnique: 0
sbg:toolAuthor: Alexander Dobin/CSHL
sbg:createdOn: 1450911471
sbg:categories:
- Alignment
sbg:contributors:
- ana_d
- bix-demo
- uros_sipetic
sbg:links:
- id: https://github.com/alexdobin/STAR
label: Homepage
- id: https://github.com/alexdobin/STAR/releases
label: Releases
- id: https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf
label: Manual
- id: https://groups.google.com/forum/#!forum/rna-star
label: Support
- id: http://www.ncbi.nlm.nih.gov/pubmed/23104886
label: Publication
sbg:project: bix-demo/star-2-4-2a-demo
sbg:createdBy: bix-demo
sbg:toolkitVersion: 2.4.2a
sbg:id: sevenbridges/public-apps/star/4
sbg:license: GNU General Public License v3.0 only
sbg:revision: 4
sbg:cmdPreview: tar -xvf genome.ext && /opt/STAR --runThreadN 15 --readFilesCommand
bzcat --sjdbGTFfile /demo/test-data/chr20.gtf --sjdbGTFchrPrefix chrPrefix
--sjdbInsertSave Basic --twopass1readsN 0 --chimOutType WithinBAM --outSAMattrRGline
ID:1 CN:illumina PI:rg_mfl PL:Ion_Torrent_PGM PU:rg_platform_unit SM:rg_sample --quantMode
TranscriptomeSAM --outFileNamePrefix ./mate_1.fastq.bz2. --readFilesIn /test-data/mate_1.fastq.bz2 &&
tar -vcf mate_1.fastq.bz2._STARgenome.tar ./mate_1.fastq.bz2._STARgenome &&
mv mate_1.fastq.bz2.Unmapped.out.mate1 mate_1.fastq.bz2.Unmapped.out.mate1.fastq
sbg:modifiedOn: 1462889222
sbg:modifiedBy: ana_d
sbg:revisionsInfo:
- sbg:modifiedBy: bix-demo
sbg:modifiedOn: 1450911471
sbg:revision: 0
- sbg:modifiedBy: bix-demo
sbg:modifiedOn: 1450911473
sbg:revision: 1
- sbg:modifiedBy: bix-demo
sbg:modifiedOn: 1450911475
sbg:revision: 2
- sbg:modifiedBy: uros_sipetic
sbg:modifiedOn: 1462878528
sbg:revision: 3
- sbg:modifiedBy: ana_d
sbg:modifiedOn: 1462889222
sbg:revision: 4
sbg:toolkit: STAR
id: sevenbridges/public-apps/star/4
inputs:
- type:
- 'null'
- int
label: Flanking regions size
description: =log2(winFlank), where win Flank is the size of the left and right
flanking regions for each window (int>0).
streamable: no
id: '#winFlankNbins'
inputBinding:
position: 0
prefix: --winFlankNbins
separate: yes
sbg:cmdInclude: yes
sbg:category: Windows, Anchors, Binning
sbg:includeInPorts: yes
sbg:toolDefaultValue: '4'
required: no
- type:
- 'null'
- int
label: Bin size
description: =log2(winBin), where winBin is the size of the bin for the windows/clustering,
each window will occupy an integer number of bins (int>0).
streamable: no
id: '#winBinNbits'
inputBinding:
position: 0
prefix: --winBinNbits
separate: yes
sbg:cmdInclude: yes
sbg:category: Windows, Anchors, Binning
sbg:includeInPorts: yes
sbg:toolDefaultValue: '16'
required: no
- type:
- 'null'
- int
label: Max loci anchors
description: Max number of loci anchors are allowed to map to (int>0).
streamable: no
id: '#winAnchorMultimapNmax'
inputBinding:
position: 0
prefix: --winAnchorMultimapNmax
separate: yes
sbg:cmdInclude: yes
sbg:category: Windows, Anchors, Binning
sbg:toolDefaultValue: '50'
required: no
- type:
- 'null'
- int
label: Max bins between anchors
description: Max number of bins between two anchors that allows aggregation
of anchors into one window (int>0).
streamable: no
id: '#winAnchorDistNbins'
inputBinding:
position: 0
prefix: --winAnchorDistNbins
separate: yes
sbg:cmdInclude: yes
sbg:category: Windows, Anchors, Binning
sbg:toolDefaultValue: '9'
required: no
- type:
- 'null'
- name: twopassMode
symbols:
- None
- Basic
type: enum
label: Two-pass mode
description: '2-pass mapping mode. None: 1-pass mapping; Basic: basic 2-pass
mapping, with all 1st pass junctions inserted into the genome indices on the
fly.'
streamable: no
id: '#twopassMode'
inputBinding:
position: 0
prefix: --twopassMode
separate: yes
sbg:cmdInclude: yes
sbg:category: 2-pass mapping
sbg:toolDefaultValue: None
required: no
- type:
- 'null'
- int
label: Reads to process in 1st step
description: 'Number of reads to process for the 1st step. 0: 1-step only, no
2nd pass; use very large number to map all reads in the first step (int>0).'
streamable: no
id: '#twopass1readsN'
sbg:category: 2-pass mapping
sbg:toolDefaultValue: '-1'
required: no
- type:
- 'null'
- int
label: Extra alignment score
description: Extra alignment score for alignments that cross database junctions.
streamable: no
id: '#sjdbScore'
sbg:category: Splice junctions database
sbg:toolDefaultValue: '2'
required: no
- type:
- 'null'
- int
label: '"Overhang" length'
description: Length of the donor/acceptor sequence on each side of the junctions,
ideally = (mate_length - 1) (int >= 0), if int = 0, splice junction database
is not used.
streamable: no
id: '#sjdbOverhang'
sbg:category: Splice junctions database
sbg:toolDefaultValue: '100'
required: no
- type:
- 'null'
- name: sjdbInsertSave
symbols:
- Basic
- All
- None
type: enum
label: Save junction files
description: 'Which files to save when sjdb junctions are inserted on the fly
at the mapping step. None: not saving files at all; Basic: only small junction/transcript
files; All: all files including big Genome, SA and SAindex. These files are
output as archive.'
streamable: no
id: '#sjdbInsertSave'
sbg:category: Splice junctions database
sbg:toolDefaultValue: None
required: no
- type:
- 'null'
- string
label: Exons' parents name
description: Tag name to be used as exons’ transcript-parents.
streamable: no
id: '#sjdbGTFtagExonParentTranscript'
sbg:category: Splice junctions database
sbg:toolDefaultValue: transcript_id
required: no
- type:
- 'null'
- string
label: Gene name
description: Tag name to be used as exons’ gene-parents.
streamable: no
id: '#sjdbGTFtagExonParentGene'
sbg:category: Splice junctions database
sbg:toolDefaultValue: gene_id
required: no
- type:
- 'null'
- items: File
type: array
label: Splice junction file
description: Gene model annotations and/or known transcripts. No need to include
this input, except in case of using "on the fly" annotations.
streamable: no
id: '#sjdbGTFfile'
sbg:category: Basic
sbg:fileTypes: GTF, GFF, TXT
required: no
- type:
- 'null'
- string
label: Set exons feature
description: Feature type in GTF file to be used as exons for building transcripts.
streamable: no
id: '#sjdbGTFfeatureExon'
sbg:category: Splice junctions database
sbg:toolDefaultValue: exon
required: no
- type:
- 'null'
- string
label: Chromosome names
description: Prefix for chromosome names in a GTF file (e.g. 'chr' for using
ENSMEBL annotations with UCSC geneomes).
streamable: no
id: '#sjdbGTFchrPrefix'
sbg:category: Splice junctions database
sbg:toolDefaultValue: '-'
required: no
- type:
- 'null'
- float
label: Search start point normalized
description: seedSearchStartLmax normalized to read length (sum of mates' lengths
for paired-end reads).
streamable: no
id: '#seedSearchStartLmaxOverLread'
inputBinding:
position: 0
prefix: --seedSearchStartLmaxOverLread
separate: yes
sbg:cmdInclude: yes
sbg:category: Alignments and Seeding
sbg:toolDefaultValue: '1.0'
required: no
- type:
- 'null'
- int
label: Search start point
description: Defines the search start point through the read - the read is split
into pieces no longer than this value (int>0).
streamable: no
id: '#seedSearchStartLmax'
inputBinding:
position: 0
prefix: --seedSearchStartLmax
separate: yes
sbg:cmdInclude: yes
sbg:category: Alignments and Seeding
sbg:toolDefaultValue: '50'
required: no
- type:
- 'null'
- int
label: Max seed length
description: Defines the maximum length of the seeds, if =0 max seed length
is infinite (int>=0).
streamable: no
id: '#seedSearchLmax'
inputBinding:
position: 0
prefix: --seedSearchLmax
separate: yes
sbg:cmdInclude: yes
sbg:category: Alignments and Seeding
sbg:toolDefaultValue: '0'
required: no
- type:
- 'null'
- int
label: Max seeds per window
description: Max number of seeds per window (int>=0).
streamable: no
id: '#seedPerWindowNmax'
inputBinding:
position: 0
prefix: --seedPerWindowNmax
separate: yes
sbg:cmdInclude: yes
sbg:category: Alignments and Seeding
sbg:toolDefaultValue: '50'
required: no
- type:
- 'null'
- int
label: Max seeds per read
description: Max number of seeds per read (int>=0).
streamable: no
id: '#seedPerReadNmax'
inputBinding:
position: 0
prefix: --seedPerReadNmax
separate: yes
sbg:cmdInclude: yes
sbg:category: Alignments and Seeding
sbg:toolDefaultValue: '1000'
required: no
- type:
- 'null'
- int
label: Max one-seed loci per window
description: Max number of one seed loci per window (int>=0).
streamable: no
id: '#seedNoneLociPerWindow'
inputBinding:
position: 0
prefix: --seedNoneLociPerWindow
separate: yes
sbg:cmdInclude: yes
sbg:category: Alignments and Seeding
sbg:toolDefaultValue: '10'
required: no
- type:
- 'null'
- int
label: Filter pieces for stitching
description: Only pieces that map fewer than this value are utilized in the
stitching procedure (int>=0).
streamable: no
id: '#seedMultimapNmax'
inputBinding:
position: 0
prefix: --seedMultimapNmax
separate: yes
sbg:cmdInclude: yes
sbg:category: Alignments and Seeding
sbg:toolDefaultValue: '10000'
required: no
- type:
- 'null'
- int
label: Max score reduction
description: Maximum score reduction while searching for SJ boundaries in the
stitching step.
streamable: no
id: '#scoreStitchSJshift'
inputBinding:
position: 0
prefix: --scoreStitchSJshift
separate: yes
sbg:cmdInclude: yes
sbg:category: Scoring
sbg:toolDefaultValue: '1'
required: no
- type:
- 'null'
- int
label: Insertion Open Penalty
description: Insertion open penalty.
streamable: no
id: '#scoreInsOpen'
inputBinding:
position: 0
prefix: --scoreInsOpen
separate: yes
sbg:cmdInclude: yes
sbg:category: Scoring
sbg:toolDefaultValue: '-2'
required: no
- type:
- 'null'
- int
label: Insertion extension penalty
description: Insertion extension penalty per base (in addition to --scoreInsOpen).
streamable: no
id: '#scoreInsBase'
inputBinding:
position: 0
prefix: --scoreInsBase
separate: yes
sbg:cmdInclude: yes
sbg:category: Scoring
sbg:toolDefaultValue: '-2'
required: no
- type:
- 'null'
- float
label: Log scaled score
description: 'Extra score logarithmically scaled with genomic length of the
alignment: <int>*log2(genomicLength).'
streamable: no
id: '#scoreGenomicLengthLog2scale'
inputBinding:
position: 0
prefix: --scoreGenomicLengthLog2scale
separate: yes
sbg:cmdInclude: yes
sbg:category: Scoring
sbg:toolDefaultValue: '-0.25'
required: no
- type:
- 'null'
- int
label: Non-canonical gap open
description: Non-canonical gap open penalty (in addition to --scoreGap).
streamable: no
id: '#scoreGapNoncan'
inputBinding:
position: 0
prefix: --scoreGapNoncan
separate: yes
sbg:cmdInclude: yes
sbg:category: Scoring
sbg:toolDefaultValue: '-8'
required: no
- type:
- 'null'
- int
label: GC/AG and CT/GC gap open
description: GC/AG and CT/GC gap open penalty (in addition to --scoreGap).
streamable: no
id: '#scoreGapGCAG'
inputBinding:
position: 0
prefix: --scoreGapGCAG
separate: yes
sbg:cmdInclude: yes
sbg:category: Scoring
sbg:toolDefaultValue: '-4'
required: no
- type:
- 'null'
- int
label: AT/AC and GT/AT gap open
description: AT/AC and GT/AT gap open penalty (in addition to --scoreGap).
streamable: no
id: '#scoreGapATAC'
inputBinding:
position: 0
prefix: --scoreGapATAC
separate: yes
sbg:cmdInclude: yes
sbg:category: Scoring
sbg:toolDefaultValue: '-8'
required: no
- type:
- 'null'
- int
label: Gap open penalty
description: Gap open penalty.
streamable: no
id: '#scoreGap'
inputBinding:
position: 0
prefix: --scoreGap
separate: yes
sbg:cmdInclude: yes
sbg:category: Scoring
sbg:toolDefaultValue: '0'
required: no
- type:
- 'null'
- int
label: Deletion open penalty
description: Deletion open penalty.
streamable: no
id: '#scoreDelOpen'
inputBinding:
position: 0
prefix: --scoreDelOpen
separate: yes
sbg:cmdInclude: yes
sbg:category: Scoring
sbg:toolDefaultValue: '-2'
required: no
- type:
- 'null'
- int
label: Deletion extension penalty
description: Deletion extension penalty per base (in addition to --scoreDelOpen).
streamable: no
id: '#scoreDelBase'
inputBinding:
position: 0
prefix: --scoreDelBase
separate: yes
sbg:cmdInclude: yes
sbg:category: Scoring
sbg:toolDefaultValue: '-2'
required: no
- type:
- 'null'
- string
label: Sequencing center
description: Specify the sequencing center for RG line.
streamable: no
id: '#rg_seq_center'
sbg:category: Read group
sbg:toolDefaultValue: Inferred from metadata
required: no
- type:
- 'null'
- string
label: Sample ID
description: Specify the sample ID for RG line.
streamable: no
id: '#rg_sample_id'
sbg:category: Read group
sbg:toolDefaultValue: Inferred from metadata
required: no
- type:
- 'null'
- string
label: Platform unit ID
description: Specify the platform unit ID for RG line.
streamable: no
id: '#rg_platform_unit_id'
sbg:category: Read group
sbg:toolDefaultValue: Inferred from metadata
required: no
- type:
- 'null'
- name: rg_platform
symbols:
- LS 454
- Helicos
- Illumina
- ABI SOLiD
- Ion Torrent PGM
- PacBio
type: enum
label: Platform
description: Specify the version of the technology that was used for sequencing
or assaying.
streamable: no
id: '#rg_platform'
sbg:category: Read group
sbg:toolDefaultValue: Inferred from metadata
required: no
- type:
- 'null'
- string
label: Median fragment length
description: Specify the median fragment length for RG line.
streamable: no
id: '#rg_mfl'
sbg:category: Read group
sbg:toolDefaultValue: Inferred from metadata
required: no
- type:
- 'null'
- string
label: Library ID
description: Specify the library ID for RG line.
streamable: no
id: '#rg_library_id'
sbg:category: Read group
sbg:toolDefaultValue: Inferred from metadata
required: no
- type:
- items: File
type: array
label: Read sequence
description: Read sequence.
streamable: no
id: '#reads'
inputBinding:
position: 10
separate: yes
itemSeparator: ' '
valueFrom:
engine: '#cwl-js-engine'
script: "{\t\n var list = [].concat($job.inputs.reads)\n \n var resp
= []\n \n if (list.length == 1){\n resp.push(list[0].path)\n \n
\ }else if (list.length == 2){ \n \n left = \"\"\n right =
\"\"\n \n for (index = 0; index < list.length; ++index) {\n \n
\ if (list[index].metadata != null){\n if (list[index].metadata.paired_end
== 1){\n left = list[index].path\n }else if (list[index].metadata.paired_end
== 2){\n right = list[index].path\n }\n }\n }\n
\ \n if (left != \"\" && right != \"\"){ \n resp.push(left)\n
\ resp.push(right)\n }\n }\n else if (list.length > 2){\n left
= []\n right = []\n \n for (index = 0; index < list.length;
++index) {\n \n if (list[index].metadata != null){\n if
(list[index].metadata.paired_end == 1){\n left.push(list[index].path)\n
\ }else if (list[index].metadata.paired_end == 2){\n right.push(list[index].path)\n
\ }\n }\n }\n left_join = left.join()\n right_join
= right.join()\n if (left != [] && right != []){ \n resp.push(left_join)\n
\ resp.push(right_join)\n }\t\n }\n \n if(resp.length > 0){
\ \n return \"--readFilesIn \".concat(resp.join(\" \"))\n }\n}"
class: Expression
sbg:cmdInclude: yes
sbg:category: Basic
sbg:fileTypes: FASTA, FASTQ, FA, FQ, FASTQ.GZ, FQ.GZ, FASTQ.BZ2, FQ.BZ2
required: yes
- type:
- 'null'
- name: readMatesLengthsIn
symbols:
- NotEqual
- Equal
type: enum
label: Reads lengths
description: Equal/Not equal - lengths of names, sequences, qualities for both
mates are the same/not the same. "Not equal" is safe in all situations.
streamable: no
id: '#readMatesLengthsIn'
inputBinding:
position: 0
prefix: --readMatesLengthsIn
separate: yes
sbg:cmdInclude: yes
sbg:category: Read parameters
sbg:toolDefaultValue: NotEqual
required: no
- type:
- 'null'
- int
label: Reads to map
description: Number of reads to map from the beginning of the file.
streamable: no
id: '#readMapNumber'
inputBinding:
position: 0
prefix: --readMapNumber
separate: yes
sbg:cmdInclude: yes
sbg:category: Read parameters
sbg:toolDefaultValue: '-1'
required: no
- type:
- 'null'
- name: quantTranscriptomeBan
symbols:
- IndelSoftclipSingleend
- Singleend
type: enum
label: Prohibit alignment type
description: 'Prohibit various alignment type. IndelSoftclipSingleend: prohibit
indels, soft clipping and single-end alignments - compatible with RSEM; Singleend:
prohibit single-end alignments.'
streamable: no
id: '#quantTranscriptomeBan'
inputBinding:
position: 0
prefix: --quantTranscriptomeBan
separate: yes
sbg:cmdInclude: yes
sbg:category: Quantification of Annotations
sbg:toolDefaultValue: IndelSoftclipSingleend
required: no
- type:
- 'null'
- name: quantMode
symbols:
- TranscriptomeSAM
- GeneCounts
type: enum
label: Quantification mode
description: Types of quantification requested. 'TranscriptomeSAM' option outputs
SAM/BAM alignments to transcriptome into a separate file. With 'GeneCounts'
option, STAR will count number of reads per gene while mapping.
streamable: no
id: '#quantMode'
sbg:category: Quantification of Annotations
sbg:toolDefaultValue: '-'
required: no
- type:
- 'null'
- name: outSortingType
symbols:
- Unsorted
- SortedByCoordinate
- Unsorted SortedByCoordinate
type: enum
label: Output sorting type
description: Type of output sorting.
streamable: no
id: '#outSortingType'
sbg:category: Output
sbg:toolDefaultValue: SortedByCoordinate
required: no
- type:
- 'null'
- name: outSJfilterReads
symbols:
- All
- Unique
type: enum
label: Collapsed junctions reads
description: 'Which reads to consider for collapsed splice junctions output.
All: all reads, unique- and multi-mappers; Unique: uniquely mapping reads
only.'
streamable: no
id: '#outSJfilterReads'
inputBinding:
position: 0
prefix: --outSJfilterReads
separate: yes
sbg:cmdInclude: yes
sbg:category: 'Output filtering: splice junctions'
sbg:toolDefaultValue: All
required: no
- type:
- 'null'
- items: int
type: array
label: Min overhang SJ
description: Minimum overhang length for splice junctions on both sides for
each of the motifs. To set no output for desired motif, assign -1 to the corresponding
field. Does not apply to annotated junctions.
streamable: no
id: '#outSJfilterOverhangMin'
inputBinding:
position: 0
prefix: --outSJfilterOverhangMin
separate: yes
itemSeparator: ' '
sbg:cmdInclude: yes
sbg:category: 'Output filtering: splice junctions'
sbg:toolDefaultValue: 30 12 12 12
required: no
- type:
- 'null'
- items: int
type: array
label: Max gap allowed
description: 'Maximum gap allowed for junctions supported by 1,2,3...N reads
(int >= 0) i.e. by default junctions supported by 1 read can have gaps <=50000b,
by 2 reads: <=100000b, by 3 reads: <=200000. By 4 or more reads: any gap <=alignIntronMax.
Does not apply to annotated junctions.'
streamable: no
id: '#outSJfilterIntronMaxVsReadN'
inputBinding:
position: 0
prefix: --outSJfilterIntronMaxVsReadN
separate: yes
itemSeparator: ' '
sbg:cmdInclude: yes
sbg:category: 'Output filtering: splice junctions'
sbg:toolDefaultValue: 50000 100000 200000
required: no
- type:
- 'null'
- items: int
type: array
label: Min distance to other donor/acceptor
description: Minimum allowed distance to other junctions' donor/acceptor for
each of the motifs (int >= 0). Does not apply to annotated junctions.
streamable: no
id: '#outSJfilterDistToOtherSJmin'
inputBinding:
position: 0
prefix: --outSJfilterDistToOtherSJmin
separate: yes
itemSeparator: ' '
sbg:cmdInclude: yes
sbg:category: 'Output filtering: splice junctions'
sbg:toolDefaultValue: 10 0 5 10
required: no
- type:
- 'null'
- items: int
type: array
label: Min unique count
description: Minimum uniquely mapping read count per junction for each of the
motifs. To set no output for desired motif, assign -1 to the corresponding
field. Junctions are output if one of --outSJfilterCountUniqueMin OR --outSJfilterCountTotalMin
conditions are satisfied. Does not apply to annotated junctions.
streamable: no
id: '#outSJfilterCountUniqueMin'
inputBinding:
position: 0
prefix: --outSJfilterCountUniqueMin
separate: yes
itemSeparator: ' '
sbg:cmdInclude: yes
sbg:category: 'Output filtering: splice junctions'
sbg:toolDefaultValue: 3 1 1 1
required: no
- type:
- 'null'
- items: int
type: array
label: Min total count
description: Minimum total (multi-mapping+unique) read count per junction for
each of the motifs. To set no output for desired motif, assign -1 to the corresponding
field. Junctions are output if one of --outSJfilterCountUniqueMin OR --outSJfilterCountTotalMin
conditions are satisfied. Does not apply to annotated junctions.
streamable: no
id: '#outSJfilterCountTotalMin'
inputBinding:
position: 0
prefix: --outSJfilterCountTotalMin
separate: yes
itemSeparator: ' '
sbg:cmdInclude: yes
sbg:category: 'Output filtering: splice junctions'
sbg:toolDefaultValue: 3 1 1 1
required: no
- type:
- 'null'
- name: outSAMunmapped
symbols:
- None
- Within
type: enum
label: Write unmapped in SAM
description: 'Output of unmapped reads in the SAM format. None: no output Within:
output unmapped reads within the main SAM file (i.e. Aligned.out.sam).'
streamable: no
id: '#outSAMunmapped'
inputBinding:
position: 0
prefix: --outSAMunmapped
separate: yes
sbg:cmdInclude: yes
sbg:category: Output
sbg:toolDefaultValue: None
required: no
- type:
- 'null'
- name: outSAMtype
symbols:
- SAM
- BAM
type: enum
label: Output format
description: Format of output alignments.
streamable: no
id: '#outSAMtype'
inputBinding:
position: 0
separate: yes
valueFrom:
engine: '#cwl-js-engine'
script: |-
{
SAM_type = $job.inputs.outSAMtype
SORT_type = $job.inputs.outSortingType
if (SAM_type && SORT_type) {
return "--outSAMtype ".concat(SAM_type, " ", SORT_type)
}
}
class: Expression
sbg:cmdInclude: yes
sbg:category: Output
sbg:toolDefaultValue: SAM
required: no
- type:
- 'null'
- name: outSAMstrandField
symbols:
- None
- intronMotif
type: enum
label: Strand field flag
description: 'Cufflinks-like strand field flag. None: not used; intronMotif:
strand derived from the intron motif. Reads with inconsistent and/or non-canonical
introns are filtered out.'
streamable: no
id: '#outSAMstrandField'
inputBinding:
position: 0
prefix: --outSAMstrandField
separate: yes
sbg:cmdInclude: yes
sbg:category: Output
sbg:toolDefaultValue: None
required: no
- type:
- 'null'
- name: outSAMreadID
symbols:
- Standard
- Number
type: enum
label: Read ID
description: 'Read ID record type. Standard: first word (until space) from the
FASTx read ID line, removing /1,/2 from the end; Number: read number (index)
in the FASTx file.'
streamable: no
id: '#outSAMreadID'
inputBinding:
position: 0
prefix: --outSAMreadID
separate: yes
sbg:cmdInclude: yes
sbg:category: Output
sbg:toolDefaultValue: Standard
required: no
- type:
- 'null'
- name: outSAMprimaryFlag
symbols:
- OneBestScore
- AllBestScore
type: enum
label: Primary alignments
description: 'Which alignments are considered primary - all others will be marked
with 0x100 bit in the FLAG. OneBestScore: only one alignment with the best
score is primary; AllBestScore: all alignments with the best score are primary.'
streamable: no
id: '#outSAMprimaryFlag'
inputBinding:
position: 0
prefix: --outSAMprimaryFlag
separate: yes
sbg:cmdInclude: yes
sbg:category: Output
sbg:toolDefaultValue: OneBestScore
required: no
- type:
- 'null'
- name: outSAMorder
symbols:
- Paired
- PairedKeepInputOrder
type: enum
label: Sorting in SAM
description: 'Type of sorting for the SAM output. Paired: one mate after the
other for all paired alignments; PairedKeepInputOrder: one mate after the
other for all paired alignments, the order is kept the same as in the input
FASTQ files.'
streamable: no
id: '#outSAMorder'
inputBinding:
position: 0
prefix: --outSAMorder
separate: yes
sbg:cmdInclude: yes
sbg:category: Output
sbg:toolDefaultValue: Paired
required: no
- type:
- 'null'
- name: outSAMmode
symbols:
- Full
- NoQS
type: enum
label: SAM mode
description: 'Mode of SAM output. Full: full SAM output; NoQS: full SAM but
without quality scores.'
streamable: no
id: '#outSAMmode'
inputBinding:
position: 0
prefix: --outSAMmode
separate: yes
sbg:cmdInclude: yes
sbg:category: Output
sbg:toolDefaultValue: Full
required: no
- type:
- 'null'
- int
label: MAPQ value
description: MAPQ value for unique mappers (0 to 255).
streamable: no
id: '#outSAMmapqUnique'
inputBinding:
position: 0
prefix: --outSAMmapqUnique
separate: yes
sbg:cmdInclude: yes
sbg:category: Output
sbg:toolDefaultValue: '255'
required: no
- type:
- 'null'
- string
label: SAM header @PG
description: Extra @PG (software) line of the SAM header (in addition to STAR).
streamable: no
id: '#outSAMheaderPG'
inputBinding:
position: 0
prefix: --outSAMheaderPG
separate: yes
sbg:cmdInclude: yes
sbg:category: Output
sbg:toolDefaultValue: '-'
required: no
- type:
- 'null'
- string
label: SAM header @HD
description: '@HD (header) line of the SAM header.'
streamable: no
id: '#outSAMheaderHD'
inputBinding:
position: 0
prefix: --outSAMheaderHD
separate: yes
sbg:cmdInclude: yes
sbg:category: Output
sbg:toolDefaultValue: '-'
required: no
- type:
- 'null'
- int
label: OR SAM flag
description: Set specific bits of the SAM FLAG.
streamable: no
id: '#outSAMflagOR'
inputBinding:
position: 0
prefix: --outSAMflagOR
separate: yes
sbg:cmdInclude: yes
sbg:category: Output
sbg:toolDefaultValue: '0'
required: no
- type:
- 'null'
- int
label: AND SAM flag
description: Set specific bits of the SAM FLAG.
streamable: no
id: '#outSAMflagAND'
inputBinding:
position: 0
prefix: --outSAMflagAND
separate: yes
sbg:cmdInclude: yes
sbg:category: Output
sbg:toolDefaultValue: '65535'
required: no
- type:
- 'null'
- name: outSAMattributes
symbols:
- Standard
- NH
- All
- None
type: enum
label: SAM attributes
description: 'Desired SAM attributes, in the order desired for the output SAM.
NH: any combination in any order; Standard: NH HI AS nM; All: NH HI AS nM
NM MD jM jI; None: no attributes.'
streamable: no
id: '#outSAMattributes'
inputBinding:
position: 0
prefix: --outSAMattributes
separate: yes
sbg:cmdInclude: yes
sbg:category: Output
sbg:toolDefaultValue: Standard
required: no
- type:
- 'null'
- name: outReadsUnmapped
symbols:
- None
- Fastx
type: enum
label: Output unmapped reads
description: 'Output of unmapped reads (besides SAM). None: no output; Fastx:
output in separate fasta/fastq files, Unmapped.out.mate1/2.'
streamable: no
id: '#outReadsUnmapped'
inputBinding:
position: 0
prefix: --outReadsUnmapped
separate: yes
sbg:cmdInclude: yes
sbg:category: Output
sbg:toolDefaultValue: None
required: no
- type:
- 'null'
- int
label: Quality conversion
description: Add this number to the quality score (e.g. to convert from Illumina
to Sanger, use -31).
streamable: no
id: '#outQSconversionAdd'
inputBinding:
position: 0
prefix: --outQSconversionAdd
separate: yes
sbg:cmdInclude: yes
sbg:category: Output
sbg:toolDefaultValue: '0'
required: no
- type:
- 'null'
- name: outFilterType
symbols:
- Normal
- BySJout
type: enum
label: Filtering type
description: 'Type of filtering. Normal: standard filtering using only current
alignment; BySJout: keep only those reads that contain junctions that passed
filtering into SJ.out.tab.'
streamable: no
id: '#outFilterType'
inputBinding:
position: 0
prefix: --outFilterType
separate: yes
sbg:cmdInclude: yes
sbg:category: Output filtering
sbg:toolDefaultValue: Normal
required: no
- type:
- 'null'
- float
label: Min score normalized
description: '''Minimum score'' normalized to read length (sum of mates'' lengths
for paired-end reads).'
streamable: no
id: '#outFilterScoreMinOverLread'
inputBinding:
position: 0
prefix: --outFilterScoreMinOverLread
separate: yes
sbg:cmdInclude: yes
sbg:category: Output filtering
sbg:toolDefaultValue: '0.66'
required: no
- type:
- 'null'
- int
label: Min score
description: Alignment will be output only if its score is higher than this
value.
streamable: no
id: '#outFilterScoreMin'
inputBinding:
position: 0
prefix: --outFilterScoreMin
separate: yes
sbg:cmdInclude: yes
sbg:category: Output filtering
sbg:toolDefaultValue: '0'
required: no
- type:
- 'null'
- int
label: Multimapping score range
description: The score range below the maximum score for multimapping alignments.
streamable: no
id: '#outFilterMultimapScoreRange'
inputBinding:
position: 0
prefix: --outFilterMultimapScoreRange
separate: yes
sbg:cmdInclude: yes
sbg:category: Output filtering
sbg:toolDefaultValue: '1'
required: no
- type:
- 'null'
- int
label: Max number of mappings
description: Read alignments will be output only if the read maps fewer than
this value, otherwise no alignments will be output.
streamable: no
id: '#outFilterMultimapNmax'
inputBinding:
position: 0
prefix: --outFilterMultimapNmax
separate: yes
sbg:cmdInclude: yes
sbg:category: Output filtering
sbg:toolDefaultValue: '10'
required: no
- type:
- 'null'
- float
label: Mismatches to *read* length
description: Alignment will be output only if its ratio of mismatches to *read*
length is less than this value.
streamable: no
id: '#outFilterMismatchNoverReadLmax'
inputBinding:
position: 0
prefix: --outFilterMismatchNoverReadLmax
separate: yes
sbg:cmdInclude: yes
sbg:category: Output filtering
sbg:toolDefaultValue: '1'
required: no
- type:
- 'null'
- float
label: Mismatches to *mapped* length
description: Alignment will be output only if its ratio of mismatches to *mapped*
length is less than this value.
streamable: no
id: '#outFilterMismatchNoverLmax'
inputBinding:
position: 0
prefix: --outFilterMismatchNoverLmax
separate: yes
sbg:cmdInclude: yes
sbg:category: Output filtering
sbg:toolDefaultValue: '0.3'
required: no
- type:
- 'null'
- int
label: Max number of mismatches
description: Alignment will be output only if it has fewer mismatches than this
value.
streamable: no
id: '#outFilterMismatchNmax'
inputBinding:
position: 0
prefix: --outFilterMismatchNmax
separate: yes
sbg:cmdInclude: yes
sbg:category: Output filtering
sbg:toolDefaultValue: '10'
required: no
- type:
- 'null'
- float
label: Min matched bases normalized
description: '''Minimum matched bases'' normalized to read length (sum of mates
lengths for paired-end reads).'
streamable: no
id: '#outFilterMatchNminOverLread'
inputBinding:
position: 0
prefix: --outFilterMatchNminOverLread
separate: yes
sbg:cmdInclude: yes
sbg:category: Output filtering
sbg:toolDefaultValue: '0.66'
required: no
- type:
- 'null'
- int
label: Min matched bases
description: Alignment will be output only if the number of matched bases is
higher than this value.
streamable: no
id: '#outFilterMatchNmin'
inputBinding:
position: 0
prefix: --outFilterMatchNmin
separate: yes
sbg:cmdInclude: yes
sbg:category: Output filtering
sbg:toolDefaultValue: '0'
required: no
- type:
- 'null'
- name: outFilterIntronMotifs
symbols:
- None
- RemoveNoncanonical
- RemoveNoncanonicalUnannotated
type: enum
label: Motifs filtering
description: 'Filter alignment using their motifs. None: no filtering; RemoveNoncanonical:
filter out alignments that contain non-canonical junctions; RemoveNoncanonicalUnannotated:
filter out alignments that contain non-canonical unannotated junctions when
using annotated splice junctions database. The annotated non-canonical junctions
will be kept.'
streamable: no
id: '#outFilterIntronMotifs'
inputBinding:
position: 0
prefix: --outFilterIntronMotifs
separate: yes
sbg:cmdInclude: yes
sbg:category: Output filtering
sbg:toolDefaultValue: None
required: no
- type:
- 'null'
- int
label: Max insert junctions
description: Maximum number of junction to be inserted to the genome on the
fly at the mapping stage, including those from annotations and those detected
in the 1st step of the 2-pass run.
streamable: no
id: '#limitSjdbInsertNsj'
inputBinding:
position: 0
prefix: --limitSjdbInsertNsj
separate: yes
sbg:cmdInclude: yes
sbg:category: Limits
sbg:toolDefaultValue: '1000000'
required: no
- type:
- 'null'
- int
label: Junctions max number
description: Max number of junctions for one read (including all multi-mappers).
streamable: no
id: '#limitOutSJoneRead'
inputBinding:
position: 0
prefix: --limitOutSJoneRead
separate: yes
sbg:cmdInclude: yes
sbg:category: Limits
sbg:toolDefaultValue: '1000'
required: no
- type:
- 'null'
- int
label: Collapsed junctions max number
description: Max number of collapsed junctions.
streamable: no
id: '#limitOutSJcollapsed'
inputBinding:
position: 0
prefix: --limitOutSJcollapsed
separate: yes
sbg:cmdInclude: yes
sbg:category: Limits
sbg:toolDefaultValue: '1000000'
required: no
- type:
- 'null'
- int
label: Limit BAM sorting memory
description: Maximum available RAM for sorting BAM. If set to 0, it will be
set to the genome index size.
streamable: no
id: '#limitBAMsortRAM'
inputBinding:
position: 0
prefix: --limitBAMsortRAM
separate: yes
sbg:cmdInclude: yes
sbg:category: Limits
sbg:toolDefaultValue: '0'
required: no
- type:
- 'null'
- string
label: Genome dir name
description: Name of the directory which contains genome files (when genome.tar
is uncompressed).
streamable: no
id: '#genomeDirName'
inputBinding:
position: 0
prefix: --genomeDir
separate: yes
valueFrom:
engine: '#cwl-js-engine'
script: $job.inputs.genomeDirName || "genomeDir"
class: Expression
sbg:cmdInclude: yes
sbg:category: Basic
sbg:toolDefaultValue: genomeDir
required: no
- type:
- File
label: Genome files
description: Genome files created using STAR Genome Generate.
streamable: no
id: '#genome'
sbg:category: Basic
sbg:fileTypes: TAR
required: yes
- type:
- 'null'
- items: int
type: array
label: Clip 5p bases
description: Number of bases to clip from 5p of each mate. In case only one
value is given, it will be assumed the same for both mates.
streamable: no
id: '#clip5pNbases'
inputBinding:
position: 0
prefix: --clip5pNbases
separate: yes
itemSeparator: ' '
sbg:cmdInclude: yes
sbg:category: Read parameters
sbg:toolDefaultValue: '0'
required: no
- type:
- 'null'
- items: int
type: array
label: Clip 3p bases
description: Number of bases to clip from 3p of each mate. In case only one
value is given, it will be assumed the same for both mates.
streamable: no
id: '#clip3pNbases'
inputBinding:
position: 0
prefix: --clip3pNbases
separate: yes
itemSeparator: ' '
sbg:cmdInclude: yes
sbg:category: Read parameters
sbg:toolDefaultValue: '0'
required: no
- type:
- 'null'
- items: int
type: array
label: Clip 3p after adapter seq.
description: Number of bases to clip from 3p of each mate after the adapter
clipping. In case only one value is given, it will be assumed the same for
both mates.
streamable: no
id: '#clip3pAfterAdapterNbases'
inputBinding:
position: 0
prefix: --clip3pAfterAdapterNbases
separate: yes
itemSeparator: ' '
sbg:cmdInclude: yes
sbg:category: Read parameters
sbg:toolDefaultValue: '0'
required: no
- type:
- 'null'
- items: string
type: array
label: Clip 3p adapter sequence
description: Adapter sequence to clip from 3p of each mate. In case only one
value is given, it will be assumed the same for both mates.
streamable: no
id: '#clip3pAdapterSeq'
inputBinding:
position: 0
prefix: --clip3pAdapterSeq
separate: yes
itemSeparator: ' '
sbg:cmdInclude: yes
sbg:category: Read parameters
sbg:toolDefaultValue: '-'
required: no
- type:
- 'null'
- items: float
type: array
label: Max mismatches proportions
description: Max proportion of mismatches for 3p adapter clipping for each mate.
In case only one value is given, it will be assumed the same for both mates.
streamable: no
id: '#clip3pAdapterMMp'
inputBinding:
position: 0
prefix: --clip3pAdapterMMp
separate: yes
itemSeparator: ' '
sbg:cmdInclude: yes
sbg:category: Read parameters
sbg:toolDefaultValue: '0.1'
required: no
- type:
- 'null'
- int
label: Min segment length
description: Minimum length of chimeric segment length, if =0, no chimeric output
(int>=0).
streamable: no
id: '#chimSegmentMin'
inputBinding:
position: 0
prefix: --chimSegmentMin
separate: yes
sbg:cmdInclude: yes
sbg:category: Chimeric Alignments
sbg:toolDefaultValue: '15'
required: no
- type:
- 'null'
- int
label: Min separation score
description: Minimum difference (separation) between the best chimeric score
and the next one (int>=0).
streamable: no
id: '#chimScoreSeparation'
inputBinding:
position: 0
prefix: --chimScoreSeparation
separate: yes
sbg:cmdInclude: yes
sbg:category: Chimeric Alignments
sbg:toolDefaultValue: '10'
required: no
- type:
- 'null'
- int
label: Min total score
description: Minimum total (summed) score of the chimeric segments (int>=0).
streamable: no
id: '#chimScoreMin'
inputBinding:
position: 0
prefix: --chimScoreMin
separate: yes
sbg:cmdInclude: yes
sbg:category: Chimeric Alignments
sbg:toolDefaultValue: '0'
required: no
- type:
- 'null'
- int
label: Non-GT/AG penalty
description: Penalty for a non-GT/AG chimeric junction.
streamable: no
id: '#chimScoreJunctionNonGTAG'
inputBinding:
position: 0
prefix: --chimScoreJunctionNonGTAG
separate: yes
sbg:cmdInclude: yes
sbg:category: Chimeric Alignments
sbg:toolDefaultValue: '-1'
required: no
- type:
- 'null'
- int
label: Max drop score
description: Max drop (difference) of chimeric score (the sum of scores of all
chimeric segements) from the read length (int>=0).
streamable: no
id: '#chimScoreDropMax'
inputBinding:
position: 0
prefix: --chimScoreDropMax
separate: yes
sbg:cmdInclude: yes
sbg:category: Chimeric Alignments
sbg:toolDefaultValue: '20'
required: no
- type:
- 'null'
- name: chimOutType
symbols:
- SeparateSAMold
- Within
type: enum
label: Chimeric output type
description: 'Type of chimeric output. SeparateSAMold: output old SAM into separate
Chimeric.out.sam file; Within: output into main aligned SAM/BAM files.'
streamable: no
id: '#chimOutType'
sbg:category: Chimeric Alignments
sbg:toolDefaultValue: SeparateSAMold
required: no
- type:
- 'null'
- int
label: Min junction overhang
description: Minimum overhang for a chimeric junction (int>=0).
streamable: no
id: '#chimJunctionOverhangMin'
inputBinding:
position: 0
prefix: --chimJunctionOverhangMin
separate: yes
sbg:cmdInclude: yes
sbg:category: Chimeric Alignments
sbg:toolDefaultValue: '20'
required: no
- type:
- 'null'
- float
label: Max windows per read
description: Max number of windows per read (int>0).
streamable: no
id: '#alignWindowsPerReadNmax'
inputBinding:
position: 0
prefix: --alignWindowsPerReadNmax
separate: yes
sbg:cmdInclude: yes
sbg:category: Alignments and Seeding
sbg:toolDefaultValue: '10000'
required: no
- type:
- 'null'
- int
label: Max transcripts per window
description: Max number of transcripts per window (int>0).
streamable: no
id: '#alignTranscriptsPerWindowNmax'
inputBinding:
position: 0
prefix: --alignTranscriptsPerWindowNmax
separate: yes
sbg:cmdInclude: yes
sbg:category: Alignments and Seeding
sbg:toolDefaultValue: '100'
required: no
- type:
- 'null'
- int
label: Max transcripts per read
description: Max number of different alignments per read to consider (int>0).
streamable: no
id: '#alignTranscriptsPerReadNmax'
inputBinding:
position: 0
prefix: --alignTranscriptsPerReadNmax
separate: yes
sbg:cmdInclude: yes
sbg:category: Alignments and Seeding
sbg:toolDefaultValue: '10000'
required: no
- type:
- 'null'
- float
label: Min mapped length normalized
description: alignSplicedMateMapLmin normalized to mate length (float>0).
streamable: no
id: '#alignSplicedMateMapLminOverLmate'
inputBinding:
position: 0
prefix: --alignSplicedMateMapLminOverLmate
separate: yes
sbg:cmdInclude: yes
sbg:category: Alignments and Seeding
sbg:toolDefaultValue: '0.66'
required: no
- type:
- 'null'
- int
label: Min mapped length
description: Minimum mapped length for a read mate that is spliced (int>0).
streamable: no
id: '#alignSplicedMateMapLmin'
inputBinding:
position: 0
prefix: --alignSplicedMateMapLmin
separate: yes
sbg:cmdInclude: yes
sbg:category: Alignments and Seeding
sbg:toolDefaultValue: '0'
required: no
- type:
- 'null'
- name: alignSoftClipAtReferenceEnds
symbols:
- 'Yes'
- 'No'
type: enum
label: Soft clipping
description: 'Option which allows soft clipping of alignments at the reference
(chromosome) ends. Can be disabled for compatibility with Cufflinks/Cuffmerge.
Yes: Enables soft clipping; No: Disables soft clipping.'
streamable: no
id: '#alignSoftClipAtReferenceEnds'
inputBinding:
position: 0
prefix: --alignSoftClipAtReferenceEnds
separate: yes
sbg:cmdInclude: yes
sbg:category: Alignments and Seeding
sbg:toolDefaultValue: 'Yes'
required: no
- type:
- 'null'
- int
label: Min overhang
description: Minimum overhang (i.e. block size) for spliced alignments (int>0).
streamable: no
id: '#alignSJoverhangMin'
inputBinding:
position: 0
prefix: --alignSJoverhangMin
separate: yes
sbg:cmdInclude: yes
sbg:category: Alignments and Seeding
sbg:toolDefaultValue: '5'
required: no
- type:
- 'null'
- int
label: 'Min overhang: annotated'
description: Minimum overhang (i.e. block size) for annotated (sjdb) spliced
alignments (int>0).
streamable: no
id: '#alignSJDBoverhangMin'
inputBinding:
position: 0
prefix: --alignSJDBoverhangMin
separate: yes
sbg:cmdInclude: yes
sbg:category: Alignments and Seeding
sbg:toolDefaultValue: '3'
required: no
- type:
- 'null'
- int
label: Max mates gap
description: Maximum gap between two mates, if 0, max intron gap will be determined
by (2^winBinNbits)*winAnchorDistNbins.
streamable: no
id: '#alignMatesGapMax'
inputBinding:
position: 0
prefix: --alignMatesGapMax
separate: yes
sbg:cmdInclude: yes
sbg:category: Alignments and Seeding
sbg:toolDefaultValue: '0'
required: no
- type:
- 'null'
- int
label: Min intron size
description: 'Minimum intron size: genomic gap is considered intron if its length
>= alignIntronMin, otherwise it is considered Deletion (int>=0).'
streamable: no
id: '#alignIntronMin'
inputBinding:
position: 0
prefix: --alignIntronMin
separate: yes
sbg:cmdInclude: yes
sbg:category: Alignments and Seeding
sbg:toolDefaultValue: '21'
required: no
- type:
- 'null'
- int
label: Max intron size
description: Maximum intron size, if 0, max intron size will be determined by
(2^winBinNbits)*winAnchorDistNbins.
streamable: no
id: '#alignIntronMax'
inputBinding:
position: 0
prefix: --alignIntronMax
separate: yes
sbg:cmdInclude: yes
sbg:category: Alignments and Seeding
sbg:toolDefaultValue: '0'
required: no
- type:
- 'null'
- name: alignEndsType
symbols:
- Local
- EndToEnd
type: enum
label: Alignment type
description: 'Type of read ends alignment. Local: standard local alignment with
soft-clipping allowed. EndToEnd: force end to end read alignment, do not soft-clip.'
streamable: no
id: '#alignEndsType'
inputBinding:
position: 0
prefix: --alignEndsType
separate: yes
sbg:cmdInclude: yes
sbg:category: Alignments and Seeding
sbg:toolDefaultValue: Local
required: no
outputs:
- type:
- 'null'
- items: File
type: array
label: Unmapped reads
description: Output of unmapped reads.
streamable: no
id: '#unmapped_reads'
outputBinding:
glob: '*Unmapped.out*'
sbg:fileTypes: FASTQ
- type:
- 'null'
- File
label: Transcriptome alignments
description: Alignments translated into transcript coordinates.
streamable: no
id: '#transcriptome_aligned_reads'
outputBinding:
glob: '*Transcriptome*'
sbg:fileTypes: BAM
- type:
- 'null'
- File
label: Splice junctions
description: High confidence collapsed splice junctions in tab-delimited format.
Only junctions supported by uniquely mapping reads are reported.
streamable: no
id: '#splice_junctions'
outputBinding:
glob: '*SJ.out.tab'
sbg:fileTypes: TAB
- type:
- 'null'
- File
label: Reads per gene
description: File with number of reads per gene. A read is counted if it overlaps
(1nt or more) one and only one gene.
streamable: no
id: '#reads_per_gene'
outputBinding:
glob: '*ReadsPerGene*'
sbg:fileTypes: TAB
- type:
- 'null'
- items: File
type: array
label: Log files
description: Log files produced during alignment.
streamable: no
id: '#log_files'
outputBinding:
glob: '*Log*.out'
sbg:fileTypes: OUT
- type:
- 'null'
- File
label: Intermediate genome files
description: Archive with genome files produced when annotations are included
on the fly (in the mapping step).
streamable: no
id: '#intermediate_genome'
outputBinding:
glob: '*_STARgenome.tar'
sbg:fileTypes: TAR
- type:
- 'null'
- File
label: Chimeric junctions
description: If chimSegmentMin in 'Chimeric Alignments' section is set to 0,
'Chimeric Junctions' won't be output.
streamable: no
id: '#chimeric_junctions'
outputBinding:
glob: '*Chimeric.out.junction'
sbg:fileTypes: JUNCTION
- type:
- 'null'
- File
label: Chimeric alignments
description: Aligned Chimeric sequences SAM - if chimSegmentMin = 0, no Chimeric
Alignment SAM and Chimeric Junctions outputs.
streamable: no
id: '#chimeric_alignments'
outputBinding:
glob: '*.Chimeric.out.sam'
sbg:fileTypes: SAM
- type:
- 'null'
- File
label: Aligned SAM/BAM
description: Aligned sequence in SAM/BAM format.
streamable: no
id: '#aligned_reads'
outputBinding:
glob:
engine: '#cwl-js-engine'
script: |-
{
if ($job.inputs.outSortingType == 'SortedByCoordinate') {
sort_name = '.sortedByCoord'
}
else {
sort_name = ''
}
if ($job.inputs.outSAMtype == 'BAM') {
sam_name = "*.Aligned".concat( sort_name, '.out.bam')
}
else {
sam_name = "*.Aligned.out.sam"
}
return sam_name
}
class: Expression
sbg:fileTypes: SAM, BAM
requirements:
- class: ExpressionEngineRequirement
id: '#cwl-js-engine'
requirements:
- class: DockerRequirement
dockerPull: rabix/js-engine
hints:
- class: DockerRequirement
dockerPull: images.sbgenomics.com/ana_d/star:2.4.2a
dockerImageId: a4b0ad2c3cae
- class: sbg:MemRequirement
value: 60000
- class: sbg:CPURequirement
value: 15
label: STAR
description: STAR is an ultrafast universal RNA-seq aligner. It has very high
mapping speed, accurate alignment of contiguous and spliced reads, detection
of polyA-tails, non-canonical splices and chimeric (fusion) junctions. It works
with reads starting from lengths ~15 bases up to ~300 bases. In case of having
longer reads, use of STAR Long is recommended.
class: CommandLineTool
arguments:
- position: 0
separate: yes
valueFrom:
engine: '#cwl-js-engine'
script: |-
{
file = [].concat($job.inputs.reads)[0].path
extension = /(?:\.([^.]+))?$/.exec(file)[1]
if (extension == "gz") {
return "--readFilesCommand zcat"
} else if (extension == "bz2") {
return "--readFilesCommand bzcat"
}
}
class: Expression
- position: 0
separate: yes
valueFrom:
engine: '#cwl-js-engine'
script: "{\t\n var sjFormat = \"False\"\n var gtfgffFormat = \"False\"\n
\ var list = $job.inputs.sjdbGTFfile\n var paths_list = []\n var joined_paths
= \"\"\n \n if (list) {\n list.forEach(function(f){return paths_list.push(f.path)})\n
\ joined_paths = paths_list.join(\" \")\n\n\n paths_list.forEach(function(f){\n
\ ext = f.replace(/^.*\\./, '')\n if (ext == \"gff\" || ext ==
\"gtf\") {\n gtfgffFormat = \"True\"\n return gtfgffFormat\n
\ }\n if (ext == \"txt\") {\n sjFormat = \"True\"\n return
sjFormat\n }\n })\n\n if ($job.inputs.sjdbGTFfile && $job.inputs.sjdbInsertSave
!= \"None\") {\n if (sjFormat == \"True\") {\n return \"--sjdbFileChrStartEnd
\".concat(joined_paths)\n }\n else if (gtfgffFormat == \"True\")
{\n return \"--sjdbGTFfile \".concat(joined_paths)\n }\n }\n
\ }\n}"
class: Expression
- position: 0
separate: yes
valueFrom:
engine: '#cwl-js-engine'
script: "{\n a = b = c = d = e = f = g = []\n if ($job.inputs.sjdbGTFchrPrefix)
{\n a = [\"--sjdbGTFchrPrefix\", $job.inputs.sjdbGTFchrPrefix]\n }\n
\ if ($job.inputs.sjdbGTFfeatureExon) {\n b = [\"--sjdbGTFfeatureExon\",
$job.inputs.sjdbGTFfeatureExon]\n }\n if ($job.inputs.sjdbGTFtagExonParentTranscript)
{\n c = [\"--sjdbGTFtagExonParentTranscript\", $job.inputs.sjdbGTFtagExonParentTranscript]\n
\ }\n if ($job.inputs.sjdbGTFtagExonParentGene) {\n d = [\"--sjdbGTFtagExonParentGene\",
$job.inputs.sjdbGTFtagExonParentGene]\n }\n if ($job.inputs.sjdbOverhang)
{\n e = [\"--sjdbOverhang\", $job.inputs.sjdbOverhang]\n }\n if ($job.inputs.sjdbScore)
{\n f = [\"--sjdbScore\", $job.inputs.sjdbScore]\n }\n if ($job.inputs.sjdbInsertSave)
{\n g = [\"--sjdbInsertSave\", $job.inputs.sjdbInsertSave]\n }\n \n
\ \n \n if ($job.inputs.sjdbInsertSave != \"None\" && $job.inputs.sjdbGTFfile)
{\n new_list = a.concat(b, c, d, e, f, g)\n return new_list.join(\"
\")\n }\n}"
class: Expression
- position: 0
separate: yes
valueFrom:
engine: '#cwl-js-engine'
script: |-
{
if ($job.inputs.twopassMode == "Basic") {
return "--twopass1readsN ".concat($job.inputs.twopass1readsN)
}
}
class: Expression
- position: 0
separate: yes
valueFrom:
engine: '#cwl-js-engine'
script: |-
{
if ($job.inputs.chimOutType == "Within") {
return "--chimOutType ".concat("Within", $job.inputs.outSAMtype)
}
else {
return "--chimOutType SeparateSAMold"
}
}
class: Expression
- position: 0
separate: yes
valueFrom:
engine: '#cwl-js-engine'
script: "{\n var param_list = []\n \n function add_param(key, value){\n
\ if (value == \"\") {\n return\n }\n else {\n return
param_list.push(key.concat(\":\", value))\n }\n }\n \n add_param('ID',
\"1\")\n if ($job.inputs.rg_seq_center) {\n add_param('CN', $job.inputs.rg_seq_center)\n
\ } else if ([].concat($job.inputs.reads)[0].metadata.seq_center) {\n add_param('CN',
[].concat($job.inputs.reads)[0].metadata.seq_center)\n }\n if ($job.inputs.rg_library_id)
{\n add_param('LB', $job.inputs.rg_library_id)\n } else if ([].concat($job.inputs.reads)[0].metadata.library_id)
{\n add_param('LB', [].concat($job.inputs.reads)[0].metadata.library_id)\n
\ }\n if ($job.inputs.rg_mfl) {\n add_param('PI', $job.inputs.rg_mfl)\n
\ } else if ([].concat($job.inputs.reads)[0].metadata.median_fragment_length)
{\n add_param('PI', [].concat($job.inputs.reads)[0].metadata.median_fragment_length)\n
\ }\n if ($job.inputs.rg_platform) {\n add_param('PL', $job.inputs.rg_platform.replace(/
/g,\"_\"))\n } else if ([].concat($job.inputs.reads)[0].metadata.platform)
{\n add_param('PL', [].concat($job.inputs.reads)[0].metadata.platform.replace(/
/g,\"_\"))\n }\n if ($job.inputs.rg_platform_unit_id) {\n add_param('PU',
$job.inputs.rg_platform_unit_id)\n } else if ([].concat($job.inputs.reads)[0].metadata.platform_unit_id)
{\n add_param('PU', [].concat($job.inputs.reads)[0].metadata.platform_unit_id)\n
\ }\n if ($job.inputs.rg_sample_id) {\n add_param('SM', $job.inputs.rg_sample_id)\n
\ } else if ([].concat($job.inputs.reads)[0].metadata.sample_id) {\n add_param('SM',
[].concat($job.inputs.reads)[0].metadata.sample_id)\n }\n return \"--outSAMattrRGline
\".concat(param_list.join(\" \"))\n}"
class: Expression
- position: 0
separate: yes
valueFrom:
engine: '#cwl-js-engine'
script: |-
{
if ($job.inputs.sjdbGTFfile && $job.inputs.quantMode) {
return "--quantMode ".concat($job.inputs.quantMode)
}
}
class: Expression
- position: 100
separate: yes
valueFrom:
engine: '#cwl-js-engine'
script: "{\n function sharedStart(array){\n var A= array.concat().sort(),
\n a1= A[0], a2= A[A.length-1], L= a1.length, i= 0;\n while(i<L &&
a1.charAt(i)=== a2.charAt(i)) i++;\n return a1.substring(0, i);\n }\n
\ path_list = []\n arr = [].concat($job.inputs.reads)\n arr.forEach(function(f){return
path_list.push(f.path.replace(/\\\\/g,'/').replace( /.*\\//, '' ))})\n common_prefix
= sharedStart(path_list)\n intermediate = common_prefix.replace( /\\-$|\\_$|\\.$/,
'' ).concat(\"._STARgenome\")\n source = \"./\".concat(intermediate)\n
\ destination = intermediate.concat(\".tar\")\n if ($job.inputs.sjdbGTFfile
&& $job.inputs.sjdbInsertSave && $job.inputs.sjdbInsertSave != \"None\")
{\n return \"&& tar -vcf \".concat(destination, \" \", source)\n }\n}"
class: Expression
- position: 0
prefix: --outFileNamePrefix
separate: yes
valueFrom:
engine: '#cwl-js-engine'
script: "{\n function sharedStart(array){\n var A= array.concat().sort(),
\n a1= A[0], a2= A[A.length-1], L= a1.length, i= 0;\n while(i<L &&
a1.charAt(i)=== a2.charAt(i)) i++;\n return a1.substring(0, i);\n }\n
\ path_list = []\n arr = [].concat($job.inputs.reads)\n arr.forEach(function(f){return
path_list.push(f.path.replace(/\\\\/g,'/').replace( /.*\\//, '' ))})\n common_prefix
= sharedStart(path_list)\n return \"./\".concat(common_prefix.replace(
/\\-$|\\_$|\\.$/, '' ), \".\")\n}"
class: Expression
- position: 101
separate: yes
valueFrom:
engine: '#cwl-js-engine'
script: "{\n function sharedStart(array){\n var A= array.concat().sort(),
\n a1= A[0], a2= A[A.length-1], L= a1.length, i= 0;\n while(i<L &&
a1.charAt(i)=== a2.charAt(i)) i++;\n return a1.substring(0, i);\n }\n
\ path_list = []\n arr = [].concat($job.inputs.reads)\n arr.forEach(function(f){return
path_list.push(f.path.replace(/\\\\/g,'/').replace( /.*\\//, '' ))})\n common_prefix
= sharedStart(path_list)\n mate1 = common_prefix.replace( /\\-$|\\_$|\\.$/,
'' ).concat(\".Unmapped.out.mate1\")\n mate2 = common_prefix.replace( /\\-$|\\_$|\\.$/,
'' ).concat(\".Unmapped.out.mate2\")\n mate1fq = mate1.concat(\".fastq\")\n
\ mate2fq = mate2.concat(\".fastq\")\n if ($job.inputs.outReadsUnmapped
== \"Fastx\" && arr.length > 1) {\n return \"&& mv \".concat(mate1, \"
\", mate1fq, \" && mv \", mate2, \" \", mate2fq)\n }\n else if ($job.inputs.outReadsUnmapped
== \"Fastx\" && arr.length == 1) {\n return \"&& mv \".concat(mate1,
\" \", mate1fq)\n }\n}"
class: Expression
stdin: ''
stdout: ''
successCodes: []
temporaryFailCodes: []
x: 624.0
'y': 323
sbg:x: 700.0
sbg:y: 200.0
sbg:canvas_zoom: 0.6
sbg:canvas_y: -16
sbg:canvas_x: -41
sbg:batchInput: '#sjdbGTFfile'
sbg:batchBy:
type: criteria
criteria:
- metadata.sample_id
- metadata.library_id
When you push your app to the platform, you will see the batch available at task page or workflow editor.
Graphic User Interface on Seven Bridges Platform is way more conventient
Yes, you could use the same function convert_app
to import json file.
f1 = system.file("extdata/app", "flow_star.json", package = "sevenbridges")
f1 = convert_app(f1)
## show it
## f1
Just like Tool
object, you also have convenient utils for it, especially useful when you execute task.
f1 = system.file("extdata/app", "flow_star.json", package = "sevenbridges")
f1 = convert_app(f1)
## input matrix
head(f1$input_matrix())
id label type required
1 #sjdbGTFfile sjdbGTFfile File... FALSE
2 #fastq fastq File... TRUE
3 #genomeFastaFiles genomeFastaFiles File TRUE
4 #sjdbGTFtagExonParentTranscript Exons' parents name string FALSE
5 #sjdbGTFtagExonParentGene Gene name string FALSE
6 #winAnchorMultimapNmax Max loci anchors int FALSE
fileTypes
1 null
2 null
3 null
4 null
5 null
6 null
## by name
head(f1$input_matrix(c("id", "type", "required")))
id type required
1 #sjdbGTFfile File... FALSE
2 #fastq File... TRUE
3 #genomeFastaFiles File TRUE
4 #sjdbGTFtagExonParentTranscript string FALSE
5 #sjdbGTFtagExonParentGene string FALSE
6 #winAnchorMultimapNmax int FALSE
## return only required
head(f1$input_matrix(required = TRUE))
id label type required fileTypes
2 #fastq fastq File... TRUE null
3 #genomeFastaFiles genomeFastaFiles File TRUE null
## return everything
head(f1$input_matrix(NULL))
id type required fileTypes
1 #sjdbGTFfile File... FALSE null
2 #fastq File... TRUE null
3 #genomeFastaFiles File TRUE null
4 #sjdbGTFtagExonParentTranscript string FALSE null
5 #sjdbGTFtagExonParentGene string FALSE null
6 #winAnchorMultimapNmax int FALSE null
label category stageInput streamable
1 sjdbGTFfile null null FALSE
2 fastq null null FALSE
3 genomeFastaFiles null null FALSE
4 Exons' parents name Splice junctions db parameters null FALSE
5 Gene name Splice junctions db parameters null FALSE
6 Max loci anchors Windows, Anchors, Binning null FALSE
sbg.x sbg.y sbg.includeInPorts
1 160.50 195.0833 NA
2 164.25 323.7500 TRUE
3 167.75 469.9999 NA
4 200.00 350.0000 NA
5 200.00 400.0000 NA
6 200.00 450.0000 NA
description
1 <NA>
2 <NA>
3 <NA>
4 Tag name to be used as exons’ transcript-parents.
5 Tag name to be used as exons’ gene-parents.
6 Max number of loci anchors are allowed to map to (int>0).
sbg.toolDefaultValue
1 <NA>
2 <NA>
3 <NA>
4 transcript_id
5 gene_id
6 50
link_to
1 #STAR_Genome_Generate.sjdbGTFfile | #STAR.sjdbGTFfile
2 #SBG_FASTQ_Quality_Detector.fastq
3 #STAR_Genome_Generate.genomeFastaFiles
4 #STAR_Genome_Generate.sjdbGTFtagExonParentTranscript
5 #STAR_Genome_Generate.sjdbGTFtagExonParentGene
6 #STAR.winAnchorMultimapNmax
## return a output matrix with more informtion
head(f1$output_matrix())
id label type
1 #unmapped_reads unmapped_reads File...
2 #transcriptome_aligned_reads transcriptome_aligned_reads File
3 #splice_junctions splice_junctions File
4 #reads_per_gene reads_per_gene File
5 #log_files log_files File...
6 #chimeric_junctions chimeric_junctions File
fileTypes
1 null
2 null
3 null
4 null
5 null
6 null
## return only a few fields
head(f1$output_matrix(c("id", "type")))
id type
1 #unmapped_reads File...
2 #transcriptome_aligned_reads File
3 #splice_junctions File
4 #reads_per_gene File
5 #log_files File...
6 #chimeric_junctions File
## return everything
head(f1$output_matrix(NULL))
id label type
1 #unmapped_reads unmapped_reads File...
2 #transcriptome_aligned_reads transcriptome_aligned_reads File
3 #splice_junctions splice_junctions File
4 #reads_per_gene reads_per_gene File
5 #log_files log_files File...
6 #chimeric_junctions chimeric_junctions File
fileTypes required source streamable
1 null FALSE #STAR.unmapped_reads FALSE
2 null FALSE #STAR.transcriptome_aligned_reads FALSE
3 null FALSE #STAR.splice_junctions FALSE
4 null FALSE #STAR.reads_per_gene FALSE
5 null FALSE #STAR.log_files FALSE
6 null FALSE #STAR.chimeric_junctions FALSE
sbg.includeInPorts sbg.x sbg.y link_to
1 TRUE 766.2498 159.58331 #STAR.unmapped_reads
2 TRUE 1118.9998 86.58332 #STAR.transcriptome_aligned_reads
3 TRUE 1282.3330 167.49998 #STAR.splice_junctions
4 TRUE 1394.4164 245.74996 #STAR.reads_per_gene
5 TRUE 1505.0830 322.99995 #STAR.log_files
6 TRUE 1278.7498 446.74996 #STAR.chimeric_junctions
## flow inputs
f1$input_type()
sjdbGTFfile fastq
"File..." "File..."
genomeFastaFiles sjdbGTFtagExonParentTranscript
"File" "string"
sjdbGTFtagExonParentGene winAnchorMultimapNmax
"string" "int"
winAnchorDistNbins
"int"
## flow outouts
f1$output_type()
unmapped_reads transcriptome_aligned_reads
"File..." "File"
splice_junctions reads_per_gene
"File" "File"
log_files chimeric_junctions
"File..." "File"
intermediate_genome chimeric_alignments
"File" "File"
sorted_bam result
"File" "File"
## list tools
f1$list_tool()
label
1 STAR Genome Generate
2 SBG FASTQ Quality Detector
3 Picard SortSam
4 STAR
sbgid
1 sevenbridges/public-apps/star-genome-generate/1
2 sevenbridges/public-apps/sbg-fastq-quality-detector/3
3 sevenbridges/public-apps/picard-sortsam-1-140/2
4 sevenbridges/public-apps/star/4
id
1 #STAR_Genome_Generate
2 #SBG_FASTQ_Quality_Detector
3 #Picard_SortSam
4 #STAR
## f1$get_tool("STAR")
There are more utilities please check example at help(Flow)
To create a workflow, we provide simple interface to pipe your tool into a single workflow, it works under situation like
Note for complicated workflow construction, I highly recommend just use our graphical interface to do it, there is no better way.
Let’s create tools from scratch to perform a simple task
library(sevenbridges)
## A tool that generate a 100 random number
t1 <- Tool(id = "runif new test 3", label = "random number",
hints = requirements(docker(pull = "rocker/r-base")),
baseCommand = "Rscript -e 'x = runif(100); write.csv(x, file = 'random.txt', row.names = FALSE)'",
outputs = output(id = "random",
type = "file",
glob = "random.txt"))
## A tool that take log
fd <- fileDef(name = "log.R",
content = "args = commandArgs(TRUE)
x = read.table(args[1], header = TRUE)[,'x']
x = log(x)
write.csv(x, file = 'random_log.txt', row.names = FALSE)
")
t2 <- Tool(id = "log new test 3", label = "get log",
hints = requirements(docker(pull = "rocker/r-base")),
requirements = requirements(fd),
baseCommand = "Rscript log.R",
inputs = input(id = "number",
type = "file"),
outputs = output(id = "log",
type = "file",
glob = "*.txt"))
## A tool that do a mean
fd <- fileDef(name = "mean.R",
content = "args = commandArgs(TRUE)
x = read.table(args[1], header = TRUE)[,'x']
x = mean(x)
write.csv(x, file = 'random_mean.txt', row.names = FALSE)")
t3 <- Tool(id = "mean new test 3", label = "get mean",
hints = requirements(docker(pull = "rocker/r-base")),
requirements = requirements(fd),
baseCommand = "Rscript mean.R",
inputs = input(id = "number",
type = "file"),
outputs = output(id = "mean",
type = "file",
glob = "*.txt"))
f = t1 %>>% t2
flow_output: #get_log.log
f = link(t1, t2, "#random", "#number")
flow_output: #get_log.log
## you can not directly copy-paste it
## please use API to push it, we will register each tool for you.
# library(clipr)
# write_clip(f$toJSON(pretty = TRUE))
t2 <- Tool(id = "log new test 3", label = "get log",
hints = requirements(docker(pull = "rocker/r-base")),
## requirements = requirements(fd),
baseCommand = "Rscript log.R",
inputs = input(id = "number",
type = "file",
secondaryFiles = sevenbridges:::set_box(".bai")),
outputs = output(id = "log",
type = "file",
glob = "*.txt"))
# library(clipr)
# write_clip(t2$toJSON(pretty = TRUE))
Note: this workflow contains tools that do not exist on the platform, so if you directly copy and paste the JSON into the GUI, it won’t work properly, however, a simple way is to push your app to platform via API. This will add new tools one by one to your project before add your workflow app on the platform. Alternative if you connect two tools you know they exist on the platform, you don’t need to do so.
## auto-check tool info and push new tools
p$app_add("new_flow_log", f)
Now let’s connect two tools
Checking potential mapping is easy with function link_what
, it will print matched input and outputs. Then the generic function link
will allow you to connect two Tool
objects
If you don’t specify which input/ouput to expose at flow level for new Flow
object, it will expose all availabl ones and print the message, otherwise, please provide parameters for flow_input
and flow_output
with full id.
t1 = system.file("extdata/app", "tool_unpack_fastq.json",
package = "sevenbridges")
t2 = system.file("extdata/app", "tool_star.json",
package = "sevenbridges")
t1 = convert_app(t1)
t2 = convert_app(t2)
## check possible link
link_what(t1, t2)
$File...
$File...$from
id label type fileTypes
1 #output_fastq_files Output FASTQ files File... FASTQ
full.name
1 #SBG_Unpack_FASTQs
$File...$to
id label type required prefix
1 #reads Read sequence File... TRUE <NA>
95 #sjdbGTFfile Splice junction file File... FALSE <NA>
fileTypes full.name
1 FASTA, FASTQ, FA, FQ, FASTQ.GZ, FQ.GZ, FASTQ.BZ2, FQ.BZ2 #STAR
95 GTF, GFF, TXT #STAR
## link
f1 = link(t1, t2, "output_fastq_files", "reads")
flow_input: #SBG_Unpack_FASTQs.input_archive_file / #STAR.sjdbGTFfile / #STAR.genome
flow_output: #STAR.aligned_reads / #STAR.transcriptome_aligned_reads / #STAR.reads_per_gene / #STAR.log_files / #STAR.splice_junctions / #STAR.chimeric_junctions / #STAR.unmapped_reads / #STAR.intermediate_genome / #STAR.chimeric_alignments
## link
t1$output_id(TRUE)
File...
"#SBG_Unpack_FASTQs.output_fastq_files"
t2$input_id(TRUE)
File...
"#STAR.reads"
enum
"#STAR.readMatesLengthsIn"
int
"#STAR.readMapNumber"
int
"#STAR.limitOutSJoneRead"
int
"#STAR.limitOutSJcollapsed"
enum
"#STAR.outReadsUnmapped"
int
"#STAR.outQSconversionAdd"
enum
"#STAR.outSAMtype"
enum
"#STAR.outSortingType"
enum
"#STAR.outSAMmode"
enum
"#STAR.outSAMstrandField"
enum
"#STAR.outSAMattributes"
enum
"#STAR.outSAMunmapped"
enum
"#STAR.outSAMorder"
enum
"#STAR.outSAMprimaryFlag"
enum
"#STAR.outSAMreadID"
int
"#STAR.outSAMmapqUnique"
int
"#STAR.outSAMflagOR"
int
"#STAR.outSAMflagAND"
string
"#STAR.outSAMheaderHD"
string
"#STAR.outSAMheaderPG"
string
"#STAR.rg_seq_center"
string
"#STAR.rg_library_id"
string
"#STAR.rg_mfl"
enum
"#STAR.rg_platform"
string
"#STAR.rg_platform_unit_id"
string
"#STAR.rg_sample_id"
enum
"#STAR.outFilterType"
int
"#STAR.outFilterMultimapScoreRange"
int
"#STAR.outFilterMultimapNmax"
int
"#STAR.outFilterMismatchNmax"
float
"#STAR.outFilterMismatchNoverLmax"
float
"#STAR.outFilterMismatchNoverReadLmax"
int
"#STAR.outFilterScoreMin"
float
"#STAR.outFilterScoreMinOverLread"
int
"#STAR.outFilterMatchNmin"
float
"#STAR.outFilterMatchNminOverLread"
enum
"#STAR.outFilterIntronMotifs"
enum
"#STAR.outSJfilterReads"
int...
"#STAR.outSJfilterOverhangMin"
int...
"#STAR.outSJfilterCountUniqueMin"
int...
"#STAR.outSJfilterCountTotalMin"
int...
"#STAR.outSJfilterDistToOtherSJmin"
int...
"#STAR.outSJfilterIntronMaxVsReadN"
int
"#STAR.scoreGap"
int
"#STAR.scoreGapNoncan"
int
"#STAR.scoreGapGCAG"
int
"#STAR.scoreGapATAC"
float
"#STAR.scoreGenomicLengthLog2scale"
int
"#STAR.scoreDelOpen"
int
"#STAR.scoreDelBase"
int
"#STAR.scoreInsOpen"
int
"#STAR.scoreInsBase"
int
"#STAR.scoreStitchSJshift"
int
"#STAR.seedSearchStartLmax"
float
"#STAR.seedSearchStartLmaxOverLread"
int
"#STAR.seedSearchLmax"
int
"#STAR.seedMultimapNmax"
int
"#STAR.seedPerReadNmax"
int
"#STAR.seedPerWindowNmax"
int
"#STAR.seedNoneLociPerWindow"
int
"#STAR.alignIntronMin"
int
"#STAR.alignIntronMax"
int
"#STAR.alignMatesGapMax"
int
"#STAR.alignSJoverhangMin"
int
"#STAR.alignSJDBoverhangMin"
int
"#STAR.alignSplicedMateMapLmin"
float
"#STAR.alignSplicedMateMapLminOverLmate"
float
"#STAR.alignWindowsPerReadNmax"
int
"#STAR.alignTranscriptsPerWindowNmax"
int
"#STAR.alignTranscriptsPerReadNmax"
enum
"#STAR.alignEndsType"
enum
"#STAR.alignSoftClipAtReferenceEnds"
int
"#STAR.winAnchorMultimapNmax"
int
"#STAR.winBinNbits"
int
"#STAR.winAnchorDistNbins"
int
"#STAR.winFlankNbins"
int
"#STAR.chimSegmentMin"
int
"#STAR.chimScoreMin"
int
"#STAR.chimScoreDropMax"
int
"#STAR.chimScoreSeparation"
int
"#STAR.chimScoreJunctionNonGTAG"
int
"#STAR.chimJunctionOverhangMin"
enum
"#STAR.quantMode"
int
"#STAR.twopass1readsN"
enum
"#STAR.twopassMode"
string
"#STAR.genomeDirName"
enum
"#STAR.sjdbInsertSave"
string
"#STAR.sjdbGTFchrPrefix"
string
"#STAR.sjdbGTFfeatureExon"
string
"#STAR.sjdbGTFtagExonParentTranscript"
string
"#STAR.sjdbGTFtagExonParentGene"
int
"#STAR.sjdbOverhang"
int
"#STAR.sjdbScore"
File...
"#STAR.sjdbGTFfile"
int...
"#STAR.clip3pNbases"
int...
"#STAR.clip5pNbases"
string...
"#STAR.clip3pAdapterSeq"
float...
"#STAR.clip3pAdapterMMp"
int...
"#STAR.clip3pAfterAdapterNbases"
enum
"#STAR.chimOutType"
File
"#STAR.genome"
int
"#STAR.limitSjdbInsertNsj"
enum
"#STAR.quantTranscriptomeBan"
int
"#STAR.limitBAMsortRAM"
f2 = link(t1, t2, "output_fastq_files", "reads",
flow_input = "#SBG_Unpack_FASTQs.input_archive_file",
flow_output = "#STAR.log_files")
flow_input: #SBG_Unpack_FASTQs.input_archive_file / #STAR.genome
flow_output: #STAR.log_files
# library(clipr)
# write_clip(f2$toJSON())
tool.in = system.file("extdata/app", "tool_unpack_fastq.json", package = "sevenbridges")
flow.in = system.file("extdata/app", "flow_star.json", package = "sevenbridges")
t1 = convert_app(tool.in)
f2 = convert_app(flow.in)
## consulting link_what first
f2$link_map()
id
1 #STAR_Genome_Generate.sjdbGTFtagExonParentTranscript
2 #STAR_Genome_Generate.sjdbGTFtagExonParentGene
3 #STAR_Genome_Generate.sjdbGTFfile
4 #STAR_Genome_Generate.genomeFastaFiles
5 #SBG_FASTQ_Quality_Detector.fastq
6 #Picard_SortSam.input_bam
7 #STAR.winAnchorMultimapNmax
8 #STAR.winAnchorDistNbins
9 #STAR.sjdbGTFfile
10 #STAR.reads
11 #STAR.genome
12 #unmapped_reads
13 #transcriptome_aligned_reads
14 #splice_junctions
15 #reads_per_gene
16 #log_files
17 #chimeric_junctions
18 #intermediate_genome
19 #chimeric_alignments
20 #sorted_bam
21 #result
source type
1 #sjdbGTFtagExonParentTranscript input
2 #sjdbGTFtagExonParentGene input
3 #sjdbGTFfile input
4 #genomeFastaFiles input
5 #fastq input
6 #STAR.aligned_reads input
7 #winAnchorMultimapNmax input
8 #winAnchorDistNbins input
9 #sjdbGTFfile input
10 #SBG_FASTQ_Quality_Detector.result input
11 #STAR_Genome_Generate.genome input
12 #STAR.unmapped_reads output
13 #STAR.transcriptome_aligned_reads output
14 #STAR.splice_junctions output
15 #STAR.reads_per_gene output
16 #STAR.log_files output
17 #STAR.chimeric_junctions output
18 #STAR.intermediate_genome output
19 #STAR.chimeric_alignments output
20 #Picard_SortSam.sorted_bam output
21 #SBG_FASTQ_Quality_Detector.result output
## then link
f3 = link(t1, f2, c("output_fastq_files"), c("#SBG_FASTQ_Quality_Detector.fastq"))
link_what(f2, t1)
$File
$File$from
id label type required
2 #transcriptome_aligned_reads transcriptome_aligned_reads File FALSE
3 #splice_junctions splice_junctions File FALSE
4 #reads_per_gene reads_per_gene File FALSE
6 #chimeric_junctions chimeric_junctions File FALSE
7 #intermediate_genome intermediate_genome File FALSE
8 #chimeric_alignments chimeric_alignments File FALSE
9 #sorted_bam sorted_bam File FALSE
10 #result result File FALSE
fileTypes link_to
2 null #STAR.transcriptome_aligned_reads
3 null #STAR.splice_junctions
4 null #STAR.reads_per_gene
6 null #STAR.chimeric_junctions
7 null #STAR.intermediate_genome
8 null #STAR.chimeric_alignments
9 null #Picard_SortSam.sorted_bam
10 null #SBG_FASTQ_Quality_Detector.result
$File$to
id label type required
1 #input_archive_file Input archive file File TRUE
prefix fileTypes
1 --input_archive_file TAR, TAR.GZ, TGZ, TAR.BZ2, TBZ2, GZ, BZ2, ZIP
f4 = link(f2, t1, c("#Picard_SortSam.sorted_bam", "#SBG_FASTQ_Quality_Detector.result"), c("#input_archive_file", "#input_archive_file"))
flow_input: #SBG_Unpack_FASTQs.input_archive_file
flow_output: #SBG_Unpack_FASTQs.output_fastq_files
## todo
## all outputs
## flow + flow
## print message when name wrong
# library(clipr)
# write_clip(f4$toJSON())
With API function, you can directly load your Tool into the account. Run a task, for “how-to”, please check the API complete guide
Here is quick demo
a = Auth(url = "api_url", token = "your_token")
p = a$project("demo")
app.runif = p$app_add("runif555", rbx)
aid = app.runif$id
tsk = p$task_add(name = "Draft runif simple",
description = "Description for runif",
app = aid,
inputs = list(min = 1, max = 10))
tsk$run()
1. from CLI
While developing tools it is useful to test them locally first. For that we can use rabix - reproducible analyses for bioinformatics, https://github.com/rabix. To test your tool with latest implementation of rabix in Java (called bunny) you could use docker image tengfei/testenv:
docker pull tengfei/testenv
Dump your rabix tool as json into dir which also contains input data. write(rbx$toJSON, file="<data_dir>/<tool>.json")
. Make inputs.json file to declare input parameters in the same directory (you can use relative paths from inputs.json to data). Create container:
docker run --privileged --name bunny -v </path/to/data_dir>:/bunny_data -dit tengfei/testenv
Execute tool
docker exec bunny bash -c 'cd /opt/bunny && ./rabix.sh -e /bunny_data /bunny_data/<tool>.json /bunny_data/inputs.json'
You’ll see running logs from within container, and also output dir inside
NOTE: tengfei/testenv has R, python, Java… so many tools can work without docker requirement set. If you however set docker requirement you need to pull image inside container first to run docker container inside running bunny docker.
NOTE: inputs.json can also be inputs.yaml if you find it easier to declare inputs in YAML.
2. from R
library(sevenbridges)
in.df <- data.frame(id = c("number", "min", "max", "seed"),
description = c("number of observation",
"lower limits of the distribution",
"upper limits of the distribution",
"seed with set.seed"),
type = c("integer", "float", "float", "float"),
label = c("number" ,"min", "max", "seed"),
prefix = c("--n", "--min", "--max", "--seed"),
default = c(1, 0, 10, 123),
required = c(TRUE, FALSE, FALSE, FALSE))
out.df <- data.frame(id = c("random", "report"),
type = c("file", "file"),
glob = c("*.txt", "*.html"))
rbx <- Tool(id = "runif",
label = "Random number generator",
hints = requirements(docker(pull = "tengfei/runif"),
cpu(1), mem(2000)),
baseCommand = "runif.R",
inputs = in.df, ## or ins.df
outputs = out.df)
params <- list(number=3, max=5)
set_test_env("tengfei/testenv", "mount_dir")
test_tool(rbx, params)