STAR.align.folder {ORFik}R Documentation

Align all libraries in folder with STAR

Description

Does either all files as paired end or single end, so if you have mix, split them in two different folders.
#' If STAR halts at .... loading genome, it means the STAR index was aborted early, then you need to run: STAR.remove.crashed.genome(), with the genome that crashed, and rerun.

Usage

STAR.align.folder(
  input.dir,
  output.dir,
  index.dir,
  star.path = STAR.install(),
  fastp = install.fastp(),
  paired.end = "no",
  steps = "tr-ge",
  adapter.sequence = "auto",
  min.length = 15,
  trim.front = 0,
  alignment.type = "Local",
  max.cpus = min(90, detectCores() - 1),
  wait = TRUE,
  include.subfolders = "n",
  script.folder = system.file("STAR_Aligner", "RNA_Align_pipeline_folder.sh", package =
    "ORFik"),
  script.single = system.file("STAR_Aligner", "RNA_Align_pipeline.sh", package =
    "ORFik")
)

Arguments

input.dir

path to fast files to align, can either be fasta files (.fastq, .fq, .fa etc) or compressed files with .gz. Also either paired end or single end reads.

output.dir

directory to save indices, default: paste0(dirname(arguments[1]), "/STAR_index/"), where arguments is the arguments input for this function.

index.dir

path to STAR index folder. Path returned from ORFik function STAR.index, when you created the index folders.

star.path

path to STAR, default: STAR.install(), if you don't have STAR installed at default location, it will install it there, set path to a runnable star if you already have it.

fastp

path to fastp trimmer, default: install.fastp(), if you have it somewhere else already installed, give the path. Only works for unix (linux or Mac OS), if not on unix, use your favorite trimmer and give the output files from that trimmer as input.dir here.

paired.end

default "no", alternative "yes". Will auto detect pairs by names. If yes running on a folder: The folder must then contain an even number of files and they must be named with the same prefix and sufix of either _1 and _2, 1 and 2, etc.

steps

a character, default: "tr-ge", trimming then genome alignment
steps of depletion and alignment wanted: The posible candidates you can use are: tr: trim reads, ph: phix depletion, rR: rrna depletion, nc: ncrna depletion, tR: trna depletion, ge: genome alignment, all: run all steps)
If not "all", a subset of these ("tr-ph-rR-nc-tR-ge")
In bash script it is reformated to this style: (trimming and genome do: "tr-ge", write "all" to get all: "tr-ph-rR-nc-tR-ge") the step where you align to the genome is usually always included, unless you are doing pure contaminant analysis. For Ribo-seq and TCP(RCP-seq) you should do rR (ribosomal RNA depletion), so when you made the STAR index you need the rRNA step (usually just download a Silva rRNA database for SSU&LSU at: https://www.arb-silva.de/)

adapter.sequence

character, default: "auto" (auto detect adapter, is not very reliable for Ribo-seq, so then you must include, else alignment will most likely fail!). Else manual assigned adapter like: "ATCTCGTATGCCGTCTTCTGCTTG" or "AAAAAAAAAAAAA".

min.length

15, minimum length of reads to pass filter.

trim.front

0, default trim 0 bases 5'. For Ribo-seq set use 0. Ignored if tr (trim) is not one of the arguments in "steps"

alignment.type

default: "Local": standard local alignment with soft-clipping allowed, "EndToEnd" (global): force end-to-end read alignment, does not soft-clip.

max.cpus

integer, default: min(90, detectCores() - 1), number of threads to use. Default is minimum of 90 and maximum cores - 1

wait

a logical (not NA) indicating whether the R interpreter should wait for the command to finish, or run it asynchronously. This will be ignored (and the interpreter will always wait) if intern = TRUE. When running the command asynchronously, no output will be displayed on the Rgui console in Windows (it will be dropped, instead).

include.subfolders

"n" (no), do recursive search downwards for fast files if "y".

script.folder

location of STAR index script, default internal ORFik file. You can change it and give your own if you need special alignments.

script.single

location of STAR single file alignment script, default internal ORFik file. You can change it and give your own if you need special alignments.

Details

Can only run on unix systems (Linux and Mac), and requires minimum 30GB memory on genomes like human, rat, zebrafish etc. The trimmer used is fastp (the fastest I could find), works on mac and linux. If you want to use your own trimmer set file1/file2 to the location of the trimmed files from your program.

Value

output.dir, can be used as as input in ORFik::create.experiment

See Also

Other STAR: STAR.align.single(), STAR.index(), STAR.install(), STAR.multiQC(), STAR.remove.crashed.genome(), getGenomeAndAnnotation(), install.fastp()

Examples

# Use your own paths for annotation or the ORFik way

## use ORFik way:
output.dir <- "/Bio_data/references/Human"
# arguments <- getGenomeAndAnnotation("Homo sapiens", output.dir)
# index <- STAR.index(arguments, output.dir)
# STAR.align.folder("data/raw_data/human_rna_seq", "data/processed/human_rna_seq",
#                    index, paired.end = "no")

[Package ORFik version 1.8.6 Index]