generic_long_pipeline {FLAMES}R Documentation

Generic FLAMES pipeline

Description

Generic implementation of the flames pipeline. Used for both bulk reads and single cell reads.

Usage

generic_long_pipeline(
  annot,
  fastq,
  in_bam,
  outdir,
  genome_fa,
  minimap2_dir,
  downsample_ratio,
  config_file,
  do_genome_align,
  do_isoform_id = TRUE,
  do_read_realign,
  do_transcript_quanti,
  gen_raw_isoform,
  has_UMI,
  MAX_DIST,
  MAX_TS_DIST,
  MAX_SPLICE_MATCH_DIST,
  min_fl_exon_len,
  Max_site_per_splice,
  Min_sup_cnt,
  Min_cnt_pct,
  Min_sup_pct,
  strand_specific,
  remove_incomp_reads,
  use_junctions,
  no_flank,
  use_annotation,
  min_tr_coverage,
  min_read_coverage
)

Arguments

annot

The file path to gene annotations file in gff3 format

fastq

The file path to input fastq file

in_bam

optional BAM file which replaces fastq directory argument. This skips the genome alignment and realignment steps

outdir

The path to directory to store all output files.

genome_fa

The file path to genome fasta file.

minimap2_dir

Path to the directory containing minimap2, if it is not in PATH. Only required if either or both of do_genome_align and do_read_realign are TRUE.

downsample_ratio

Integer; downsampling ratio if performing downsampling analysis.

config_file

File path to the JSON configuration file. If specified, config_file overrides all configuration parameters

do_genome_align

Boolean; specifies whether to run the genome alignment step. TRUE is recommended

do_isoform_id

Boolean; specifies whether to run the isoform identification step. TRUE is recommended

do_read_realign

Boolean; specifies whether to run the read realignment step. TRUE is recommended

do_transcript_quanti

Boolean; specifies whether to run the transcript quantification step. TRUE is recommended

gen_raw_isoform

Boolean; specifies whether a gff3 should be generated containing the raw isoform information in the isoform identification step

has_UMI

Boolean; specifies if the data contains UMI.

MAX_DIST

Real; maximum distance allowed when merging splicing sites in isoform consensus clustering.

MAX_TS_DIST

Real; maximum distance allowed when merging transcript start/end position in isoform consensus clustering.

MAX_SPLICE_MATCH_DIST

Real; maximum distance allowed when merging splice site called from the data and the reference annotation.

min_fl_exon_len

Real; minimum length for the first exon outside the gene body in reference annotation. This is to correct the alignment artifact

Max_site_per_splice

Real; maximum transcript start/end site combinations allowed per splice chain

Min_sup_cnt

Real; minimum number of read support an isoform. Decreasing this number will significantly increase the number of isoform detected.

Min_cnt_pct

Real; minimum percentage of count for an isoform relative to total count for the same gene.

Min_sup_pct

Real; minimum percentage of count for an splice chain that support a given transcript start/end site combination.

strand_specific

1, -1 or 0. 1 indicates if reads are in the same strand as mRNA, -1 indicates reads are reverse complemented, 0 indicates reads are not strand specific.

remove_incomp_reads

Real; determines the strength of truncated isoform filtering. Larger number means more stringent filtering.

use_junctions

Boolean; determiens whether to use known splice junctions to help correct the alignment results

no_flank

Boolean; passed to minimap2 for synthetic spike-in data. Refer to Minimap2 document for more details

use_annotation

Boolean; specifies whether to use reference to help annotate known isoforms

min_tr_coverage

Real; minimum percentage of isoform coverage for a read to be aligned to that isoform

min_read_coverage

Real; minimum percentage of read coverage for a read to be uniquely aligned to that isoform

Value

This generic function returns a named list containing the output file names of the provided output files in the given 'outdir' directory. These files are loaded into R in either a SummarizedExperiment or SingleCellExperiment object by the callers to this function, 'sc_long_pipeline()' and 'bulk_long_pipeline()' respectively.


[Package FLAMES version 0.99.31 Index]