bambu {bambu}R Documentation

long read isoform reconstruction and quantification

Description

This function takes bam file of genomic alignments and performs isoform recontruction and gene and transcript expression quantification. It also allows saving of read class files of alignments, extending provided annotations, and quantification based on extended annotations. When multiple samples are provided, extended annotations will be combined across samples to allow comparison.

Usage

bambu(
    reads = NULL,
    rcFile = NULL,
    rcOutDir = NULL,
    annotations = NULL,
    genome = NULL,
    stranded = FALSE,
    ncore = 1,
    yieldSize = NULL,
    opt.discovery = NULL,
    opt.em = NULL,
    discovery = TRUE,
    quant = TRUE,
    verbose = FALSE,
    lowMemory = FALSE
)

Arguments

reads

A string or a vector of strings specifying the paths of bam files for genomic alignments, or a BamFile object or a BamFileList object (see Rsamtools).

rcFile

A string or a vector of strings specifying the read class files that are saved during previous run of bambu.

rcOutDir

A string variable specifying the path to where read class files will be saved.

annotations

A TxDb object or A GRangesList object obtained by prepareAnnotations.

genome

A fasta file or a BSGenome object.

stranded

A boolean for strandedness, defaults to FALSE.

ncore

specifying number of cores used when parallel processing is used, defaults to 1.

yieldSize

see Rsamtools.

opt.discovery

A list of controlling parameters for isoform reconstruction process:

prefix

specifying prefix for new gene Ids (genePrefix.number), defaults to empty

remove.subsetTx

indicating whether filter to remove read classes which are a subset of known transcripts(), defaults to TRUE

min.readCount

specifying minimun read count to consider a read class valid in a sample, defaults to 2

min.readFractionByGene

specifying minimum relative read count per gene, highly expressed genes will have many high read count low relative abundance transcripts that can be filtered, defaults to 0.05

min.sampleNumber

specifying minimum sample number with minimum read count, defaults to 1

min.exonDistance

specifying minum distance to known transcript to be considered valid as new, defaults to 35bp

min.exonOverlap

specifying minimum number of bases shared with annotation to be assigned to the same gene id, defaults to 10bp

min.primarySecondaryDist

specifying the minimum number of distance threshold, defaults to 5bp

min.primarySecondaryDistStartEnd1

specifying the minimum number of distance threshold, used for extending annotation, defaults to 5bp

min.primarySecondaryDistStartEnd2

specifying the minimum number of distance threshold, used for estimating distance to annotation, defaults to 5bp

min.txScore.multiExon

specifying the minimum transcript level threshold for multi-exon transcripts during sample combining, defaults to 0

min.txScore.singleExon

specifying the minimum transcript level threshold for single-exon transcripts during sample combining, defaults to 1

max.txNDR

specifying the maximum NDR rate to novel transcript output from detected read classes, defaults to 0.1

opt.em

A list of controlling parameters for quantification algorithm estimation process:

maxiter

specifying maximum number of run iterations, defaults to 10000

degradationBias

correcting for degradation bias, defaults to TRUE

conv

specifying the covergence threshold control, defaults to 0.0001

minvalue

specifying the minvalue for convergence consideration, defaults to 0.00000001

discovery

A logical variable indicating whether annotations are to be extended for quantification.

quant

A logical variable indicating whether quantification will be performed

verbose

A logical variable indicating whether processing messages will be printed.

lowMemory

Read classes will be processed by chromosomes when lowMemory is specified. This option provides an efficient way to process big samples.

Details

Main function

Value

bambu will output different results depending on whether quant mode is on. By default, quant is set to TRUE, so bambu will generate a SummarizedExperiment object that contains the transcript expression estimates. Transcript expression estimates can be accessed by counts(), including the following variables

counts

expression estimates

CPM

sequencing depth normalized estimates

fullLengthCounts

estimates of read counts mapped as full length reads for each transcript

partialLengthCounts

estimates of read counts mapped as partial length reads for each transcript

uniqueCounts

counts of reads that are uniquely mapped to each transcript

theta

raw estimates

Output annotations that are usually the annotations with/without novel transcripts/genes added, depending on whether discovery mode is on can be accessed by rowRanges() Transcript to gene map can be accessed by rowData(), with eqClass that defining equivalent class for each transcript

In the case when quant is set to FALSE, i.e., only transcript discovery is performed, bambu will report the grangeslist of the extended annotations.

Examples

## =====================
test.bam <- system.file("extdata",
    "SGNex_A549_directRNA_replicate5_run1_chr9_1_1000000.bam",
    package = "bambu")
fa.file <- system.file("extdata", 
    "Homo_sapiens.GRCh38.dna_sm.primary_assembly_chr9_1_1000000.fa", 
    package = "bambu")
gr <- readRDS(system.file("extdata", 
    "annotationGranges_txdbGrch38_91_chr9_1_1000000.rds",
    package = "bambu"))
se <- bambu(reads = test.bam, annotations = gr, 
    genome = fa.file, discovery = TRUE, quant = TRUE)

[Package bambu version 1.99.2 Index]