An introduction to the bambu package using NanoporeRNASeq data


NanoporeRNASeq contains RNA-Seq data from the K562 and MCF7 cell lines that were generated by the SG-NEx project ( Each of these cell line has three replicates, with 1 direct RNA sequencing data and 2 cDNA sequencing data. The files contains reads aligned to the human genome (Grch38) chromosome 22 (1:25409234).

Accessing NanoporeRNASeq data

Load the NanoporeRNASeq package


List the samples

##> DataFrame with 6 rows and 6 columns
##>                sample_id    Platform    cellLine    protocol cancer_type
##>              <character> <character> <character> <character> <character>
##> 1 K562_directcDNA_repl..      MinION        K562  directcDNA   Leukocyte
##> 2 K562_directcDNA_repl..     GridION        K562  directcDNA   Leukocyte
##> 3 K562_directRNA_repli..     GridION        K562   directRNA   Leukocyte
##> 4 MCF7_directcDNA_repl..      MinION        MCF7  directcDNA      Breast
##> 5 MCF7_directcDNA_repl..     GridION        MCF7  directcDNA      Breast
##> 6 MCF7_directRNA_repli..     GridION        MCF7   directRNA      Breast
##>                fileNames
##>              <character>
##> 1 NanoporeRNASeq/versi..
##> 2 NanoporeRNASeq/versi..
##> 3 NanoporeRNASeq/versi..
##> 4 NanoporeRNASeq/versi..
##> 5 NanoporeRNASeq/versi..
##> 6 NanoporeRNASeq/versi..

List the available BamFile

NanoporeData <- query(ExperimentHub(), c("NanoporeRNA", "GRCh38", "Bam"))
bamFiles <- Rsamtools::BamFileList(NanoporeData[["EH3808"]], NanoporeData[["EH3809"]], 
    NanoporeData[["EH3810"]], NanoporeData[["EH3811"]], NanoporeData[["EH3812"]], 

Get the annotation GRangesList

##> GRangesList object of length 1500:
##> $ENST00000043402
##> GRanges object with 2 ranges and 2 metadata columns:
##>       seqnames            ranges strand | exon_rank exon_endRank
##>          <Rle>         <IRanges>  <Rle> | <integer>    <integer>
##>   [1]       22 20241415-20243110      - |         2            1
##>   [2]       22 20268071-20268531      - |         1            2
##>   -------
##>   seqinfo: 1 sequence from an unspecified genome; no seqlengths
##> $ENST00000086933
##> GRanges object with 3 ranges and 2 metadata columns:
##>       seqnames            ranges strand | exon_rank exon_endRank
##>          <Rle>         <IRanges>  <Rle> | <integer>    <integer>
##>   [1]       22 19148576-19149095      - |         3            1
##>   [2]       22 19149663-19149916      - |         2            2
##>   [3]       22 19150025-19150283      - |         1            3
##>   -------
##>   seqinfo: 1 sequence from an unspecified genome; no seqlengths
##> $ENST00000155674
##> GRanges object with 8 ranges and 2 metadata columns:
##>       seqnames            ranges strand | exon_rank exon_endRank
##>          <Rle>         <IRanges>  <Rle> | <integer>    <integer>
##>   [1]       22 17137511-17138357      - |         8            1
##>   [2]       22 17138550-17138738      - |         7            2
##>   [3]       22 17141059-17141233      - |         6            3
##>   [4]       22 17143098-17143131      - |         5            4
##>   [5]       22 17145024-17145117      - |         4            5
##>   [6]       22 17148448-17148560      - |         3            6
##>   [7]       22 17149542-17149745      - |         2            7
##>   [8]       22 17165209-17165287      - |         1            8
##>   -------
##>   seqinfo: 1 sequence from an unspecified genome; no seqlengths
##> ...
##> <1497 more elements>

Visualizing gene of interest from a single bam file

We can visualize the one sample for a single gene ENST00000215832 (MAPK1)

range <- HsChr22BambuAnnotation$ENST00000215832
# plot mismatch track
# plot annotation track
tx <- autoplot(range, aes(type = model, col = strand), group.selfish = TRUE)
# plot coverage track
coverage <- autoplot(bamFiles[[1]], aes(col = coverage), which = range)

# merge the tracks into one plot
tracks(annotation = tx, coverage = coverage, heights = c(1, 3)) + theme_minimal()

Running Bambu with NanoporeRNASeq data

Load the bambu package


Run bambu

Applying bambu to bamFiles

se <- bambu(reads = bamFiles, annotations = HsChr22BambuAnnotation, genome = "BSgenome.Hsapiens.NCBI.GRCh38")

bambu returns a SummarizedExperiment object

##> class: RangedSummarizedExperiment 
##> dim: 1930 6 
##> metadata(0):
##> assays(2): counts CPM
##> rownames(1930): tx.1 tx.2 ... ENST00000641933 ENST00000641967
##> rowData names(4): TXNAME GENEID eqClass newTxClass
##> colnames(6): 6b19395d7ffc_3844 6b194debc183_3846 ... 6b19512ade6e_3852
##>   6b19364b6b8e_3854
##> colData names(1): name

Visualizing gene examples

We can visualize the annotated and novel isoforms identified in this gene example using plot functions from bambu

plotBambu(se, type = "annotation", gene_id = "ENSG00000099968")

##> [[1]]
##> TableGrob (3 x 1) "arrange": 3 grobs
##>   z     cells    name                grob
##> 1 1 (2-2,1-1) arrange      gtable[layout]
##> 2 2 (3-3,1-1) arrange      gtable[layout]
##> 3 3 (1-1,1-1) arrange text[GRID.text.248]
