NanoporeRNASeq contains RNA-Seq data from the K562 and MCF7 cell lines that were generated by the SG-NEx project (https://github.com/GoekeLab/sg-nex-data). Each of these cell line has three replicates, with 1 direct RNA sequencing data and 2 cDNA sequencing data. The files contains reads aligned to the human genome (Grch38) chromosome 22 (1:25409234).
data("SGNexSamples")
SGNexSamples
##> DataFrame with 6 rows and 6 columns
##> sample_id Platform cellLine protocol cancer_type
##> <character> <character> <character> <character> <character>
##> 1 K562_directcDNA_repl.. MinION K562 directcDNA Leukocyte
##> 2 K562_directcDNA_repl.. GridION K562 directcDNA Leukocyte
##> 3 K562_directRNA_repli.. GridION K562 directRNA Leukocyte
##> 4 MCF7_directcDNA_repl.. MinION MCF7 directcDNA Breast
##> 5 MCF7_directcDNA_repl.. GridION MCF7 directcDNA Breast
##> 6 MCF7_directRNA_repli.. GridION MCF7 directRNA Breast
##> fileNames
##> <character>
##> 1 NanoporeRNASeq/versi..
##> 2 NanoporeRNASeq/versi..
##> 3 NanoporeRNASeq/versi..
##> 4 NanoporeRNASeq/versi..
##> 5 NanoporeRNASeq/versi..
##> 6 NanoporeRNASeq/versi..
data("HsChr22BambuAnnotation")
HsChr22BambuAnnotation
##> GRangesList object of length 1500:
##> $ENST00000043402
##> GRanges object with 2 ranges and 2 metadata columns:
##> seqnames ranges strand | exon_rank exon_endRank
##> <Rle> <IRanges> <Rle> | <integer> <integer>
##> [1] 22 20241415-20243110 - | 2 1
##> [2] 22 20268071-20268531 - | 1 2
##> -------
##> seqinfo: 1 sequence from an unspecified genome; no seqlengths
##>
##> $ENST00000086933
##> GRanges object with 3 ranges and 2 metadata columns:
##> seqnames ranges strand | exon_rank exon_endRank
##> <Rle> <IRanges> <Rle> | <integer> <integer>
##> [1] 22 19148576-19149095 - | 3 1
##> [2] 22 19149663-19149916 - | 2 2
##> [3] 22 19150025-19150283 - | 1 3
##> -------
##> seqinfo: 1 sequence from an unspecified genome; no seqlengths
##>
##> $ENST00000155674
##> GRanges object with 8 ranges and 2 metadata columns:
##> seqnames ranges strand | exon_rank exon_endRank
##> <Rle> <IRanges> <Rle> | <integer> <integer>
##> [1] 22 17137511-17138357 - | 8 1
##> [2] 22 17138550-17138738 - | 7 2
##> [3] 22 17141059-17141233 - | 6 3
##> [4] 22 17143098-17143131 - | 5 4
##> [5] 22 17145024-17145117 - | 4 5
##> [6] 22 17148448-17148560 - | 3 6
##> [7] 22 17149542-17149745 - | 2 7
##> [8] 22 17165209-17165287 - | 1 8
##> -------
##> seqinfo: 1 sequence from an unspecified genome; no seqlengths
##>
##> ...
##> <1497 more elements>
We can visualize the one sample for a single gene ENST00000215832 (MAPK1)
library(ggbio)
range <- HsChr22BambuAnnotation$ENST00000215832
# plot mismatch track
library(BSgenome.Hsapiens.NCBI.GRCh38)
# plot annotation track
tx <- autoplot(range, aes(type = model, col = strand), group.selfish = TRUE)
# plot coverage track
coverage <- autoplot(bamFiles[[1]], aes(col = coverage), which = range)
# merge the tracks into one plot
tracks(annotation = tx, coverage = coverage, heights = c(1, 3)) + theme_minimal()
Applying bambu to bamFiles
se <- bambu(reads = bamFiles, annotations = HsChr22BambuAnnotation, genome = "BSgenome.Hsapiens.NCBI.GRCh38")
bambu returns a SummarizedExperiment object
se
##> class: RangedSummarizedExperiment
##> dim: 1930 6
##> metadata(0):
##> assays(2): counts CPM
##> rownames(1930): tx.1 tx.2 ... ENST00000641933 ENST00000641967
##> rowData names(4): TXNAME GENEID eqClass newTxClass
##> colnames(6): 6b19395d7ffc_3844 6b194debc183_3846 ... 6b19512ade6e_3852
##> 6b19364b6b8e_3854
##> colData names(1): name
We can visualize the annotated and novel isoforms identified in this gene example using plot functions from bambu
##> [[1]]
##> TableGrob (3 x 1) "arrange": 3 grobs
##> z cells name grob
##> 1 1 (2-2,1-1) arrange gtable[layout]
##> 2 2 (3-3,1-1) arrange gtable[layout]
##> 3 3 (1-1,1-1) arrange text[GRID.text.248]
sessionInfo()
##> R version 4.0.3 (2020-10-10)
##> Platform: x86_64-pc-linux-gnu (64-bit)
##> Running under: Ubuntu 18.04.5 LTS
##>
##> Matrix products: default
##> BLAS: /home/biocbuild/bbs-3.12-bioc/R/lib/libRblas.so
##> LAPACK: /home/biocbuild/bbs-3.12-bioc/R/lib/libRlapack.so
##>
##> locale:
##> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
##> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
##> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
##> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
##> [9] LC_ADDRESS=C LC_TELEPHONE=C
##> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##>
##> attached base packages:
##> [1] stats4 parallel stats graphics grDevices utils datasets
##> [8] methods base
##>
##> other attached packages:
##> [1] bambu_1.0.0
##> [2] SummarizedExperiment_1.20.0
##> [3] Biobase_2.50.0
##> [4] MatrixGenerics_1.2.0
##> [5] matrixStats_0.57.0
##> [6] BSgenome.Hsapiens.NCBI.GRCh38_1.3.1000
##> [7] BSgenome_1.58.0
##> [8] rtracklayer_1.50.0
##> [9] ggbio_1.38.0
##> [10] ggplot2_3.3.2
##> [11] Rsamtools_2.6.0
##> [12] Biostrings_2.58.0
##> [13] XVector_0.30.0
##> [14] GenomicRanges_1.42.0
##> [15] GenomeInfoDb_1.26.0
##> [16] IRanges_2.24.0
##> [17] S4Vectors_0.28.0
##> [18] NanoporeRNASeq_1.0.0
##> [19] ExperimentHub_1.16.0
##> [20] AnnotationHub_2.22.0
##> [21] BiocFileCache_1.14.0
##> [22] dbplyr_1.4.4
##> [23] BiocGenerics_0.36.0
##>
##> loaded via a namespace (and not attached):
##> [1] colorspace_1.4-1 ellipsis_0.3.1
##> [3] biovizBase_1.38.0 htmlTable_2.1.0
##> [5] base64enc_0.1-3 dichromat_2.0-0
##> [7] rstudioapi_0.11 farver_2.0.3
##> [9] bit64_4.0.5 interactiveDisplayBase_1.28.0
##> [11] AnnotationDbi_1.52.0 xml2_1.3.2
##> [13] codetools_0.2-16 splines_4.0.3
##> [15] knitr_1.30 Formula_1.2-4
##> [17] cluster_2.1.0 png_0.1-7
##> [19] graph_1.68.0 shiny_1.5.0
##> [21] BiocManager_1.30.10 compiler_4.0.3
##> [23] httr_1.4.2 backports_1.1.10
##> [25] assertthat_0.2.1 Matrix_1.2-18
##> [27] fastmap_1.0.1 lazyeval_0.2.2
##> [29] later_1.1.0.1 formatR_1.7
##> [31] htmltools_0.5.0 prettyunits_1.1.1
##> [33] tools_4.0.3 gtable_0.3.0
##> [35] glue_1.4.2 GenomeInfoDbData_1.2.4
##> [37] reshape2_1.4.4 dplyr_1.0.2
##> [39] rappdirs_0.3.1 Rcpp_1.0.5
##> [41] vctrs_0.3.4 iterators_1.0.13
##> [43] xfun_0.18 stringr_1.4.0
##> [45] mime_0.9 lifecycle_0.2.0
##> [47] ensembldb_2.14.0 XML_3.99-0.5
##> [49] zlibbioc_1.36.0 scales_1.1.1
##> [51] VariantAnnotation_1.36.0 ProtGenerics_1.22.0
##> [53] hms_0.5.3 promises_1.1.1
##> [55] RBGL_1.66.0 AnnotationFilter_1.14.0
##> [57] RColorBrewer_1.1-2 yaml_2.2.1
##> [59] curl_4.3 memoise_1.1.0
##> [61] gridExtra_2.3 biomaRt_2.46.0
##> [63] rpart_4.1-15 reshape_0.8.8
##> [65] latticeExtra_0.6-29 stringi_1.5.3
##> [67] RSQLite_2.2.1 BiocVersion_3.12.0
##> [69] foreach_1.5.1 checkmate_2.0.0
##> [71] GenomicFeatures_1.42.0 BiocParallel_1.24.0
##> [73] shape_1.4.5 rlang_0.4.8
##> [75] pkgconfig_2.0.3 bitops_1.0-6
##> [77] evaluate_0.14 lattice_0.20-41
##> [79] purrr_0.3.4 labeling_0.4.2
##> [81] GenomicAlignments_1.26.0 htmlwidgets_1.5.2
##> [83] bit_4.0.4 tidyselect_1.1.0
##> [85] GGally_2.0.0 plyr_1.8.6
##> [87] magrittr_1.5 R6_2.5.0
##> [89] generics_0.0.2 Hmisc_4.4-1
##> [91] DelayedArray_0.16.0 DBI_1.1.0
##> [93] pillar_1.4.6 foreign_0.8-80
##> [95] withr_2.3.0 survival_3.2-7
##> [97] RCurl_1.98-1.2 nnet_7.3-14
##> [99] tibble_3.0.4 crayon_1.3.4
##> [101] OrganismDbi_1.32.0 rmarkdown_2.5
##> [103] jpeg_0.1-8.1 progress_1.2.2
##> [105] grid_4.0.3 data.table_1.13.2
##> [107] blob_1.2.1 digest_0.6.27
##> [109] xtable_1.8-4 httpuv_1.5.4
##> [111] glmnet_4.0-2 openssl_1.4.3
##> [113] munsell_0.5.0 askpass_1.1