Contents

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

1 Introduction

This is a package containing unCOVERApp, a shiny graphical application for clinical assessment of sequence coverage. unCOVERApp allows:

2 Installation and example

Install the latest version of uncoverappLib using BiocManager.

uncoverappLib requires:


install.packages("BiocManager")
BiocManager::install("uncoverappLib")
library(uncoverappLib)

Alternatively, it can be installed from GitHub using:


#library(devtools)
#install_github("Manuelaio/uncoverappLib")
#library(uncoverappLib)

When users load uncoverappLib for the first time, the first thing to do is a download of annotation files. getAnnotationFiles() function allows to download the annotation files from Zenodo and parse it using uncoverappLib package. The function does not return an R object but store the annotation files in a cache (sorted.bed.gz and sorted.bed.gz.tbi) and show
the cache path. The local cache is managed by the BiocFileCache Bioconductor package. It is sufficient run the function getAnnotationFiles(verbose= TRUE) one time after installing uncoverappLib package as shown below. The preprocessing time can take few minutes, therefore during running vignette, users can provide vignette= TRUE as a parameter to download an example annotation files, as below.

library(uncoverappLib)
#> Warning: replacing previous import 'utils::findMatches' by
#> 'S4Vectors::findMatches' when loading 'AnnotationDbi'
#> 
#> 
#getAnnotationFiles(verbose= TRUE, vignette= TRUE)

The preprocessing time can take few minutes.

3 Input file

All unCOVERApp functionalities are based on the availability of a BED-style formatted input file containing tab-separated specifications of genomic coordinates (chromosome, start position, end position), the coverage value, and the reference:alternate allele counts for each position. In the first page Preprocessing, users can prepare the input file by specifying the genes to be examined and the BAM file(s) to be inspected. Users should be able to provide:

gene.list<- system.file("extdata", "mygene.txt", package = "uncoverappLib")

Type the following command to load our example:


bam_example <- system.file("extdata", "example_POLG.bam", package = "uncoverappLib")

print(bam_example)
#> [1] "/tmp/RtmplBqZYB/Rinst28866f1a51abdc/uncoverappLib/extdata/example_POLG.bam"

write.table(bam_example, file= "./bam.list", quote= FALSE, row.names = FALSE, 
            col.names = FALSE)

and launch run.uncoverapp(where="browser") command. After running run.uncoverapp(where="browser") the shiny app appears in your deafult browser. RStudio user can define where launching uncoverapp using where option:

If option where is not defined uncoverapp will launch with default option of R.

In the first page Preprocessing users can load mygene.txt in Load input file and bam.list in Load bam file(s) list. In general, a target bed can also be used instead of genes name selecting Target Bed option in Choose the type of your input file. Users should also specify the reference genome in Genome box and the chromosome notation of their BAM file(s) in Chromosome Notation box. In the BAM file, the number option refers to 1, 2, …, X,.M chromosome notation, while the chr option refers to chr1, chr2, … chrX, chrM chromosome notation. Users can specify the minimum mapping quality (MAPQ) value in box and minimum base quality (QUAL) value in box. Default values for both mapping and base qualities is 1. Users can download Statistical_Summary report to obtain a coverage metrics per genes (List of genes name) or per amplicons (Target Bed) according to uploaded input file. The report summarizes following information: mean, median, number of positions under 20x and percentage of position above 20x.

To run the example, choose chr chromosome notation, hg19 genome reference and leave minimum mapping and base qualities to the default settings, as shown in the following screenshot of the Preprocessing page:

Screenshot of Preprocessing page.

Figure 1: Screenshot of Preprocessing page

unCOVERApp input file generation fails if incorrect gene names are specified. An unrecognized gene name(s) table is displayed if such a case occurs. Below is a snippet of a the unCOVERApp input file generated as a result of the preprocessing step performed for the example


chr15   89859516        89859516        68      A:68
chr15   89859517        89859517        70      T:70
chr15   89859518        89859518        73      A:2;G:71
chr15   89859519        89859519        73      A:73
chr15   89859520        89859520        74      C:74
chr15   89859521        89859521        75      C:1;T:74

The preprocessing time depends on the size of the BAM file(s) and on the number of genes to investigate. In general, if many (e.g. > 50) genes are to be analyzed, we would recommend to use buildInput function
in R console before launching the app as shown in following example. This function also return a file with .txt estention containg statistical report of each genes/amplicon Alternatively, other tools do a similar job and can be used to generate the unCOVERApp input file ( for instance: bedtools, samtools, gatk). In this case, users can load the file directly on Coverage Analysis page in Select input file box.

Once pre-processing is done, users can move to the Coverage Analysis page and push the load prepared input file button.

Screenshot of Coverage Analysis page.

Figure 2: Screenshot of Coverage Analysis page

To assess sequence coverage of the example, the following input parameters must be specified in the sidebar of the Coverage Analysis section

Other input sections, as Chromosome, Transcript ID, START genomic position, END genomic position and Region coordinate, are dynamically filled.

4 Output

unCOVERApp generates the following outputs :

Screenshot of output of UCSC gene table.

Figure 3: Screenshot of output of UCSC gene table

Screenshot of output of Exon genomic coordinate positions from UCSC table.

Figure 4: Screenshot of output of Exon genomic coordinate positions from UCSC table

Screenshot of output of gene coverage.

Figure 5: Screenshot of output of gene coverage

zoom of exon 10

Figure 6: zoom of exon 10

 Example of uncovered positions annotate with dbNSFP.

Figure 7: Example of uncovered positions annotate with dbNSFP

By clicking on the download button, users can save the table as spreadsheet format with certain cells colored according to pre-specified thresholds for AF, CADD, MAP-CAP, SIFT, Polyphen2, ClinVar, OMIM ID, HGVSp and HGVSc, …).

In Calculate maximum credible allele frequency page, users can set allele frequency cut-offs based on specific assumptions about the genetic architecture of the disease. If not specified, variants with allele frequency > 5 % will be instead filtered out. More details are available here. Moreover, users may click on the ”download” button and save the resulting table as spreadsheet format.

The Binomial distribution page returns the 95 % binomial probability distribution of the variant supporting reads on the input genomic position (Genomic position). Users should define the expected allele fraction (the expected fraction of variant reads, probability of success) and Variant reads (the minimum number of variant reads required by the user to support variant calling, number of successes). The comment color change according to binomial proportion intervals. If the estimated intervals , with 95% confidence, is included or higher than user-defined Variant reads the color of comment appears blue, otherwise if it is lower the color appears red.

5 Session information

#> R version 4.3.0 RC (2023-04-13 r84269)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 22.04.2 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.17-bioc/R/lib/libRblas.so 
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB              LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: America/New_York
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] uncoverappLib_1.10.0 BiocStyle_2.28.0    
#> 
#> loaded via a namespace (and not attached):
#>   [1] later_1.3.0                             
#>   [2] BiocIO_1.10.0                           
#>   [3] bitops_1.0-7                            
#>   [4] filelock_1.0.2                          
#>   [5] tibble_3.2.1                            
#>   [6] graph_1.78.0                            
#>   [7] XML_3.99-0.14                           
#>   [8] rpart_4.1.19                            
#>   [9] lifecycle_1.0.3                         
#>  [10] Homo.sapiens_1.3.1                      
#>  [11] processx_3.8.1                          
#>  [12] lattice_0.21-8                          
#>  [13] ensembldb_2.24.0                        
#>  [14] OrganismDbi_1.42.0                      
#>  [15] backports_1.4.1                         
#>  [16] magrittr_2.0.3                          
#>  [17] openxlsx_4.2.5.2                        
#>  [18] Hmisc_5.0-1                             
#>  [19] sass_0.4.5                              
#>  [20] rmarkdown_2.21                          
#>  [21] jquerylib_0.1.4                         
#>  [22] yaml_2.3.7                              
#>  [23] rlist_0.4.6.2                           
#>  [24] shinyBS_0.61.1                          
#>  [25] httpuv_1.6.9                            
#>  [26] zip_2.3.0                               
#>  [27] Gviz_1.44.0                             
#>  [28] DBI_1.1.3                               
#>  [29] RColorBrewer_1.1-3                      
#>  [30] zlibbioc_1.46.0                         
#>  [31] GenomicRanges_1.52.0                    
#>  [32] AnnotationFilter_1.24.0                 
#>  [33] biovizBase_1.48.0                       
#>  [34] BiocGenerics_0.46.0                     
#>  [35] RCurl_1.98-1.12                         
#>  [36] nnet_7.3-18                             
#>  [37] VariantAnnotation_1.46.0                
#>  [38] rappdirs_0.3.3                          
#>  [39] GenomeInfoDbData_1.2.10                 
#>  [40] IRanges_2.34.0                          
#>  [41] S4Vectors_0.38.0                        
#>  [42] condformat_0.10.0                       
#>  [43] codetools_0.2-19                        
#>  [44] DelayedArray_0.26.0                     
#>  [45] DT_0.27                                 
#>  [46] xml2_1.3.3                              
#>  [47] tidyselect_1.2.0                        
#>  [48] shinyWidgets_0.7.6                      
#>  [49] matrixStats_0.63.0                      
#>  [50] stats4_4.3.0                            
#>  [51] BiocFileCache_2.8.0                     
#>  [52] base64enc_0.1-3                         
#>  [53] GenomicAlignments_1.36.0                
#>  [54] jsonlite_1.8.4                          
#>  [55] ellipsis_0.3.2                          
#>  [56] Formula_1.2-5                           
#>  [57] tools_4.3.0                             
#>  [58] progress_1.2.2                          
#>  [59] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2 
#>  [60] Rcpp_1.0.10                             
#>  [61] glue_1.6.2                              
#>  [62] gridExtra_2.3                           
#>  [63] xfun_0.39                               
#>  [64] MatrixGenerics_1.12.0                   
#>  [65] GenomeInfoDb_1.36.0                     
#>  [66] dplyr_1.1.2                             
#>  [67] BiocManager_1.30.20                     
#>  [68] fastmap_1.1.1                           
#>  [69] latticeExtra_0.6-30                     
#>  [70] fansi_1.0.4                             
#>  [71] shinyjs_2.1.0                           
#>  [72] digest_0.6.31                           
#>  [73] R6_2.5.1                                
#>  [74] mime_0.12                               
#>  [75] colorspace_2.1-0                        
#>  [76] GO.db_3.17.0                            
#>  [77] jpeg_0.1-10                             
#>  [78] dichromat_2.0-0.1                       
#>  [79] markdown_1.6                            
#>  [80] biomaRt_2.56.0                          
#>  [81] RSQLite_2.3.1                           
#>  [82] utf8_1.2.3                              
#>  [83] generics_0.1.3                          
#>  [84] data.table_1.14.8                       
#>  [85] rtracklayer_1.60.0                      
#>  [86] prettyunits_1.1.1                       
#>  [87] httr_1.4.5                              
#>  [88] htmlwidgets_1.6.2                       
#>  [89] pkgconfig_2.0.3                         
#>  [90] gtable_0.3.3                            
#>  [91] blob_1.2.4                              
#>  [92] XVector_0.40.0                          
#>  [93] htmltools_0.5.5                         
#>  [94] bookdown_0.33                           
#>  [95] RBGL_1.76.0                             
#>  [96] ProtGenerics_1.32.0                     
#>  [97] scales_1.2.1                            
#>  [98] Biobase_2.60.0                          
#>  [99] TxDb.Hsapiens.UCSC.hg38.knownGene_3.17.0
#> [100] png_0.1-8                               
#> [101] EnsDb.Hsapiens.v75_2.99.0               
#> [102] knitr_1.42                              
#> [103] rstudioapi_0.14                         
#> [104] rjson_0.2.21                            
#> [105] checkmate_2.1.0                         
#> [106] curl_5.0.0                              
#> [107] org.Hs.eg.db_3.17.0                     
#> [108] cachem_1.0.7                            
#> [109] stringr_1.5.0                           
#> [110] parallel_4.3.0                          
#> [111] shinycssloaders_1.0.0                   
#> [112] foreign_0.8-84                          
#> [113] AnnotationDbi_1.62.0                    
#> [114] restfulr_0.0.15                         
#> [115] pillar_1.9.0                            
#> [116] grid_4.3.0                              
#> [117] vctrs_0.6.2                             
#> [118] promises_1.2.0.1                        
#> [119] dbplyr_2.3.2                            
#> [120] EnsDb.Hsapiens.v86_2.99.0               
#> [121] xtable_1.8-4                            
#> [122] cluster_2.1.4                           
#> [123] htmlTable_2.4.1                         
#> [124] evaluate_0.20                           
#> [125] GenomicFeatures_1.52.0                  
#> [126] cli_3.6.1                               
#> [127] compiler_4.3.0                          
#> [128] Rsamtools_2.16.0                        
#> [129] rlang_1.1.0                             
#> [130] crayon_1.5.2                            
#> [131] interp_1.1-4                            
#> [132] ps_1.7.5                                
#> [133] stringi_1.7.12                          
#> [134] deldir_1.0-6                            
#> [135] BiocParallel_1.34.0                     
#> [136] munsell_0.5.0                           
#> [137] Biostrings_2.68.0                       
#> [138] lazyeval_0.2.2                          
#> [139] Matrix_1.5-4                            
#> [140] BSgenome_1.68.0                         
#> [141] hms_1.1.3                               
#> [142] bit64_4.0.5                             
#> [143] ggplot2_3.4.2                           
#> [144] KEGGREST_1.40.0                         
#> [145] shiny_1.7.4                             
#> [146] SummarizedExperiment_1.30.0             
#> [147] highr_0.10                              
#> [148] memoise_2.0.1                           
#> [149] bslib_0.4.2                             
#> [150] bit_4.0.5