run_coverageQC {InPAS} | R Documentation |
Calculate coverage over gene bodies and 3UTRs. This function is used for quality control of the coverage.The coverage rate can show the complexity of RNA-seq library.
run_coverageQC( sqlite_db, TxDb, edb, genome, cutoff_readsNum = 1, cutoff_expdGene_cvgRate = 0.1, cutoff_expdGene_sampleRate = 0.5, removeScaffolds = FALSE, BPPARAM = NULL, which = NULL, ... )
sqlite_db |
A path to the SQLite database for InPAS, i.e. the output of
|
TxDb |
An object of GenomicFeatures::TxDb |
edb |
An object of ensembldb::EnsDb |
genome |
An object of BSgenome::BSgenome |
cutoff_readsNum |
cutoff reads number. If the coverage in the location is greater than cutoff_readsNum,the location will be treated as covered by signal |
cutoff_expdGene_cvgRate |
cutoff_expdGene_cvgRate and cutoff_expdGene_sampleRate are the parameters used to calculate which gene is expressed in all input dataset. cutoff_expdGene_cvgRateset the cutoff value for the coverage rate of each gene; cutoff_expdGene_sampleRateset the cutoff value for ratio of numbers of expressed and all samples for each gene. for example, by default, cutoff_expdGene_cvgRate=0.1 and cutoff_expdGene_sampleRate=0.5,suppose there are 4 samples, for one gene, if the coverage rates by base are:0.05, 0.12, 0.2, 0.17, this gene will be count as expressed gene because mean(c(0.05,0.12, 0.2, 0.17) > cutoff_expdGene_cvgRate) > cutoff_expdGene_sampleRate if the coverage rates by base are: 0.05, 0.12, 0.07, 0.17, this gene will be count as un-expressed gene because mean(c(0.05, 0.12, 0.07, 0.17) > cutoff_expdGene_cvgRate)<= cutoff_expdGene_sampleRate |
cutoff_expdGene_sampleRate |
See cutoff_expdGene_cvgRate |
removeScaffolds |
A logical(1) vector, whether the scaffolds should be removed from the genome If you use a TxDb containing alternative scaffolds, you'd better to remove the scaffolds. |
BPPARAM |
an optional BiocParallel::BiocParallelParam instance determining the parallel back-end to be used during evaluation, or a list of BiocParallelParam instances, to be applied in sequence for nested calls to bplapply. It can be set to NULL or bpparam() |
which |
an object of GenomicRanges::GRanges or NULL. If it is not NULL, only the exons overlapping the given ranges are used. For fast data quality control, set which to Granges for one or a few large chromosomes. |
... |
Not used yet |
A data frame with colnames: gene.coverage.rate: coverage per base for all genes, expressed.gene.coverage.rate: coverage per base for expressed genes, UTR3.coverage.rate: coverage per base for all 3' UTRs,UTR3.expressed.gene.subset.coverage. rate: coverage per base for 3' UTRs of expressed genes. and rownames: the names of coverage
Jianhong Ou, Haibo Liu
if (interactive()) { library("BSgenome.Mmusculus.UCSC.mm10") library("TxDb.Mmusculus.UCSC.mm10.knownGene") library("EnsDb.Mmusculus.v79") genome <- BSgenome.Mmusculus.UCSC.mm10 TxDb <- TxDb.Mmusculus.UCSC.mm10.knownGene edb <- EnsDb.Mmusculus.v79 bedgraphs <- system.file("extdata",c("Baf3.extract.bedgraph", "UM15.extract.bedgraph"), package = "InPAS") tags <- c("Baf3", "UM15") metadata <- data.frame(tag = tags, condition = c("Baf3", "UM15"), bedgraph_file = bedgraphs) outdir = tempdir() write.table(metadata, file =file.path(outdir, "metadata.txt"), sep = "\t", quote = FALSE, row.names = FALSE) sqlite_db <- setup_sqlitedb(metadata = file.path(outdir, "metadata.txt"), outdir) coverage <- list() for (i in seq_along(bedgraphs)){ coverage[[tags[i]]] <- get_ssRleCov(bedgraph = bedgraphs[i], tag = tags[i], genome = genome, sqlite_db = sqlite_db, outdir = outdir, removeScaffolds = TRUE, BPPARAM = NULL) } coverage_files <- assemble_allCov(sqlite_db, outdir, genome, removeScaffolds = FALSE) run_coverageQC(sqlite_db, TxDb, edb, genome, removeScaffolds = TRUE, which = GRanges("chr6", ranges = IRanges(98013000, 140678000))) }