split_on_utr_and_add_feature {chimeraviz} | R Documentation |
This function will look for ranges (exons) in the GRanges object that has the coding DNA sequence starting or stopping within it. If found, these exons are split, and each exon in the GRanges object will be tagged as either "protein_coding", "5utr", or "3utr". The returned GRanges object will have feature values set in mcols(gr)$feature reflecting this.
split_on_utr_and_add_feature(gr)
gr |
The GRanges object we want to split and tag with feature info. |
An updated GRanges object with feature values set.
# Load fusion data and choose a fusion object: defuseData <- system.file( "extdata", "defuse_833ke_results.filtered.tsv", package="chimeraviz") fusions <- import_defuse(defuseData, "hg19", 1) fusion <- get_fusion_by_id(fusions, 5267) # Create edb object edbSqliteFile <- system.file( "extdata", "Homo_sapiens.GRCh37.74.sqlite", package="chimeraviz") edb <- ensembldb::EnsDb(edbSqliteFile) # Get all exons for all transcripts in the genes in the fusion transcript allTranscripts <- ensembldb::exonsBy( edb, filter = list( AnnotationFilter::GeneIdFilter( c( partner_gene_ensembl_id(upstream_partner_gene(fusion)), partner_gene_ensembl_id(downstream_partner_gene(fusion))))), columns = c( "gene_id", "gene_name", "tx_id", "tx_cds_seq_start", "tx_cds_seq_end", "exon_id")) # Extract one of the GRanges objects gr <- allTranscripts[[1]] # Check how many ranges there are here length(gr) # Should be 9 ranges # Split the ranges containing the cds start/stop positions and add feature # values: gr <- split_on_utr_and_add_feature(gr) # Check the length again length(gr) # Should be 11 now, as the range containing the cds_strat position and the # range containing the cds_stop position has been split into separate ranges