The Organism.dplyr creates an on disk sqlite database to hold data of an organism combined from an ‘org’ package (e.g., org.Hs.eg.db) and a genome coordinate functionality of the ‘TxDb’ package (e.g., TxDb.Hsapiens.UCSC.hg38.knownGene). It aims to provide an integrated presentation of identifiers and genomic coordinates. And a src_organism object is created to point to the database.
The src_organism object is created as an extension of src_sql and
src_sqlite from dplyr, which inherited all dplyr methods. It also
implements the select()
interface from AnnotationDbi and
genomic coordinates extractors from GenomicFeatures.
The src_organism()
constructor creates an on disk sqlite database file with
data from a given ‘TxDb’ package and corresponding ‘org’ package. When dbpath
is given, file is created at the given path, otherwise temporary file is
created.
library(Organism.dplyr)
Running src_organism()
without a given path will save the sqlite file to a
tempdir()
:
src <- src_organism("TxDb.Hsapiens.UCSC.hg38.knownGene")
Alternatively you can provide explicit path to where the sqlite file should be saved, and re-use the data base at a later date.
path <- "path/to/my.sqlite"
src <- src_organism("TxDb.Hsapiens.UCSC.hg38.knownGene", path)
supportedOrganisms()
provides a list of organisms with corresponding ‘org’
and ‘TxDb’ packages being supported.
supportedOrganisms()
## # A tibble: 21 x 3
## organism OrgDb TxDb
## <chr> <chr> <chr>
## 1 Bos taurus org.Bt.eg.db TxDb.Btaurus.UCSC.bosTau8.refGene
## 2 Caenorhabditis elegans org.Ce.eg.db TxDb.Celegans.UCSC.ce11.refGene
## 3 Caenorhabditis elegans org.Ce.eg.db TxDb.Celegans.UCSC.ce6.ensGene
## 4 Canis familiaris org.Cf.eg.db TxDb.Cfamiliaris.UCSC.canFam3.refGene
## 5 Drosophila melanogaster org.Dm.eg.db TxDb.Dmelanogaster.UCSC.dm3.ensGene
## 6 Drosophila melanogaster org.Dm.eg.db TxDb.Dmelanogaster.UCSC.dm6.ensGene
## 7 Danio rerio org.Dr.eg.db TxDb.Drerio.UCSC.danRer10.refGene
## 8 Gallus gallus org.Gg.eg.db TxDb.Ggallus.UCSC.galGal4.refGene
## 9 Homo sapiens org.Hs.eg.db TxDb.Hsapiens.UCSC.hg18.knownGene
## 10 Homo sapiens org.Hs.eg.db TxDb.Hsapiens.UCSC.hg19.knownGene
## # … with 11 more rows
Organism name, genome and id could be specified to create sqlite database. Organism name (either Organism or common name) must be provided to create the database, if genome and/or id are not provided, most recent ‘TxDb’ package is used.
src <- src_ucsc("human", path)
An existing on-disk sqlite file can be accessed without recreating the database. A version of the database created with TxDb.Hsapiens.UCSC.hg38.knownGene, with just 50 Entrez gene identifiers, is distributed with the Organism.dplyr package
src <- src_organism(dbpath = hg38light())
src
## src: sqlite 3.33.0 [/tmp/RtmpzmwPb0/Rinst35685ec978c/Organism.dplyr/extdata/light.hg38.knownGene.sqlite]
## tbls: id, id_accession, id_go, id_go_all, id_omim_pm, id_protein,
## id_transcript, ranges_cds, ranges_exon, ranges_gene, ranges_tx
All methods from package dplyr can be used for a src_organism object.
Look at all available tables.
src_tbls(src)
## [1] "id_accession" "id_transcript" "id" "id_omim_pm"
## [5] "id_protein" "id_go" "id_go_all" "ranges_gene"
## [9] "ranges_tx" "ranges_exon" "ranges_cds"
Look at data from one specific table.
tbl(src, "id")
## # Source: table<id> [?? x 6]
## # Database: sqlite 3.33.0 []
## entrez map ensembl symbol genename alias
## <chr> <chr> <chr> <chr> <chr> <chr>
## 1 1 19q13.43 ENSG00000121410 A1BG alpha-1-B glycoprotein A1B
## 2 1 19q13.43 ENSG00000121410 A1BG alpha-1-B glycoprotein ABG
## 3 1 19q13.43 ENSG00000121410 A1BG alpha-1-B glycoprotein GAB
## 4 1 19q13.43 ENSG00000121410 A1BG alpha-1-B glycoprotein HYST2477
## 5 1 19q13.43 ENSG00000121410 A1BG alpha-1-B glycoprotein A1BG
## 6 10 8p22 ENSG00000156006 NAT2 N-acetyltransferase 2 AAC2
## 7 10 8p22 ENSG00000156006 NAT2 N-acetyltransferase 2 NAT-2
## 8 10 8p22 ENSG00000156006 NAT2 N-acetyltransferase 2 PNAT
## 9 10 8p22 ENSG00000156006 NAT2 N-acetyltransferase 2 NAT2
## 10 100 20q13.12 ENSG00000196839 ADA adenosine deaminase ADA
## # … with more rows
Look at fields of one table.
colnames(tbl(src, "id"))
## [1] "entrez" "map" "ensembl" "symbol" "genename" "alias"
Below are some examples of querying tables using dplyr.
SNORD%
is from
SQL, with %
representing a wild-card match to any string)tbl(src, "id") %>%
filter(symbol %like% "SNORD%") %>%
dplyr::select(entrez, map, ensembl, symbol) %>%
distinct() %>% arrange(symbol) %>% collect()
## # A tibble: 8 x 4
## entrez map ensembl symbol
## <chr> <chr> <chr> <chr>
## 1 100033413 15q11.2 ENSG00000207063 SNORD116-1
## 2 100033414 15q11.2 ENSG00000207001 SNORD116-2
## 3 100033415 15q11.2 ENSG00000207014 SNORD116-3
## 4 100033416 15q11.2 ENSG00000275529 SNORD116-4
## 5 100033417 15q11.2 ENSG00000207191 SNORD116-5
## 6 100033418 15q11.2 ENSG00000207442 SNORD116-6
## 7 100033419 15q11.2 ENSG00000207133 SNORD116-7
## 8 100033420 15q11.2 ENSG00000207093 SNORD116-8
inner_join(tbl(src, "id"), tbl(src, "id_go")) %>%
filter(symbol == "ADA") %>%
dplyr::select(entrez, ensembl, symbol, go, evidence, ontology)
## Joining, by = "entrez"
## # Source: lazy query [?? x 6]
## # Database: sqlite 3.33.0 []
## entrez ensembl symbol go evidence ontology
## <chr> <chr> <chr> <chr> <chr> <chr>
## 1 100 ENSG00000196839 ADA GO:0001666 IDA BP
## 2 100 ENSG00000196839 ADA GO:0001821 IEA BP
## 3 100 ENSG00000196839 ADA GO:0001829 IEA BP
## 4 100 ENSG00000196839 ADA GO:0001883 IEA MF
## 5 100 ENSG00000196839 ADA GO:0001889 IEA BP
## 6 100 ENSG00000196839 ADA GO:0001890 IEA BP
## 7 100 ENSG00000196839 ADA GO:0002314 IEA BP
## 8 100 ENSG00000196839 ADA GO:0002636 IEA BP
## 9 100 ENSG00000196839 ADA GO:0002686 IEA BP
## 10 100 ENSG00000196839 ADA GO:0002906 IEA BP
## # … with more rows
txcount <- inner_join(tbl(src, "id"), tbl(src, "ranges_tx")) %>%
dplyr::select(symbol, tx_id) %>%
group_by(symbol) %>%
summarize(count = n()) %>%
dplyr::select(symbol, count) %>%
arrange(desc(count)) %>%
collect(n=Inf)
## Joining, by = "entrez"
txcount
## # A tibble: 18 x 2
## symbol count
## <chr> <int>
## 1 AKT3 396
## 2 NAALAD2 200
## 3 A1BG 40
## 4 MED6 39
## 5 CDH2 25
## 6 NR2E3 18
## 7 ADA 9
## 8 LINC02584 8
## 9 NAT2 8
## 10 SNORD116-2 4
## 11 SNORD116-3 4
## 12 SNORD116-5 4
## 13 SNORD116-1 3
## 14 POU5F1P5 2
## 15 SNORD116-4 2
## 16 SNORD116-8 2
## 17 DUXB 1
## 18 ZBTB11-AS1 1
library(ggplot2)
ggplot(txcount, aes(x = symbol, y = count)) +
geom_bar(stat="identity") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
ggtitle("Transcript count") +
labs(x = "Symbol") +
labs(y = "Count")
inner_join(tbl(src, "id"), tbl(src, "ranges_gene")) %>%
filter(symbol %in% c("ADA", "NAT2")) %>%
dplyr::select(gene_chrom, gene_start, gene_end, gene_strand,
symbol, map) %>%
collect() %>% GenomicRanges::GRanges()
## Joining, by = "entrez"
## GRanges object with 5 ranges and 2 metadata columns:
## seqnames ranges strand | symbol map
## <Rle> <IRanges> <Rle> | <character> <character>
## [1] chr8 18391282-18401218 + | NAT2 8p22
## [2] chr8 18391282-18401218 + | NAT2 8p22
## [3] chr8 18391282-18401218 + | NAT2 8p22
## [4] chr8 18391282-18401218 + | NAT2 8p22
## [5] chr20 44619522-44652233 - | ADA 20q13.12
## -------
## seqinfo: 2 sequences from an unspecified genome; no seqlengths
Methods select()
, keytypes()
, keys()
, columns()
and mapIds
from
AnnotationDbi are implemented for src_organism objects.
Use keytypes()
to discover which keytypes can be passed to keytype argument
of methods select()
or keys()
.
keytypes(src)
## [1] "accnum" "alias" "cds_chrom" "cds_end" "cds_id"
## [6] "cds_name" "cds_start" "cds_strand" "ensembl" "ensemblprot"
## [11] "ensembltrans" "entrez" "enzyme" "evidence" "evidenceall"
## [16] "exon_chrom" "exon_end" "exon_id" "exon_name" "exon_rank"
## [21] "exon_start" "exon_strand" "gene_chrom" "gene_end" "gene_start"
## [26] "gene_strand" "genename" "go" "goall" "ipi"
## [31] "map" "omim" "ontology" "ontologyall" "pfam"
## [36] "pmid" "prosite" "refseq" "symbol" "tx_chrom"
## [41] "tx_end" "tx_id" "tx_name" "tx_start" "tx_strand"
## [46] "tx_type" "unigene" "uniprot"
Use columns()
to discover which kinds of data can be returned for the
src_organism object.
columns(src)
## [1] "accnum" "alias" "cds_chrom" "cds_end" "cds_id"
## [6] "cds_name" "cds_start" "cds_strand" "ensembl" "ensemblprot"
## [11] "ensembltrans" "entrez" "enzyme" "evidence" "evidenceall"
## [16] "exon_chrom" "exon_end" "exon_id" "exon_name" "exon_rank"
## [21] "exon_start" "exon_strand" "gene_chrom" "gene_end" "gene_start"
## [26] "gene_strand" "genename" "go" "goall" "ipi"
## [31] "map" "omim" "ontology" "ontologyall" "pfam"
## [36] "pmid" "prosite" "refseq" "symbol" "tx_chrom"
## [41] "tx_end" "tx_id" "tx_name" "tx_start" "tx_strand"
## [46] "tx_type" "unigene" "uniprot"
keys()
returns keys for the src_organism object. By default it returns the
primary keys for the database, and returns the keys from that keytype when the
keytype argument is used.
Keys of entrez
head(keys(src))
## [1] "1" "10" "100" "1000" "10000" "100008586"
Keys of symbol
head(keys(src, "symbol"))
## [1] "A1BG" "NAT2" "ADA" "CDH2" "AKT3" "GAGE12F"
select()
retrieves the data as a tibble based on parameters for selected
keys columns and keytype arguments. If requested columns that have multiple
matches for the keys, select_tbl()
will return a tibble with one row for
each possible match, and select()
will return a data frame.
keytype <- "symbol"
keys <- c("ADA", "NAT2")
columns <- c("entrez", "tx_id", "tx_name","exon_id")
select_tbl(src, keys, columns, keytype)
## Joining, by = "entrez"
## # Source: lazy query [?? x 5]
## # Database: sqlite 3.33.0 []
## symbol entrez tx_id tx_name exon_id
## <chr> <chr> <int> <chr> <int>
## 1 NAT2 10 92729 ENST00000286479.4 259633
## 2 NAT2 10 92729 ENST00000286479.4 259635
## 3 NAT2 10 92730 ENST00000520116.1 259634
## 4 NAT2 10 92730 ENST00000520116.1 259636
## 5 NAT2 10 92729 ENST00000286479.4 259633
## 6 NAT2 10 92729 ENST00000286479.4 259635
## 7 NAT2 10 92730 ENST00000520116.1 259634
## 8 NAT2 10 92730 ENST00000520116.1 259636
## 9 NAT2 10 92729 ENST00000286479.4 259633
## 10 NAT2 10 92729 ENST00000286479.4 259635
## # … with more rows
mapIds()
gets the mapped ids (column) for a set of keys that are of a
particular keytype. Usually returned as a named character vector.
mapIds(src, keys, column = "tx_name", keytype)
## Joining, by = "entrez"
## ADA NAT2
## "ENST00000372874.9" "ENST00000286479.4"
mapIds(src, keys, column = "tx_name", keytype, multiVals="CharacterList")
## Joining, by = "entrez"
## CharacterList of length 2
## [["ADA"]] ENST00000372874.9 ENST00000464097.5 ... ENST00000545776.5
## [["NAT2"]] ENST00000286479.4 ENST00000520116.1 ... ENST00000520116.1
Eleven genomic coordinates extractor methods are available in this
package: transcripts()
, exons()
, cds()
, genes()
,
promoters()
, transcriptsBy()
, exonsBy()
, cdsBy()
,
intronsByTranscript()
, fiveUTRsByTranscript()
,
threeUTRsByTranscript()
. Data can be returned in two versions, for
instance tibble (transcripts_tbl()
) and GRanges or GRangesList
(transcripts()
).
Filters can be applied to all extractor functions. The output can be resctricted
by an AnnotationFilter
, an AnnotationFilterList
, or an expression that can
be tranlated into an AnnotationFilterList
. Valid filters can be retrieved by
supportedFilters(src)
.
supportedFilters(src)
## filter field
## 1 AccnumFilter accnum
## 2 AliasFilter alias
## 3 CdsChromFilter cds_chrom
## 45 CdsEndFilter cds_end
## 43 CdsIdFilter cds_id
## 4 CdsNameFilter cds_name
## 44 CdsStartFilter cds_start
## 5 CdsStrandFilter cds_strand
## 6 EnsemblFilter ensembl
## 7 EnsemblprotFilter ensemblprot
## 8 EnsembltransFilter ensembltrans
## 9 EntrezFilter entrez
## 10 EnzymeFilter enzyme
## 11 EvidenceFilter evidence
## 12 EvidenceallFilter evidenceall
## 13 ExonChromFilter exon_chrom
## 48 ExonEndFilter exon_end
## 46 ExonIdFilter exon_id
## 14 ExonNameFilter exon_name
## 49 ExonRankFilter exon_rank
## 47 ExonStartFilter exon_start
## 15 ExonStrandFilter exon_strand
## 17 FlybaseCgFilter flybase_cg
## 16 FlybaseFilter flybase
## 18 FlybaseProtFilter flybase_prot
## 55 GRangesFilter granges
## 19 GeneChromFilter gene_chrom
## 51 GeneEndFilter gene_end
## 50 GeneStartFilter gene_start
## 20 GeneStrandFilter gene_strand
## 21 GenenameFilter genename
## 22 GoFilter go
## 23 GoallFilter goall
## 24 IpiFilter ipi
## 25 MapFilter map
## 26 MgiFilter mgi
## 27 OmimFilter omim
## 28 OntologyFilter ontology
## 29 OntologyallFilter ontologyall
## 30 PfamFilter pfam
## 31 PmidFilter pmid
## 32 PrositeFilter prosite
## 33 RefseqFilter refseq
## 34 SymbolFilter symbol
## 35 TxChromFilter tx_chrom
## 54 TxEndFilter tx_end
## 52 TxIdFilter tx_id
## 36 TxNameFilter tx_name
## 53 TxStartFilter tx_start
## 37 TxStrandFilter tx_strand
## 38 TxTypeFilter tx_type
## 39 UnigeneFilter unigene
## 40 UniprotFilter uniprot
## 41 WormbaseFilter wormbase
## 42 ZfinFilter zfin
All filters take two parameters: value and condition, condition could be one of “==”, “!=”, “startsWith”, “endsWith”, “>”, “<”, “>=” and “<=”, default condition is “==”.
EnsemblFilter("ENSG00000196839")
## class: EnsemblFilter
## condition: ==
## value: ENSG00000196839
SymbolFilter("SNORD", "startsWith")
## class: SymbolFilter
## condition: startsWith
## value: SNORD
The following illustrates several ways of inputting filters to an extractor function.
smbl <- SymbolFilter("SNORD", "startsWith")
transcripts_tbl(src, filter=smbl)
## # Source: lazy query [?? x 7]
## # Database: sqlite 3.33.0 []
## # Ordered by: tx_id
## tx_chrom tx_start tx_end tx_strand tx_id tx_name symbol
## <chr> <int> <int> <chr> <int> <chr> <chr>
## 1 chr15 25051477 25051571 + 155668 ENST00000384335.1 SNORD116-1
## 2 chr15 25054210 25054304 + 155672 ENST00000384274.1 SNORD116-2
## 3 chr15 25056860 25056954 + 155673 ENST00000384287.1 SNORD116-3
## 4 chr15 25059538 25059633 + 155674 ENST00000384733.1 SNORD116-4
## 5 chr15 25062333 25062427 + 155676 ENST00000384462.1 SNORD116-5
## 6 chr15 25065026 25065121 + 155679 ENST00000384711.1 SNORD116-2
## 7 chr15 25067788 25067882 + 155687 ENST00000384404.1 SNORD116-5
## 8 chr15 25070432 25070526 + 155689 ENST00000384365.1 SNORD116-8
## 9 chr15 25073107 25073201 + 155690 ENST00000384000.1 SNORD116-3
filter <- AnnotationFilterList(smbl)
transcripts_tbl(src, filter=filter)
## # Source: lazy query [?? x 7]
## # Database: sqlite 3.33.0 []
## # Ordered by: tx_id
## tx_chrom tx_start tx_end tx_strand tx_id tx_name symbol
## <chr> <int> <int> <chr> <int> <chr> <chr>
## 1 chr15 25051477 25051571 + 155668 ENST00000384335.1 SNORD116-1
## 2 chr15 25054210 25054304 + 155672 ENST00000384274.1 SNORD116-2
## 3 chr15 25056860 25056954 + 155673 ENST00000384287.1 SNORD116-3
## 4 chr15 25059538 25059633 + 155674 ENST00000384733.1 SNORD116-4
## 5 chr15 25062333 25062427 + 155676 ENST00000384462.1 SNORD116-5
## 6 chr15 25065026 25065121 + 155679 ENST00000384711.1 SNORD116-2
## 7 chr15 25067788 25067882 + 155687 ENST00000384404.1 SNORD116-5
## 8 chr15 25070432 25070526 + 155689 ENST00000384365.1 SNORD116-8
## 9 chr15 25073107 25073201 + 155690 ENST00000384000.1 SNORD116-3
transcripts_tbl(src, filter=~smbl)
## # Source: lazy query [?? x 7]
## # Database: sqlite 3.33.0 []
## # Ordered by: tx_id
## tx_chrom tx_start tx_end tx_strand tx_id tx_name symbol
## <chr> <int> <int> <chr> <int> <chr> <chr>
## 1 chr15 25051477 25051571 + 155668 ENST00000384335.1 SNORD116-1
## 2 chr15 25054210 25054304 + 155672 ENST00000384274.1 SNORD116-2
## 3 chr15 25056860 25056954 + 155673 ENST00000384287.1 SNORD116-3
## 4 chr15 25059538 25059633 + 155674 ENST00000384733.1 SNORD116-4
## 5 chr15 25062333 25062427 + 155676 ENST00000384462.1 SNORD116-5
## 6 chr15 25065026 25065121 + 155679 ENST00000384711.1 SNORD116-2
## 7 chr15 25067788 25067882 + 155687 ENST00000384404.1 SNORD116-5
## 8 chr15 25070432 25070526 + 155689 ENST00000384365.1 SNORD116-8
## 9 chr15 25073107 25073201 + 155690 ENST00000384000.1 SNORD116-3
A GRangesFilter()
can also be used as a filter for the methods with
result displaying as GRanges or GRangesList.
gr <- GRangesFilter(GenomicRanges::GRanges("chr15:25062333-25065121"))
transcripts(src, filter=~smbl & gr)
## GRanges object with 2 ranges and 3 metadata columns:
## seqnames ranges strand | tx_id tx_name
## <Rle> <IRanges> <Rle> | <integer> <character>
## [1] chr15 25062333-25062427 + | 155676 ENST00000384462.1
## [2] chr15 25065026-25065121 + | 155679 ENST00000384711.1
## symbol
## <character>
## [1] SNORD116-5
## [2] SNORD116-2
## -------
## seqinfo: 595 sequences (1 circular) from hg38 genome
Filters in extractor functions support &
, |
, !
(negation), and
()
(grouping). Transcript coordinates of gene symbol equal to “ADA” and
transcript start position less than 44619810.
transcripts_tbl(src, filter = AnnotationFilterList(
SymbolFilter("ADA"),
TxStartFilter(44619810,"<"),
logicOp="&")
)
## # Source: lazy query [?? x 7]
## # Database: sqlite 3.33.0 []
## # Ordered by: tx_id
## tx_chrom tx_start tx_end tx_strand tx_id tx_name symbol
## <chr> <int> <int> <chr> <int> <chr> <chr>
## 1 chr20 44619522 44626491 - 210819 ENST00000464097.5 ADA
## 2 chr20 44619522 44651699 - 210820 ENST00000372874.9 ADA
## 3 chr20 44619579 44651681 - 210821 ENST00000536532.5 ADA
## Equivalent to
transcripts_tbl(src, filter = ~symbol == "ADA" & tx_start < 44619810)
## # Source: lazy query [?? x 7]
## # Database: sqlite 3.33.0 []
## # Ordered by: tx_id
## tx_chrom tx_start tx_end tx_strand tx_id tx_name symbol
## <chr> <int> <int> <chr> <int> <chr> <chr>
## 1 chr20 44619522 44626491 - 210819 ENST00000464097.5 ADA
## 2 chr20 44619522 44651699 - 210820 ENST00000372874.9 ADA
## 3 chr20 44619579 44651681 - 210821 ENST00000536532.5 ADA
Transcripts coordinates of gene symbol equal to “ADA” or transcript end position equal to 243843236.
txend <- TxEndFilter(243843236, '==')
transcripts_tbl(src, filter = ~symbol == "ADA" | txend)
## # Source: lazy query [?? x 7]
## # Database: sqlite 3.33.0 []
## # Ordered by: tx_id
## tx_chrom tx_start tx_end tx_strand tx_id tx_name symbol
## <chr> <int> <int> <chr> <int> <chr> <chr>
## 1 chr1 243488233 243843236 - 19678 ENST00000336199.9 AKT3
## 2 chr20 44619522 44626491 - 210819 ENST00000464097.5 ADA
## 3 chr20 44619522 44651699 - 210820 ENST00000372874.9 ADA
## 4 chr20 44619579 44651681 - 210821 ENST00000536532.5 ADA
## 5 chr20 44619810 44651691 - 210822 ENST00000492931.5 ADA
## 6 chr20 44619810 44651691 - 210823 ENST00000537820.1 ADA
## 7 chr20 44619810 44651691 - 210824 ENST00000539235.5 ADA
## 8 chr20 44626323 44651661 - 210825 ENST00000545776.5 ADA
## 9 chr20 44626517 44652114 - 210826 ENST00000536076.1 ADA
## 10 chr20 44636071 44652233 - 210827 ENST00000535573.1 ADA
Using negation to find transcript coordinates of gene symbol equal to “ADA” and transcript start positions NOT less than 44619810.
transcripts_tbl(src, filter = ~symbol == "ADA" & !tx_start < 44618910)
## # Source: lazy query [?? x 7]
## # Database: sqlite 3.33.0 []
## # Ordered by: tx_id
## tx_chrom tx_start tx_end tx_strand tx_id tx_name symbol
## <chr> <int> <int> <chr> <int> <chr> <chr>
## 1 chr20 44619522 44626491 - 210819 ENST00000464097.5 ADA
## 2 chr20 44619522 44651699 - 210820 ENST00000372874.9 ADA
## 3 chr20 44619579 44651681 - 210821 ENST00000536532.5 ADA
## 4 chr20 44619810 44651691 - 210822 ENST00000492931.5 ADA
## 5 chr20 44619810 44651691 - 210823 ENST00000537820.1 ADA
## 6 chr20 44619810 44651691 - 210824 ENST00000539235.5 ADA
## 7 chr20 44626323 44651661 - 210825 ENST00000545776.5 ADA
## 8 chr20 44626517 44652114 - 210826 ENST00000536076.1 ADA
## 9 chr20 44636071 44652233 - 210827 ENST00000535573.1 ADA
Using grouping to find transcript coordinates of a long filter statement.
transcripts_tbl(src,
filter = ~(symbol == 'ADA' & !(tx_start >= 44619810 | tx_end < 44651742)) |
(smbl & !tx_end > 25056954)
)
## # Source: lazy query [?? x 7]
## # Database: sqlite 3.33.0 []
## # Ordered by: tx_id
## tx_chrom tx_start tx_end tx_strand tx_id tx_name symbol
## <chr> <int> <int> <chr> <int> <chr> <chr>
## 1 chr15 25051477 25051571 + 155668 ENST00000384335.1 SNORD116-1
## 2 chr15 25054210 25054304 + 155672 ENST00000384274.1 SNORD116-2
## 3 chr15 25056860 25056954 + 155673 ENST00000384287.1 SNORD116-3
sessionInfo()
## R version 4.0.3 (2020-10-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.5 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.12-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.12-bioc/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] parallel stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] ggplot2_3.3.2 GenomicRanges_1.42.0 GenomeInfoDb_1.26.0
## [4] IRanges_2.24.0 S4Vectors_0.28.0 BiocGenerics_0.36.0
## [7] Organism.dplyr_1.18.0 AnnotationFilter_1.14.0 dplyr_1.0.2
## [10] BiocStyle_2.18.0
##
## loaded via a namespace (and not attached):
## [1] MatrixGenerics_1.2.0 Biobase_2.50.0
## [3] httr_1.4.2 bit64_4.0.5
## [5] assertthat_0.2.1 askpass_1.1
## [7] BiocManager_1.30.10 BiocFileCache_1.14.0
## [9] blob_1.2.1 GenomeInfoDbData_1.2.4
## [11] Rsamtools_2.6.0 yaml_2.2.1
## [13] progress_1.2.2 pillar_1.4.6
## [15] RSQLite_2.2.1 lattice_0.20-41
## [17] glue_1.4.2 digest_0.6.27
## [19] XVector_0.30.0 colorspace_1.4-1
## [21] htmltools_0.5.0 Matrix_1.2-18
## [23] XML_3.99-0.5 pkgconfig_2.0.3
## [25] biomaRt_2.46.0 magick_2.5.0
## [27] bookdown_0.21 zlibbioc_1.36.0
## [29] purrr_0.3.4 scales_1.1.1
## [31] BiocParallel_1.24.0 tibble_3.0.4
## [33] openssl_1.4.3 farver_2.0.3
## [35] generics_0.0.2 ellipsis_0.3.1
## [37] withr_2.3.0 SummarizedExperiment_1.20.0
## [39] GenomicFeatures_1.42.0 lazyeval_0.2.2
## [41] cli_2.1.0 magrittr_1.5
## [43] crayon_1.3.4 memoise_1.1.0
## [45] evaluate_0.14 fansi_0.4.1
## [47] xml2_1.3.2 tools_4.0.3
## [49] prettyunits_1.1.1 hms_0.5.3
## [51] lifecycle_0.2.0 matrixStats_0.57.0
## [53] stringr_1.4.0 munsell_0.5.0
## [55] DelayedArray_0.16.0 AnnotationDbi_1.52.0
## [57] Biostrings_2.58.0 compiler_4.0.3
## [59] rlang_0.4.8 grid_4.0.3
## [61] RCurl_1.98-1.2 rappdirs_0.3.1
## [63] labeling_0.4.2 bitops_1.0-6
## [65] rmarkdown_2.5 gtable_0.3.0
## [67] DBI_1.1.0 curl_4.3
## [69] R6_2.4.1 GenomicAlignments_1.26.0
## [71] knitr_1.30 rtracklayer_1.50.0
## [73] utf8_1.1.4 bit_4.0.4
## [75] stringi_1.5.3 Rcpp_1.0.5
## [77] vctrs_0.3.4 dbplyr_1.4.4
## [79] tidyselect_1.1.0 xfun_0.18