Contents

1 Introduction

The Organism.dplyr creates an on disk sqlite database to hold data of an organism combined from an ‘org’ package (e.g., org.Hs.eg.db) and a genome coordinate functionality of the ‘TxDb’ package (e.g., TxDb.Hsapiens.UCSC.hg38.knownGene). It aims to provide an integrated presentation of identifiers and genomic coordinates. And a src_organism object is created to point to the database.

The src_organism object is created as an extension of src_sql and src_sqlite from dplyr, which inherited all dplyr methods. It also implements the select() interface from AnnotationDbi and genomic coordinates extractors from GenomicFeatures.

2 Constructing a src_organism

2.1 Make sqlite datebase from ‘TxDb’ package

The src_organism() constructor creates an on disk sqlite database file with data from a given ‘TxDb’ package and corresponding ‘org’ package. When dbpath is given, file is created at the given path, otherwise temporary file is created.

library(Organism.dplyr)

Running src_organism() without a given path will save the sqlite file to a tempdir():

src <- src_organism("TxDb.Hsapiens.UCSC.hg38.knownGene")

Alternatively you can provide explicit path to where the sqlite file should be saved, and re-use the data base at a later date.

path <- "path/to/my.sqlite"
src <- src_organism("TxDb.Hsapiens.UCSC.hg38.knownGene", path)

supportedOrganisms() provides a list of organisms with corresponding ‘org’ and ‘TxDb’ packages being supported.

supportedOrganisms()
## # A tibble: 21 x 3
##    organism                OrgDb        TxDb                                 
##    <chr>                   <chr>        <chr>                                
##  1 Bos taurus              org.Bt.eg.db TxDb.Btaurus.UCSC.bosTau8.refGene    
##  2 Caenorhabditis elegans  org.Ce.eg.db TxDb.Celegans.UCSC.ce11.refGene      
##  3 Caenorhabditis elegans  org.Ce.eg.db TxDb.Celegans.UCSC.ce6.ensGene       
##  4 Canis familiaris        org.Cf.eg.db TxDb.Cfamiliaris.UCSC.canFam3.refGene
##  5 Drosophila melanogaster org.Dm.eg.db TxDb.Dmelanogaster.UCSC.dm3.ensGene  
##  6 Drosophila melanogaster org.Dm.eg.db TxDb.Dmelanogaster.UCSC.dm6.ensGene  
##  7 Danio rerio             org.Dr.eg.db TxDb.Drerio.UCSC.danRer10.refGene    
##  8 Gallus gallus           org.Gg.eg.db TxDb.Ggallus.UCSC.galGal4.refGene    
##  9 Homo sapiens            org.Hs.eg.db TxDb.Hsapiens.UCSC.hg18.knownGene    
## 10 Homo sapiens            org.Hs.eg.db TxDb.Hsapiens.UCSC.hg19.knownGene    
## # … with 11 more rows

2.2 Make sqlite datebase from organism name

Organism name, genome and id could be specified to create sqlite database. Organism name (either Organism or common name) must be provided to create the database, if genome and/or id are not provided, most recent ‘TxDb’ package is used.

src <- src_ucsc("human", path)

2.3 Access existing sqlite file

An existing on-disk sqlite file can be accessed without recreating the database. A version of the database created with TxDb.Hsapiens.UCSC.hg38.knownGene, with just 50 Entrez gene identifiers, is distributed with the Organism.dplyr package

src <- src_organism(dbpath = hg38light())
src
## src:  sqlite 3.29.0 [/tmp/RtmpTopsOK/Rinst563b76fc9bc5/Organism.dplyr/extdata/light.hg38.knownGene.sqlite]
## tbls: id, id_accession, id_go, id_go_all, id_omim_pm, id_protein,
##   id_transcript, ranges_cds, ranges_exon, ranges_gene, ranges_tx

3 The “dplyr” interface

All methods from package dplyr can be used for a src_organism object.

Look at all available tables.

src_tbls(src)
##  [1] "id_accession"  "id_transcript" "id"            "id_omim_pm"   
##  [5] "id_protein"    "id_go"         "id_go_all"     "ranges_gene"  
##  [9] "ranges_tx"     "ranges_exon"   "ranges_cds"

Look at data from one specific table.

tbl(src, "id")
## # Source:   table<id> [?? x 6]
## # Database: sqlite 3.29.0 []
##    entrez map      ensembl         symbol genename               alias   
##    <chr>  <chr>    <chr>           <chr>  <chr>                  <chr>   
##  1 1      19q13.43 ENSG00000121410 A1BG   alpha-1-B glycoprotein A1B     
##  2 1      19q13.43 ENSG00000121410 A1BG   alpha-1-B glycoprotein ABG     
##  3 1      19q13.43 ENSG00000121410 A1BG   alpha-1-B glycoprotein GAB     
##  4 1      19q13.43 ENSG00000121410 A1BG   alpha-1-B glycoprotein HYST2477
##  5 1      19q13.43 ENSG00000121410 A1BG   alpha-1-B glycoprotein A1BG    
##  6 10     8p22     ENSG00000156006 NAT2   N-acetyltransferase 2  AAC2    
##  7 10     8p22     ENSG00000156006 NAT2   N-acetyltransferase 2  NAT-2   
##  8 10     8p22     ENSG00000156006 NAT2   N-acetyltransferase 2  PNAT    
##  9 10     8p22     ENSG00000156006 NAT2   N-acetyltransferase 2  NAT2    
## 10 100    20q13.12 ENSG00000196839 ADA    adenosine deaminase    ADA     
## # … with more rows

Look at fields of one table.

colnames(tbl(src, "id"))
## [1] "entrez"   "map"      "ensembl"  "symbol"   "genename" "alias"

Below are some examples of querying tables using dplyr.

  1. Gene symbol starting with “SNORD” (the notation SNORD% is from SQL, with % representing a wild-card match to any string)
tbl(src, "id") %>%
    filter(symbol %like% "SNORD%") %>%
    dplyr::select(entrez, map, ensembl, symbol) %>%
    distinct() %>% arrange(symbol) %>% collect()
## # A tibble: 8 x 4
##   entrez    map     ensembl         symbol    
##   <chr>     <chr>   <chr>           <chr>     
## 1 100033413 15q11.2 ENSG00000207063 SNORD116-1
## 2 100033414 15q11.2 ENSG00000207001 SNORD116-2
## 3 100033415 15q11.2 ENSG00000207014 SNORD116-3
## 4 100033416 15q11.2 ENSG00000275529 SNORD116-4
## 5 100033417 15q11.2 ENSG00000207191 SNORD116-5
## 6 100033418 15q11.2 ENSG00000207442 SNORD116-6
## 7 100033419 15q11.2 ENSG00000207133 SNORD116-7
## 8 100033420 15q11.2 ENSG00000207093 SNORD116-8
  1. Gene ontology (GO) info for gene symbol “ADA”
inner_join(tbl(src, "id"), tbl(src, "id_go")) %>%
    filter(symbol == "ADA") %>%
    dplyr::select(entrez, ensembl, symbol, go, evidence, ontology)
## Joining, by = "entrez"
## # Source:   lazy query [?? x 6]
## # Database: sqlite 3.29.0 []
##    entrez ensembl         symbol go         evidence ontology
##    <chr>  <chr>           <chr>  <chr>      <chr>    <chr>   
##  1 100    ENSG00000196839 ADA    GO:0001666 IDA      BP      
##  2 100    ENSG00000196839 ADA    GO:0001821 IEA      BP      
##  3 100    ENSG00000196839 ADA    GO:0001829 IEA      BP      
##  4 100    ENSG00000196839 ADA    GO:0001883 IEA      MF      
##  5 100    ENSG00000196839 ADA    GO:0001889 IEA      BP      
##  6 100    ENSG00000196839 ADA    GO:0001890 IEA      BP      
##  7 100    ENSG00000196839 ADA    GO:0002314 IEA      BP      
##  8 100    ENSG00000196839 ADA    GO:0002636 IEA      BP      
##  9 100    ENSG00000196839 ADA    GO:0002686 IEA      BP      
## 10 100    ENSG00000196839 ADA    GO:0002906 IEA      BP      
## # … with more rows
  1. Gene transcript counts per gene symbol
txcount <- inner_join(tbl(src, "id"), tbl(src, "ranges_tx")) %>%
    dplyr::select(symbol, tx_id) %>%
    group_by(symbol) %>%
    summarize(count = n()) %>%
    dplyr::select(symbol, count) %>%
    arrange(desc(count)) %>%
    collect(n=Inf)
## Joining, by = "entrez"
txcount
## # A tibble: 18 x 2
##    symbol     count
##    <chr>      <int>
##  1 AKT3         396
##  2 NAALAD2      200
##  3 A1BG          40
##  4 MED6          39
##  5 CDH2          25
##  6 NR2E3         18
##  7 ADA            9
##  8 LINC02584      8
##  9 NAT2           8
## 10 SNORD116-2     4
## 11 SNORD116-3     4
## 12 SNORD116-5     4
## 13 SNORD116-1     3
## 14 POU5F1P5       2
## 15 SNORD116-4     2
## 16 SNORD116-8     2
## 17 DUXB           1
## 18 ZBTB11-AS1     1
library(ggplot2)
ggplot(txcount, aes(x = symbol, y = count)) + 
    geom_bar(stat="identity") +
    theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
    ggtitle("Transcript count") +
    labs(x = "Symbol") +
    labs(y = "Count")

  1. Gene coordinates of symbol “ADA” and “NAT2” as GRanges
inner_join(tbl(src, "id"), tbl(src, "ranges_gene")) %>%
    filter(symbol %in% c("ADA", "NAT2")) %>%
    dplyr::select(gene_chrom, gene_start, gene_end, gene_strand,
                  symbol, map) %>%
    collect() %>% GenomicRanges::GRanges()
## Joining, by = "entrez"
## GRanges object with 5 ranges and 2 metadata columns:
##       seqnames            ranges strand |      symbol         map
##          <Rle>         <IRanges>  <Rle> | <character> <character>
##   [1]     chr8 18391282-18401218      + |        NAT2        8p22
##   [2]     chr8 18391282-18401218      + |        NAT2        8p22
##   [3]     chr8 18391282-18401218      + |        NAT2        8p22
##   [4]     chr8 18391282-18401218      + |        NAT2        8p22
##   [5]    chr20 44619522-44652233      - |         ADA    20q13.12
##   -------
##   seqinfo: 2 sequences from an unspecified genome; no seqlengths

4 The “select” interface

Methods select(), keytypes(), keys(), columns() and mapIds from AnnotationDbi are implemented for src_organism objects.

Use keytypes() to discover which keytypes can be passed to keytype argument of methods select() or keys().

keytypes(src)
##  [1] "accnum"       "alias"        "cds_chrom"    "cds_end"     
##  [5] "cds_id"       "cds_name"     "cds_start"    "cds_strand"  
##  [9] "ensembl"      "ensemblprot"  "ensembltrans" "entrez"      
## [13] "enzyme"       "evidence"     "evidenceall"  "exon_chrom"  
## [17] "exon_end"     "exon_id"      "exon_name"    "exon_rank"   
## [21] "exon_start"   "exon_strand"  "gene_chrom"   "gene_end"    
## [25] "gene_start"   "gene_strand"  "genename"     "go"          
## [29] "goall"        "ipi"          "map"          "omim"        
## [33] "ontology"     "ontologyall"  "pfam"         "pmid"        
## [37] "prosite"      "refseq"       "symbol"       "tx_chrom"    
## [41] "tx_end"       "tx_id"        "tx_name"      "tx_start"    
## [45] "tx_strand"    "tx_type"      "unigene"      "uniprot"

Use columns() to discover which kinds of data can be returned for the src_organism object.

columns(src)
##  [1] "accnum"       "alias"        "cds_chrom"    "cds_end"     
##  [5] "cds_id"       "cds_name"     "cds_start"    "cds_strand"  
##  [9] "ensembl"      "ensemblprot"  "ensembltrans" "entrez"      
## [13] "enzyme"       "evidence"     "evidenceall"  "exon_chrom"  
## [17] "exon_end"     "exon_id"      "exon_name"    "exon_rank"   
## [21] "exon_start"   "exon_strand"  "gene_chrom"   "gene_end"    
## [25] "gene_start"   "gene_strand"  "genename"     "go"          
## [29] "goall"        "ipi"          "map"          "omim"        
## [33] "ontology"     "ontologyall"  "pfam"         "pmid"        
## [37] "prosite"      "refseq"       "symbol"       "tx_chrom"    
## [41] "tx_end"       "tx_id"        "tx_name"      "tx_start"    
## [45] "tx_strand"    "tx_type"      "unigene"      "uniprot"

keys() returns keys for the src_organism object. By default it returns the primary keys for the database, and returns the keys from that keytype when the keytype argument is used.

Keys of entrez

head(keys(src))
## [1] "1"         "10"        "100"       "1000"      "10000"     "100008586"

Keys of symbol

head(keys(src, "symbol"))
## [1] "A1BG"    "NAT2"    "ADA"     "CDH2"    "AKT3"    "GAGE12F"

select() retrieves the data as a tibble based on parameters for selected keys columns and keytype arguments. If requested columns that have multiple matches for the keys, select_tbl() will return a tibble with one row for each possible match, and select() will return a data frame.

keytype <- "symbol"
keys <- c("ADA", "NAT2")
columns <- c("entrez", "tx_id", "tx_name","exon_id")
select_tbl(src, keys, columns, keytype)
## Joining, by = "entrez"
## # Source:   lazy query [?? x 5]
## # Database: sqlite 3.29.0 []
##    symbol entrez  tx_id tx_name           exon_id
##    <chr>  <chr>   <int> <chr>               <int>
##  1 NAT2   10      92729 ENST00000286479.4  259633
##  2 NAT2   10      92729 ENST00000286479.4  259635
##  3 NAT2   10      92730 ENST00000520116.1  259634
##  4 NAT2   10      92730 ENST00000520116.1  259636
##  5 ADA    100    210819 ENST00000464097.5  585778
##  6 ADA    100    210819 ENST00000464097.5  585782
##  7 ADA    100    210819 ENST00000464097.5  585784
##  8 ADA    100    210819 ENST00000464097.5  585786
##  9 ADA    100    210819 ENST00000464097.5  585788
## 10 ADA    100    210819 ENST00000464097.5  585789
## # … with more rows

mapIds() gets the mapped ids (column) for a set of keys that are of a particular keytype. Usually returned as a named character vector.

mapIds(src, keys, column = "tx_name", keytype)
## Joining, by = "entrez"
##                 ADA                NAT2 
## "ENST00000464097.5" "ENST00000286479.4"
mapIds(src, keys, column = "tx_name", keytype, multiVals="CharacterList")
## Joining, by = "entrez"
## CharacterList of length 2
## [["ADA"]] ENST00000464097.5 ENST00000372874.9 ... ENST00000535573.1
## [["NAT2"]] ENST00000286479.4 ENST00000520116.1

5 Genomic Coordinates Extractor Interfaces

Eleven genomic coordinates extractor methods are available in this package: transcripts(), exons(), cds(), genes(), promoters(), transcriptsBy(), exonsBy(), cdsBy(), intronsByTranscript(), fiveUTRsByTranscript(), threeUTRsByTranscript(). Data can be returned in two versions, for instance tibble (transcripts_tbl()) and GRanges or GRangesList (transcripts()).

Filters can be applied to all extractor functions. The output can be resctricted by an AnnotationFilter, an AnnotationFilterList, or an expression that can be tranlated into an AnnotationFilterList. Valid filters can be retrieved by supportedFilters(src).

supportedFilters(src)
##                filter        field
## 1        AccnumFilter       accnum
## 2         AliasFilter        alias
## 3      CdsChromFilter    cds_chrom
## 45       CdsEndFilter      cds_end
## 43        CdsIdFilter       cds_id
## 4       CdsNameFilter     cds_name
## 44     CdsStartFilter    cds_start
## 5     CdsStrandFilter   cds_strand
## 6       EnsemblFilter      ensembl
## 7   EnsemblprotFilter  ensemblprot
## 8  EnsembltransFilter ensembltrans
## 9        EntrezFilter       entrez
## 10       EnzymeFilter       enzyme
## 11     EvidenceFilter     evidence
## 12  EvidenceallFilter  evidenceall
## 13    ExonChromFilter   exon_chrom
## 48      ExonEndFilter     exon_end
## 46       ExonIdFilter      exon_id
## 14     ExonNameFilter    exon_name
## 49     ExonRankFilter    exon_rank
## 47    ExonStartFilter   exon_start
## 15   ExonStrandFilter  exon_strand
## 17    FlybaseCgFilter   flybase_cg
## 16      FlybaseFilter      flybase
## 18  FlybaseProtFilter flybase_prot
## 55      GRangesFilter      granges
## 19    GeneChromFilter   gene_chrom
## 51      GeneEndFilter     gene_end
## 50    GeneStartFilter   gene_start
## 20   GeneStrandFilter  gene_strand
## 21     GenenameFilter     genename
## 22           GoFilter           go
## 23        GoallFilter        goall
## 24          IpiFilter          ipi
## 25          MapFilter          map
## 26          MgiFilter          mgi
## 27         OmimFilter         omim
## 28     OntologyFilter     ontology
## 29  OntologyallFilter  ontologyall
## 30         PfamFilter         pfam
## 31         PmidFilter         pmid
## 32      PrositeFilter      prosite
## 33       RefseqFilter       refseq
## 34       SymbolFilter       symbol
## 35      TxChromFilter     tx_chrom
## 54        TxEndFilter       tx_end
## 52         TxIdFilter        tx_id
## 36       TxNameFilter      tx_name
## 53      TxStartFilter     tx_start
## 37     TxStrandFilter    tx_strand
## 38       TxTypeFilter      tx_type
## 39      UnigeneFilter      unigene
## 40      UniprotFilter      uniprot
## 41     WormbaseFilter     wormbase
## 42         ZfinFilter         zfin

All filters take two parameters: value and condition, condition could be one of “==”, “!=”, “startsWith”, “endsWith”, “>”, “<”, “>=” and “<=”, default condition is “==”.

EnsemblFilter("ENSG00000196839")
## class: EnsemblFilter 
## condition: == 
## value: ENSG00000196839
SymbolFilter("SNORD", "startsWith")
## class: SymbolFilter 
## condition: startsWith 
## value: SNORD

The following illustrates several ways of inputting filters to an extractor function.

smbl <- SymbolFilter("SNORD", "startsWith")
transcripts_tbl(src, filter=smbl)
## # Source:     lazy query [?? x 7]
## # Database:   sqlite 3.29.0 []
## # Ordered by: tx_id
##   tx_chrom tx_start   tx_end tx_strand  tx_id tx_name           symbol    
##   <chr>       <int>    <int> <chr>      <int> <chr>             <chr>     
## 1 chr15    25051477 25051571 +         155668 ENST00000384335.1 SNORD116-1
## 2 chr15    25054210 25054304 +         155672 ENST00000384274.1 SNORD116-2
## 3 chr15    25056860 25056954 +         155673 ENST00000384287.1 SNORD116-3
## 4 chr15    25059538 25059633 +         155674 ENST00000384733.1 SNORD116-4
## 5 chr15    25062333 25062427 +         155676 ENST00000384462.1 SNORD116-5
## 6 chr15    25065026 25065121 +         155679 ENST00000384711.1 SNORD116-2
## 7 chr15    25067788 25067882 +         155687 ENST00000384404.1 SNORD116-5
## 8 chr15    25070432 25070526 +         155689 ENST00000384365.1 SNORD116-8
## 9 chr15    25073107 25073201 +         155690 ENST00000384000.1 SNORD116-3
filter <- AnnotationFilterList(smbl)
transcripts_tbl(src, filter=filter)
## # Source:     lazy query [?? x 7]
## # Database:   sqlite 3.29.0 []
## # Ordered by: tx_id
##   tx_chrom tx_start   tx_end tx_strand  tx_id tx_name           symbol    
##   <chr>       <int>    <int> <chr>      <int> <chr>             <chr>     
## 1 chr15    25051477 25051571 +         155668 ENST00000384335.1 SNORD116-1
## 2 chr15    25054210 25054304 +         155672 ENST00000384274.1 SNORD116-2
## 3 chr15    25056860 25056954 +         155673 ENST00000384287.1 SNORD116-3
## 4 chr15    25059538 25059633 +         155674 ENST00000384733.1 SNORD116-4
## 5 chr15    25062333 25062427 +         155676 ENST00000384462.1 SNORD116-5
## 6 chr15    25065026 25065121 +         155679 ENST00000384711.1 SNORD116-2
## 7 chr15    25067788 25067882 +         155687 ENST00000384404.1 SNORD116-5
## 8 chr15    25070432 25070526 +         155689 ENST00000384365.1 SNORD116-8
## 9 chr15    25073107 25073201 +         155690 ENST00000384000.1 SNORD116-3
transcripts_tbl(src, filter=~smbl) 
## # Source:     lazy query [?? x 7]
## # Database:   sqlite 3.29.0 []
## # Ordered by: tx_id
##   tx_chrom tx_start   tx_end tx_strand  tx_id tx_name           symbol    
##   <chr>       <int>    <int> <chr>      <int> <chr>             <chr>     
## 1 chr15    25051477 25051571 +         155668 ENST00000384335.1 SNORD116-1
## 2 chr15    25054210 25054304 +         155672 ENST00000384274.1 SNORD116-2
## 3 chr15    25056860 25056954 +         155673 ENST00000384287.1 SNORD116-3
## 4 chr15    25059538 25059633 +         155674 ENST00000384733.1 SNORD116-4
## 5 chr15    25062333 25062427 +         155676 ENST00000384462.1 SNORD116-5
## 6 chr15    25065026 25065121 +         155679 ENST00000384711.1 SNORD116-2
## 7 chr15    25067788 25067882 +         155687 ENST00000384404.1 SNORD116-5
## 8 chr15    25070432 25070526 +         155689 ENST00000384365.1 SNORD116-8
## 9 chr15    25073107 25073201 +         155690 ENST00000384000.1 SNORD116-3

A GRangesFilter() can also be used as a filter for the methods with result displaying as GRanges or GRangesList.

gr <- GRangesFilter(GenomicRanges::GRanges("chr15:25062333-25065121"))
transcripts(src, filter=~smbl & gr)
## GRanges object with 2 ranges and 3 metadata columns:
##       seqnames            ranges strand |     tx_id           tx_name
##          <Rle>         <IRanges>  <Rle> | <integer>       <character>
##   [1]    chr15 25062333-25062427      + |    155676 ENST00000384462.1
##   [2]    chr15 25065026-25065121      + |    155679 ENST00000384711.1
##            symbol
##       <character>
##   [1]  SNORD116-5
##   [2]  SNORD116-2
##   -------
##   seqinfo: 595 sequences (1 circular) from hg38 genome

Filters in extractor functions support &, |, !(negation), and ()(grouping). Transcript coordinates of gene symbol equal to “ADA” and transcript start position less than 44619810.

transcripts_tbl(src, filter = AnnotationFilterList(
    SymbolFilter("ADA"),
    TxStartFilter(44619810,"<"),
    logicOp="&")
)
## # Source:     lazy query [?? x 7]
## # Database:   sqlite 3.29.0 []
## # Ordered by: tx_id
##   tx_chrom tx_start   tx_end tx_strand  tx_id tx_name           symbol
##   <chr>       <int>    <int> <chr>      <int> <chr>             <chr> 
## 1 chr20    44619522 44626491 -         210819 ENST00000464097.5 ADA   
## 2 chr20    44619522 44651699 -         210820 ENST00000372874.9 ADA   
## 3 chr20    44619579 44651681 -         210821 ENST00000536532.5 ADA
## Equivalent to
transcripts_tbl(src, filter = ~symbol == "ADA" & tx_start < 44619810)
## # Source:     lazy query [?? x 7]
## # Database:   sqlite 3.29.0 []
## # Ordered by: tx_id
##   tx_chrom tx_start   tx_end tx_strand  tx_id tx_name           symbol
##   <chr>       <int>    <int> <chr>      <int> <chr>             <chr> 
## 1 chr20    44619522 44626491 -         210819 ENST00000464097.5 ADA   
## 2 chr20    44619522 44651699 -         210820 ENST00000372874.9 ADA   
## 3 chr20    44619579 44651681 -         210821 ENST00000536532.5 ADA

Transcripts coordinates of gene symbol equal to “ADA” or transcript end position equal to 243843236.

txend <- TxEndFilter(243843236, '==')
transcripts_tbl(src, filter = ~symbol == "ADA" | txend)
## # Source:     lazy query [?? x 7]
## # Database:   sqlite 3.29.0 []
## # Ordered by: tx_id
##    tx_chrom  tx_start    tx_end tx_strand  tx_id tx_name           symbol
##    <chr>        <int>     <int> <chr>      <int> <chr>             <chr> 
##  1 chr1     243488233 243843236 -          19678 ENST00000336199.9 AKT3  
##  2 chr20     44619522  44626491 -         210819 ENST00000464097.5 ADA   
##  3 chr20     44619522  44651699 -         210820 ENST00000372874.9 ADA   
##  4 chr20     44619579  44651681 -         210821 ENST00000536532.5 ADA   
##  5 chr20     44619810  44651691 -         210822 ENST00000492931.5 ADA   
##  6 chr20     44619810  44651691 -         210823 ENST00000537820.1 ADA   
##  7 chr20     44619810  44651691 -         210824 ENST00000539235.5 ADA   
##  8 chr20     44626323  44651661 -         210825 ENST00000545776.5 ADA   
##  9 chr20     44626517  44652114 -         210826 ENST00000536076.1 ADA   
## 10 chr20     44636071  44652233 -         210827 ENST00000535573.1 ADA

Using negation to find transcript coordinates of gene symbol equal to “ADA” and transcript start positions NOT less than 44619810.

transcripts_tbl(src, filter = ~symbol == "ADA" & !tx_start < 44618910)
## # Source:     lazy query [?? x 7]
## # Database:   sqlite 3.29.0 []
## # Ordered by: tx_id
##   tx_chrom tx_start   tx_end tx_strand  tx_id tx_name           symbol
##   <chr>       <int>    <int> <chr>      <int> <chr>             <chr> 
## 1 chr20    44619522 44626491 -         210819 ENST00000464097.5 ADA   
## 2 chr20    44619522 44651699 -         210820 ENST00000372874.9 ADA   
## 3 chr20    44619579 44651681 -         210821 ENST00000536532.5 ADA   
## 4 chr20    44619810 44651691 -         210822 ENST00000492931.5 ADA   
## 5 chr20    44619810 44651691 -         210823 ENST00000537820.1 ADA   
## 6 chr20    44619810 44651691 -         210824 ENST00000539235.5 ADA   
## 7 chr20    44626323 44651661 -         210825 ENST00000545776.5 ADA   
## 8 chr20    44626517 44652114 -         210826 ENST00000536076.1 ADA   
## 9 chr20    44636071 44652233 -         210827 ENST00000535573.1 ADA

Using grouping to find transcript coordinates of a long filter statement.

transcripts_tbl(src,
    filter = ~(symbol == 'ADA' & !(tx_start >= 44619810 | tx_end < 44651742)) | 
              (smbl & !tx_end > 25056954)
)
## # Source:     lazy query [?? x 7]
## # Database:   sqlite 3.29.0 []
## # Ordered by: tx_id
##   tx_chrom tx_start   tx_end tx_strand  tx_id tx_name           symbol    
##   <chr>       <int>    <int> <chr>      <int> <chr>             <chr>     
## 1 chr15    25051477 25051571 +         155668 ENST00000384335.1 SNORD116-1
## 2 chr15    25054210 25054304 +         155672 ENST00000384274.1 SNORD116-2
## 3 chr15    25056860 25056954 +         155673 ENST00000384287.1 SNORD116-3
sessionInfo()
## R version 3.6.1 (2019-07-05)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.3 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.10-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.10-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] parallel  stats4    stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] ggplot2_3.2.1           GenomicRanges_1.38.0    GenomeInfoDb_1.22.0    
##  [4] IRanges_2.20.0          S4Vectors_0.24.0        BiocGenerics_0.32.0    
##  [7] Organism.dplyr_1.14.0   AnnotationFilter_1.10.0 dplyr_0.8.3            
## [10] BiocStyle_2.14.0       
## 
## loaded via a namespace (and not attached):
##  [1] Biobase_2.46.0              httr_1.4.1                 
##  [3] bit64_0.9-7                 assertthat_0.2.1           
##  [5] askpass_1.1                 BiocManager_1.30.9         
##  [7] BiocFileCache_1.10.0        blob_1.2.0                 
##  [9] GenomeInfoDbData_1.2.2      Rsamtools_2.2.0            
## [11] yaml_2.2.0                  progress_1.2.2             
## [13] pillar_1.4.2                RSQLite_2.1.2              
## [15] backports_1.1.5             lattice_0.20-38            
## [17] glue_1.3.1                  digest_0.6.22              
## [19] XVector_0.26.0              colorspace_1.4-1           
## [21] htmltools_0.4.0             Matrix_1.2-17              
## [23] XML_3.98-1.20               pkgconfig_2.0.3            
## [25] biomaRt_2.42.0              bookdown_0.14              
## [27] zlibbioc_1.32.0             purrr_0.3.3                
## [29] scales_1.0.0                BiocParallel_1.20.0        
## [31] tibble_2.1.3                openssl_1.4.1              
## [33] withr_2.1.2                 SummarizedExperiment_1.16.0
## [35] GenomicFeatures_1.38.0      lazyeval_0.2.2             
## [37] cli_1.1.0                   magrittr_1.5               
## [39] crayon_1.3.4                memoise_1.1.0              
## [41] evaluate_0.14               fansi_0.4.0                
## [43] tools_3.6.1                 prettyunits_1.0.2          
## [45] hms_0.5.1                   matrixStats_0.55.0         
## [47] stringr_1.4.0               munsell_0.5.0              
## [49] DelayedArray_0.12.0         AnnotationDbi_1.48.0       
## [51] Biostrings_2.54.0           compiler_3.6.1             
## [53] rlang_0.4.1                 grid_3.6.1                 
## [55] RCurl_1.95-4.12             rappdirs_0.3.1             
## [57] labeling_0.3                bitops_1.0-6               
## [59] rmarkdown_1.16              gtable_0.3.0               
## [61] DBI_1.0.0                   curl_4.2                   
## [63] R6_2.4.0                    GenomicAlignments_1.22.0   
## [65] knitr_1.25                  rtracklayer_1.46.0         
## [67] utf8_1.1.4                  bit_1.1-14                 
## [69] zeallot_0.1.0               stringi_1.4.3              
## [71] Rcpp_1.0.2                  vctrs_0.2.0                
## [73] dbplyr_1.4.2                tidyselect_0.2.5           
## [75] xfun_0.10