Gene Summaries from RefSeq Database

Zuguang Gu (z.gu@dkfz.de)

2022-10-05

This package provides long description of genes collected from the RefSeq database. The text in “COMMENT” section started with “Summary:” is extracted as the description of the gene, e.g. in the following example:

LOCUS       NM_012363                936 bp    mRNA    linear   PRI 12-FEB-2021
DEFINITION  Homo sapiens olfactory receptor family 1 subfamily N member 1
            (OR1N1), mRNA.
ACCESSION   NM_012363 XM_071152
VERSION     NM_012363.1
KEYWORDS    RefSeq; MANE Select.
SOURCE      Homo sapiens (human)
  ORGANISM  Homo sapiens
            Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
            Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;
            Catarrhini; Hominidae; Homo.
REFERENCE   1  (bases 1 to 936)
  AUTHORS   Malnic B, Godfrey PA and Buck LB.
  TITLE     The human olfactory receptor gene family
  JOURNAL   Proc Natl Acad Sci U S A 101 (8), 2584-2589 (2004)
   PUBMED   14983052
  REMARK    Erratum:[Proc Natl Acad Sci U S A. 2004 May 4;101(18):7205]
REFERENCE   2  (bases 1 to 936)
  AUTHORS   Fuchs T, Malecova B, Linhart C, Sharan R, Khen M, Herwig R,
            Shmulevich D, Elkon R, Steinfath M, O'Brien JK, Radelof U, Lehrach
            H, Lancet D and Shamir R.
  TITLE     DEFOG: a practical scheme for deciphering families of genes
  JOURNAL   Genomics 80 (3), 295-302 (2002)
   PUBMED   12213199
REFERENCE   3  (bases 1 to 936)
  AUTHORS   Rouquier S, Taviaux S, Trask BJ, Brand-Arpon V, van den Engh G,
            Demaille J and Giorgi D.
  TITLE     Distribution of olfactory receptor genes in the human genome
  JOURNAL   Nat Genet 18 (3), 243-250 (1998)
   PUBMED   9500546
  REMARK    Erratum:[Nat Genet 1998 May;19(1):102]
COMMENT     REVIEWED REFSEQ: This record has been curated by NCBI staff. The
            reference sequence was derived from AL359636.17.
            On Apr 5, 2004 this sequence version replaced XM_071152.1.
            
            Summary: Olfactory receptors interact with odorant molecules in the
            nose, to initiate a neuronal response that triggers the perception
            of a smell. The olfactory receptor proteins are members of a large
            family of G-protein-coupled receptors (GPCR) arising from single
            coding-exon genes. Olfactory receptors share a 7-transmembrane
            domain structure with many neurotransmitter and hormone receptors
            and are responsible for the recognition and G protein-mediated
            transduction of odorant signals. The olfactory receptor gene family
            is the largest in the genome. The nomenclature assigned to the
            olfactory receptor genes and proteins for this organism is
            independent of other organisms. [provided by RefSeq, Jul 2008].
            
            ##RefSeq-Attributes-START##
            MANE Ensembl match     :: ENST00000304880.2/ ENSP00000306974.2
            RefSeq Select criteria :: based on single protein-coding transcript
            ##RefSeq-Attributes-END##

Function loadGeneSummary() extracts the gene summary table. Specifying the organism argument with the full name or the corresponding taxon ID returns a table of genes and their summaries:

library(GeneSummary)
## Gene summaries were retrieved from RefSeq database release 214 (Sep 30, 2022).
tb = loadGeneSummary(organism = 9606)
# # or use the full organism name
# tb = loadGeneSummary(organism = "Homo sapiens")
dim(tb)
## [1] 53545     6
head(tb)
##   RefSeq_accession     Organism Taxon_ID   Gene_ID      Review_status
## 1      NR_039609.1 Homo sapiens     9606 100616498 PROVISIONAL REFSEQ
## 2      NR_030183.1 Homo sapiens     9606    574461 PROVISIONAL REFSEQ
## 3      NR_039939.1 Homo sapiens     9606 100616159 PROVISIONAL REFSEQ
## 4      NR_107042.1 Homo sapiens     9606 102465874 PROVISIONAL REFSEQ
## 5      NR_030222.1 Homo sapiens     9606    574500 PROVISIONAL REFSEQ
## 6      NR_030188.1 Homo sapiens     9606    574466 PROVISIONAL REFSEQ
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Gene_summary
## 1 microRNAs (miRNAs) are short (20-24 nt) non-coding RNAs that are involved in post-transcriptional regulation of gene expression in multicellular organisms by affecting both the stability and translation of mRNAs. miRNAs are transcribed by RNA polymerase II as part of capped and polyadenylated primary transcripts (pri-miRNAs) that can be either protein-coding or non-coding. The primary transcript is cleaved by the Drosha ribonuclease III enzyme to produce an approximately 70-nt stem-loop precursor miRNA (pre-miRNA), which is further cleaved by the cytoplasmic Dicer ribonuclease to generate the mature miRNA and antisense miRNA star (miRNA*) products. The mature miRNA is incorporated into a RNA-induced silencing complex (RISC), which recognizes target mRNAs through imperfect base pairing with the miRNA and most commonly results in translational inhibition or destabilization of the target mRNA. The RefSeq represents the predicted microRNA stem-loop.
## 2 microRNAs (miRNAs) are short (20-24 nt) non-coding RNAs that are involved in post-transcriptional regulation of gene expression in multicellular organisms by affecting both the stability and translation of mRNAs. miRNAs are transcribed by RNA polymerase II as part of capped and polyadenylated primary transcripts (pri-miRNAs) that can be either protein-coding or non-coding. The primary transcript is cleaved by the Drosha ribonuclease III enzyme to produce an approximately 70-nt stem-loop precursor miRNA (pre-miRNA), which is further cleaved by the cytoplasmic Dicer ribonuclease to generate the mature miRNA and antisense miRNA star (miRNA*) products. The mature miRNA is incorporated into a RNA-induced silencing complex (RISC), which recognizes target mRNAs through imperfect base pairing with the miRNA and most commonly results in translational inhibition or destabilization of the target mRNA. The RefSeq represents the predicted microRNA stem-loop.
## 3 microRNAs (miRNAs) are short (20-24 nt) non-coding RNAs that are involved in post-transcriptional regulation of gene expression in multicellular organisms by affecting both the stability and translation of mRNAs. miRNAs are transcribed by RNA polymerase II as part of capped and polyadenylated primary transcripts (pri-miRNAs) that can be either protein-coding or non-coding. The primary transcript is cleaved by the Drosha ribonuclease III enzyme to produce an approximately 70-nt stem-loop precursor miRNA (pre-miRNA), which is further cleaved by the cytoplasmic Dicer ribonuclease to generate the mature miRNA and antisense miRNA star (miRNA*) products. The mature miRNA is incorporated into a RNA-induced silencing complex (RISC), which recognizes target mRNAs through imperfect base pairing with the miRNA and most commonly results in translational inhibition or destabilization of the target mRNA. The RefSeq represents the predicted microRNA stem-loop.
## 4 microRNAs (miRNAs) are short (20-24 nt) non-coding RNAs that are involved in post-transcriptional regulation of gene expression in multicellular organisms by affecting both the stability and translation of mRNAs. miRNAs are transcribed by RNA polymerase II as part of capped and polyadenylated primary transcripts (pri-miRNAs) that can be either protein-coding or non-coding. The primary transcript is cleaved by the Drosha ribonuclease III enzyme to produce an approximately 70-nt stem-loop precursor miRNA (pre-miRNA), which is further cleaved by the cytoplasmic Dicer ribonuclease to generate the mature miRNA and antisense miRNA star (miRNA*) products. The mature miRNA is incorporated into a RNA-induced silencing complex (RISC), which recognizes target mRNAs through imperfect base pairing with the miRNA and most commonly results in translational inhibition or destabilization of the target mRNA. The RefSeq represents the predicted microRNA stem-loop.
## 5 microRNAs (miRNAs) are short (20-24 nt) non-coding RNAs that are involved in post-transcriptional regulation of gene expression in multicellular organisms by affecting both the stability and translation of mRNAs. miRNAs are transcribed by RNA polymerase II as part of capped and polyadenylated primary transcripts (pri-miRNAs) that can be either protein-coding or non-coding. The primary transcript is cleaved by the Drosha ribonuclease III enzyme to produce an approximately 70-nt stem-loop precursor miRNA (pre-miRNA), which is further cleaved by the cytoplasmic Dicer ribonuclease to generate the mature miRNA and antisense miRNA star (miRNA*) products. The mature miRNA is incorporated into a RNA-induced silencing complex (RISC), which recognizes target mRNAs through imperfect base pairing with the miRNA and most commonly results in translational inhibition or destabilization of the target mRNA. The RefSeq represents the predicted microRNA stem-loop.
## 6 microRNAs (miRNAs) are short (20-24 nt) non-coding RNAs that are involved in post-transcriptional regulation of gene expression in multicellular organisms by affecting both the stability and translation of mRNAs. miRNAs are transcribed by RNA polymerase II as part of capped and polyadenylated primary transcripts (pri-miRNAs) that can be either protein-coding or non-coding. The primary transcript is cleaved by the Drosha ribonuclease III enzyme to produce an approximately 70-nt stem-loop precursor miRNA (pre-miRNA), which is further cleaved by the cytoplasmic Dicer ribonuclease to generate the mature miRNA and antisense miRNA star (miRNA*) products. The mature miRNA is incorporated into a RNA-induced silencing complex (RISC), which recognizes target mRNAs through imperfect base pairing with the miRNA and most commonly results in translational inhibition or destabilization of the target mRNA. The RefSeq represents the predicted microRNA stem-loop.

Setting organism to NULL returns a table of all organisms.

tb = loadGeneSummary(organism = NULL)
sort(table(tb$Organism))
## 
##                       Aedes aegypti                     Aotus nancymaae 
##                                   1                                   1 
##                 Aplysia californica                   Bison bison bison 
##                                   1                                   1 
##                 Callorhinchus milii                   Macaca nemestrina 
##                                   1                                   1 
##              Mandrillus leucophaeus             Rhinopithecus roxellana 
##                                   1                                   1 
##                  Anas platyrhynchos                     Cercocebus atys 
##                                   2                                   2 
##                      Chelonia mydas        Colobus angolensis palliatus 
##                                   2                                   2 
##                   Crassostrea gigas                     Geospiza fortis 
##                                   2                                   2 
##                 Latimeria chalumnae                  Loxodonta africana 
##                                   2                                   2 
##             Melopsittacus undulatus                  Nannospalax galili 
##                                   2                                   2 
##                   Python bivittatus                  Alligator sinensis 
##                                   2                                   3 
##            Amphimedon queenslandica                 Chlorocebus sabaeus 
##                                   3                                   3 
##                       Columba livia                       Falco cherrug 
##                                   3                                   3 
##                    Falco peregrinus                 Oncorhynchus mykiss 
##                                   3                                   3 
##               Orycteropus afer afer                 Pelodiscus sinensis 
##                                   3                                   3 
##                         Salmo salar              Zonotrichia albicollis 
##                                   3                                   3 
##          Alligator mississippiensis                           Bos mutus 
##                                   4                                   4 
##                 Ficedula albicollis                 Meleagris gallopavo 
##                                   4                                   4 
##                     Myotis brandtii                      Myotis davidii 
##                                   4                                   4 
##               Pseudopodoces humilis              Ailuropoda melanoleuca 
##                                   4                                   5 
##                  Astyanax mexicanus Balaenoptera acutorostrata scammoni 
##                                   5                                   5 
##                       Camelus ferus               Elephantulus edwardii 
##                                   5                                   5 
##                     Panthera tigris                    Poecilia formosa 
##                                   5                                   5 
##                     Chrysemys picta               Heterocephalus glaber 
##                                   6                                   6 
##                  Otolemur garnettii                    Physeter catodon 
##                                   6                                   6 
##                 Saimiri boliviensis                       Sorex araneus 
##                                   6                                   6 
##                     Cavia porcellus                 Chinchilla lanigera 
##                                   7                                   7 
##                Dasypus novemcinctus             Leptonychotes weddellii 
##                                   7                                   7 
##                    Myotis lucifugus                       Octodon degus 
##                                   7                                   7 
##                  Tursiops truncatus           Ceratotherium simum simum 
##                                   7                                   8 
##                  Condylura cristata                   Echinops telfairi 
##                                   8                                   8 
##                 Erinaceus europaeus                     Jaculus jaculus 
##                                   8                                   8 
##                Mesocricetus auratus               Mustela putorius furo 
##                                   8                                   8 
##                   Ochotona princeps                     Pteropus alecto 
##                                   8                                   8 
##                       Vicugna pacos              Chrysochloris asiatica 
##                                   8                                   9 
##                         Felis catus          Ictidomys tridecemlineatus 
##                                   9                                   9 
##                  Lipotes vexillifer         Odobenus rosmarus divergens 
##                                   9                                   9 
##                        Orcinus orca      Trichechus manatus latirostris 
##                                   9                                   9 
##                      Hydra vulgaris                Microtus ochrogaster 
##                                  10                                  10 
##                        Papio anubis                     Bubalus bubalis 
##                                  10                                  11 
##                 Macaca fascicularis                 Nomascus leucogenys 
##                                  11                                  11 
##      Peromyscus maniculatus bairdii                        Pongo abelii 
##                                  11                                  14 
##                  Callithrix jacchus       Strongylocentrotus purpuratus 
##                                  15                                  64 
##                Sarcophilus harrisii                      Xenopus laevis 
##                                  65                                  84 
##                       Brassica rapa            Saccoglossus kowalevskii 
##                                  89                                  90 
##                        Cucumis melo                          Ovis aries 
##                                 104                                 115 
##                 Acyrthosiphon pisum                     Malus domestica 
##                                 125                                 130 
##                   Takifugu rubripes                     Citrus sinensis 
##                                 140                                 146 
##                Solanum lycopersicum                      Vitis vinifera 
##                                 152                                 156 
##                     Oryzias latipes                            Zea mays 
##                                 161                                 166 
##                        Pan paniscus                    Tupaia chinensis 
##                                 179                                 184 
##                   Solanum tuberosum                  Cricetulus griseus 
##                                 215                                 236 
##                  Xenopus tropicalis                 Taeniopygia guttata 
##                                 244                                 248 
##                      Apis mellifera                        Capra hircus 
##                                 254                                 277 
##                 Anolis carolinensis             Brachypodium distachyon 
##                                 293                                 312 
##               Oryctolagus cuniculus                  Ciona intestinalis 
##                                 319                                 331 
##                 Nasonia vitripennis                 Tribolium castaneum 
##                                 332                                 333 
##                     Gorilla gorilla            Ornithorhynchus anatinus 
##                                 374                                 396 
##                          Sus scrofa                         Bombyx mori 
##                                 402                                 423 
##                         Danio rerio                    Eptesicus fuscus 
##                                 440                                 494 
##                         Glycine max                      Macaca mulatta 
##                                 670                                 677 
##                     Pan troglodytes               Monodelphis domestica 
##                                 680                                 685 
##                       Gallus gallus              Canis lupus familiaris 
##                                 966                                1085 
##                      Equus caballus                          Bos taurus 
##                                1463                                1966 
##                   Rattus norvegicus                        Mus musculus 
##                                2059                                6254 
##                        Homo sapiens 
##                               53545
sort(table(tb$Review_status))
## 
##   PREDICTED REFSEQ    INFERRED REFSEQ   VALIDATED REFSEQ PROVISIONAL REFSEQ 
##                  9               2351               6550              17462 
##    REVIEWED REFSEQ 
##              52208

A specific status can be set via argument status, e.g. only to "reviewed":

tb = loadGeneSummary(organism = NULL, status = "reviewed")
sort(table(tb$Review_status))
## REVIEWED REFSEQ 
##           52208

Version of the data:

GeneSummary
## RefSeq gene summaries
##   RefSeq release: 214 
##   Source: https://ftp.ncbi.nih.gov/refseq/release/complete/*.rna.gbff.gz 
##   Number of organisms: 129 
##   Built date:  2022-09-30
sessionInfo()
## R version 4.2.1 (2022-06-23)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.5 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.16-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.16-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] GeneSummary_0.99.4
## 
## loaded via a namespace (and not attached):
##  [1] digest_0.6.29   R6_2.5.1        jsonlite_1.8.2  magrittr_2.0.3 
##  [5] evaluate_0.16   stringi_1.7.8   rlang_1.0.6     cachem_1.0.6   
##  [9] cli_3.4.1       jquerylib_0.1.4 bslib_0.4.0     rmarkdown_2.16 
## [13] tools_4.2.1     stringr_1.4.1   xfun_0.33       yaml_2.3.5     
## [17] fastmap_1.1.0   compiler_4.2.1  htmltools_0.5.3 knitr_1.40     
## [21] sass_0.4.2