R version: R Under development (unstable) (2025-03-13 r87965)
Bioconductor version: 3.21
Package version: 1.31.1
Annotation resources make up a significant proportion of the Bioconductor project[1]. And there are also a diverse set of online resources available which are accessed using specific packages. This walkthrough will describe the most popular of these resources and give some high level examples on how to use them.
Bioconductor annotation resources have traditionally been used near the end of an analysis. After the bulk of the data analysis, annotations would be used interpretatively to learn about the most significant results. But increasingly, they are also used as a starting point or even as an intermediate step to help guide a study that is still in progress. In addition to this, what it means for something to be an annotation is also becoming less clear than it once was. It used to be clear that annotations were only those things that had been established after multiple different studies had been performed (such as the primary role of a gene product). But today many large data sets are treated by communities in much the same way that classic annotations once were: as a reference for additional comparisons.
Another change that is underway with annotations in Bioconductor is in the way that they are obtained. In the past annotations existed almost exclusively as separate annotation packages[2,3,4]. Today packages are still an enormous source of annotations. The current release repository contains over eight hundred annotation packages. This table summarizes some of the more important classes of annotation objects that are often accessed using packages:
Object Type | Example Package Name | Contents |
---|---|---|
TxDb |
TxDb.Hsapiens.UCSC.hg19.knownGene
|
Transcriptome ranges for the known gene track of Homo sapiens, e.g., introns, exons, UTR regions. |
OrgDb |
org.Hs.eg.db
|
Gene-based information for Homo sapiens; useful for mapping between gene IDs, Names, Symbols, GO and KEGG identifiers, etc. |
BSgenome |
BSgenome.Hsapiens.UCSC.hg19
|
Full genome sequence for Homo sapiens. |
Organism.dplyr |
src_organism
|
Collection of multiple annotations for a common organism and genome build. |
AnnotationHub |
AnnotationHub
|
Provides a convenient interface to annotations from many different sources; objects are returned as fully parsed Bioconductor data objects or as the name of a file on disk. |
But in spite of the popularity of annotation packages, annotations are increasingly also being pulled down from web services like biomaRt[5,6,7] or from the AnnotationHub[8]. And both of these represent enormous resources for annotation data.
In part because of the rapidly evolving landscape, it is currently impossible in a single document to cover every possible annotation or even every kind of annotation present in Bioconductor. So here we will instead go over the most popular annotation resources and describe them in a way intended to expose common patterns used for accessing them. The hope is that a user with this information will be able to make educated guesses about how to find and use additional resources that will inevitably be added later. Topics that will be covered will include the following:
In this chapter we make use of several Bioconductor packages. You can install
them with BiocManager::install()
:
if (!"BiocManager" %in% rownames(installed.packages()))
install.packages("BiocManager")
BiocManager::install(c("AnnotationHub", "Homo.sapiens",
"Organism.dplyr",
"TxDb.Hsapiens.UCSC.hg19.knownGene",
"TxDb.Hsapiens.UCSC.hg38.knownGene",
"BSgenome.Hsapiens.UCSC.hg19", "biomaRt",
"TxDb.Athaliana.BioMart.plantsmart22"))
The usage of the installed packages will be described in detail within the Usage section.
The top of the list for learning about annotation resources is the relatively new AnnotationHub package[8]. The AnnotationHub was created to provide a convenient access point for end users to find a large range of different annotation objects for use with Bioconductor. Resources found in the AnnotationHub are easy to discover and are presented to the user as familiar Bioconductor data objects. Because it is a recent addition, the AnnotationHub allows access to a broad range of annotation like objects, some of which may not have been considered annotations even a few years ago. To get started with the AnnotationHub users only need to load the package and then create a local AnnotationHub object like this:
ah <- AnnotationHub()
The very 1st time that you call the AnnotationHub, it will create a cache directory on your system and download the latest metadata for the hubs current contents. From that time forward, whenever you download one of the hubs data objects, it will also cache those files in the local directory so that if you request the information again, you will be able to access it quickly.
The show method of an AnnotationHub object will tell you how many resources are currently accessible using that object as well as give a high level overview of the most common kinds of data present.
ah
## AnnotationHub with 70637 records
## # snapshotDate(): 2025-04-08
## # $dataprovider: Ensembl, BroadInstitute, UCSC, ftp://ftp.ncbi.nlm.nih.gov/g...
## # $species: Homo sapiens, Mus musculus, Drosophila melanogaster, Rattus norv...
## # $rdataclass: GRanges, TwoBitFile, BigWigFile, EnsDb, Rle, OrgDb, ChainFile...
## # additional mcols(): taxonomyid, genome, description,
## # coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
## # rdatapath, sourceurl, sourcetype
## # retrieve records with, e.g., 'object[["AH5012"]]'
##
## title
## AH5012 | Chromosome Band
## AH5013 | STS Markers
## AH5014 | FISH Clones
## AH5015 | Recomb Rate
## AH5016 | ENCODE Pilot
## ... ...
## AH121712 | Data.table for PubMed Author Information
## AH121713 | Data.table for PMC
## AH121714 | Data.table for MeSH (Descriptor)
## AH121715 | Data.table for MeSH (Qualifier)
## AH121716 | Data.table for MeSH (SCR)
As you can see from the object above, there are a LOT of different resources available. So normally when you get an AnnotationHub object the 1st thing you want to do is to filter it to remove unwanted resources.
Fortunately, the AnnotationHub has several different kinds of metadata that you can use for searching and subsetting. To see the different categories all you need to do is to type the name of your AnnotationHub object and then tab complete from the ‘$’ operator. And to see all possible contents of one of these categories you can pass that value in to unique like this:
unique(ah$dataprovider)
## [1] "UCSC"
## [2] "Ensembl"
## [3] "RefNet"
## [4] "Inparanoid8"
## [5] "NHLBI"
## [6] "ChEA"
## [7] "Pazar"
## [8] "NIH Pathway Interaction Database"
## [9] "Haemcode"
## [10] "BroadInstitute"
## [11] "PRIDE"
## [12] "Gencode"
## [13] "CRIBI"
## [14] "Genoscope"
## [15] "MISO, VAST-TOOLS, UCSC"
## [16] "Stanford"
## [17] "dbSNP"
## [18] "BioMart"
## [19] "GeneOntology"
## [20] "KEGG"
## [21] "URGI"
## [22] "EMBL-EBI"
## [23] "MicrosporidiaDB"
## [24] "FungiDB"
## [25] "TriTrypDB"
## [26] "ToxoDB"
## [27] "AmoebaDB"
## [28] "PlasmoDB"
## [29] "PiroplasmaDB"
## [30] "CryptoDB"
## [31] "TrichDB"
## [32] "GiardiaDB"
## [33] "The Gene Ontology Consortium"
## [34] "ENCODE Project"
## [35] "SchistoDB"
## [36] "NCBI/UniProt"
## [37] "GENCODE"
## [38] "http://www.pantherdb.org"
## [39] "RMBase v2.0"
## [40] "snoRNAdb"
## [41] "tRNAdb"
## [42] "NCBI"
## [43] "DrugAge, DrugBank, Broad Institute"
## [44] "DrugAge"
## [45] "DrugBank"
## [46] "Broad Institute"
## [47] "HMDB, EMBL-EBI, EPA"
## [48] "STRING"
## [49] "OMA"
## [50] "OrthoDB"
## [51] "PathBank"
## [52] "EBI/EMBL"
## [53] "WikiPathways"
## [54] "VAST-TOOLS"
## [55] "pyGenomeTracks "
## [56] "NA"
## [57] "UoE"
## [58] "TargetScan,miRTarBase,USCS,ENSEMBL"
## [59] "TargetScan"
## [60] "QuickGO"
## [61] "CIS-BP"
## [62] "CTCFBSDB 2.0"
## [63] "HOCOMOCO v11"
## [64] "JASPAR 2022"
## [65] "Jolma 2013"
## [66] "SwissRegulon"
## [67] "ENCODE SCREEN v3"
## [68] "MassBank"
## [69] "excluderanges"
## [70] "ENCODE"
## [71] "GitHub"
## [72] "Stanford.edu"
## [73] "Publication"
## [74] "CHM13"
## [75] "UCSChub"
## [76] "Google DeepMind"
## [77] "UWashington"
## [78] "Bioconductor"
## [79] "ENCODE cCREs"
## [80] "The Human Phenotype Ontology"
## [81] "MGI"
## [82] "NCBI, WormBase Parasite"
## [83] "GreyListChIP"
## [84] "ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/"
## [85] "FANTOM5,DLRP,IUPHAR,HPRD,STRING,SWISSPROT,TREMBL,ENSEMBL,CELLPHONEDB,BADERLAB,SINGLECELLSIGNALR,HOMOLOGENE"
## [86] "NCBI,DBCLS"
One of the most valuable ways in which the data is labeled is according to the kind of R object that will be returned to you.
unique(ah$rdataclass)
## [1] "GRanges" "data.frame"
## [3] "Inparanoid8Db" "TwoBitFile"
## [5] "ChainFile" "SQLiteConnection"
## [7] "biopax" "BigWigFile"
## [9] "AAStringSet" "MSnSet"
## [11] "mzRident" "list"
## [13] "TxDb" "Rle"
## [15] "EnsDb" "VcfFile"
## [17] "igraph" "data.frame, DNAStringSet, GRanges"
## [19] "sqlite" "data.table"
## [21] "character" "SQLite"
## [23] "SQLiteFile" "Tibble"
## [25] "Rda" "FaFile"
## [27] "String" "CompDb"
## [29] "OrgDb"
Once you have identified which sorts of metadata you would like to use to find your data of interest, you can then use the subset or query methods to reduce the size of the hub object to something more manageable. For example you could select only those records where the string ‘GRanges’ was in the metadata. As you can see GRanges are one of the more popular formats for data that comes from the AnnotationHub.
grs <- query(ah, "GRanges")
grs
## AnnotationHub with 30548 records
## # snapshotDate(): 2025-04-08
## # $dataprovider: Ensembl, BroadInstitute, UCSC, Haemcode, FungiDB, Pazar, Tr...
## # $species: Homo sapiens, Mus musculus, Bos taurus, Pan troglodytes, Danio r...
## # $rdataclass: GRanges, data.frame, DNAStringSet, GRanges
## # additional mcols(): taxonomyid, genome, description,
## # coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
## # rdatapath, sourceurl, sourcetype
## # retrieve records with, e.g., 'object[["AH5012"]]'
##
## title
## AH5012 | Chromosome Band
## AH5013 | STS Markers
## AH5014 | FISH Clones
## AH5015 | Recomb Rate
## AH5016 | ENCODE Pilot
## ... ...
## AH119513 | T2T.GreyListChIP.STAR_101bp_1000merge
## AH119514 | mm10.GreyListChIP.STAR_36bp_1000merge
## AH119515 | mm10.GreyListChIP.STAR_50bp_1000merge
## AH119516 | mm39.GreyListChIP.STAR_36bp_1000merge
## AH119517 | mm39.GreyListChIP.STAR_50bp_1000merge
Or you can use subsetting to only select for matches on a specific field
grs <- ah[ah$rdataclass == "GRanges",]
The subset function is also provided.
orgs <- subset(ah, ah$rdataclass == "OrgDb")
orgs
## AnnotationHub with 1976 records
## # snapshotDate(): 2025-04-08
## # $dataprovider: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/, NCBI, WormBase Parasite
## # $species: Escherichia coli, greater Indian_fruit_bat, Zophobas morio, Zoph...
## # $rdataclass: OrgDb
## # additional mcols(): taxonomyid, genome, description,
## # coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
## # rdatapath, sourceurl, sourcetype
## # retrieve records with, e.g., 'object[["AH119509"]]'
##
## title
## AH119509 | org.Hbacteriophora.eg.db
## AH119520 | org.Ag.eg.db.sqlite
## AH119521 | org.At.tair.db.sqlite
## AH119522 | org.Bt.eg.db.sqlite
## AH119523 | org.Cf.eg.db.sqlite
## ... ...
## AH121492 | org.Drosophila_subpulchrella.eg.sqlite
## AH121493 | org.Chlamydomonas_reinhardtii.eg.sqlite
## AH121494 | org.Chlamydomonas_smithii.eg.sqlite
## AH121495 | org.Puccinia_striiformis_f._sp._tritici.eg.sqlite
## AH121496 | org.Colius_striatus.eg.sqlite
And if you really need access to all the metadata you can extract it as a DataFrame using mcols() like so:
meta <- mcols(ah)
Also if you are a fan of GUI’s you can use the display method to look at your data in a browser and return selected rows back as a smaller AnnotationHub object like this:
sah <- display(ah)
Calling this method will produce a web based interface like the one pictured here:
Once you have the AnnotationHub object pared down to a reasonable size, and are sure about which records you want to retrieve, then you only need to use the ‘[[’ operator to extract them. Using the ‘[[’ operator, you can extract by numeric index (1,2,3) or by AnnotationHub ID. If you choose to use the former, you simply extract the element that you are interested in. So for our chain example, you might just want to 1st one like this:
res <- grs[[1]]
## loading from cache
head(res, n=3)
## UCSC track 'cytoBand'
## UCSCData object with 3 ranges and 1 metadata column:
## seqnames ranges strand | name
## <Rle> <IRanges> <Rle> | <character>
## [1] chr1 1-2300000 * | p36.33
## [2] chr1 2300001-5400000 * | p36.32
## [3] chr1 5400001-7200000 * | p36.31
## -------
## seqinfo: 93 sequences (1 circular) from hg19 genome
Exercise 1: Use the AnnotationHub to extract UCSC data that is from Homo sapiens and also specifically from the hg19 genome. What happens to the hub object as you filter data at each step?
Exercise 2 Now that you have basically narrowed things down to the hg19 annotations from UCSC genome browser, lets get one of these annotations. Find the oreganno track and save it into a local variable.
[ Back to top ]
At this point you might be wondering: What is this OrgDb object about? OrgDb objects are one member of a family of annotation objects that all represent hidden data through a shared set of methods. So if you look closely at the dog object created below you can see it contains data for Canis familiaris (taxonomy ID = 9615). You can learn a little more about it by learning about the columns method.
dogquery <- query(orgs, c("Canis familiaris", "9615"))
dogquery
## AnnotationHub with 1 record
## # snapshotDate(): 2025-04-08
## # names(): AH119523
## # $dataprovider: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/
## # $species: Canis familiaris
## # $rdataclass: OrgDb
## # $rdatadateadded: 2025-03-11
## # $title: org.Cf.eg.db.sqlite
## # $description: NCBI gene ID based annotations about Canis familiaris
## # $taxonomyid: 9615
## # $genome: NCBI genomes
## # $sourcetype: NCBI/ensembl
## # $sourceurl: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/, ftp://ftp.ensembl.org/p...
## # $sourcesize: NA
## # $tags: c("NCBI", "Gene", "Annotation")
## # retrieve record with 'object[["AH119523"]]'
ah_id <- dogquery$ah_id
ah_id
## [1] "AH119523"
dog <- ah[[ah_id]]
## loading from cache
columns(dog)
## [1] "ACCNUM" "ALIAS" "ENSEMBL" "ENSEMBLPROT" "ENSEMBLTRANS"
## [6] "ENTREZID" "ENZYME" "EVIDENCE" "EVIDENCEALL" "GENENAME"
## [11] "GENETYPE" "GO" "GOALL" "ONTOLOGY" "ONTOLOGYALL"
## [16] "PATH" "PMID" "REFSEQ" "SYMBOL" "UNIPROT"
The columns method gives you a vector of data types that can be retrieved from the object that you call it on. So the above call indicates that there are several different data types that can be retrieved from the tetra object.
A very similar method is the keytypes method, which will list all the data types that can also be used as keys.
keytypes(dog)
## [1] "ACCNUM" "ALIAS" "ENSEMBL" "ENSEMBLPROT" "ENSEMBLTRANS"
## [6] "ENTREZID" "ENZYME" "EVIDENCE" "EVIDENCEALL" "GENENAME"
## [11] "GENETYPE" "GO" "GOALL" "ONTOLOGY" "ONTOLOGYALL"
## [16] "PATH" "PMID" "REFSEQ" "SYMBOL" "UNIPROT"
In many cases most of the things that are listed as columns will also come back from a keytypes call, but since these two things are not guaranteed to be identical, we maintain two separate methods.
Now that you can see what kinds of things can be used as keys, you can call the keys method to extract out all the keys of a given key type.
head(keys(dog, keytype="ENTREZID"))
## [1] "399518" "399530" "399544" "399545" "399653" "403152"
This is useful if you need to get all the IDs of a particular kind but the keys method has a few extra arguments that can make it even more flexible. For example, using the keys method you could also extract the gene SYMBOLS that contain “COX” like this:
keys(dog, keytype="SYMBOL", pattern="COX")
## [1] "COX5B" "COX7A2L" "COX8A" "COX15" "COX5A" "COX4I1" "COX6A2"
## [8] "COX20" "COX18" "ACOX1" "COX4I2" "ACOX3" "COX10" "COX17"
## [15] "COX11" "ACOXL" "COX7A1" "COX1" "COX2" "COX3" "COX19"
## [22] "COX7B2" "COX14" "ACOX2" "COX16"
Or if you really needed an other keytype, you can use the column argument to extract the ENTREZ GENE IDs for those gene SYMBOLS that contain the string “COX”:
keys(dog, keytype="ENTREZID", pattern="COX", column="SYMBOL")
## 'select()' returned 1:1 mapping between keys and columns
## [1] "474567" "475739" "476040" "477792" "478370" "479623"
## [7] "479780" "480099" "482193" "483322" "485825" "488790"
## [13] "489515" "503668" "609555" "611729" "612614" "804478"
## [19] "804479" "804480" "100685945" "100687434" "100688544" "100855488"
## [25] "119863880"
But often, you will really want to extract other data that matches a particular key or set of keys. For that there are two methods which you can use. The more powerful of these is probably select. Here is how you would look up the gene SYMBOL, and REFSEQ id for specific entrez gene ID.
select(dog, keys="804478", columns=c("SYMBOL","REFSEQ"), keytype="ENTREZID")
## 'select()' returned 1:1 mapping between keys and columns
## ENTREZID SYMBOL REFSEQ
## 1 804478 COX1 NP_008473
When you call it, select will return a data.frame that attempts to fill in matching values for all the columns you requested. However, if you ask select for things that have a many to one relationship to your keys it can result in an expansion of the data object that is returned. For example, watch what happens when we ask for the GO terms for the same entrez gene ID:
select(dog, keys="804478", columns="GO", keytype="ENTREZID")
## 'select()' returned 1:many mapping between keys and columns
## ENTREZID GO EVIDENCE ONTOLOGY
## 1 804478 GO:0004129 IEA MF
## 2 804478 GO:0005743 IEA CC
## 3 804478 GO:0006119 IEA BP
## 4 804478 GO:0009060 IEA BP
## 5 804478 GO:0020037 IEA MF
## 6 804478 GO:0022904 IEA BP
## 7 804478 GO:0045277 IEA CC
## 8 804478 GO:0046872 IEA MF
Because there are several GO terms associated with the gene “804478”, you end up with many rows in the data.frame. This can become problematic if you then ask for several columns that have a many to one relationship to the original key. If you were to do that, not only would the result multiply in size, it would also become really hard to use. A better strategy is to be selective when using select.
Sometimes you might want to look up matching results in a way that is simpler than the data.frame object that select returns. This is especially true when you only want to look up one kind of value per key. For these cases, we recommend that you look at the mapIds method. Lets look at what happens if request the same basic information as in our recent select call, but instead using the mapIds method:
mapIds(dog, keys="804478", column="GO", keytype="ENTREZID")
## 'select()' returned 1:many mapping between keys and columns
## 804478
## "GO:0004129"
As you can see, the mapIds method allows you to simplify the result that is returned. And by default, mapIds only returns the 1st matching element for each key. But what if you really need all those GO terms returned when you call mapIds? Well then you can make use of the mapIds multiVals argument. There are several options for this argument, we have already seen how by default you can return only the ‘first’ element. But you can also return a ‘list’ or ‘CharacterList’ object, or you can ‘filter’ out or return ‘asNA’ any keys that have multiple matches. You can even define your own rule (as a function) and pass that in as an argument to multiVals. Lets look at what happens when you return a list:
mapIds(dog, keys="804478", column="GO", keytype="ENTREZID", multiVals="list")
## 'select()' returned 1:many mapping between keys and columns
## $`804478`
## [1] "GO:0004129" "GO:0005743" "GO:0006119" "GO:0009060" "GO:0020037"
## [6] "GO:0022904" "GO:0045277" "GO:0046872"
Now you know how to extract information from an OrgDb object, you might find it helpful to know that there is a whole family of other AnnotationDb derived objects that you can also use with these same five methods (keytypes(), columns(), keys(), select(), and mapIds()). For example there are ChipDb objects, InparanoidDb objects and TxDb objects which contain data about microarray probes, inparanoid homology partners or transcript range information respectively. And there are also more specialized objects like GODb or ReactomeDb objects which offer access to data from GO and reactome. In the next section, we will be looking at one of the more popular classes of these objects: the TxDb object.
Exercise 3: Look at the help page for the different columns and keytypes values with: help(“SYMBOL”). Now use this information and what we just described to look up the entrez gene and chromosome for the gene symbol “MSX2”.
Exercise 4: In the previous exercise we had to use gene symbols as keys. But in the past this kind of behavior has sometimes been inadvisable because some gene symbols are used as the official symbol for more than one gene. To learn if this is still happening take advantage of the fact that entrez gene ids are uniquely assigned, and extract all of the gene symbols and their associated entrez gene ids from the org.Hs.eg.db package. Then check the symbols for redundancy.
[ Back to top ]
As mentioned before, TxDb objects can be accessed using the standard set of methods: keytypes(), columns(), keys(), select(), and mapIds(). But because these objects contain information about a transcriptome, they are often used to compare range based information to these important features of the genome[3,4]. As a result they also have specialized accessors for extracting out ranges that correspond to important transcriptome characteristics.
Lets start by loading a TxDb object from an annotation package based on the UCSC ensembl genes track for Drosophila. A common practice when loading these is to shorten the long name to ‘txdb’ (just as a convenience).
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
txdb
## TxDb object:
## # Db type: TxDb
## # Supporting package: GenomicFeatures
## # Data source: UCSC
## # Genome: hg19
## # Organism: Homo sapiens
## # Taxonomy ID: 9606
## # UCSC Table: knownGene
## # Resource URL: http://genome.ucsc.edu/
## # Type of Gene ID: Entrez Gene ID
## # Full dataset: yes
## # miRBase build ID: GRCh37
## # transcript_nrow: 82960
## # exon_nrow: 289969
## # cds_nrow: 237533
## # Db created by: GenomicFeatures package from Bioconductor
## # Creation time: 2015-10-07 18:11:28 +0000 (Wed, 07 Oct 2015)
## # GenomicFeatures version at creation time: 1.21.30
## # RSQLite version at creation time: 1.0.0
## # DBSCHEMAVERSION: 1.1
Just by looking at the TxDb object, we can learn a lot about what data it contains including where the data came from, which build of the UCSC genome it was based on and the last time that the object was updated. One of the most common uses for a TxDb object is to extract various kinds of transcript data out of it. So for example you can extract all the transcripts out of the TxDb as a GRanges object like this:
txs <- transcripts(txdb)
txs
## GRanges object with 5506 ranges and 2 metadata columns:
## seqnames ranges strand | tx_id tx_name
## <Rle> <IRanges> <Rle> | <integer> <character>
## [1] chr3 238279-451097 + | 13060 uc003bot.3
## [2] chr3 238279-451097 + | 13061 uc003bou.3
## [3] chr3 239326-290282 + | 13062 uc003bov.2
## [4] chr3 239326-440831 + | 13063 uc003bow.2
## [5] chr3 361366-451097 + | 13064 uc011asi.2
## ... ... ... ... . ... ...
## [5502] chr18 77732867-77748532 - | 65761 uc002lnr.3
## [5503] chr18 77732867-77748532 - | 65762 uc010drf.3
## [5504] chr18 77732867-77793915 - | 65763 uc010drg.3
## [5505] chr18 77915117-78005397 - | 65764 uc002lny.3
## [5506] chr18 77941005-78005397 - | 65765 uc010xfp.2
## -------
## seqinfo: 2 sequences from hg19 genome
Similarly, there are also extractors for exons(), cds(), genes() and promoters(). Which kind of feature you choose to extract just depends on what information you are after. These basic extractors are fine if you only want a flat representation of these data, but many of these features are inherently nested. So instead of extracting a flat GRanges object, you might choose instead to extract a GRangesList object that groups the transcripts by the genes that they are associated with like this:
txby <- transcriptsBy(txdb, by="gene")
txby
## GRangesList object of length 1612:
## $`1000`
## GRanges object with 2 ranges and 2 metadata columns:
## seqnames ranges strand | tx_id tx_name
## <Rle> <IRanges> <Rle> | <integer> <character>
## [1] chr18 25530930-25616539 - | 65378 uc010xbn.1
## [2] chr18 25530930-25757445 - | 65379 uc002kwg.2
## -------
## seqinfo: 2 sequences from hg19 genome
##
## $`100009676`
## GRanges object with 1 range and 2 metadata columns:
## seqnames ranges strand | tx_id tx_name
## <Rle> <IRanges> <Rle> | <integer> <character>
## [1] chr3 101395274-101398057 + | 14200 uc003dvg.3
## -------
## seqinfo: 2 sequences from hg19 genome
##
## $`100101467`
## GRanges object with 3 ranges and 2 metadata columns:
## seqnames ranges strand | tx_id tx_name
## <Rle> <IRanges> <Rle> | <integer> <character>
## [1] chr18 32831023-32870196 - | 65418 uc002kyl.3
## [2] chr18 32831023-32870196 - | 65419 uc002kym.3
## [3] chr18 32843361-32870165 - | 65420 uc002kyn.1
## -------
## seqinfo: 2 sequences from hg19 genome
##
## ...
## <1609 more elements>
Just as with the flat extractors, there is a whole family of extractors available depending on what you want to extract and how you want it grouped. They include transcriptsBy(), exonsBy(), cdsBy(), intronsByTranscript(), fiveUTRsByTranscript() and threeUTRsByTranscript().
When dealing with genomic data it is almost inevitable that you will run into problems with the way that different groups have adopted alternate ways of naming chromosomes. This is because almost every major repository has cooked up their own slightly different way of labeling these important features.
To cope with this, the Seqinfo object was invented and is attached to TxDb objects as well as the GenomicRanges extracted from these objects. You can extract it using the seqinfo() method like this:
si <- seqinfo(txdb)
si
## Seqinfo object with 2 sequences from hg19 genome:
## seqnames seqlengths isCircular genome
## chr3 198022430 NA hg19
## chr18 78077248 NA hg19
And since the seqinfo information is also attached to the GRanges objects produced by the TxDb extractors, you can also call seqinfo on the results of those methods like this:
txby <- transcriptsBy(txdb, by="gene")
si <- seqinfo(txby)
The Seqinfo object contains a lot of valuable data about which chromosome features are present, whether they are circular or linear, and how long each one is. It is also something that will be checked against if you try to do an operation like ‘findOverlaps’ to compute overlapping ranges etc. So it’s a valuable way to make sure that the chromosomes and genome are the same for your annotations as the range that you are comparing them to. But sometimes you may have a situation where your annotation object contains data that is comparable to your data object, but where it is simply named with a different naming style. For those cases, there are helpers that you can use to discover what the current name style is for an object. And there is also a setter method to allow you to change the value to something more appropriate. So in the following example, we are going to change the seqlevelStyle from ‘UCSC’ to ‘ensembl’ based naming convention (and then back again).
head(seqlevels(txdb))
## [1] "chr3" "chr18"
seqlevelsStyle(txdb)
## [1] "UCSC"
seqlevelsStyle(txdb) <- "NCBI"
head(seqlevels(txdb))
## [1] "3" "18"
## then change it back
seqlevelsStyle(txdb) <- "UCSC"
head(seqlevels(txdb))
## [1] "chr3" "chr18"
In addition to being able to change the naming style used for an object with seqinfo data, you can also toggle which of the chromosomes are ‘active’ so that the software will ignore certain chromosomes. By default, all of the chromosomes are set to be ‘active’.
head(isActiveSeq(txdb), n=30)
## chr3 chr18
## TRUE TRUE
But sometimes you might wish to ignore some of them. For example, lets suppose that you wanted to ignore the Y chromosome from our txdb. You could do that like so:
isActiveSeq(txdb)["chrY"] <- FALSE
head(isActiveSeq(txdb), n=26)
Exercise 5: Use the accessors for the TxDb.Hsapiens.UCSC.hg19.knownGene package to retrieve the gene id, transcript name and transcript chromosome for all the transcripts. Do this using both the select() method and also using the transcripts() method. What is the difference in the output?
Exercise 6: Load the TxDb.Athaliana.BioMart.plantsmart22 package. This package is not from UCSC and it is based on plantsmart. Now use select or one of the range based accessors to look at the gene ids from this TxDb object. How do they compare to what you saw in the TxDb.Hsapiens.UCSC.hg19.knownGene package?
[ Back to top ]
So what happens if you have data from multiple different Annotation objects. For example, what if you had gene SYMBOLS (found in an OrgDb object) and you wanted to easily match those up with known gene transcript names from a UCSC based TxDb object? There is an ideal tool that can help with this kind of problem and it’s called an src_organism object from the Organism.dplyr package. src_organism objects and their related methods are able to query each of OrgDb and TxDb resources for you and then merge the results back together in way that lets you pretend that you only have one source for all your annotations.
library(Organism.dplyr)
src_organism objects can be created for organisms that have both an OrgDb and a TxDb. To see organisms that can have src_organism objects made, use the function supportOrganisms():
supported <- supportedOrganisms()
print(supported, n=Inf)
## # A tibble: 21 × 3
## organism OrgDb TxDb
## <chr> <chr> <chr>
## 1 Bos taurus org.Bt.eg.db TxDb.Btaurus.UCSC.bosTau8.refGene
## 2 Caenorhabditis elegans org.Ce.eg.db TxDb.Celegans.UCSC.ce11.refGene
## 3 Caenorhabditis elegans org.Ce.eg.db TxDb.Celegans.UCSC.ce6.ensGene
## 4 Canis familiaris org.Cf.eg.db TxDb.Cfamiliaris.UCSC.canFam3.refGene
## 5 Drosophila melanogaster org.Dm.eg.db TxDb.Dmelanogaster.UCSC.dm3.ensGene
## 6 Drosophila melanogaster org.Dm.eg.db TxDb.Dmelanogaster.UCSC.dm6.ensGene
## 7 Danio rerio org.Dr.eg.db TxDb.Drerio.UCSC.danRer10.refGene
## 8 Gallus gallus org.Gg.eg.db TxDb.Ggallus.UCSC.galGal4.refGene
## 9 Homo sapiens org.Hs.eg.db TxDb.Hsapiens.UCSC.hg18.knownGene
## 10 Homo sapiens org.Hs.eg.db TxDb.Hsapiens.UCSC.hg19.knownGene
## 11 Homo sapiens org.Hs.eg.db TxDb.Hsapiens.UCSC.hg38.knownGene
## 12 Mus musculus org.Mm.eg.db TxDb.Mmusculus.UCSC.mm10.ensGene
## 13 Mus musculus org.Mm.eg.db TxDb.Mmusculus.UCSC.mm10.knownGene
## 14 Mus musculus org.Mm.eg.db TxDb.Mmusculus.UCSC.mm9.knownGene
## 15 Macaca mulatta org.Mmu.eg.db TxDb.Mmulatta.UCSC.rheMac3.refGene
## 16 Macaca mulatta org.Mmu.eg.db TxDb.Mmulatta.UCSC.rheMac8.refGene
## 17 Pan troglodytes org.Pt.eg.db TxDb.Ptroglodytes.UCSC.panTro4.refGene
## 18 Rattus norvegicus org.Rn.eg.db TxDb.Rnorvegicus.UCSC.rn4.ensGene
## 19 Rattus norvegicus org.Rn.eg.db TxDb.Rnorvegicus.UCSC.rn5.refGene
## 20 Rattus norvegicus org.Rn.eg.db TxDb.Rnorvegicus.UCSC.rn6.refGene
## 21 Sus scrofa org.Ss.eg.db TxDb.Sscrofa.UCSC.susScr3.refGene
Notice how there are multiple entries for a single organism (e.g. three for Homo sapiens). There is only one OrgDb per organism, but different TxDbs can be used. To specify a certain version of a TxDb to use, we can use the src_organism() function to create an src_organism object.
library(org.Hs.eg.db)
library(TxDb.Hsapiens.UCSC.hg38.knownGene)
src <- src_organism("TxDb.Hsapiens.UCSC.hg38.knownGene")
## creating 'src_organism' database...
src
## src: sqlite 3.47.1 [/tmp/Rtmp6vcSdT/file1ee0eb90a18]
## tbls: id, id_accession, id_go, id_go_all, id_omim_pm, id_protein,
## id_transcript, ranges_cds, ranges_exon, ranges_gene, ranges_tx
We can also create one using the src_ucsc() function. This will create an src_organism object using the most recent TxDb version available:
src <- src_ucsc("Homo sapiens")
src
## src: sqlite 3.47.1 [/tmp/Rtmp6vcSdT/file1ee0eb90a18]
## tbls: id, id_accession, id_go, id_go_all, id_omim_pm, id_protein,
## id_transcript, ranges_cds, ranges_exon, ranges_gene, ranges_tx
The five methods that worked for all of the other Db objects that we have discussed (keytypes(), columns(), keys(), select(), and mapIds()) all work for src_organism objects. Here, we use keytypes() to show which keytypes can be passed to the keytype argument of select().
keytypes(src)
## [1] "accnum" "alias" "cds_chrom" "cds_end" "cds_id"
## [6] "cds_name" "cds_start" "cds_strand" "ensembl" "ensemblprot"
## [11] "ensembltrans" "entrez" "enzyme" "evidence" "evidenceall"
## [16] "exon_chrom" "exon_end" "exon_id" "exon_name" "exon_rank"
## [21] "exon_start" "exon_strand" "gene_chrom" "gene_end" "gene_start"
## [26] "gene_strand" "genename" "go" "goall" "ipi"
## [31] "map" "omim" "ontology" "ontologyall" "pfam"
## [36] "pmid" "prosite" "refseq" "symbol" "tx_chrom"
## [41] "tx_end" "tx_id" "tx_name" "tx_start" "tx_strand"
## [46] "tx_type" "uniprot"
Use columns() to show which keytypes can be passed to the keytype argument of select().
columns(src)
## [1] "accnum" "alias" "cds_chrom" "cds_end" "cds_id"
## [6] "cds_name" "cds_start" "cds_strand" "ensembl" "ensemblprot"
## [11] "ensembltrans" "entrez" "enzyme" "evidence" "evidenceall"
## [16] "exon_chrom" "exon_end" "exon_id" "exon_name" "exon_rank"
## [21] "exon_start" "exon_strand" "gene_chrom" "gene_end" "gene_start"
## [26] "gene_strand" "genename" "go" "goall" "ipi"
## [31] "map" "omim" "ontology" "ontologyall" "pfam"
## [36] "pmid" "prosite" "refseq" "symbol" "tx_chrom"
## [41] "tx_end" "tx_id" "tx_name" "tx_start" "tx_strand"
## [46] "tx_type" "uniprot"
And that’s it. You can now use these objects in the same way that you use OrgDb or TxDb objects. It works the same as the base objects that it contains:
select(src, keys="4488", columns=c("symbol", "tx_name"), keytype="entrez")
## Joining with `by = join_by(entrez)`
## entrez symbol tx_name
## 1 4488 MSX2 ENST00000239243.7
## 2 4488 MSX2 ENST00000507785.2
## 3 4488 MSX2 ENST00000239243.7
## 4 4488 MSX2 ENST00000507785.2
## 5 4488 MSX2 ENST00000239243.7
## 6 4488 MSX2 ENST00000507785.2
## 7 4488 MSX2 ENST00000239243.7
## 8 4488 MSX2 ENST00000507785.2
## 9 4488 MSX2 ENST00000239243.7
## 10 4488 MSX2 ENST00000507785.2
## 11 4488 MSX2 ENST00000239243.7
## 12 4488 MSX2 ENST00000507785.2
## 13 4488 MSX2 ENST00000239243.7
## 14 4488 MSX2 ENST00000507785.2
Organism.dplyr also supports numerous Genomic Extractor functions allowing users to filter based on information contained in the OrgDb and TxDb objects. To see the filters supported by a src_organism() object, use supportedFIlters():
head(supportedFilters(src))
## filter field
## 1 AccnumFilter accnum
## 2 AliasFilter alias
## 3 CdsChromFilter cds_chrom
## 44 CdsEndFilter cds_end
## 42 CdsIdFilter cds_id
## 4 CdsNameFilter cds_name
The ranged based accessors such as those in GenomicFeatures will also work. There are also "_tbl" functions (e.g. transcripts_tbl()) that return tbl objects instead of GRanges objects. Complex filter statements can be given as input. Here we declare a GRangesFilter and use two different type-returning accessors to query transcripts that either start with “SNORD” and are within our given GRangesFilter, or have symbol with symbol “ADA”:
gr <- GRangesFilter(GenomicRanges::GRanges("chr1:44000000-55000000"))
transcripts(src, filter=~(symbol %startsWith% "SNORD" & gr) | symbol == "ADA")
## GRanges object with 66 ranges and 3 metadata columns:
## seqnames ranges strand | tx_id tx_name
## <Rle> <IRanges> <Rle> | <integer> <character>
## [1] chr1 44775864-44775943 + | 4620 ENST00000581525.1
## [2] chr1 44776490-44776593 + | 4621 ENST00000364043.1
## [3] chr1 44777843-44777912 + | 4624 ENST00000365161.1
## [4] chr1 44778390-44778456 + | 4626 ENST00000384690.1
## [5] chr1 44778390-44778458 + | 4627 ENST00000625943.1
## ... ... ... ... . ... ...
## [62] chr20 44623752-44651678 - | 357210 ENST00000695997.1
## [63] chr20 44623972-44651718 - | 357211 ENST00000696009.1
## [64] chr20 44626323-44651661 - | 357212 ENST00000545776.5
## [65] chr20 44627547-44651720 - | 357213 ENST00000696010.1
## [66] chr20 44636071-44652233 - | 357214 ENST00000535573.1
## symbol
## <character>
## [1] SNORD55
## [2] SNORD46
## [3] SNORD38A
## [4] SNORD38B
## [5] SNORD38B
## ... ...
## [62] ADA
## [63] ADA
## [64] ADA
## [65] ADA
## [66] ADA
## -------
## seqinfo: 711 sequences (1 circular) from hg38 genome
transcripts_tbl(src, filter=~(symbol %startsWith% "SNORD" & gr) | symbol == "ADA")
## # A tibble: 66 × 7
## tx_chrom tx_start tx_end tx_strand tx_id tx_name symbol
## <chr> <int> <int> <chr> <int> <chr> <chr>
## 1 chr1 44775864 44775943 + 4620 ENST00000581525.1 SNORD55
## 2 chr1 44776490 44776593 + 4621 ENST00000364043.1 SNORD46
## 3 chr1 44777843 44777912 + 4624 ENST00000365161.1 SNORD38A
## 4 chr1 44778390 44778456 + 4626 ENST00000384690.1 SNORD38B
## 5 chr1 44778390 44778458 + 4627 ENST00000625943.1 SNORD38B
## 6 chr20 44584896 44651702 - 357154 ENST00000696034.1 ADA
## 7 chr20 44618605 44651745 - 357155 ENST00000537820.2 ADA
## 8 chr20 44618618 44651699 - 357156 ENST00000696003.1 ADA
## 9 chr20 44618625 44651699 - 357157 ENST00000696004.1 ADA
## 10 chr20 44619521 44651678 - 357158 ENST00000695991.1 ADA
## # ℹ 56 more rows
Exercise 7: Use the src_organism object to look up the gene symbol, transcript start and chromosome using select(). Then do the same thing using transcripts. You might expect that this call to transcripts will look the same as it did for the TxDb object, but (temporarily) it will not.
Exercise 8: Look at the results from call the columns method on the src_organism object and compare that to what happens when you call columns on the org.Hs.eg.db object and then look at a call to columns on the TxDb.Hsapiens.UCSC.hg19.knownGene object.
Exercise 9: Use the src_organism object with the transcripts method to look up the entrez gene IDs for all gene symbols that contain the letter ‘X’.
[ Back to top ]
Another important annotation resource type is a BSgenome package[10]. There are many BSgenome packages in the repository for you to choose from. And you can learn which organisms are already supported by using the available.genomes() function.
head(available.genomes())
## [1] "BSgenome.Alyrata.JGI.v1"
## [2] "BSgenome.Amellifera.BeeBase.assembly4"
## [3] "BSgenome.Amellifera.NCBI.AmelHAv3.1"
## [4] "BSgenome.Amellifera.UCSC.apiMel2"
## [5] "BSgenome.Amellifera.UCSC.apiMel2.masked"
## [6] "BSgenome.Aofficinalis.NCBI.V1"
Unlike the other resources that we have discussed here, these packages are meant to contain sequence data for a specific genome build of an organism. You can load one of these packages in the usual way. And each of them normally has an alias for the primary object that is shorter than the full package name (as a convenience):
ls(2)
## character(0)
Hsapiens
## | BSgenome object for Human
## | - organism: Homo sapiens
## | - provider: UCSC
## | - genome: hg19
## | - release date: June 2013
## | - 298 sequence(s):
## | chr1 chr2 chr3
## | chr4 chr5 chr6
## | chr7 chr8 chr9
## | chr10 chr11 chr12
## | chr13 chr14 chr15
## | ... ... ...
## | chr19_gl949749_alt chr19_gl949750_alt chr19_gl949751_alt
## | chr19_gl949752_alt chr19_gl949753_alt chr20_gl383577_alt
## | chr21_gl383578_alt chr21_gl383579_alt chr21_gl383580_alt
## | chr21_gl383581_alt chr22_gl383582_alt chr22_gl383583_alt
## | chr22_kb663609_alt
## |
## | Tips: call 'seqnames()' on the object to get all the sequence names, call
## | 'seqinfo()' to get the full sequence info, use the '$' or '[[' operator to
## | access a given sequence, see '?BSgenome' for more information.
The getSeq method is a useful way of extracting data from these packages. This method takes several arguments but the important ones are the 1st two. The 1st argument specifies the BSgenome object to use and the second argument (names) specifies what data you want back out. So for example, if you call it and give a character vector that names the seqnames for the object then you will get the sequences from those chromosomes as a DNAStringSet object.
seqNms <- seqnames(Hsapiens)
head(seqNms)
## [1] "chr1" "chr2" "chr3" "chr4" "chr5" "chr6"
getSeq(Hsapiens, seqNms[1:2])
## DNAStringSet object of length 2:
## width seq names
## [1] 249250621 NNNNNNNNNNNNNNNNNNNNN...NNNNNNNNNNNNNNNNNNNNN chr1
## [2] 243199373 NNNNNNNNNNNNNNNNNNNNN...NNNNNNNNNNNNNNNNNNNNN chr2
Whereas if you give the a GRanges object for the 2nd argument, you can instead get a DNAStringSet that corresponds to those ranges. This can be a powerful way to learn what sequence was present from a particular range. For example, here we can extract the range of a specific gene of interest like this.
txby <- transcriptsBy(txdb, by="gene")
geneOfInterest <- txby[["4488"]]
res <- getSeq(Hsapiens, geneOfInterest)
res
Additionally, the Biostrings[11] package has many useful functions for finding a pattern in a string set etc. You may not have noticed when it happened, but the Biostrings package was loaded when you loaded the BSgenome object, so these functions will already be available for you to explore.
Exercise 10: Use what you have just learned to extract the sequence for the PTEN gene.
[ Back to top ]
Another great annotation resource is the biomaRt package[5,6,7]. The biomaRt package exposes a huge family of different online annotation resources called marts. Each mart is another of a set of online web resources that are following a convention that allows them to work with this package. Historically these marts were maintained by various projects around the world, however the majority are now maintained as part of Ensembl and we’ll focus on that resource here. If you wish to access another BioMart instance see the biomaRt vignette Using a BioMart other than Ensembl.
The first step in using biomaRt is always to load the package and then decide which “mart” you want to use. Once you have made your decision, you will then use the useEnsembl() method to create a mart object in your R session. Here we are looking at the marts available and then choosing to use one of the most popular marts: the Ensembl “genes” mart.
listEnsembl()
## biomart version
## 1 genes Ensembl Genes 113
## 2 mouse_strains Mouse strains 113
## 3 snps Ensembl Variation 113
## 4 regulation Ensembl Regulation 113
ensembl <- useEnsembl(biomart = "genes")
ensembl
## Object of class 'Mart':
## Using the ENSEMBL_MART_ENSEMBL BioMart database
## No dataset selected.
Each ‘mart’ can contain datasets for multiple different things. In our example here the “genes” mart contains separate datasets for a large number of organisms. So the next step is that you need to decide on a dataset. Once you have chosen one, you will need to specify that dataset using the dataset argument when you call the useEnsembl() constructor method. Here we will point to the dataset for humans.
head(listDatasets(ensembl))
## dataset description
## 1 abrachyrhynchus_gene_ensembl Pink-footed goose genes (ASM259213v1)
## 2 acalliptera_gene_ensembl Eastern happy genes (fAstCal1.3)
## 3 acarolinensis_gene_ensembl Green anole genes (AnoCar2.0v2)
## 4 acchrysaetos_gene_ensembl Golden eagle genes (bAquChr1.2)
## 5 acitrinellus_gene_ensembl Midas cichlid genes (Midas_v5)
## 6 amelanoleuca_gene_ensembl Giant panda genes (ASM200744v2)
## version
## 1 ASM259213v1
## 2 fAstCal1.3
## 3 AnoCar2.0v2
## 4 bAquChr1.2
## 5 Midas_v5
## 6 ASM200744v2
ensembl <- useEnsembl(biomart="genes", dataset="hsapiens_gene_ensembl")
ensembl
## Object of class 'Mart':
## Using the ENSEMBL_MART_ENSEMBL BioMart database
## Using the hsapiens_gene_ensembl dataset
Next we need to think about attributes, values and filters. Lets start with attributes. You can get a listing of the different kinds of attributes from biomaRt buy using the listAttributes method:
head(listAttributes(ensembl))
## name description page
## 1 ensembl_gene_id Gene stable ID feature_page
## 2 ensembl_gene_id_version Gene stable ID version feature_page
## 3 ensembl_transcript_id Transcript stable ID feature_page
## 4 ensembl_transcript_id_version Transcript stable ID version feature_page
## 5 ensembl_peptide_id Protein stable ID feature_page
## 6 ensembl_peptide_id_version Protein stable ID version feature_page
And you can see what the values for a particular attribute are by using the getBM method:
head(getBM(attributes="chromosome_name", mart=ensembl))
## chromosome_name
## 1 1
## 2 10
## 3 11
## 4 12
## 5 13
## 6 14
Attributes are the things that you can have returned from biomaRt. They are analogous to what you get when you use the columns method with other objects.
In the biomaRt package, filters are things that can be used with values to restrict or choose what comes back. The ‘values’ here are treated as keys that you are passing in and which you would like to know more information about. In contrast, the filter represents the kind of key that you are searching for. So for example, you might choose a filter name of “chromosome_name” to go with specific value of “1”. Together these two argument values would request whatever attributes matched things on the 1st chromosome. Just as there is an accessor for attributes, there is also an accessor to list all available filters:
head(listFilters(ensembl))
## name description
## 1 chromosome_name Chromosome/scaffold name
## 2 start Start
## 3 end End
## 4 band_start Band Start
## 5 band_end Band End
## 6 marker_start Marker Start
So now you know about attributes, values and filters, you can call the getBM() method to put it all together and request specific data from the mart. So for example, the following requests gene symbols and NCBI Gene (formerly called ‘entrezgene’) IDs that are found on chromosome 1 of humans:
res <- getBM(attributes = c("hgnc_symbol", "entrezgene_id"),
filters = "chromosome_name",
values = "1",
mart = ensembl)
head(res)
## hgnc_symbol entrezgene_id
## 1 727856
## 2 100287102
## 3 DDX11L1 NA
## 4 WASH7P 653635
## 5 WASH7P NA
## 6 MIR6859-1 102466751
Of course you may have noticed that a lot of the arguments for getBM are very similar to what you do when working with OrgDb objects. So if it’s your preference you can also use the standard select(), columns(), keytypes() etc methods with mart objects.
head(columns(ensembl))
## [1] "3_utr_end" "3_utr_end" "3_utr_start" "3_utr_start" "3utr"
## [6] "5_utr_end"
Exercise 11: Pull down GO terms for entrez gene id “1” from human by using the ensembl “hsapiens_gene_ensembl” dataset.
Exercise 12: Now compare the GO terms you just pulled down to the same GO terms from the org.Hs.eg.db package (which you can now retrieve using select()). What differences do you notice? Why do you suspect that is?
[ Back to top ]
By now you are aware that Bioconductor has a lot of annotation resources. But it is still completely impossible to have every annotation resource pre-packaged for every conceivable use. Because of this, almost all annotation objects have special functions that can be called to create those objects (or the packages that load them) from generalized data resources or specific file types. Below is a table with a few of the more popular options.
If you want this | And you have this | Then you could call this to help |
---|---|---|
TxDb | tracks from UCSC | GenomicFeatures::makeTxDbPackageFromUCSC |
TxDb | data from biomaRt | GenomicFeatures::makeTxDbPackageFromBiomaRt |
TxDb | gff or gtf file | GenomicFeatures::makeTxDbFromGFF |
OrgDb | custom data.frames | AnnotationForge::makeOrgPackage |
OrgDb | valid Taxonomy ID | AnnotationForge::makeOrgPackageFromNCBI |
ChipDb | org package & data.frame | AnnotationForge::makeChipPackage |
BSgenome | fasta or twobit sequence files | BSgenome::forgeBSgenomeDataPkg |
In most cases the output for resource creation functions will be an annotation package that you can install.
And there is unfortunately not enough space to demonstrate how to call each of these functions here. But to do so is actually pretty straightforward and most such functions will be well documented with their associated manual pages and vignettes[3,4,10,12]. As usual, you can see the help page for any function right inside of R.
help("makeTxDbPackageFromUCSC")
If you plan to make use of these kinds of functions then you should expect to consult the associated documentation first. These kinds of functions tend to have a lot of arguments and most of them also require that their input data meet some fairly specific criteria. Finally, you should know that even after you have succeeded at creating an annotation package, you will also have to make use of the install.packages() function (with the repos argument=NULL) to install whatever package source directory has just been created.
The bioconductor project represents a very large and active codebase from an active and engaged community. Because of this, you should expect that the software described in this walkthrough will change over time and often in dramatic ways. As an example, the getSeq function that is described in this chapter is expected to a big overhaul in the coming months. When this happens the older function will be deprecated for a full release cycle (6 months) and then labeled as defunct for another release cycle before it is removed. This cycle is in place so that active users can be warned about what is happening and where they should look for the appropriate replacement functionality. But obviously, this system cannot warn end users if they have not been vigilant about updating their software to the latest version. So please take the time to always update your software to the latest version.
To stay abreast of new developments users are encouraged to explore the bioconductor website which contains many current walkthroughs and vignettes. Also visit the support site where you can ask questions and engage in discussions.
Package versions used in this tutorial:
sessionInfo()
## R Under development (unstable) (2025-03-13 r87965)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.2 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.21-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0 LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] annotation_1.31.1
## [2] TxDb.Athaliana.BioMart.plantsmart22_3.0.1
## [3] biomaRt_2.63.3
## [4] BSgenome.Hsapiens.UCSC.hg19_1.4.3
## [5] BSgenome_1.75.1
## [6] rtracklayer_1.67.1
## [7] BiocIO_1.17.1
## [8] Homo.sapiens_1.3.1
## [9] GO.db_3.21.0
## [10] OrganismDbi_1.49.0
## [11] org.Mm.eg.db_3.21.0
## [12] org.Hs.eg.db_3.21.0
## [13] TxDb.Mmusculus.UCSC.mm10.ensGene_3.4.0
## [14] TxDb.Hsapiens.UCSC.hg38.knownGene_3.21.0
## [15] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
## [16] GenomicFeatures_1.59.1
## [17] AnnotationDbi_1.69.0
## [18] Organism.dplyr_1.35.1
## [19] AnnotationFilter_1.31.0
## [20] dplyr_1.1.4
## [21] AnnotationHub_3.15.0
## [22] BiocFileCache_2.15.1
## [23] dbplyr_2.5.0
## [24] VariantAnnotation_1.53.1
## [25] Rsamtools_2.23.1
## [26] Biostrings_2.75.4
## [27] XVector_0.47.2
## [28] SummarizedExperiment_1.37.0
## [29] Biobase_2.67.0
## [30] GenomicRanges_1.59.1
## [31] GenomeInfoDb_1.43.4
## [32] IRanges_2.41.3
## [33] S4Vectors_0.45.4
## [34] MatrixGenerics_1.19.1
## [35] matrixStats_1.5.0
## [36] BiocGenerics_0.53.6
## [37] generics_0.1.3
## [38] BiocStyle_2.35.0
##
## loaded via a namespace (and not attached):
## [1] DBI_1.2.3 bitops_1.0-9 RBGL_1.83.0
## [4] httr2_1.1.2 rlang_1.1.5 magrittr_2.0.3
## [7] compiler_4.6.0 RSQLite_2.3.9 png_0.1-8
## [10] vctrs_0.6.5 txdbmaker_1.3.1 stringr_1.5.1
## [13] pkgconfig_2.0.3 crayon_1.5.3 fastmap_1.2.0
## [16] utf8_1.2.4 rmarkdown_2.29 graph_1.85.3
## [19] UCSC.utils_1.3.1 purrr_1.0.4 bit_4.6.0
## [22] xfun_0.52 cachem_1.1.0 jsonlite_2.0.0
## [25] progress_1.2.3 blob_1.2.4 DelayedArray_0.33.6
## [28] BiocParallel_1.41.5 parallel_4.6.0 prettyunits_1.2.0
## [31] R6_2.6.1 bslib_0.9.0 stringi_1.8.7
## [34] jquerylib_0.1.4 bookdown_0.42 knitr_1.50
## [37] Matrix_1.7-3 tidyselect_1.2.1 abind_1.4-8
## [40] yaml_2.3.10 codetools_0.2-20 curl_6.2.2
## [43] lattice_0.22-7 tibble_3.2.1 withr_3.0.2
## [46] KEGGREST_1.47.0 evaluate_1.0.3 xml2_1.3.8
## [49] pillar_1.10.2 BiocManager_1.30.25 filelock_1.0.3
## [52] RCurl_1.98-1.17 BiocVersion_3.21.1 hms_1.1.3
## [55] glue_1.8.0 lazyeval_0.2.2 tools_4.6.0
## [58] GenomicAlignments_1.43.0 XML_3.99-0.18 grid_4.6.0
## [61] GenomeInfoDbData_1.2.14 restfulr_0.0.15 cli_3.6.4
## [64] rappdirs_0.3.3 S4Arrays_1.7.3 sass_0.4.9
## [67] digest_0.6.37 SparseArray_1.7.7 rjson_0.2.23
## [70] memoise_2.0.1 htmltools_0.5.8.1 lifecycle_1.0.4
## [73] httr_1.4.7 mime_0.13 bit64_4.6.0-1
Research reported in this chapter was supported by the National Human Genome Research Institute of the National Institutes of Health under Award Number U41HG004059 and by the National Cancer Institute of the National Institutes of Health under Award Number U24CA180996. We also want to thank the numerous institutions who produced and maintained the data that is used for generating and updating the annotation resources described here.
Wolfgang Huber, Vincent J Carey, Robert Gentleman, Simon Anders, Marc Carlson, Benilton S Carvalho, Hector Corrada Bravo, Sean Davis, Laurent Gatto, Thomas Girke, Raphael Gottardo, Florian Hahne, Kasper D Hansen, Rafael A Irizarry, Michael Lawrence, Michael I Love, James MacDonald, Valerie Obenchain, Andrzej K Oleś, Hervé Pagès, Alejandro Reyes, Paul Shannon, Gordon K Smyth, Dan Tenenbaum, Levi Waldron & Martin Morgan (2015) Orchestrating high-throughput genomic analysis with Bioconductor Nature Methods 12:115-121
Pages H, Carlson M, Falcon S and Li N. AnnotationDbi: Annotation Database Interface. R package version 1.30.0.
M. Carlson, H. Pages, P. Aboyoun, S. Falcon, M. Morgan, D. Sarkar, M. Lawrence GenomicFeatures: Tools for making and manipulating transcript centric annotations version 1.19.38.
Lawrence M, Huber W, Pagès H, Aboyoun P, Carlson M, Gentleman R, Morgan M and Carey V (2013). Software for Computing and Annotating Genomic Ranges. PLoS Computational Biology, 9. http://dx.doi.org/10.1371/journal.pcbi.1003118, http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1003118
Steffen Durinck, Wolfgang Huber biomaRt: Interface to BioMart databases (e.g. Ensembl, COSMIC ,Wormbase and Gramene) version 2.23.5.
Durinck S, Spellman P, Birney E and Huber W (2009). Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt. Nature Protocols, 4, pp. 1184-1191.
Durinck S, Moreau Y, Kasprzyk A, Davis S, De Moor B, Brazma A and Huber W (2005). BioMart and Bioconductor: a powerful link between biological databases and microarray data analysis. Bioinformatics, 21, pp. 3439-3440.
Morgan M, Carlson M, Tenenbaum D and Arora S. AnnotationHub: Client to access AnnotationHub resources. R package version 2.0.1.
Carlson M, Pages H, Morgan M and Obenchain V. OrganismDbi: Software to enable the smooth interfacing of different database packages. R package version 1.10.0.
Pages H. BSgenome: Infrastructure for Biostrings-based genome data packages. R package version 1.36.0.
Pages H, Aboyoun P, Gentleman R and DebRoy S. Biostrings: String objects representing biological sequences, and matching algorithms. R package version 2.36.0.
Carlson M, and Pages H. AnnotationForge: Code for Building Annotation Database Packages. R package version 1.10.0.
The 1st thing you need to do is look for thing from UCSC
ahs <- query(ah, "UCSC")
Then you can look for Genome values that match ‘hg19’ and a species that matches ‘Homo sapiens’.
ahs <- subset(ahs, ahs$genome=='hg19')
length(ahs)
## [1] 5908
ahs <- subset(ahs, ahs$species=='Homo sapiens')
length(ahs)
## [1] 5908
You might notice that the last two filtering steps are redundant (IOW doing the 1st of them is the same as doing both of them.) If this were not the case, we might suspect that there was a problem with the metadata.
This pulls down the oreganno annotations. Which are described on the UCSC site thusly: “This track displays literature-curated regulatory regions, transcription factor binding sites, and regulatory polymorphisms from ORegAnno (Open Regulatory Annotation). For more detailed information on a particular regulatory element, follow the link to ORegAnno from the details page.”
ahs <- query(ah, 'oreganno')
ahs
## AnnotationHub with 9 records
## # snapshotDate(): 2025-04-08
## # $dataprovider: Pazar, UCSC
## # $species: Saccharomyces cerevisiae, Homo sapiens, NA
## # $rdataclass: GRanges
## # additional mcols(): taxonomyid, genome, description,
## # coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
## # rdatapath, sourceurl, sourcetype
## # retrieve records with, e.g., 'object[["AH5087"]]'
##
## title
## AH5087 | ORegAnno
## AH5213 | ORegAnno
## AH7053 | ORegAnno
## AH7061 | ORegAnno
## AH22286 | pazar_ORegAnno_20120522.csv
## AH22287 | pazar_ORegAnno_ENCODEprom_20120522.csv
## AH22288 | pazar_ORegAnno_Erythroid_20120522.csv
## AH22289 | pazar_ORegAnno_STAT1_ChIP_20120522.csv
## AH22290 | pazar_ORegAnno_STAT1_lit_20120522.csv
ahs[1]
## AnnotationHub with 1 record
## # snapshotDate(): 2025-04-08
## # names(): AH5087
## # $dataprovider: UCSC
## # $species: Homo sapiens
## # $rdataclass: GRanges
## # $rdatadateadded: 2013-03-26
## # $title: ORegAnno
## # $description: GRanges object from UCSC track 'ORegAnno'
## # $taxonomyid: 9606
## # $genome: hg19
## # $sourcetype: UCSC track
## # $sourceurl: rtracklayer://hgdownload.cse.ucsc.edu/goldenpath/hg19/database...
## # $sourcesize: NA
## # $tags: c("oreganno", "UCSC", "track", "Gene", "Transcript",
## # "Annotation")
## # retrieve record with 'object[["AH5087"]]'
oreg <- ahs[['AH5087']]
## loading from cache
oreg
## GRanges object with 23118 ranges and 2 metadata columns:
## seqnames ranges strand | name score
## <Rle> <IRanges> <Rle> | <character> <numeric>
## [1] chr1 873499-873849 + | OREG0012989 0
## [2] chr1 886764-887214 + | OREG0012990 0
## [3] chr1 886938-886958 + | OREG0007909 0
## [4] chr1 919400-919950 + | OREG0012991 0
## [5] chr1 919695-919715 + | OREG0007910 0
## ... ... ... ... . ... ...
## [23114] chr7_gl000195_random 1-851 + | OREG0026736 0
## [23115] chr7_gl000195_random 103427-103447 + | OREG0012963 0
## [23116] chr7_gl000195_random 121139-121159 + | OREG0012964 0
## [23117] chr17_gl000204_random 58370-58955 + | OREG0026769 0
## [23118] chr17_gl000205_random 117492-118442 + | OREG0026772 0
## -------
## seqinfo: 93 sequences (1 circular) from hg19 genome
keys <- "MSX2"
columns <- c("ENTREZID", "CHR")
select(org.Hs.eg.db, keys, columns, keytype="SYMBOL")
## Warning in .deprecatedColsMessage(): Accessing gene location information via 'CHR','CHRLOC','CHRLOCEND' is
## deprecated. Please use a range based accessor like genes(), or select() with
## columns values like TXCHROM and TXSTART on a TxDb or OrganismDb object
## instead.
## 'select()' returned 1:1 mapping between keys and columns
## SYMBOL ENTREZID CHR
## 1 MSX2 4488 5
## 1st get all the gene symbols
orgSymbols <- keys(org.Hs.eg.db, keytype="SYMBOL")
## and then use that to get all gene symbols matched to all entrez gene IDs
egr <- select(org.Hs.eg.db, keys=orgSymbols, "ENTREZID", "SYMBOL")
## 'select()' returned 1:many mapping between keys and columns
length(egr$ENTREZID)
## [1] 193430
length(unique(egr$ENTREZID))
## [1] 193430
## VS:
length(egr$SYMBOL)
## [1] 193430
length(unique(egr$SYMBOL))
## [1] 193329
## So lets trap these symbols that are redundant and look more closely...
redund <- egr$SYMBOL
badSymbols <- redund[duplicated(redund)]
select(org.Hs.eg.db, badSymbols, "ENTREZID", "SYMBOL")
## 'select()' returned many:many mapping between keys and columns
## SYMBOL ENTREZID
## 1 HBD 3045
## 2 HBD 100187828
## 3 TEC 7006
## 4 TEC 100124696
## 5 MMD2 221938
## 6 MMD2 100505381
## 7 DEL1P36 100240737
## 8 DEL1P36 123670537
## 9 DEL11P13 100528024
## 10 DEL11P13 107648861
## 11 TRNAV-CAC 107985614
## 12 TRNAV-CAC 107985615
## 13 TRNAE-UUC 107987368
## 14 TRNAE-UUC 124905580
## 15 TRNAE-UUC 124905583
## 16 TRNAE-UUC 124905584
## 17 TRNAE-UUC 124905586
## 18 TRNAE-UUC 124905908
## 19 TRNAE-UUC 107987368
## 20 TRNAE-UUC 124905580
## 21 TRNAE-UUC 124905583
## 22 TRNAE-UUC 124905584
## 23 TRNAE-UUC 124905586
## 24 TRNAE-UUC 124905908
## 25 TRNAE-UUC 107987368
## 26 TRNAE-UUC 124905580
## 27 TRNAE-UUC 124905583
## 28 TRNAE-UUC 124905584
## 29 TRNAE-UUC 124905586
## 30 TRNAE-UUC 124905908
## 31 TRNAE-UUC 107987368
## 32 TRNAE-UUC 124905580
## 33 TRNAE-UUC 124905583
## 34 TRNAE-UUC 124905584
## 35 TRNAE-UUC 124905586
## 36 TRNAE-UUC 124905908
## 37 TRNAE-UUC 107987368
## 38 TRNAE-UUC 124905580
## 39 TRNAE-UUC 124905583
## 40 TRNAE-UUC 124905584
## 41 TRNAE-UUC 124905586
## 42 TRNAE-UUC 124905908
## 43 TRNAA-AGC 124901561
## 44 TRNAA-AGC 124901562
## 45 TRNAA-AGC 124901563
## 46 TRNAA-AGC 124901564
## 47 TRNAA-AGC 124901565
## 48 TRNAA-AGC 124906586
## 49 TRNAA-AGC 124901561
## 50 TRNAA-AGC 124901562
## 51 TRNAA-AGC 124901563
## 52 TRNAA-AGC 124901564
## 53 TRNAA-AGC 124901565
## 54 TRNAA-AGC 124906586
## 55 TRNAA-AGC 124901561
## 56 TRNAA-AGC 124901562
## 57 TRNAA-AGC 124901563
## 58 TRNAA-AGC 124901564
## 59 TRNAA-AGC 124901565
## 60 TRNAA-AGC 124906586
## 61 TRNAA-AGC 124901561
## 62 TRNAA-AGC 124901562
## 63 TRNAA-AGC 124901563
## 64 TRNAA-AGC 124901564
## 65 TRNAA-AGC 124901565
## 66 TRNAA-AGC 124906586
## 67 TRNAA-AGC 124901561
## 68 TRNAA-AGC 124901562
## 69 TRNAA-AGC 124901563
## 70 TRNAA-AGC 124901564
## 71 TRNAA-AGC 124901565
## 72 TRNAA-AGC 124906586
## 73 TRNAG-CCC 124905578
## 74 TRNAG-CCC 124905581
## 75 TRNAG-CCC 124905588
## 76 TRNAG-CCC 124905578
## 77 TRNAG-CCC 124905581
## 78 TRNAG-CCC 124905588
## 79 TRNAN-GUU 124905579
## 80 TRNAN-GUU 124905582
## 81 TRNAN-GUU 124905585
## 82 TRNAN-GUU 124905587
## 83 TRNAN-GUU 124905579
## 84 TRNAN-GUU 124905582
## 85 TRNAN-GUU 124905585
## 86 TRNAN-GUU 124905587
## 87 TRNAN-GUU 124905579
## 88 TRNAN-GUU 124905582
## 89 TRNAN-GUU 124905585
## 90 TRNAN-GUU 124905587
## 91 TRNAG-GCC 124905847
## 92 TRNAG-GCC 124905849
## 93 TRNAG-GCC 124905851
## 94 TRNAG-GCC 124905853
## 95 TRNAG-GCC 124905907
## 96 TRNAG-GCC 124905910
## 97 TRNAG-GCC 124905912
## 98 TRNAG-GCC 124905914
## 99 TRNAG-GCC 124905916
## 100 TRNAG-GCC 124905918
## 101 TRNAG-GCC 124905921
## 102 TRNAG-GCC 124905923
## 103 TRNAG-GCC 124905925
## 104 TRNAG-GCC 124905927
## 105 TRNAG-GCC 124905929
## 106 TRNAG-GCC 124905931
## 107 TRNAG-GCC 124905933
## 108 TRNAG-GCC 124905847
## 109 TRNAG-GCC 124905849
## 110 TRNAG-GCC 124905851
## 111 TRNAG-GCC 124905853
## 112 TRNAG-GCC 124905907
## 113 TRNAG-GCC 124905910
## 114 TRNAG-GCC 124905912
## 115 TRNAG-GCC 124905914
## 116 TRNAG-GCC 124905916
## 117 TRNAG-GCC 124905918
## 118 TRNAG-GCC 124905921
## 119 TRNAG-GCC 124905923
## 120 TRNAG-GCC 124905925
## 121 TRNAG-GCC 124905927
## 122 TRNAG-GCC 124905929
## 123 TRNAG-GCC 124905931
## 124 TRNAG-GCC 124905933
## 125 TRNAG-GCC 124905847
## 126 TRNAG-GCC 124905849
## 127 TRNAG-GCC 124905851
## 128 TRNAG-GCC 124905853
## 129 TRNAG-GCC 124905907
## 130 TRNAG-GCC 124905910
## 131 TRNAG-GCC 124905912
## 132 TRNAG-GCC 124905914
## 133 TRNAG-GCC 124905916
## 134 TRNAG-GCC 124905918
## 135 TRNAG-GCC 124905921
## 136 TRNAG-GCC 124905923
## 137 TRNAG-GCC 124905925
## 138 TRNAG-GCC 124905927
## 139 TRNAG-GCC 124905929
## 140 TRNAG-GCC 124905931
## 141 TRNAG-GCC 124905933
## 142 TRNAG-GCC 124905847
## 143 TRNAG-GCC 124905849
## 144 TRNAG-GCC 124905851
## 145 TRNAG-GCC 124905853
## 146 TRNAG-GCC 124905907
## 147 TRNAG-GCC 124905910
## 148 TRNAG-GCC 124905912
## 149 TRNAG-GCC 124905914
## 150 TRNAG-GCC 124905916
## 151 TRNAG-GCC 124905918
## 152 TRNAG-GCC 124905921
## 153 TRNAG-GCC 124905923
## 154 TRNAG-GCC 124905925
## 155 TRNAG-GCC 124905927
## 156 TRNAG-GCC 124905929
## 157 TRNAG-GCC 124905931
## 158 TRNAG-GCC 124905933
## 159 TRNAG-GCC 124905847
## 160 TRNAG-GCC 124905849
## 161 TRNAG-GCC 124905851
## 162 TRNAG-GCC 124905853
## 163 TRNAG-GCC 124905907
## 164 TRNAG-GCC 124905910
## 165 TRNAG-GCC 124905912
## 166 TRNAG-GCC 124905914
## 167 TRNAG-GCC 124905916
## 168 TRNAG-GCC 124905918
## 169 TRNAG-GCC 124905921
## 170 TRNAG-GCC 124905923
## 171 TRNAG-GCC 124905925
## 172 TRNAG-GCC 124905927
## 173 TRNAG-GCC 124905929
## 174 TRNAG-GCC 124905931
## 175 TRNAG-GCC 124905933
## 176 TRNAG-GCC 124905847
## 177 TRNAG-GCC 124905849
## 178 TRNAG-GCC 124905851
## 179 TRNAG-GCC 124905853
## 180 TRNAG-GCC 124905907
## 181 TRNAG-GCC 124905910
## 182 TRNAG-GCC 124905912
## 183 TRNAG-GCC 124905914
## 184 TRNAG-GCC 124905916
## 185 TRNAG-GCC 124905918
## 186 TRNAG-GCC 124905921
## 187 TRNAG-GCC 124905923
## 188 TRNAG-GCC 124905925
## 189 TRNAG-GCC 124905927
## 190 TRNAG-GCC 124905929
## 191 TRNAG-GCC 124905931
## 192 TRNAG-GCC 124905933
## 193 TRNAG-GCC 124905847
## 194 TRNAG-GCC 124905849
## 195 TRNAG-GCC 124905851
## 196 TRNAG-GCC 124905853
## 197 TRNAG-GCC 124905907
## 198 TRNAG-GCC 124905910
## 199 TRNAG-GCC 124905912
## 200 TRNAG-GCC 124905914
## 201 TRNAG-GCC 124905916
## 202 TRNAG-GCC 124905918
## 203 TRNAG-GCC 124905921
## 204 TRNAG-GCC 124905923
## 205 TRNAG-GCC 124905925
## 206 TRNAG-GCC 124905927
## 207 TRNAG-GCC 124905929
## 208 TRNAG-GCC 124905931
## 209 TRNAG-GCC 124905933
## 210 TRNAG-GCC 124905847
## 211 TRNAG-GCC 124905849
## 212 TRNAG-GCC 124905851
## 213 TRNAG-GCC 124905853
## 214 TRNAG-GCC 124905907
## 215 TRNAG-GCC 124905910
## 216 TRNAG-GCC 124905912
## 217 TRNAG-GCC 124905914
## 218 TRNAG-GCC 124905916
## 219 TRNAG-GCC 124905918
## 220 TRNAG-GCC 124905921
## 221 TRNAG-GCC 124905923
## 222 TRNAG-GCC 124905925
## 223 TRNAG-GCC 124905927
## 224 TRNAG-GCC 124905929
## 225 TRNAG-GCC 124905931
## 226 TRNAG-GCC 124905933
## 227 TRNAG-GCC 124905847
## 228 TRNAG-GCC 124905849
## 229 TRNAG-GCC 124905851
## 230 TRNAG-GCC 124905853
## 231 TRNAG-GCC 124905907
## 232 TRNAG-GCC 124905910
## 233 TRNAG-GCC 124905912
## 234 TRNAG-GCC 124905914
## 235 TRNAG-GCC 124905916
## 236 TRNAG-GCC 124905918
## 237 TRNAG-GCC 124905921
## 238 TRNAG-GCC 124905923
## 239 TRNAG-GCC 124905925
## 240 TRNAG-GCC 124905927
## 241 TRNAG-GCC 124905929
## 242 TRNAG-GCC 124905931
## 243 TRNAG-GCC 124905933
## 244 TRNAG-GCC 124905847
## 245 TRNAG-GCC 124905849
## 246 TRNAG-GCC 124905851
## 247 TRNAG-GCC 124905853
## 248 TRNAG-GCC 124905907
## 249 TRNAG-GCC 124905910
## 250 TRNAG-GCC 124905912
## 251 TRNAG-GCC 124905914
## 252 TRNAG-GCC 124905916
## 253 TRNAG-GCC 124905918
## 254 TRNAG-GCC 124905921
## 255 TRNAG-GCC 124905923
## 256 TRNAG-GCC 124905925
## 257 TRNAG-GCC 124905927
## 258 TRNAG-GCC 124905929
## 259 TRNAG-GCC 124905931
## 260 TRNAG-GCC 124905933
## 261 TRNAG-GCC 124905847
## 262 TRNAG-GCC 124905849
## 263 TRNAG-GCC 124905851
## 264 TRNAG-GCC 124905853
## 265 TRNAG-GCC 124905907
## 266 TRNAG-GCC 124905910
## 267 TRNAG-GCC 124905912
## 268 TRNAG-GCC 124905914
## 269 TRNAG-GCC 124905916
## 270 TRNAG-GCC 124905918
## 271 TRNAG-GCC 124905921
## 272 TRNAG-GCC 124905923
## 273 TRNAG-GCC 124905925
## 274 TRNAG-GCC 124905927
## 275 TRNAG-GCC 124905929
## 276 TRNAG-GCC 124905931
## 277 TRNAG-GCC 124905933
## 278 TRNAG-GCC 124905847
## 279 TRNAG-GCC 124905849
## 280 TRNAG-GCC 124905851
## 281 TRNAG-GCC 124905853
## 282 TRNAG-GCC 124905907
## 283 TRNAG-GCC 124905910
## 284 TRNAG-GCC 124905912
## 285 TRNAG-GCC 124905914
## 286 TRNAG-GCC 124905916
## 287 TRNAG-GCC 124905918
## 288 TRNAG-GCC 124905921
## 289 TRNAG-GCC 124905923
## 290 TRNAG-GCC 124905925
## 291 TRNAG-GCC 124905927
## 292 TRNAG-GCC 124905929
## 293 TRNAG-GCC 124905931
## 294 TRNAG-GCC 124905933
## 295 TRNAG-GCC 124905847
## 296 TRNAG-GCC 124905849
## 297 TRNAG-GCC 124905851
## 298 TRNAG-GCC 124905853
## 299 TRNAG-GCC 124905907
## 300 TRNAG-GCC 124905910
## 301 TRNAG-GCC 124905912
## 302 TRNAG-GCC 124905914
## 303 TRNAG-GCC 124905916
## 304 TRNAG-GCC 124905918
## 305 TRNAG-GCC 124905921
## 306 TRNAG-GCC 124905923
## 307 TRNAG-GCC 124905925
## 308 TRNAG-GCC 124905927
## 309 TRNAG-GCC 124905929
## 310 TRNAG-GCC 124905931
## 311 TRNAG-GCC 124905933
## 312 TRNAG-GCC 124905847
## 313 TRNAG-GCC 124905849
## 314 TRNAG-GCC 124905851
## 315 TRNAG-GCC 124905853
## 316 TRNAG-GCC 124905907
## 317 TRNAG-GCC 124905910
## 318 TRNAG-GCC 124905912
## 319 TRNAG-GCC 124905914
## 320 TRNAG-GCC 124905916
## 321 TRNAG-GCC 124905918
## 322 TRNAG-GCC 124905921
## 323 TRNAG-GCC 124905923
## 324 TRNAG-GCC 124905925
## 325 TRNAG-GCC 124905927
## 326 TRNAG-GCC 124905929
## 327 TRNAG-GCC 124905931
## 328 TRNAG-GCC 124905933
## 329 TRNAG-GCC 124905847
## 330 TRNAG-GCC 124905849
## 331 TRNAG-GCC 124905851
## 332 TRNAG-GCC 124905853
## 333 TRNAG-GCC 124905907
## 334 TRNAG-GCC 124905910
## 335 TRNAG-GCC 124905912
## 336 TRNAG-GCC 124905914
## 337 TRNAG-GCC 124905916
## 338 TRNAG-GCC 124905918
## 339 TRNAG-GCC 124905921
## 340 TRNAG-GCC 124905923
## 341 TRNAG-GCC 124905925
## 342 TRNAG-GCC 124905927
## 343 TRNAG-GCC 124905929
## 344 TRNAG-GCC 124905931
## 345 TRNAG-GCC 124905933
## 346 TRNAG-GCC 124905847
## 347 TRNAG-GCC 124905849
## 348 TRNAG-GCC 124905851
## 349 TRNAG-GCC 124905853
## 350 TRNAG-GCC 124905907
## 351 TRNAG-GCC 124905910
## 352 TRNAG-GCC 124905912
## 353 TRNAG-GCC 124905914
## 354 TRNAG-GCC 124905916
## 355 TRNAG-GCC 124905918
## 356 TRNAG-GCC 124905921
## 357 TRNAG-GCC 124905923
## 358 TRNAG-GCC 124905925
## 359 TRNAG-GCC 124905927
## 360 TRNAG-GCC 124905929
## 361 TRNAG-GCC 124905931
## 362 TRNAG-GCC 124905933
## 363 TRNAL-CAG 124905848
## 364 TRNAL-CAG 124905850
## 365 TRNAL-CAG 124905852
## 366 TRNAL-CAG 124905906
## 367 TRNAL-CAG 124905909
## 368 TRNAL-CAG 124905911
## 369 TRNAL-CAG 124905913
## 370 TRNAL-CAG 124905915
## 371 TRNAL-CAG 124905917
## 372 TRNAL-CAG 124905920
## 373 TRNAL-CAG 124905922
## 374 TRNAL-CAG 124905924
## 375 TRNAL-CAG 124905926
## 376 TRNAL-CAG 124905928
## 377 TRNAL-CAG 124905930
## 378 TRNAL-CAG 124905932
## 379 TRNAL-CAG 124905934
## 380 TRNAL-CAG 124905848
## 381 TRNAL-CAG 124905850
## 382 TRNAL-CAG 124905852
## 383 TRNAL-CAG 124905906
## 384 TRNAL-CAG 124905909
## 385 TRNAL-CAG 124905911
## 386 TRNAL-CAG 124905913
## 387 TRNAL-CAG 124905915
## 388 TRNAL-CAG 124905917
## 389 TRNAL-CAG 124905920
## 390 TRNAL-CAG 124905922
## 391 TRNAL-CAG 124905924
## 392 TRNAL-CAG 124905926
## 393 TRNAL-CAG 124905928
## 394 TRNAL-CAG 124905930
## 395 TRNAL-CAG 124905932
## 396 TRNAL-CAG 124905934
## 397 TRNAL-CAG 124905848
## 398 TRNAL-CAG 124905850
## 399 TRNAL-CAG 124905852
## 400 TRNAL-CAG 124905906
## 401 TRNAL-CAG 124905909
## 402 TRNAL-CAG 124905911
## 403 TRNAL-CAG 124905913
## 404 TRNAL-CAG 124905915
## 405 TRNAL-CAG 124905917
## 406 TRNAL-CAG 124905920
## 407 TRNAL-CAG 124905922
## 408 TRNAL-CAG 124905924
## 409 TRNAL-CAG 124905926
## 410 TRNAL-CAG 124905928
## 411 TRNAL-CAG 124905930
## 412 TRNAL-CAG 124905932
## 413 TRNAL-CAG 124905934
## 414 TRNAL-CAG 124905848
## 415 TRNAL-CAG 124905850
## 416 TRNAL-CAG 124905852
## 417 TRNAL-CAG 124905906
## 418 TRNAL-CAG 124905909
## 419 TRNAL-CAG 124905911
## 420 TRNAL-CAG 124905913
## 421 TRNAL-CAG 124905915
## 422 TRNAL-CAG 124905917
## 423 TRNAL-CAG 124905920
## 424 TRNAL-CAG 124905922
## 425 TRNAL-CAG 124905924
## 426 TRNAL-CAG 124905926
## 427 TRNAL-CAG 124905928
## 428 TRNAL-CAG 124905930
## 429 TRNAL-CAG 124905932
## 430 TRNAL-CAG 124905934
## 431 TRNAL-CAG 124905848
## 432 TRNAL-CAG 124905850
## 433 TRNAL-CAG 124905852
## 434 TRNAL-CAG 124905906
## 435 TRNAL-CAG 124905909
## 436 TRNAL-CAG 124905911
## 437 TRNAL-CAG 124905913
## 438 TRNAL-CAG 124905915
## 439 TRNAL-CAG 124905917
## 440 TRNAL-CAG 124905920
## 441 TRNAL-CAG 124905922
## 442 TRNAL-CAG 124905924
## 443 TRNAL-CAG 124905926
## 444 TRNAL-CAG 124905928
## 445 TRNAL-CAG 124905930
## 446 TRNAL-CAG 124905932
## 447 TRNAL-CAG 124905934
## 448 TRNAL-CAG 124905848
## 449 TRNAL-CAG 124905850
## 450 TRNAL-CAG 124905852
## 451 TRNAL-CAG 124905906
## 452 TRNAL-CAG 124905909
## 453 TRNAL-CAG 124905911
## 454 TRNAL-CAG 124905913
## 455 TRNAL-CAG 124905915
## 456 TRNAL-CAG 124905917
## 457 TRNAL-CAG 124905920
## 458 TRNAL-CAG 124905922
## 459 TRNAL-CAG 124905924
## 460 TRNAL-CAG 124905926
## 461 TRNAL-CAG 124905928
## 462 TRNAL-CAG 124905930
## 463 TRNAL-CAG 124905932
## 464 TRNAL-CAG 124905934
## 465 TRNAL-CAG 124905848
## 466 TRNAL-CAG 124905850
## 467 TRNAL-CAG 124905852
## 468 TRNAL-CAG 124905906
## 469 TRNAL-CAG 124905909
## 470 TRNAL-CAG 124905911
## 471 TRNAL-CAG 124905913
## 472 TRNAL-CAG 124905915
## 473 TRNAL-CAG 124905917
## 474 TRNAL-CAG 124905920
## 475 TRNAL-CAG 124905922
## 476 TRNAL-CAG 124905924
## 477 TRNAL-CAG 124905926
## 478 TRNAL-CAG 124905928
## 479 TRNAL-CAG 124905930
## 480 TRNAL-CAG 124905932
## 481 TRNAL-CAG 124905934
## 482 TRNAL-CAG 124905848
## 483 TRNAL-CAG 124905850
## 484 TRNAL-CAG 124905852
## 485 TRNAL-CAG 124905906
## 486 TRNAL-CAG 124905909
## 487 TRNAL-CAG 124905911
## 488 TRNAL-CAG 124905913
## 489 TRNAL-CAG 124905915
## 490 TRNAL-CAG 124905917
## 491 TRNAL-CAG 124905920
## 492 TRNAL-CAG 124905922
## 493 TRNAL-CAG 124905924
## 494 TRNAL-CAG 124905926
## 495 TRNAL-CAG 124905928
## 496 TRNAL-CAG 124905930
## 497 TRNAL-CAG 124905932
## 498 TRNAL-CAG 124905934
## 499 TRNAL-CAG 124905848
## 500 TRNAL-CAG 124905850
## 501 TRNAL-CAG 124905852
## 502 TRNAL-CAG 124905906
## 503 TRNAL-CAG 124905909
## 504 TRNAL-CAG 124905911
## 505 TRNAL-CAG 124905913
## 506 TRNAL-CAG 124905915
## 507 TRNAL-CAG 124905917
## 508 TRNAL-CAG 124905920
## 509 TRNAL-CAG 124905922
## 510 TRNAL-CAG 124905924
## 511 TRNAL-CAG 124905926
## 512 TRNAL-CAG 124905928
## 513 TRNAL-CAG 124905930
## 514 TRNAL-CAG 124905932
## 515 TRNAL-CAG 124905934
## 516 TRNAL-CAG 124905848
## 517 TRNAL-CAG 124905850
## 518 TRNAL-CAG 124905852
## 519 TRNAL-CAG 124905906
## 520 TRNAL-CAG 124905909
## 521 TRNAL-CAG 124905911
## 522 TRNAL-CAG 124905913
## 523 TRNAL-CAG 124905915
## 524 TRNAL-CAG 124905917
## 525 TRNAL-CAG 124905920
## 526 TRNAL-CAG 124905922
## 527 TRNAL-CAG 124905924
## 528 TRNAL-CAG 124905926
## 529 TRNAL-CAG 124905928
## 530 TRNAL-CAG 124905930
## 531 TRNAL-CAG 124905932
## 532 TRNAL-CAG 124905934
## 533 TRNAL-CAG 124905848
## 534 TRNAL-CAG 124905850
## 535 TRNAL-CAG 124905852
## 536 TRNAL-CAG 124905906
## 537 TRNAL-CAG 124905909
## 538 TRNAL-CAG 124905911
## 539 TRNAL-CAG 124905913
## 540 TRNAL-CAG 124905915
## 541 TRNAL-CAG 124905917
## 542 TRNAL-CAG 124905920
## 543 TRNAL-CAG 124905922
## 544 TRNAL-CAG 124905924
## 545 TRNAL-CAG 124905926
## 546 TRNAL-CAG 124905928
## 547 TRNAL-CAG 124905930
## 548 TRNAL-CAG 124905932
## 549 TRNAL-CAG 124905934
## 550 TRNAL-CAG 124905848
## 551 TRNAL-CAG 124905850
## 552 TRNAL-CAG 124905852
## 553 TRNAL-CAG 124905906
## 554 TRNAL-CAG 124905909
## 555 TRNAL-CAG 124905911
## 556 TRNAL-CAG 124905913
## 557 TRNAL-CAG 124905915
## 558 TRNAL-CAG 124905917
## 559 TRNAL-CAG 124905920
## 560 TRNAL-CAG 124905922
## 561 TRNAL-CAG 124905924
## 562 TRNAL-CAG 124905926
## 563 TRNAL-CAG 124905928
## 564 TRNAL-CAG 124905930
## 565 TRNAL-CAG 124905932
## 566 TRNAL-CAG 124905934
## 567 TRNAL-CAG 124905848
## 568 TRNAL-CAG 124905850
## 569 TRNAL-CAG 124905852
## 570 TRNAL-CAG 124905906
## 571 TRNAL-CAG 124905909
## 572 TRNAL-CAG 124905911
## 573 TRNAL-CAG 124905913
## 574 TRNAL-CAG 124905915
## 575 TRNAL-CAG 124905917
## 576 TRNAL-CAG 124905920
## 577 TRNAL-CAG 124905922
## 578 TRNAL-CAG 124905924
## 579 TRNAL-CAG 124905926
## 580 TRNAL-CAG 124905928
## 581 TRNAL-CAG 124905930
## 582 TRNAL-CAG 124905932
## 583 TRNAL-CAG 124905934
## 584 TRNAL-CAG 124905848
## 585 TRNAL-CAG 124905850
## 586 TRNAL-CAG 124905852
## 587 TRNAL-CAG 124905906
## 588 TRNAL-CAG 124905909
## 589 TRNAL-CAG 124905911
## 590 TRNAL-CAG 124905913
## 591 TRNAL-CAG 124905915
## 592 TRNAL-CAG 124905917
## 593 TRNAL-CAG 124905920
## 594 TRNAL-CAG 124905922
## 595 TRNAL-CAG 124905924
## 596 TRNAL-CAG 124905926
## 597 TRNAL-CAG 124905928
## 598 TRNAL-CAG 124905930
## 599 TRNAL-CAG 124905932
## 600 TRNAL-CAG 124905934
## 601 TRNAL-CAG 124905848
## 602 TRNAL-CAG 124905850
## 603 TRNAL-CAG 124905852
## 604 TRNAL-CAG 124905906
## 605 TRNAL-CAG 124905909
## 606 TRNAL-CAG 124905911
## 607 TRNAL-CAG 124905913
## 608 TRNAL-CAG 124905915
## 609 TRNAL-CAG 124905917
## 610 TRNAL-CAG 124905920
## 611 TRNAL-CAG 124905922
## 612 TRNAL-CAG 124905924
## 613 TRNAL-CAG 124905926
## 614 TRNAL-CAG 124905928
## 615 TRNAL-CAG 124905930
## 616 TRNAL-CAG 124905932
## 617 TRNAL-CAG 124905934
## 618 TRNAL-CAG 124905848
## 619 TRNAL-CAG 124905850
## 620 TRNAL-CAG 124905852
## 621 TRNAL-CAG 124905906
## 622 TRNAL-CAG 124905909
## 623 TRNAL-CAG 124905911
## 624 TRNAL-CAG 124905913
## 625 TRNAL-CAG 124905915
## 626 TRNAL-CAG 124905917
## 627 TRNAL-CAG 124905920
## 628 TRNAL-CAG 124905922
## 629 TRNAL-CAG 124905924
## 630 TRNAL-CAG 124905926
## 631 TRNAL-CAG 124905928
## 632 TRNAL-CAG 124905930
## 633 TRNAL-CAG 124905932
## 634 TRNAL-CAG 124905934
## 635 TRNAD-GUC 124905854
## 636 TRNAD-GUC 124905857
## 637 TRNAD-GUC 124905860
## 638 TRNAD-GUC 124905863
## 639 TRNAD-GUC 124905866
## 640 TRNAD-GUC 124905869
## 641 TRNAD-GUC 124905872
## 642 TRNAD-GUC 124905875
## 643 TRNAD-GUC 124905878
## 644 TRNAD-GUC 124905881
## 645 TRNAD-GUC 124905884
## 646 TRNAD-GUC 124905887
## 647 TRNAD-GUC 124905890
## 648 TRNAD-GUC 124905893
## 649 TRNAD-GUC 124905896
## 650 TRNAD-GUC 124905899
## 651 TRNAD-GUC 124905902
## 652 TRNAD-GUC 124905854
## 653 TRNAD-GUC 124905857
## 654 TRNAD-GUC 124905860
## 655 TRNAD-GUC 124905863
## 656 TRNAD-GUC 124905866
## 657 TRNAD-GUC 124905869
## 658 TRNAD-GUC 124905872
## 659 TRNAD-GUC 124905875
## 660 TRNAD-GUC 124905878
## 661 TRNAD-GUC 124905881
## 662 TRNAD-GUC 124905884
## 663 TRNAD-GUC 124905887
## 664 TRNAD-GUC 124905890
## 665 TRNAD-GUC 124905893
## 666 TRNAD-GUC 124905896
## 667 TRNAD-GUC 124905899
## 668 TRNAD-GUC 124905902
## 669 TRNAD-GUC 124905854
## 670 TRNAD-GUC 124905857
## 671 TRNAD-GUC 124905860
## 672 TRNAD-GUC 124905863
## 673 TRNAD-GUC 124905866
## 674 TRNAD-GUC 124905869
## 675 TRNAD-GUC 124905872
## 676 TRNAD-GUC 124905875
## 677 TRNAD-GUC 124905878
## 678 TRNAD-GUC 124905881
## 679 TRNAD-GUC 124905884
## 680 TRNAD-GUC 124905887
## 681 TRNAD-GUC 124905890
## 682 TRNAD-GUC 124905893
## 683 TRNAD-GUC 124905896
## 684 TRNAD-GUC 124905899
## 685 TRNAD-GUC 124905902
## 686 TRNAD-GUC 124905854
## 687 TRNAD-GUC 124905857
## 688 TRNAD-GUC 124905860
## 689 TRNAD-GUC 124905863
## 690 TRNAD-GUC 124905866
## 691 TRNAD-GUC 124905869
## 692 TRNAD-GUC 124905872
## 693 TRNAD-GUC 124905875
## 694 TRNAD-GUC 124905878
## 695 TRNAD-GUC 124905881
## 696 TRNAD-GUC 124905884
## 697 TRNAD-GUC 124905887
## 698 TRNAD-GUC 124905890
## 699 TRNAD-GUC 124905893
## 700 TRNAD-GUC 124905896
## 701 TRNAD-GUC 124905899
## 702 TRNAD-GUC 124905902
## 703 TRNAD-GUC 124905854
## 704 TRNAD-GUC 124905857
## 705 TRNAD-GUC 124905860
## 706 TRNAD-GUC 124905863
## 707 TRNAD-GUC 124905866
## 708 TRNAD-GUC 124905869
## 709 TRNAD-GUC 124905872
## 710 TRNAD-GUC 124905875
## 711 TRNAD-GUC 124905878
## 712 TRNAD-GUC 124905881
## 713 TRNAD-GUC 124905884
## 714 TRNAD-GUC 124905887
## 715 TRNAD-GUC 124905890
## 716 TRNAD-GUC 124905893
## 717 TRNAD-GUC 124905896
## 718 TRNAD-GUC 124905899
## 719 TRNAD-GUC 124905902
## 720 TRNAD-GUC 124905854
## 721 TRNAD-GUC 124905857
## 722 TRNAD-GUC 124905860
## 723 TRNAD-GUC 124905863
## 724 TRNAD-GUC 124905866
## 725 TRNAD-GUC 124905869
## 726 TRNAD-GUC 124905872
## 727 TRNAD-GUC 124905875
## 728 TRNAD-GUC 124905878
## 729 TRNAD-GUC 124905881
## 730 TRNAD-GUC 124905884
## 731 TRNAD-GUC 124905887
## 732 TRNAD-GUC 124905890
## 733 TRNAD-GUC 124905893
## 734 TRNAD-GUC 124905896
## 735 TRNAD-GUC 124905899
## 736 TRNAD-GUC 124905902
## 737 TRNAD-GUC 124905854
## 738 TRNAD-GUC 124905857
## 739 TRNAD-GUC 124905860
## 740 TRNAD-GUC 124905863
## 741 TRNAD-GUC 124905866
## 742 TRNAD-GUC 124905869
## 743 TRNAD-GUC 124905872
## 744 TRNAD-GUC 124905875
## 745 TRNAD-GUC 124905878
## 746 TRNAD-GUC 124905881
## 747 TRNAD-GUC 124905884
## 748 TRNAD-GUC 124905887
## 749 TRNAD-GUC 124905890
## 750 TRNAD-GUC 124905893
## 751 TRNAD-GUC 124905896
## 752 TRNAD-GUC 124905899
## 753 TRNAD-GUC 124905902
## 754 TRNAD-GUC 124905854
## 755 TRNAD-GUC 124905857
## 756 TRNAD-GUC 124905860
## 757 TRNAD-GUC 124905863
## 758 TRNAD-GUC 124905866
## 759 TRNAD-GUC 124905869
## 760 TRNAD-GUC 124905872
## 761 TRNAD-GUC 124905875
## 762 TRNAD-GUC 124905878
## 763 TRNAD-GUC 124905881
## 764 TRNAD-GUC 124905884
## 765 TRNAD-GUC 124905887
## 766 TRNAD-GUC 124905890
## 767 TRNAD-GUC 124905893
## 768 TRNAD-GUC 124905896
## 769 TRNAD-GUC 124905899
## 770 TRNAD-GUC 124905902
## 771 TRNAD-GUC 124905854
## 772 TRNAD-GUC 124905857
## 773 TRNAD-GUC 124905860
## 774 TRNAD-GUC 124905863
## 775 TRNAD-GUC 124905866
## 776 TRNAD-GUC 124905869
## 777 TRNAD-GUC 124905872
## 778 TRNAD-GUC 124905875
## 779 TRNAD-GUC 124905878
## 780 TRNAD-GUC 124905881
## 781 TRNAD-GUC 124905884
## 782 TRNAD-GUC 124905887
## 783 TRNAD-GUC 124905890
## 784 TRNAD-GUC 124905893
## 785 TRNAD-GUC 124905896
## 786 TRNAD-GUC 124905899
## 787 TRNAD-GUC 124905902
## 788 TRNAD-GUC 124905854
## 789 TRNAD-GUC 124905857
## 790 TRNAD-GUC 124905860
## 791 TRNAD-GUC 124905863
## 792 TRNAD-GUC 124905866
## 793 TRNAD-GUC 124905869
## 794 TRNAD-GUC 124905872
## 795 TRNAD-GUC 124905875
## 796 TRNAD-GUC 124905878
## 797 TRNAD-GUC 124905881
## 798 TRNAD-GUC 124905884
## 799 TRNAD-GUC 124905887
## 800 TRNAD-GUC 124905890
## 801 TRNAD-GUC 124905893
## 802 TRNAD-GUC 124905896
## 803 TRNAD-GUC 124905899
## 804 TRNAD-GUC 124905902
## 805 TRNAD-GUC 124905854
## 806 TRNAD-GUC 124905857
## 807 TRNAD-GUC 124905860
## 808 TRNAD-GUC 124905863
## 809 TRNAD-GUC 124905866
## 810 TRNAD-GUC 124905869
## 811 TRNAD-GUC 124905872
## 812 TRNAD-GUC 124905875
## 813 TRNAD-GUC 124905878
## 814 TRNAD-GUC 124905881
## 815 TRNAD-GUC 124905884
## 816 TRNAD-GUC 124905887
## 817 TRNAD-GUC 124905890
## 818 TRNAD-GUC 124905893
## 819 TRNAD-GUC 124905896
## 820 TRNAD-GUC 124905899
## 821 TRNAD-GUC 124905902
## 822 TRNAD-GUC 124905854
## 823 TRNAD-GUC 124905857
## 824 TRNAD-GUC 124905860
## 825 TRNAD-GUC 124905863
## 826 TRNAD-GUC 124905866
## 827 TRNAD-GUC 124905869
## 828 TRNAD-GUC 124905872
## 829 TRNAD-GUC 124905875
## 830 TRNAD-GUC 124905878
## 831 TRNAD-GUC 124905881
## 832 TRNAD-GUC 124905884
## 833 TRNAD-GUC 124905887
## 834 TRNAD-GUC 124905890
## 835 TRNAD-GUC 124905893
## 836 TRNAD-GUC 124905896
## 837 TRNAD-GUC 124905899
## 838 TRNAD-GUC 124905902
## 839 TRNAD-GUC 124905854
## 840 TRNAD-GUC 124905857
## 841 TRNAD-GUC 124905860
## 842 TRNAD-GUC 124905863
## 843 TRNAD-GUC 124905866
## 844 TRNAD-GUC 124905869
## 845 TRNAD-GUC 124905872
## 846 TRNAD-GUC 124905875
## 847 TRNAD-GUC 124905878
## 848 TRNAD-GUC 124905881
## 849 TRNAD-GUC 124905884
## 850 TRNAD-GUC 124905887
## 851 TRNAD-GUC 124905890
## 852 TRNAD-GUC 124905893
## 853 TRNAD-GUC 124905896
## 854 TRNAD-GUC 124905899
## 855 TRNAD-GUC 124905902
## 856 TRNAD-GUC 124905854
## 857 TRNAD-GUC 124905857
## 858 TRNAD-GUC 124905860
## 859 TRNAD-GUC 124905863
## 860 TRNAD-GUC 124905866
## 861 TRNAD-GUC 124905869
## 862 TRNAD-GUC 124905872
## 863 TRNAD-GUC 124905875
## 864 TRNAD-GUC 124905878
## 865 TRNAD-GUC 124905881
## 866 TRNAD-GUC 124905884
## 867 TRNAD-GUC 124905887
## 868 TRNAD-GUC 124905890
## 869 TRNAD-GUC 124905893
## 870 TRNAD-GUC 124905896
## 871 TRNAD-GUC 124905899
## 872 TRNAD-GUC 124905902
## 873 TRNAD-GUC 124905854
## 874 TRNAD-GUC 124905857
## 875 TRNAD-GUC 124905860
## 876 TRNAD-GUC 124905863
## 877 TRNAD-GUC 124905866
## 878 TRNAD-GUC 124905869
## 879 TRNAD-GUC 124905872
## 880 TRNAD-GUC 124905875
## 881 TRNAD-GUC 124905878
## 882 TRNAD-GUC 124905881
## 883 TRNAD-GUC 124905884
## 884 TRNAD-GUC 124905887
## 885 TRNAD-GUC 124905890
## 886 TRNAD-GUC 124905893
## 887 TRNAD-GUC 124905896
## 888 TRNAD-GUC 124905899
## 889 TRNAD-GUC 124905902
## 890 TRNAD-GUC 124905854
## 891 TRNAD-GUC 124905857
## 892 TRNAD-GUC 124905860
## 893 TRNAD-GUC 124905863
## 894 TRNAD-GUC 124905866
## 895 TRNAD-GUC 124905869
## 896 TRNAD-GUC 124905872
## 897 TRNAD-GUC 124905875
## 898 TRNAD-GUC 124905878
## 899 TRNAD-GUC 124905881
## 900 TRNAD-GUC 124905884
## 901 TRNAD-GUC 124905887
## 902 TRNAD-GUC 124905890
## 903 TRNAD-GUC 124905893
## 904 TRNAD-GUC 124905896
## 905 TRNAD-GUC 124905899
## 906 TRNAD-GUC 124905902
## 907 TRNAE-CUC 124905855
## 908 TRNAE-CUC 124905858
## 909 TRNAE-CUC 124905861
## 910 TRNAE-CUC 124905864
## 911 TRNAE-CUC 124905867
## 912 TRNAE-CUC 124905870
## 913 TRNAE-CUC 124905873
## 914 TRNAE-CUC 124905876
## 915 TRNAE-CUC 124905879
## 916 TRNAE-CUC 124905882
## 917 TRNAE-CUC 124905885
## 918 TRNAE-CUC 124905888
## 919 TRNAE-CUC 124905891
## 920 TRNAE-CUC 124905894
## 921 TRNAE-CUC 124905897
## 922 TRNAE-CUC 124905900
## 923 TRNAE-CUC 124905903
## 924 TRNAE-CUC 124905855
## 925 TRNAE-CUC 124905858
## 926 TRNAE-CUC 124905861
## 927 TRNAE-CUC 124905864
## 928 TRNAE-CUC 124905867
## 929 TRNAE-CUC 124905870
## 930 TRNAE-CUC 124905873
## 931 TRNAE-CUC 124905876
## 932 TRNAE-CUC 124905879
## 933 TRNAE-CUC 124905882
## 934 TRNAE-CUC 124905885
## 935 TRNAE-CUC 124905888
## 936 TRNAE-CUC 124905891
## 937 TRNAE-CUC 124905894
## 938 TRNAE-CUC 124905897
## 939 TRNAE-CUC 124905900
## 940 TRNAE-CUC 124905903
## 941 TRNAE-CUC 124905855
## 942 TRNAE-CUC 124905858
## 943 TRNAE-CUC 124905861
## 944 TRNAE-CUC 124905864
## 945 TRNAE-CUC 124905867
## 946 TRNAE-CUC 124905870
## 947 TRNAE-CUC 124905873
## 948 TRNAE-CUC 124905876
## 949 TRNAE-CUC 124905879
## 950 TRNAE-CUC 124905882
## 951 TRNAE-CUC 124905885
## 952 TRNAE-CUC 124905888
## 953 TRNAE-CUC 124905891
## 954 TRNAE-CUC 124905894
## 955 TRNAE-CUC 124905897
## 956 TRNAE-CUC 124905900
## 957 TRNAE-CUC 124905903
## 958 TRNAE-CUC 124905855
## 959 TRNAE-CUC 124905858
## 960 TRNAE-CUC 124905861
## 961 TRNAE-CUC 124905864
## 962 TRNAE-CUC 124905867
## 963 TRNAE-CUC 124905870
## 964 TRNAE-CUC 124905873
## 965 TRNAE-CUC 124905876
## 966 TRNAE-CUC 124905879
## 967 TRNAE-CUC 124905882
## 968 TRNAE-CUC 124905885
## 969 TRNAE-CUC 124905888
## 970 TRNAE-CUC 124905891
## 971 TRNAE-CUC 124905894
## 972 TRNAE-CUC 124905897
## 973 TRNAE-CUC 124905900
## 974 TRNAE-CUC 124905903
## 975 TRNAE-CUC 124905855
## 976 TRNAE-CUC 124905858
## 977 TRNAE-CUC 124905861
## 978 TRNAE-CUC 124905864
## 979 TRNAE-CUC 124905867
## 980 TRNAE-CUC 124905870
## 981 TRNAE-CUC 124905873
## 982 TRNAE-CUC 124905876
## 983 TRNAE-CUC 124905879
## 984 TRNAE-CUC 124905882
## 985 TRNAE-CUC 124905885
## 986 TRNAE-CUC 124905888
## 987 TRNAE-CUC 124905891
## 988 TRNAE-CUC 124905894
## 989 TRNAE-CUC 124905897
## 990 TRNAE-CUC 124905900
## 991 TRNAE-CUC 124905903
## 992 TRNAE-CUC 124905855
## 993 TRNAE-CUC 124905858
## 994 TRNAE-CUC 124905861
## 995 TRNAE-CUC 124905864
## 996 TRNAE-CUC 124905867
## 997 TRNAE-CUC 124905870
## 998 TRNAE-CUC 124905873
## 999 TRNAE-CUC 124905876
## 1000 TRNAE-CUC 124905879
## 1001 TRNAE-CUC 124905882
## 1002 TRNAE-CUC 124905885
## 1003 TRNAE-CUC 124905888
## 1004 TRNAE-CUC 124905891
## 1005 TRNAE-CUC 124905894
## 1006 TRNAE-CUC 124905897
## 1007 TRNAE-CUC 124905900
## 1008 TRNAE-CUC 124905903
## 1009 TRNAE-CUC 124905855
## 1010 TRNAE-CUC 124905858
## 1011 TRNAE-CUC 124905861
## 1012 TRNAE-CUC 124905864
## 1013 TRNAE-CUC 124905867
## 1014 TRNAE-CUC 124905870
## 1015 TRNAE-CUC 124905873
## 1016 TRNAE-CUC 124905876
## 1017 TRNAE-CUC 124905879
## 1018 TRNAE-CUC 124905882
## 1019 TRNAE-CUC 124905885
## 1020 TRNAE-CUC 124905888
## 1021 TRNAE-CUC 124905891
## 1022 TRNAE-CUC 124905894
## 1023 TRNAE-CUC 124905897
## 1024 TRNAE-CUC 124905900
## 1025 TRNAE-CUC 124905903
## 1026 TRNAE-CUC 124905855
## 1027 TRNAE-CUC 124905858
## 1028 TRNAE-CUC 124905861
## 1029 TRNAE-CUC 124905864
## 1030 TRNAE-CUC 124905867
## 1031 TRNAE-CUC 124905870
## 1032 TRNAE-CUC 124905873
## 1033 TRNAE-CUC 124905876
## 1034 TRNAE-CUC 124905879
## 1035 TRNAE-CUC 124905882
## 1036 TRNAE-CUC 124905885
## 1037 TRNAE-CUC 124905888
## 1038 TRNAE-CUC 124905891
## 1039 TRNAE-CUC 124905894
## 1040 TRNAE-CUC 124905897
## 1041 TRNAE-CUC 124905900
## 1042 TRNAE-CUC 124905903
## 1043 TRNAE-CUC 124905855
## 1044 TRNAE-CUC 124905858
## 1045 TRNAE-CUC 124905861
## 1046 TRNAE-CUC 124905864
## 1047 TRNAE-CUC 124905867
## 1048 TRNAE-CUC 124905870
## 1049 TRNAE-CUC 124905873
## 1050 TRNAE-CUC 124905876
## 1051 TRNAE-CUC 124905879
## 1052 TRNAE-CUC 124905882
## 1053 TRNAE-CUC 124905885
## 1054 TRNAE-CUC 124905888
## 1055 TRNAE-CUC 124905891
## 1056 TRNAE-CUC 124905894
## 1057 TRNAE-CUC 124905897
## 1058 TRNAE-CUC 124905900
## 1059 TRNAE-CUC 124905903
## 1060 TRNAE-CUC 124905855
## 1061 TRNAE-CUC 124905858
## 1062 TRNAE-CUC 124905861
## 1063 TRNAE-CUC 124905864
## 1064 TRNAE-CUC 124905867
## 1065 TRNAE-CUC 124905870
## 1066 TRNAE-CUC 124905873
## 1067 TRNAE-CUC 124905876
## 1068 TRNAE-CUC 124905879
## 1069 TRNAE-CUC 124905882
## 1070 TRNAE-CUC 124905885
## 1071 TRNAE-CUC 124905888
## 1072 TRNAE-CUC 124905891
## 1073 TRNAE-CUC 124905894
## 1074 TRNAE-CUC 124905897
## 1075 TRNAE-CUC 124905900
## 1076 TRNAE-CUC 124905903
## 1077 TRNAE-CUC 124905855
## 1078 TRNAE-CUC 124905858
## 1079 TRNAE-CUC 124905861
## 1080 TRNAE-CUC 124905864
## 1081 TRNAE-CUC 124905867
## 1082 TRNAE-CUC 124905870
## 1083 TRNAE-CUC 124905873
## 1084 TRNAE-CUC 124905876
## 1085 TRNAE-CUC 124905879
## 1086 TRNAE-CUC 124905882
## 1087 TRNAE-CUC 124905885
## 1088 TRNAE-CUC 124905888
## 1089 TRNAE-CUC 124905891
## 1090 TRNAE-CUC 124905894
## 1091 TRNAE-CUC 124905897
## 1092 TRNAE-CUC 124905900
## 1093 TRNAE-CUC 124905903
## 1094 TRNAE-CUC 124905855
## 1095 TRNAE-CUC 124905858
## 1096 TRNAE-CUC 124905861
## 1097 TRNAE-CUC 124905864
## 1098 TRNAE-CUC 124905867
## 1099 TRNAE-CUC 124905870
## 1100 TRNAE-CUC 124905873
## 1101 TRNAE-CUC 124905876
## 1102 TRNAE-CUC 124905879
## 1103 TRNAE-CUC 124905882
## 1104 TRNAE-CUC 124905885
## 1105 TRNAE-CUC 124905888
## 1106 TRNAE-CUC 124905891
## 1107 TRNAE-CUC 124905894
## 1108 TRNAE-CUC 124905897
## 1109 TRNAE-CUC 124905900
## 1110 TRNAE-CUC 124905903
## 1111 TRNAE-CUC 124905855
## 1112 TRNAE-CUC 124905858
## 1113 TRNAE-CUC 124905861
## 1114 TRNAE-CUC 124905864
## 1115 TRNAE-CUC 124905867
## 1116 TRNAE-CUC 124905870
## 1117 TRNAE-CUC 124905873
## 1118 TRNAE-CUC 124905876
## 1119 TRNAE-CUC 124905879
## 1120 TRNAE-CUC 124905882
## 1121 TRNAE-CUC 124905885
## 1122 TRNAE-CUC 124905888
## 1123 TRNAE-CUC 124905891
## 1124 TRNAE-CUC 124905894
## 1125 TRNAE-CUC 124905897
## 1126 TRNAE-CUC 124905900
## 1127 TRNAE-CUC 124905903
## 1128 TRNAE-CUC 124905855
## 1129 TRNAE-CUC 124905858
## 1130 TRNAE-CUC 124905861
## 1131 TRNAE-CUC 124905864
## 1132 TRNAE-CUC 124905867
## 1133 TRNAE-CUC 124905870
## 1134 TRNAE-CUC 124905873
## 1135 TRNAE-CUC 124905876
## 1136 TRNAE-CUC 124905879
## 1137 TRNAE-CUC 124905882
## 1138 TRNAE-CUC 124905885
## 1139 TRNAE-CUC 124905888
## 1140 TRNAE-CUC 124905891
## 1141 TRNAE-CUC 124905894
## 1142 TRNAE-CUC 124905897
## 1143 TRNAE-CUC 124905900
## 1144 TRNAE-CUC 124905903
## 1145 TRNAE-CUC 124905855
## 1146 TRNAE-CUC 124905858
## 1147 TRNAE-CUC 124905861
## 1148 TRNAE-CUC 124905864
## 1149 TRNAE-CUC 124905867
## 1150 TRNAE-CUC 124905870
## 1151 TRNAE-CUC 124905873
## 1152 TRNAE-CUC 124905876
## 1153 TRNAE-CUC 124905879
## 1154 TRNAE-CUC 124905882
## 1155 TRNAE-CUC 124905885
## 1156 TRNAE-CUC 124905888
## 1157 TRNAE-CUC 124905891
## 1158 TRNAE-CUC 124905894
## 1159 TRNAE-CUC 124905897
## 1160 TRNAE-CUC 124905900
## 1161 TRNAE-CUC 124905903
## 1162 TRNAE-CUC 124905855
## 1163 TRNAE-CUC 124905858
## 1164 TRNAE-CUC 124905861
## 1165 TRNAE-CUC 124905864
## 1166 TRNAE-CUC 124905867
## 1167 TRNAE-CUC 124905870
## 1168 TRNAE-CUC 124905873
## 1169 TRNAE-CUC 124905876
## 1170 TRNAE-CUC 124905879
## 1171 TRNAE-CUC 124905882
## 1172 TRNAE-CUC 124905885
## 1173 TRNAE-CUC 124905888
## 1174 TRNAE-CUC 124905891
## 1175 TRNAE-CUC 124905894
## 1176 TRNAE-CUC 124905897
## 1177 TRNAE-CUC 124905900
## 1178 TRNAE-CUC 124905903
## 1179 TRNAG-UCC 124905856
## 1180 TRNAG-UCC 124905859
## 1181 TRNAG-UCC 124905862
## 1182 TRNAG-UCC 124905865
## 1183 TRNAG-UCC 124905868
## 1184 TRNAG-UCC 124905871
## 1185 TRNAG-UCC 124905874
## 1186 TRNAG-UCC 124905877
## 1187 TRNAG-UCC 124905880
## 1188 TRNAG-UCC 124905883
## 1189 TRNAG-UCC 124905886
## 1190 TRNAG-UCC 124905889
## 1191 TRNAG-UCC 124905892
## 1192 TRNAG-UCC 124905895
## 1193 TRNAG-UCC 124905898
## 1194 TRNAG-UCC 124905901
## 1195 TRNAG-UCC 124905904
## 1196 TRNAG-UCC 124905856
## 1197 TRNAG-UCC 124905859
## 1198 TRNAG-UCC 124905862
## 1199 TRNAG-UCC 124905865
## 1200 TRNAG-UCC 124905868
## 1201 TRNAG-UCC 124905871
## 1202 TRNAG-UCC 124905874
## 1203 TRNAG-UCC 124905877
## 1204 TRNAG-UCC 124905880
## 1205 TRNAG-UCC 124905883
## 1206 TRNAG-UCC 124905886
## 1207 TRNAG-UCC 124905889
## 1208 TRNAG-UCC 124905892
## 1209 TRNAG-UCC 124905895
## 1210 TRNAG-UCC 124905898
## 1211 TRNAG-UCC 124905901
## 1212 TRNAG-UCC 124905904
## 1213 TRNAG-UCC 124905856
## 1214 TRNAG-UCC 124905859
## 1215 TRNAG-UCC 124905862
## 1216 TRNAG-UCC 124905865
## 1217 TRNAG-UCC 124905868
## 1218 TRNAG-UCC 124905871
## 1219 TRNAG-UCC 124905874
## 1220 TRNAG-UCC 124905877
## 1221 TRNAG-UCC 124905880
## 1222 TRNAG-UCC 124905883
## 1223 TRNAG-UCC 124905886
## 1224 TRNAG-UCC 124905889
## 1225 TRNAG-UCC 124905892
## 1226 TRNAG-UCC 124905895
## 1227 TRNAG-UCC 124905898
## 1228 TRNAG-UCC 124905901
## 1229 TRNAG-UCC 124905904
## 1230 TRNAG-UCC 124905856
## 1231 TRNAG-UCC 124905859
## 1232 TRNAG-UCC 124905862
## 1233 TRNAG-UCC 124905865
## 1234 TRNAG-UCC 124905868
## 1235 TRNAG-UCC 124905871
## 1236 TRNAG-UCC 124905874
## 1237 TRNAG-UCC 124905877
## 1238 TRNAG-UCC 124905880
## 1239 TRNAG-UCC 124905883
## 1240 TRNAG-UCC 124905886
## 1241 TRNAG-UCC 124905889
## 1242 TRNAG-UCC 124905892
## 1243 TRNAG-UCC 124905895
## 1244 TRNAG-UCC 124905898
## 1245 TRNAG-UCC 124905901
## 1246 TRNAG-UCC 124905904
## 1247 TRNAG-UCC 124905856
## 1248 TRNAG-UCC 124905859
## 1249 TRNAG-UCC 124905862
## 1250 TRNAG-UCC 124905865
## 1251 TRNAG-UCC 124905868
## 1252 TRNAG-UCC 124905871
## 1253 TRNAG-UCC 124905874
## 1254 TRNAG-UCC 124905877
## 1255 TRNAG-UCC 124905880
## 1256 TRNAG-UCC 124905883
## 1257 TRNAG-UCC 124905886
## 1258 TRNAG-UCC 124905889
## 1259 TRNAG-UCC 124905892
## 1260 TRNAG-UCC 124905895
## 1261 TRNAG-UCC 124905898
## 1262 TRNAG-UCC 124905901
## 1263 TRNAG-UCC 124905904
## 1264 TRNAG-UCC 124905856
## 1265 TRNAG-UCC 124905859
## 1266 TRNAG-UCC 124905862
## 1267 TRNAG-UCC 124905865
## 1268 TRNAG-UCC 124905868
## 1269 TRNAG-UCC 124905871
## 1270 TRNAG-UCC 124905874
## 1271 TRNAG-UCC 124905877
## 1272 TRNAG-UCC 124905880
## 1273 TRNAG-UCC 124905883
## 1274 TRNAG-UCC 124905886
## 1275 TRNAG-UCC 124905889
## 1276 TRNAG-UCC 124905892
## 1277 TRNAG-UCC 124905895
## 1278 TRNAG-UCC 124905898
## 1279 TRNAG-UCC 124905901
## 1280 TRNAG-UCC 124905904
## 1281 TRNAG-UCC 124905856
## 1282 TRNAG-UCC 124905859
## 1283 TRNAG-UCC 124905862
## 1284 TRNAG-UCC 124905865
## 1285 TRNAG-UCC 124905868
## 1286 TRNAG-UCC 124905871
## 1287 TRNAG-UCC 124905874
## 1288 TRNAG-UCC 124905877
## 1289 TRNAG-UCC 124905880
## 1290 TRNAG-UCC 124905883
## 1291 TRNAG-UCC 124905886
## 1292 TRNAG-UCC 124905889
## 1293 TRNAG-UCC 124905892
## 1294 TRNAG-UCC 124905895
## 1295 TRNAG-UCC 124905898
## 1296 TRNAG-UCC 124905901
## 1297 TRNAG-UCC 124905904
## 1298 TRNAG-UCC 124905856
## 1299 TRNAG-UCC 124905859
## 1300 TRNAG-UCC 124905862
## 1301 TRNAG-UCC 124905865
## 1302 TRNAG-UCC 124905868
## 1303 TRNAG-UCC 124905871
## 1304 TRNAG-UCC 124905874
## 1305 TRNAG-UCC 124905877
## 1306 TRNAG-UCC 124905880
## 1307 TRNAG-UCC 124905883
## 1308 TRNAG-UCC 124905886
## 1309 TRNAG-UCC 124905889
## 1310 TRNAG-UCC 124905892
## 1311 TRNAG-UCC 124905895
## 1312 TRNAG-UCC 124905898
## 1313 TRNAG-UCC 124905901
## 1314 TRNAG-UCC 124905904
## 1315 TRNAG-UCC 124905856
## 1316 TRNAG-UCC 124905859
## 1317 TRNAG-UCC 124905862
## 1318 TRNAG-UCC 124905865
## 1319 TRNAG-UCC 124905868
## 1320 TRNAG-UCC 124905871
## 1321 TRNAG-UCC 124905874
## 1322 TRNAG-UCC 124905877
## 1323 TRNAG-UCC 124905880
## 1324 TRNAG-UCC 124905883
## 1325 TRNAG-UCC 124905886
## 1326 TRNAG-UCC 124905889
## 1327 TRNAG-UCC 124905892
## 1328 TRNAG-UCC 124905895
## 1329 TRNAG-UCC 124905898
## 1330 TRNAG-UCC 124905901
## 1331 TRNAG-UCC 124905904
## 1332 TRNAG-UCC 124905856
## 1333 TRNAG-UCC 124905859
## 1334 TRNAG-UCC 124905862
## 1335 TRNAG-UCC 124905865
## 1336 TRNAG-UCC 124905868
## 1337 TRNAG-UCC 124905871
## 1338 TRNAG-UCC 124905874
## 1339 TRNAG-UCC 124905877
## 1340 TRNAG-UCC 124905880
## 1341 TRNAG-UCC 124905883
## 1342 TRNAG-UCC 124905886
## 1343 TRNAG-UCC 124905889
## 1344 TRNAG-UCC 124905892
## 1345 TRNAG-UCC 124905895
## 1346 TRNAG-UCC 124905898
## 1347 TRNAG-UCC 124905901
## 1348 TRNAG-UCC 124905904
## 1349 TRNAG-UCC 124905856
## 1350 TRNAG-UCC 124905859
## 1351 TRNAG-UCC 124905862
## 1352 TRNAG-UCC 124905865
## 1353 TRNAG-UCC 124905868
## 1354 TRNAG-UCC 124905871
## 1355 TRNAG-UCC 124905874
## 1356 TRNAG-UCC 124905877
## 1357 TRNAG-UCC 124905880
## 1358 TRNAG-UCC 124905883
## 1359 TRNAG-UCC 124905886
## 1360 TRNAG-UCC 124905889
## 1361 TRNAG-UCC 124905892
## 1362 TRNAG-UCC 124905895
## 1363 TRNAG-UCC 124905898
## 1364 TRNAG-UCC 124905901
## 1365 TRNAG-UCC 124905904
## 1366 TRNAG-UCC 124905856
## 1367 TRNAG-UCC 124905859
## 1368 TRNAG-UCC 124905862
## 1369 TRNAG-UCC 124905865
## 1370 TRNAG-UCC 124905868
## 1371 TRNAG-UCC 124905871
## 1372 TRNAG-UCC 124905874
## 1373 TRNAG-UCC 124905877
## 1374 TRNAG-UCC 124905880
## 1375 TRNAG-UCC 124905883
## 1376 TRNAG-UCC 124905886
## 1377 TRNAG-UCC 124905889
## 1378 TRNAG-UCC 124905892
## 1379 TRNAG-UCC 124905895
## 1380 TRNAG-UCC 124905898
## 1381 TRNAG-UCC 124905901
## 1382 TRNAG-UCC 124905904
## 1383 TRNAG-UCC 124905856
## 1384 TRNAG-UCC 124905859
## 1385 TRNAG-UCC 124905862
## 1386 TRNAG-UCC 124905865
## 1387 TRNAG-UCC 124905868
## 1388 TRNAG-UCC 124905871
## 1389 TRNAG-UCC 124905874
## 1390 TRNAG-UCC 124905877
## 1391 TRNAG-UCC 124905880
## 1392 TRNAG-UCC 124905883
## 1393 TRNAG-UCC 124905886
## 1394 TRNAG-UCC 124905889
## 1395 TRNAG-UCC 124905892
## 1396 TRNAG-UCC 124905895
## 1397 TRNAG-UCC 124905898
## 1398 TRNAG-UCC 124905901
## 1399 TRNAG-UCC 124905904
## 1400 TRNAG-UCC 124905856
## 1401 TRNAG-UCC 124905859
## 1402 TRNAG-UCC 124905862
## 1403 TRNAG-UCC 124905865
## 1404 TRNAG-UCC 124905868
## 1405 TRNAG-UCC 124905871
## 1406 TRNAG-UCC 124905874
## 1407 TRNAG-UCC 124905877
## 1408 TRNAG-UCC 124905880
## 1409 TRNAG-UCC 124905883
## 1410 TRNAG-UCC 124905886
## 1411 TRNAG-UCC 124905889
## 1412 TRNAG-UCC 124905892
## 1413 TRNAG-UCC 124905895
## 1414 TRNAG-UCC 124905898
## 1415 TRNAG-UCC 124905901
## 1416 TRNAG-UCC 124905904
## 1417 TRNAG-UCC 124905856
## 1418 TRNAG-UCC 124905859
## 1419 TRNAG-UCC 124905862
## 1420 TRNAG-UCC 124905865
## 1421 TRNAG-UCC 124905868
## 1422 TRNAG-UCC 124905871
## 1423 TRNAG-UCC 124905874
## 1424 TRNAG-UCC 124905877
## 1425 TRNAG-UCC 124905880
## 1426 TRNAG-UCC 124905883
## 1427 TRNAG-UCC 124905886
## 1428 TRNAG-UCC 124905889
## 1429 TRNAG-UCC 124905892
## 1430 TRNAG-UCC 124905895
## 1431 TRNAG-UCC 124905898
## 1432 TRNAG-UCC 124905901
## 1433 TRNAG-UCC 124905904
## 1434 TRNAG-UCC 124905856
## 1435 TRNAG-UCC 124905859
## 1436 TRNAG-UCC 124905862
## 1437 TRNAG-UCC 124905865
## 1438 TRNAG-UCC 124905868
## 1439 TRNAG-UCC 124905871
## 1440 TRNAG-UCC 124905874
## 1441 TRNAG-UCC 124905877
## 1442 TRNAG-UCC 124905880
## 1443 TRNAG-UCC 124905883
## 1444 TRNAG-UCC 124905886
## 1445 TRNAG-UCC 124905889
## 1446 TRNAG-UCC 124905892
## 1447 TRNAG-UCC 124905895
## 1448 TRNAG-UCC 124905898
## 1449 TRNAG-UCC 124905901
## 1450 TRNAG-UCC 124905904
So to retrieve this information using select you need to do it like this:
res1 <- select(TxDb.Hsapiens.UCSC.hg19.knownGene,
keys(TxDb.Hsapiens.UCSC.hg19.knownGene, keytype="TXID"),
columns=c("GENEID","TXNAME","TXCHROM"), keytype="TXID")
## 'select()' returned 1:1 mapping between keys and columns
head(res1)
## TXID GENEID TXNAME TXCHROM
## 1 1 100287102 uc001aaa.3 chr1
## 2 2 100287102 uc010nxq.1 chr1
## 3 3 100287102 uc010nxr.1 chr1
## 4 4 79501 uc001aal.1 chr1
## 5 5 <NA> uc001aaq.2 chr1
## 6 6 <NA> uc001aar.2 chr1
And to do it using transcripts you do it like this:
res2 <- transcripts(TxDb.Hsapiens.UCSC.hg19.knownGene,
columns = c("gene_id","tx_name"))
head(res2)
## GRanges object with 6 ranges and 2 metadata columns:
## seqnames ranges strand | gene_id tx_name
## <Rle> <IRanges> <Rle> | <CharacterList> <character>
## [1] chr3 238279-451097 + | 10752 uc003bot.3
## [2] chr3 238279-451097 + | 10752 uc003bou.3
## [3] chr3 239326-290282 + | 10752 uc003bov.2
## [4] chr3 239326-440831 + | 10752 uc003bow.2
## [5] chr3 361366-451097 + | 10752 uc011asi.2
## [6] chr3 577914-887698 + | uc003boy.1
## -------
## seqinfo: 2 sequences from hg19 genome
Notice that in the 2nd case we don’t have to ask for the chromosome, as transcripts() returns a GRanges object, so the chromosome will automatically be returned as part of the object.
res <- transcripts(TxDb.Athaliana.BioMart.plantsmart22, columns = c("gene_id"))
You will notice that the gene ids for this package are TAIR locus IDs and are NOT entrez gene IDs like what you saw in the TxDb.Hsapiens.UCSC.hg19.knownGene package. It’s important to always pay attention to the kind of gene id is being used by the TxDb you are looking at.
keys <- keys(Homo.sapiens, keytype="TXID")
res1 <- select(Homo.sapiens,
keys= keys,
columns=c("SYMBOL","TXSTART","TXCHROM"), keytype="TXID")
head(res1)
And to do it using transcripts you do it like this:
res2 <- transcripts(Homo.sapiens, columns="SYMBOL")
## 'select()' returned 1:1 mapping between keys and columns
head(res2)
## GRanges object with 6 ranges and 1 metadata column:
## seqnames ranges strand | SYMBOL
## <Rle> <IRanges> <Rle> | <CharacterList>
## [1] chr3 238279-451097 + | CHL1
## [2] chr3 238279-451097 + | CHL1
## [3] chr3 239326-290282 + | CHL1
## [4] chr3 239326-440831 + | CHL1
## [5] chr3 361366-451097 + | CHL1
## [6] chr3 577914-887698 + | <NA>
## -------
## seqinfo: 2 sequences from hg19 genome
columns(Homo.sapiens)
## [1] "ACCNUM" "ALIAS" "CDSCHROM" "CDSEND" "CDSID"
## [6] "CDSNAME" "CDSSTART" "CDSSTRAND" "DEFINITION" "ENSEMBL"
## [11] "ENSEMBLPROT" "ENSEMBLTRANS" "ENTREZID" "ENZYME" "EVIDENCE"
## [16] "EVIDENCEALL" "EXONCHROM" "EXONEND" "EXONID" "EXONNAME"
## [21] "EXONRANK" "EXONSTART" "EXONSTRAND" "GENEID" "GENENAME"
## [26] "GENETYPE" "GO" "GOALL" "GOID" "IPI"
## [31] "MAP" "OMIM" "ONTOLOGY" "ONTOLOGYALL" "PATH"
## [36] "PFAM" "PMID" "PROSITE" "REFSEQ" "SYMBOL"
## [41] "TERM" "TXCHROM" "TXEND" "TXID" "TXNAME"
## [46] "TXSTART" "TXSTRAND" "TXTYPE" "UCSCKG" "UNIPROT"
columns(org.Hs.eg.db)
## [1] "ACCNUM" "ALIAS" "ENSEMBL" "ENSEMBLPROT" "ENSEMBLTRANS"
## [6] "ENTREZID" "ENZYME" "EVIDENCE" "EVIDENCEALL" "GENENAME"
## [11] "GENETYPE" "GO" "GOALL" "IPI" "MAP"
## [16] "OMIM" "ONTOLOGY" "ONTOLOGYALL" "PATH" "PFAM"
## [21] "PMID" "PROSITE" "REFSEQ" "SYMBOL" "UCSCKG"
## [26] "UNIPROT"
columns(TxDb.Hsapiens.UCSC.hg19.knownGene)
## [1] "CDSCHROM" "CDSEND" "CDSID" "CDSNAME" "CDSSTART"
## [6] "CDSSTRAND" "EXONCHROM" "EXONEND" "EXONID" "EXONNAME"
## [11] "EXONRANK" "EXONSTART" "EXONSTRAND" "GENEID" "TXCHROM"
## [16] "TXEND" "TXID" "TXNAME" "TXSTART" "TXSTRAND"
## [21] "TXTYPE"
## You might also want to look at this:
transcripts(Homo.sapiens, columns=c("SYMBOL","CHRLOC"))
## 'select()' returned 1:1 mapping between keys and columns
## GRanges object with 5506 ranges and 1 metadata column:
## seqnames ranges strand | SYMBOL
## <Rle> <IRanges> <Rle> | <CharacterList>
## [1] chr3 238279-451097 + | CHL1
## [2] chr3 238279-451097 + | CHL1
## [3] chr3 239326-290282 + | CHL1
## [4] chr3 239326-440831 + | CHL1
## [5] chr3 361366-451097 + | CHL1
## ... ... ... ... . ...
## [5502] chr18 77732867-77748532 - | TXNL4A
## [5503] chr18 77732867-77748532 - | TXNL4A
## [5504] chr18 77732867-77793915 - | TXNL4A
## [5505] chr18 77915117-78005397 - | PARD6G
## [5506] chr18 77941005-78005397 - | PARD6G
## -------
## seqinfo: 2 sequences from hg19 genome
The key difference is that the TXSTART refers to the start of a transcript and originates in the TxDb object from the TxDb.Hsapiens.UCSC.hg19.knownGene package, while the CHRLOC refers to the same thing but originates in the OrgDb object from the org.Hs.eg.db package. The point of origin is significant because the TxDb object represents a transcriptome from UCSC and the OrgDb is primarily gene centric data that originates at NCBI. The upshot is that CHRLOC will not have as many regions represented as TXSTART, since there has to be an official gene for there to even be a record. The CHRLOC data is also locked in for org.Hs.eg.db as data for hg19, whereas you can swap in a different TxDb object to match the genome you are using to make it hg18 etc. For these reasons, we strongly recommend using TXSTART instead of CHRLOC. Howeverm CHRLOC still remains in the org packages for historical reasons.
To find the keys that match, make use of the pattern and column arguments.
xk = head(keys(Homo.sapiens, keytype="ENTREZID", pattern="X", column="SYMBOL"))
## 'select()' returned 1:1 mapping between keys and columns
xk
## [1] "51" "189" "239" "240" "241" "242"
select verifies the results
select(Homo.sapiens, xk, "SYMBOL", "ENTREZID")
## 'select()' returned 1:1 mapping between keys and columns
## ENTREZID SYMBOL
## 1 51 ACOX1
## 2 189 AGXT
## 3 239 ALOX12
## 4 240 ALOX5
## 5 241 ALOX5AP
## 6 242 ALOX12B
## Get the transcript ranges grouped by gene
txby <- transcriptsBy(Homo.sapiens, by="gene")
## look up the entrez ID for the gene symbol 'PTEN'
select(Homo.sapiens, keys='PTEN', columns='ENTREZID', keytype='SYMBOL')
## subset that genes transcripts
geneOfInterest <- txby[["5728"]]
## extract the sequence
res <- getSeq(Hsapiens, geneOfInterest)
res
ensembl <- useEnsembl(biomart = "ensembl", dataset="hsapiens_gene_ensembl")
ids <- c("1")
getBM(attributes=c('go_id', 'entrezgene_id'),
filters = 'entrezgene_id',
values = ids,
mart = ensembl)
## go_id entrezgene_id
## 1 1
## 2 GO:0005576 1
## 3 GO:0005886 1
## 4 GO:0005615 1
## 5 GO:0002764 1
## 6 GO:0070062 1
## 7 GO:0003674 1
## 8 GO:0008150 1
## 9 GO:0072562 1
## 10 GO:0062023 1
## 11 GO:0034774 1
## 12 GO:1904813 1
## 13 GO:0031093 1
ids <- c("1")
select(org.Hs.eg.db, keys=ids, columns="GO", keytype="ENTREZID")
## 'select()' returned 1:many mapping between keys and columns
## ENTREZID GO EVIDENCE ONTOLOGY
## 1 1 GO:0002764 IBA BP
## 2 1 GO:0005576 HDA CC
## 3 1 GO:0005576 IDA CC
## 4 1 GO:0005576 TAS CC
## 5 1 GO:0005615 HDA CC
## 6 1 GO:0005886 IBA CC
## 7 1 GO:0031093 TAS CC
## 8 1 GO:0034774 TAS CC
## 9 1 GO:0062023 HDA CC
## 10 1 GO:0070062 HDA CC
## 11 1 GO:0072562 HDA CC
## 12 1 GO:1904813 TAS CC
When this exercise was written, there was a different number of GO terms returned from biomaRt than from org.Hs.eg.db. This may not always be true in the future though as both of these resources are updated. It is expected however that this web service, (which is updated continuously) will fall in and out of sync with the org.Hs.eg.db package (which is updated twice a year). This is an important difference as each approach has different advantages and disadvantages. The advantage to updating continuously is that you always have the very latest annotations which are frequently different for something like GO terms. The advantage to using a package is that the results are frozen to a release of Bioconductor. And this can help you to get the same answers that you get today (reproducibility), a few years from now.
[ Back to top ]