BgeeDB
is a collection of functions to import data from the Bgee database (http://bgee.org/) directly into R, and to facilitate downstream analyses, such as gene set enrichment test based on expression of genes in anatomical structures. Bgee provides annotated and processed expression data and expression calls from curated wild-type healthy samples, from human and many other animal species.
The package retrieves the annotation of RNA-seq or Affymetrix experiments integrated into the Bgee database, and downloads into R the quantitative data and expression calls produced by the Bgee pipeline. The package also allows to run GO-like enrichment analyses based on anatomical terms, where genes are mapped to anatomical terms by expression patterns, based on the topGO
package. This is the same as the TopAnat web-service available at (http://bgee.org/?page=top_anat#/), but with more flexibility in the choice of parameters and developmental stages.
In summary, the BgeeDB package allows to: * 1. List annotation of RNA-seq and microarray data available the Bgee database * 2. Download the processed gene expression data available in the Bgee database * 3. Download the gene expression calls and use them to perform TopAnat analyses
The pipeline used to generate Bgee expression data is documented and publicly available at (https://github.com/BgeeDB/bgee_pipeline)
If you find a bug or have any issues to use BgeeDB
please write a bug report in our own GitHub issues manager available at (https://github.com/BgeeDB/BgeeDB_R/issues)
In R:
The listBgeeSpecies()
function allows to retrieve available species in the Bgee database, and which data types are available for each species.
##
## Querying Bgee to get release information...
##
## Building URL to query species in Bgee release 14...
##
## Submitting URL to Bgee webservice... (https://r.bgee.org/bgee14/?page=r_package&action=get_all_species&display_type=tsv&source=BgeeDB_R_package&source_version=2.12.1)
##
## Query to Bgee webservice successful!
## ID GENUS SPECIES_NAME COMMON_NAME AFFYMETRIX EST
## 1 6239 Caenorhabditis elegans nematode TRUE FALSE
## 2 7217 Drosophila ananassae FALSE FALSE
## 3 7227 Drosophila melanogaster fruit fly TRUE TRUE
## 4 7230 Drosophila mojavensis FALSE FALSE
## 5 7237 Drosophila pseudoobscura FALSE FALSE
## 6 7240 Drosophila simulans FALSE FALSE
## 7 7244 Drosophila virilis FALSE FALSE
## 8 7245 Drosophila yakuba FALSE FALSE
## 9 7955 Danio rerio zebrafish TRUE TRUE
## 10 8364 Xenopus tropicalis western clawed frog FALSE TRUE
## 11 9031 Gallus gallus chicken FALSE FALSE
## 12 9258 Ornithorhynchus anatinus platypus FALSE FALSE
## 13 9365 Erinaceus europaeus hedgehog FALSE FALSE
## 14 9544 Macaca mulatta macaque TRUE FALSE
## 15 9593 Gorilla gorilla gorilla FALSE FALSE
## 16 9597 Pan paniscus bonobo FALSE FALSE
## 17 9598 Pan troglodytes chimpanzee FALSE FALSE
## 18 9606 Homo sapiens human TRUE TRUE
## 19 9615 Canis lupus familiaris dog FALSE FALSE
## 20 9685 Felis catus cat FALSE FALSE
## 21 9796 Equus caballus horse FALSE FALSE
## 22 9823 Sus scrofa pig FALSE FALSE
## 23 9913 Bos taurus cattle FALSE FALSE
## 24 9986 Oryctolagus cuniculus rabbit FALSE FALSE
## 25 10090 Mus musculus mouse TRUE TRUE
## 26 10116 Rattus norvegicus rat TRUE FALSE
## 27 10141 Cavia porcellus guinea pig FALSE FALSE
## 28 13616 Monodelphis domestica opossum FALSE FALSE
## 29 28377 Anolis carolinensis green anole FALSE FALSE
## IN_SITU RNA_SEQ
## 1 TRUE TRUE
## 2 FALSE TRUE
## 3 TRUE TRUE
## 4 FALSE TRUE
## 5 FALSE TRUE
## 6 FALSE TRUE
## 7 FALSE TRUE
## 8 FALSE TRUE
## 9 TRUE TRUE
## 10 TRUE TRUE
## 11 FALSE TRUE
## 12 FALSE TRUE
## 13 FALSE TRUE
## 14 FALSE TRUE
## 15 FALSE TRUE
## 16 FALSE TRUE
## 17 FALSE TRUE
## 18 FALSE TRUE
## 19 FALSE TRUE
## 20 FALSE TRUE
## 21 FALSE TRUE
## 22 FALSE TRUE
## 23 FALSE TRUE
## 24 FALSE TRUE
## 25 TRUE TRUE
## 26 FALSE TRUE
## 27 FALSE TRUE
## 28 FALSE TRUE
## 29 FALSE TRUE
It is possible to list all species from a specific release of Bgee with the release
argument (see listBgeeRelease()
function), and order the species according to a specific columns with the ordering
argument. For example:
##
## Querying Bgee to get release information...
##
## Building URL to query species in Bgee release 13_2...
##
## Submitting URL to Bgee webservice... (https://r.bgee.org/bgee13/?page=species&display_type=tsv&source=BgeeDB_R_package&source_version=2.12.1)
##
## Query to Bgee webservice successful!
## ID GENUS SPECIES_NAME COMMON_NAME AFFYMETRIX EST IN_SITU
## 17 28377 Anolis carolinensis anolis FALSE FALSE FALSE
## 13 9913 Bos taurus cow FALSE FALSE FALSE
## 1 6239 Caenorhabditis elegans c.elegans TRUE FALSE TRUE
## 3 7955 Danio rerio zebrafish TRUE TRUE TRUE
## 2 7227 Drosophila melanogaster fruitfly TRUE TRUE TRUE
## 5 9031 Gallus gallus chicken FALSE FALSE FALSE
## 8 9593 Gorilla gorilla gorilla FALSE FALSE FALSE
## 11 9606 Homo sapiens human TRUE TRUE FALSE
## 7 9544 Macaca mulatta macaque FALSE FALSE FALSE
## 16 13616 Monodelphis domestica opossum FALSE FALSE FALSE
## 14 10090 Mus musculus mouse TRUE TRUE TRUE
## 6 9258 Ornithorhynchus anatinus platypus FALSE FALSE FALSE
## 9 9597 Pan paniscus bonobo FALSE FALSE FALSE
## 10 9598 Pan troglodytes chimpanzee FALSE FALSE FALSE
## 15 10116 Rattus norvegicus rat FALSE FALSE FALSE
## 12 9823 Sus scrofa pig FALSE FALSE FALSE
## 4 8364 Xenopus tropicalis xenopus FALSE TRUE TRUE
## RNA_SEQ
## 17 TRUE
## 13 TRUE
## 1 TRUE
## 3 FALSE
## 2 FALSE
## 5 TRUE
## 8 TRUE
## 11 TRUE
## 7 TRUE
## 16 TRUE
## 14 TRUE
## 6 TRUE
## 9 TRUE
## 10 TRUE
## 15 TRUE
## 12 TRUE
## 4 TRUE
In the following example we will choose to focus on mouse (“Mus_musculus”) RNA-seq. Species can be specified using their name or their NCBI taxonomic IDs. To specify that RNA-seq data want to be downloaded, the dataType
argument is set to “rna_seq”. To download Affymetrix microarray data, set this argument to “affymetrix”.
##
## Querying Bgee to get release information...
##
## Building URL to query species in Bgee release 14_0...
##
## Submitting URL to Bgee webservice... (https://r.bgee.org/bgee14/?page=r_package&action=get_all_species&display_type=tsv&source=BgeeDB_R_package&source_version=2.12.1)
##
## Query to Bgee webservice successful!
##
## API key built: b00a311a003fc3d62f337a5b6de2ba24813925b4f891bb347c870f1ecdabf1511a273f71980c3d2d7be31a1ee5404279e812282ec3f6d2908117b52e6274801a
Note 1: It is possible to work with data from a specific release of Bgee by specifying the release
argument, see listBgeeRelease()
function.
Note 2: The functions of the package will store the downloaded files in a versioned folder created by default in the working directory. These cache files allow faster re-access to the data. The directory where data are stored can be changed with the pathToData
argument.
The getAnnotation()
function will output the list of RNA-seq experiments and libraries available in Bgee for mouse.
##
## Saved annotation files in /tmp/Rtmp1QfSwr/Rbuild7bad6aa1d052/BgeeDB/vignettes/Mus_musculus_Bgee_14_0 folder.
## $sample.annotation
## Experiment.ID Library.ID Anatomical.entity.ID Anatomical.entity.name
## 1 GSE30617 ERX012344 UBERON:0000948 heart
## 2 GSE30617 ERX012348 UBERON:0000948 heart
## 3 GSE30617 ERX012362 UBERON:0000948 heart
## 4 GSE30617 ERX012363 UBERON:0000948 heart
## 5 GSE30617 ERX012374 UBERON:0000948 heart
## 6 GSE30617 ERX012378 UBERON:0000948 heart
## Stage.ID Stage.name Sex Strain Platform.ID
## 1 MmusDv:0000052 8 weeks (mouse) <NA> DBAxC57BL/6J Illumina Genome Analyzer IIx
## 2 MmusDv:0000052 8 weeks (mouse) <NA> DBAxC57BL/6J Illumina Genome Analyzer IIx
## 3 MmusDv:0000052 8 weeks (mouse) <NA> DBAxC57BL/6J Illumina Genome Analyzer IIx
## 4 MmusDv:0000052 8 weeks (mouse) <NA> DBAxC57BL/6J Illumina Genome Analyzer IIx
## 5 MmusDv:0000052 8 weeks (mouse) <NA> DBAxC57BL/6J Illumina Genome Analyzer IIx
## 6 MmusDv:0000052 8 weeks (mouse) <NA> DBAxC57BL/6J Illumina Genome Analyzer IIx
## Library.type Library.orientation TPM.expression.threshold
## 1 paired NA 0.024357
## 2 paired NA 0.042566
## 3 paired NA 0.024598
## 4 paired NA 0.025378
## 5 paired NA 0.027712
## 6 paired NA 0.025381
## FPKM.expression.threshold Read.count Mapped.read.count Min.read.length
## 1 0.023059 30075234 26055677 0
## 2 0.047824 8605668 7347266 0
## 3 0.028740 29498377 25708709 0
## 4 0.024916 31000737 26507831 0
## 5 0.026467 31160789 27397595 0
## 6 0.025119 27824366 23808859 0
## Max.read.length All.genes.percent.present
## 1 0 40.42
## 2 0 25.95
## 3 0 43.40
## 4 0 42.31
## 5 0 40.92
## 6 0 40.36
## Protein.coding.genes.percent.present Intergenic.regions.percent.present
## 1 69.20 14.44
## 2 51.95 6.46
## 3 69.54 17.08
## 4 70.52 15.54
## 5 68.87 14.66
## 6 68.93 14.40
## Run.IDs Data.source
## 1 ERR032229 GEO
## 2 ERR032228 GEO
## 3 ERR032238 GEO
## 4 ERR032227 GEO
## 5 ERR032231 GEO
## 6 ERR032230 GEO
## Data.source.URL
## 1 http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=ERX012344
## 2 http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=ERX012348
## 3 http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=ERX012362
## 4 http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=ERX012363
## 5 http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=ERX012374
## 6 http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=ERX012378
## Bgee.normalized.data.URL
## 1 ftp://ftp.bgee.org/bgee_v14_0/download/processed_expr_values/rna_seq/Mus_musculus/Mus_musculus_RNA-Seq_read_counts_TPM_FPKM_GSE30617.tsv.tar.gz
## 2 ftp://ftp.bgee.org/bgee_v14_0/download/processed_expr_values/rna_seq/Mus_musculus/Mus_musculus_RNA-Seq_read_counts_TPM_FPKM_GSE30617.tsv.tar.gz
## 3 ftp://ftp.bgee.org/bgee_v14_0/download/processed_expr_values/rna_seq/Mus_musculus/Mus_musculus_RNA-Seq_read_counts_TPM_FPKM_GSE30617.tsv.tar.gz
## 4 ftp://ftp.bgee.org/bgee_v14_0/download/processed_expr_values/rna_seq/Mus_musculus/Mus_musculus_RNA-Seq_read_counts_TPM_FPKM_GSE30617.tsv.tar.gz
## 5 ftp://ftp.bgee.org/bgee_v14_0/download/processed_expr_values/rna_seq/Mus_musculus/Mus_musculus_RNA-Seq_read_counts_TPM_FPKM_GSE30617.tsv.tar.gz
## 6 ftp://ftp.bgee.org/bgee_v14_0/download/processed_expr_values/rna_seq/Mus_musculus/Mus_musculus_RNA-Seq_read_counts_TPM_FPKM_GSE30617.tsv.tar.gz
## Raw.file.URL
## 1 https://trace.ncbi.nlm.nih.gov/Traces/study/?acc=ERX012344
## 2 https://trace.ncbi.nlm.nih.gov/Traces/study/?acc=ERX012348
## 3 https://trace.ncbi.nlm.nih.gov/Traces/study/?acc=ERX012362
## 4 https://trace.ncbi.nlm.nih.gov/Traces/study/?acc=ERX012363
## 5 https://trace.ncbi.nlm.nih.gov/Traces/study/?acc=ERX012374
## 6 https://trace.ncbi.nlm.nih.gov/Traces/study/?acc=ERX012378
##
## $experiment.annotation
## Experiment.ID
## 1 GSE30617
## 2 GSE41637
## 3 GSE30352
## 4 SRP022612
## 5 GSE36026
## 6 GSE43520
## Experiment.name
## 1 [E-MTAB-599] Mouse Transcriptome
## 2 Evolutionary dynamics of gene and isoform regulation in mammalian tissues
## 3 The evolution of gene expression levels in mammalian organs
## 4 RNA-seq of Danio rerio and Mus musculus skin for three different age groups
## 5 RNA-seq from ENCODE/LICR
## 6 The evolution of lncRNA repertoires and expression patterns in tetrapods
## Library.count Condition.count Organ.stage.count Organ.count Stage.count
## 1 36 6 6 6 1
## 2 26 26 9 9 1
## 3 17 17 6 6 1
## 4 15 2 3 1 3
## 5 12 12 12 12 3
## 6 9 7 5 4 2
## Sex.count Strain.count Data.source
## 1 0 1 GEO
## 2 1 6 GEO
## 3 2 2 GEO
## 4 1 2 GEO
## 5 2 1 GEO
## 6 2 2 GEO
## Data.source.URL
## 1 http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE30617
## 2 http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE41637
## 3 http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE30352
## 4 http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=SRP022612
## 5 http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE36026
## 6 http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE43520
## Bgee.normalized.data.URL
## 1 ftp://ftp.bgee.org/bgee_v14_0/download/processed_expr_values/rna_seq/Mus_musculus/Mus_musculus_RNA-Seq_read_counts_TPM_FPKM_GSE30617.tsv.tar.gz
## 2 ftp://ftp.bgee.org/bgee_v14_0/download/processed_expr_values/rna_seq/Mus_musculus/Mus_musculus_RNA-Seq_read_counts_TPM_FPKM_GSE41637.tsv.tar.gz
## 3 ftp://ftp.bgee.org/bgee_v14_0/download/processed_expr_values/rna_seq/Mus_musculus/Mus_musculus_RNA-Seq_read_counts_TPM_FPKM_GSE30352.tsv.tar.gz
## 4 ftp://ftp.bgee.org/bgee_v14_0/download/processed_expr_values/rna_seq/Mus_musculus/Mus_musculus_RNA-Seq_read_counts_TPM_FPKM_SRP022612.tsv.tar.gz
## 5 ftp://ftp.bgee.org/bgee_v14_0/download/processed_expr_values/rna_seq/Mus_musculus/Mus_musculus_RNA-Seq_read_counts_TPM_FPKM_GSE36026.tsv.tar.gz
## 6 ftp://ftp.bgee.org/bgee_v14_0/download/processed_expr_values/rna_seq/Mus_musculus/Mus_musculus_RNA-Seq_read_counts_TPM_FPKM_GSE43520.tsv.tar.gz
## Experiment.description
## 1 Sequencing the transcriptome of DBAxC57BL/6J mice. To study the regulation of transcription, splicing and RNA turnover we have sequenced the transcriptomes of tissues collected DBAxC57BL/6J mice.
## 2 Most mammalian genes produce multiple distinct mRNAs through alternative splicing, but the extent of splicing conservation is not clear. To assess tissue-specific transcriptome variation across mammals, we sequenced cDNA from 9 tissues from 4 mammals and one bird in biological triplicate, at unprecedented depth. We find that while tissue-specific gene expression programs are largely conserved, alternative splicing is well conserved in only a subset of tissues and is frequently lineage-specific. Thousands of novel, lineage-specific and conserved alternative exons were identified; widely conserved alternative exons had signatures of binding by MBNL, PTB, RBFOX, STAR and TIA family splicing factors, implicating them as ancestral mammalian splicing regulators. Our data also indicates that alternative splicing is often used to alter protein phosphorylatability, delimiting the scope of kinase signaling.
## 3 Changes in gene expression are thought to underlie many of the phenotypic differences between species. However, large-scale analyses of gene expression evolution were until recently prevented by technological limitations. Here we report the sequencing of polyadenylated RNA from six organs across ten species that represent all major mammalian lineages (placentals, marsupials and monotremes) and birds (the evolutionary outgroup), with the goal of understanding the dynamics of mammalian transcriptome evolution. We show that the rate of gene expression evolution varies among organs, lineages and chromosomes, owing to differences in selective pressures: transcriptome change was slow in nervous tissues and rapid in testes, slower in rodents than in apes and monotremes, and rapid for the X chromosome right after its formation. Although gene expression evolution in mammals was strongly shaped by purifying selection, we identify numerous potentially selectively driven expression switches, which occurred at different rates across lineages and tissues and which probably contributed to the specific organ biology of various mammals. Our transcriptome data provide a valuable resource for functional and evolutionary analyses of mammalian genomes.
## 4 Comparison of temporal gene expression profiles (www.jenage.de) The RNA-seq data comprises 3 age groups: 2, 15 and 30 months for mouse skin; 5, 24 and 42 months for zebrafish skin. Illumina 50bp single-stranded single-read RNA sequencing Overall design: 15 samples for mouse: 5 biological replicates for 2 months, 6 biological replicates for 15 months and 4 biological replicates for 30 months; 20 samples for zebrafish: 9 biological replicates for 5 months, 6 biological replicates for 24 months and 5 biological replicates for 42 months
## 5 Using RNA-Seq (Mortazavi et al., 2008), high-resolution genome-wide maps of the mouse transcriptome across multiple mouse (C57Bl/6) tissues and primary cells were generated.
## 6 To broaden our understanding of lncRNA evolution, we used an extensive RNA-seq dataset to establish lncRNA repertoires and homologous gene families in 11 tetrapod species. We analyzed the poly- adenylated transcriptomes of 8 organs (cortex/whole brain without cerebellum, cerebellum, heart, kidney, liver, placenta, ovary and testis) and 11 species (human, chimpanzee, bonobo, gorilla, orangutan, macaque, mouse, opossum, platypus, chicken and the frog Xenopus tropicalis), which shared a common ancestor ~370 millions of years (MY) ago. Our dataset included 47 strand-specific samples, which allowed us to confirm the orientation of gene predictions and to address the evolution of sense-antisense transcripts. See also GSE43721 (Soumillon et al, Cell Reports, 2013) for three strand-specific samples for mouse brain, liver and testis.
The getData()
function will download processed RNA-seq data from all mouse experiments in Bgee as a list.
## The experiment is not defined. Hence taking all rna_seq experiments available for Mus_musculus.
## Downloading expression data...
## Saved expression data file in/tmp/Rtmp1QfSwr/Rbuild7bad6aa1d052/BgeeDB/vignettes/Mus_musculus_Bgee_14_0 folder. Now untar file...
## Finished uncompress tar files
## Saving all data in .rds file...
## [1] 9
## [[1]]
## Experiment.ID Library.ID Library.type Gene.ID Anatomical.entity.ID
## 1 GSE43721 SRX219282 single ENSMUSG00000000001 UBERON:0000473
## 2 GSE43721 SRX219282 single ENSMUSG00000000003 UBERON:0000473
## 3 GSE43721 SRX219282 single ENSMUSG00000000028 UBERON:0000473
## 4 GSE43721 SRX219282 single ENSMUSG00000000031 UBERON:0000473
## 5 GSE43721 SRX219282 single ENSMUSG00000000037 UBERON:0000473
## 6 GSE43721 SRX219282 single ENSMUSG00000000049 UBERON:0000473
## Anatomical.entity.name Stage.ID Stage.name Sex Strain
## 1 testis MmusDv:0000055 11 weeks (mouse) male C57BL/6
## 2 testis MmusDv:0000055 11 weeks (mouse) male C57BL/6
## 3 testis MmusDv:0000055 11 weeks (mouse) male C57BL/6
## 4 testis MmusDv:0000055 11 weeks (mouse) male C57BL/6
## 5 testis MmusDv:0000055 11 weeks (mouse) male C57BL/6
## 6 testis MmusDv:0000055 11 weeks (mouse) male C57BL/6
## Read.count TPM FPKM Detection.flag Detection.quality
## 1 3077.00000 11.356800 12.591405 present high quality
## 2 0.00000 0.000000 0.000000 absent high quality
## 3 2491.38470 17.577187 19.488013 present high quality
## 4 74.28836 0.748961 0.830381 present high quality
## 5 1659.00031 6.053025 6.711053 present high quality
## 6 4060.00000 54.832049 60.792873 present high quality
## State.in.Bgee
## 1 Part of a call
## 2 Result excluded, reason: pre-filtering
## 3 Part of a call
## 4 Part of a call
## 5 Part of a call
## 6 Part of a call
##
## [[2]]
## Experiment.ID Library.ID Library.type Gene.ID Anatomical.entity.ID
## 1 GSE41338 SRX191152 paired ENSMUSG00000000001 UBERON:0000082
## 2 GSE41338 SRX191152 paired ENSMUSG00000000003 UBERON:0000082
## 3 GSE41338 SRX191152 paired ENSMUSG00000000028 UBERON:0000082
## 4 GSE41338 SRX191152 paired ENSMUSG00000000031 UBERON:0000082
## 5 GSE41338 SRX191152 paired ENSMUSG00000000037 UBERON:0000082
## 6 GSE41338 SRX191152 paired ENSMUSG00000000049 UBERON:0000082
## Anatomical.entity.name Stage.ID Stage.name Sex Strain
## 1 adult mammalian kidney UBERON:0000113 post-juvenile adult stage mixed C57BL/6
## 2 adult mammalian kidney UBERON:0000113 post-juvenile adult stage mixed C57BL/6
## 3 adult mammalian kidney UBERON:0000113 post-juvenile adult stage mixed C57BL/6
## 4 adult mammalian kidney UBERON:0000113 post-juvenile adult stage mixed C57BL/6
## 5 adult mammalian kidney UBERON:0000113 post-juvenile adult stage mixed C57BL/6
## 6 adult mammalian kidney UBERON:0000113 post-juvenile adult stage mixed C57BL/6
## Read.count TPM FPKM Detection.flag Detection.quality
## 1 567 14.501564 16.562366 present high quality
## 2 0 0.000000 0.000000 absent high quality
## 3 21 0.877042 1.001678 present high quality
## 4 2 0.065568 0.074886 absent high quality
## 5 0 0.000000 0.000000 absent high quality
## 6 192 12.685089 14.487754 present high quality
## State.in.Bgee
## 1 Part of a call
## 2 Result excluded, reason: pre-filtering
## 3 Part of a call
## 4 Part of a call
## 5 Part of a call
## 6 Part of a call
##
## [[3]]
## Experiment.ID Library.ID Library.type Gene.ID Anatomical.entity.ID
## 1 GSE43520 SRX217693 single ENSMUSG00000000001 UBERON:0000473
## 2 GSE43520 SRX217693 single ENSMUSG00000000003 UBERON:0000473
## 3 GSE43520 SRX217693 single ENSMUSG00000000028 UBERON:0000473
## 4 GSE43520 SRX217693 single ENSMUSG00000000031 UBERON:0000473
## 5 GSE43520 SRX217693 single ENSMUSG00000000037 UBERON:0000473
## 6 GSE43520 SRX217693 single ENSMUSG00000000049 UBERON:0000473
## Anatomical.entity.name Stage.ID Stage.name Sex Strain
## 1 testis UBERON:0000113 post-juvenile adult stage male C57BL/6
## 2 testis UBERON:0000113 post-juvenile adult stage male C57BL/6
## 3 testis UBERON:0000113 post-juvenile adult stage male C57BL/6
## 4 testis UBERON:0000113 post-juvenile adult stage male C57BL/6
## 5 testis UBERON:0000113 post-juvenile adult stage male C57BL/6
## 6 testis UBERON:0000113 post-juvenile adult stage male C57BL/6
## Read.count TPM FPKM Detection.flag Detection.quality
## 1 1240.0000 11.779760 11.467125 present high quality
## 2 0.0000 0.000000 0.000000 absent high quality
## 3 585.1399 10.046090 9.779467 present high quality
## 4 71.0000 2.325751 2.264026 present high quality
## 5 717.9998 7.448692 7.251004 present high quality
## 6 781.9998 27.757432 27.020750 present high quality
## State.in.Bgee
## 1 Part of a call
## 2 Result excluded, reason: pre-filtering
## 3 Part of a call
## 4 Part of a call
## 5 Part of a call
## 6 Part of a call
##
## [[4]]
## Experiment.ID Library.ID Library.type Gene.ID Anatomical.entity.ID
## 1 GSE43013 SRX211656 paired ENSMUSG00000000001 UBERON:0000082
## 2 GSE43013 SRX211656 paired ENSMUSG00000000003 UBERON:0000082
## 3 GSE43013 SRX211656 paired ENSMUSG00000000028 UBERON:0000082
## 4 GSE43013 SRX211656 paired ENSMUSG00000000031 UBERON:0000082
## 5 GSE43013 SRX211656 paired ENSMUSG00000000037 UBERON:0000082
## 6 GSE43013 SRX211656 paired ENSMUSG00000000049 UBERON:0000082
## Anatomical.entity.name Stage.ID Stage.name Sex
## 1 adult mammalian kidney UBERON:0000113 post-juvenile adult stage male
## 2 adult mammalian kidney UBERON:0000113 post-juvenile adult stage male
## 3 adult mammalian kidney UBERON:0000113 post-juvenile adult stage male
## 4 adult mammalian kidney UBERON:0000113 post-juvenile adult stage male
## 5 adult mammalian kidney UBERON:0000113 post-juvenile adult stage male
## 6 adult mammalian kidney UBERON:0000113 post-juvenile adult stage male
## Strain Read.count TPM FPKM Detection.flag Detection.quality
## 1 wild-type 1525 46.583577 37.122217 present high quality
## 2 wild-type 0 0.000000 0.000000 absent high quality
## 3 wild-type 20 0.889774 0.709056 present high quality
## 4 wild-type 2 0.269064 0.214416 present high quality
## 5 wild-type 5 0.110121 0.087755 present high quality
## 6 wild-type 136 11.615857 9.256618 present high quality
## State.in.Bgee
## 1 Part of a call
## 2 Result excluded, reason: pre-filtering
## 3 Part of a call
## 4 Part of a call
## 5 Part of a call
## 6 Part of a call
##
## [[5]]
## Experiment.ID Library.ID Library.type Gene.ID Anatomical.entity.ID
## 1 GSE36026 SRX147592 single ENSMUSG00000000001 CL:0000057
## 2 GSE36026 SRX147592 single ENSMUSG00000000003 CL:0000057
## 3 GSE36026 SRX147592 single ENSMUSG00000000028 CL:0000057
## 4 GSE36026 SRX147592 single ENSMUSG00000000031 CL:0000057
## 5 GSE36026 SRX147592 single ENSMUSG00000000037 CL:0000057
## 6 GSE36026 SRX147592 single ENSMUSG00000000049 CL:0000057
## Anatomical.entity.name Stage.ID Stage.name Sex Strain Read.count
## 1 fibroblast MmusDv:0000052 8 weeks (mouse) male C57BL/6 5274.8400
## 2 fibroblast MmusDv:0000052 8 weeks (mouse) male C57BL/6 0.0000
## 3 fibroblast MmusDv:0000052 8 weeks (mouse) male C57BL/6 379.9996
## 4 fibroblast MmusDv:0000052 8 weeks (mouse) male C57BL/6 78238.7950
## 5 fibroblast MmusDv:0000052 8 weeks (mouse) male C57BL/6 102.0000
## 6 fibroblast MmusDv:0000052 8 weeks (mouse) male C57BL/6 4.0000
## TPM FPKM Detection.flag Detection.quality
## 1 26.901318 66.348954 present high quality
## 2 0.000000 0.000000 absent high quality
## 3 3.636568 8.969169 present high quality
## 4 965.590380 2381.515721 present high quality
## 5 0.602173 1.485190 present high quality
## 6 0.109412 0.269851 present high quality
## State.in.Bgee
## 1 Part of a call
## 2 Result excluded, reason: pre-filtering
## 3 Part of a call
## 4 Part of a call
## 5 Part of a call
## 6 Part of a call
##
## [[6]]
## Experiment.ID Library.ID Library.type Gene.ID Anatomical.entity.ID
## 1 SRP022612 SRX277661 single ENSMUSG00000000001 UBERON:0000014
## 2 SRP022612 SRX277661 single ENSMUSG00000000003 UBERON:0000014
## 3 SRP022612 SRX277661 single ENSMUSG00000000028 UBERON:0000014
## 4 SRP022612 SRX277661 single ENSMUSG00000000031 UBERON:0000014
## 5 SRP022612 SRX277661 single ENSMUSG00000000037 UBERON:0000014
## 6 SRP022612 SRX277661 single ENSMUSG00000000049 UBERON:0000014
## Anatomical.entity.name Stage.ID Stage.name Sex Strain
## 1 zone of skin MmusDv:0000062 2 month-old stage (mouse) male C57BL/6
## 2 zone of skin MmusDv:0000062 2 month-old stage (mouse) male C57BL/6
## 3 zone of skin MmusDv:0000062 2 month-old stage (mouse) male C57BL/6
## 4 zone of skin MmusDv:0000062 2 month-old stage (mouse) male C57BL/6
## 5 zone of skin MmusDv:0000062 2 month-old stage (mouse) male C57BL/6
## 6 zone of skin MmusDv:0000062 2 month-old stage (mouse) male C57BL/6
## Read.count TPM FPKM Detection.flag Detection.quality
## 1 5956.0000 67.499246 69.611952 present high quality
## 2 0.0000 0.000000 0.000000 absent high quality
## 3 154.3024 3.071474 3.167611 present high quality
## 4 40562.2100 1047.442212 1080.226848 present high quality
## 5 244.0006 3.770360 3.888371 present high quality
## 6 3.0000 0.218505 0.225344 present high quality
## State.in.Bgee
## 1 Part of a call
## 2 Result excluded, reason: pre-filtering
## 3 Part of a call
## 4 Part of a call
## 5 Part of a call
## 6 Part of a call
##
## [[7]]
## Experiment.ID Library.ID Library.type Gene.ID Anatomical.entity.ID
## 1 GSE30352 SRX081914 single ENSMUSG00000000001 UBERON:0000082
## 2 GSE30352 SRX081914 single ENSMUSG00000000003 UBERON:0000082
## 3 GSE30352 SRX081914 single ENSMUSG00000000028 UBERON:0000082
## 4 GSE30352 SRX081914 single ENSMUSG00000000031 UBERON:0000082
## 5 GSE30352 SRX081914 single ENSMUSG00000000037 UBERON:0000082
## 6 GSE30352 SRX081914 single ENSMUSG00000000049 UBERON:0000082
## Anatomical.entity.name Stage.ID Stage.name Sex
## 1 adult mammalian kidney UBERON:0000113 post-juvenile adult stage female
## 2 adult mammalian kidney UBERON:0000113 post-juvenile adult stage female
## 3 adult mammalian kidney UBERON:0000113 post-juvenile adult stage female
## 4 adult mammalian kidney UBERON:0000113 post-juvenile adult stage female
## 5 adult mammalian kidney UBERON:0000113 post-juvenile adult stage female
## 6 adult mammalian kidney UBERON:0000113 post-juvenile adult stage female
## Strain Read.count TPM FPKM Detection.flag Detection.quality
## 1 C57BL/6 2870.00000 39.095102 39.816500 present high quality
## 2 C57BL/6 0.00000 0.000000 0.000000 absent high quality
## 3 C57BL/6 50.99998 1.343029 1.367811 present high quality
## 4 C57BL/6 15.00005 0.682456 0.695049 present high quality
## 5 C57BL/6 11.00000 0.253984 0.258671 present high quality
## 6 C57BL/6 2.00000 0.077435 0.078864 present high quality
## State.in.Bgee
## 1 Part of a call
## 2 Result excluded, reason: pre-filtering
## 3 Part of a call
## 4 Part of a call
## 5 Part of a call
## 6 Part of a call
##
## [[8]]
## Experiment.ID Library.ID Library.type Gene.ID Anatomical.entity.ID
## 1 GSE41637 SRX196275 paired ENSMUSG00000000001 UBERON:0000082
## 2 GSE41637 SRX196275 paired ENSMUSG00000000003 UBERON:0000082
## 3 GSE41637 SRX196275 paired ENSMUSG00000000028 UBERON:0000082
## 4 GSE41637 SRX196275 paired ENSMUSG00000000031 UBERON:0000082
## 5 GSE41637 SRX196275 paired ENSMUSG00000000037 UBERON:0000082
## 6 GSE41637 SRX196275 paired ENSMUSG00000000049 UBERON:0000082
## Anatomical.entity.name Stage.ID Stage.name Sex Strain
## 1 adult mammalian kidney UBERON:0000113 post-juvenile adult stage <NA> C57BL/6
## 2 adult mammalian kidney UBERON:0000113 post-juvenile adult stage <NA> C57BL/6
## 3 adult mammalian kidney UBERON:0000113 post-juvenile adult stage <NA> C57BL/6
## 4 adult mammalian kidney UBERON:0000113 post-juvenile adult stage <NA> C57BL/6
## 5 adult mammalian kidney UBERON:0000113 post-juvenile adult stage <NA> C57BL/6
## 6 adult mammalian kidney UBERON:0000113 post-juvenile adult stage <NA> C57BL/6
## Read.count TPM FPKM Detection.flag Detection.quality
## 1 8990.00000 38.639422 31.498242 present high quality
## 2 0.00000 0.000000 0.000000 absent high quality
## 3 218.00030 1.402498 1.143294 present high quality
## 4 9.00000 0.043174 0.035195 absent high quality
## 5 25.99995 0.123947 0.101040 present high quality
## 6 788.00000 8.513097 6.939742 present high quality
## State.in.Bgee
## 1 Part of a call
## 2 Result excluded, reason: pre-filtering
## 3 Part of a call
## 4 Part of a call
## 5 Part of a call
## 6 Part of a call
##
## [[9]]
## Experiment.ID Library.ID Library.type Gene.ID Anatomical.entity.ID
## 1 GSE30617 ERX012344 paired ENSMUSG00000000001 UBERON:0000948
## 2 GSE30617 ERX012344 paired ENSMUSG00000000003 UBERON:0000948
## 3 GSE30617 ERX012344 paired ENSMUSG00000000028 UBERON:0000948
## 4 GSE30617 ERX012344 paired ENSMUSG00000000031 UBERON:0000948
## 5 GSE30617 ERX012344 paired ENSMUSG00000000037 UBERON:0000948
## 6 GSE30617 ERX012344 paired ENSMUSG00000000049 UBERON:0000948
## Anatomical.entity.name Stage.ID Stage.name Sex Strain
## 1 heart MmusDv:0000052 8 weeks (mouse) NA DBAxC57BL/6J
## 2 heart MmusDv:0000052 8 weeks (mouse) NA DBAxC57BL/6J
## 3 heart MmusDv:0000052 8 weeks (mouse) NA DBAxC57BL/6J
## 4 heart MmusDv:0000052 8 weeks (mouse) NA DBAxC57BL/6J
## 5 heart MmusDv:0000052 8 weeks (mouse) NA DBAxC57BL/6J
## 6 heart MmusDv:0000052 8 weeks (mouse) NA DBAxC57BL/6J
## Read.count TPM FPKM Detection.flag Detection.quality
## 1 1029.00000 14.732067 13.946815 present high quality
## 2 0.00000 0.000000 0.000000 absent high quality
## 3 61.00006 1.938954 1.835604 present high quality
## 4 94.00000 3.661912 3.466724 present high quality
## 5 22.99999 0.212652 0.201317 present high quality
## 6 8.00000 0.669763 0.634063 present high quality
## State.in.Bgee
## 1 Part of a call
## 2 Result excluded, reason: pre-filtering
## 3 Part of a call
## 4 Part of a call
## 5 Part of a call
## 6 Part of a call
The result of the getData()
function is, for each experiment, a data frame with the different samples listed in rows, one after the other. Each row is a gene and the expression levels are displayed as raw read counts, RPKMs (up to Bgee 13.2), TPMs (from Bgee 14.0), or FPKMs (from Bgee 14.0). A detection flag indicates if the gene is significantly expressed above background level of expression.
Note: If microarray data are downloaded, rows corresponding to probesets and log2 of expression intensities are available instead of read counts/RPKMs/TPMs/FPKMs.
Alternatively, you can choose to download data from only one particular RNA-seq experiment from Bgee with the experimentId
parameter:
## Downloading expression data for the experiment GSE30617...
## Saved expression data file in /tmp/Rtmp1QfSwr/Rbuild7bad6aa1d052/BgeeDB/vignettes/Mus_musculus_Bgee_14_0 folder. Now untar file...
## Finished uncompress tar files
## Saving all data in .rds file...
It is sometimes easier to work with data organized as a matrix, where rows represent genes or probesets and columns represent different samples. The formatData()
function reformats the data into an ExpressionSet object including: * An expression data matrix, with genes or probesets as rows, and samples as columns (assayData
slot). The stats
argument allows to choose if the matrix should be filled with read counts, RPKMs (up to Bgee 13.2), FPKMs (from Bgee 14.0), or TPMs (from Bgee 14.0) for RNA-seq data. For microarray data the matrix is filled with log2 expression intensities. * A data frame listing the samples and their anatomical structure and developmental stage annotation (phenoData
slot) * For microarray data, the mapping from probesets to Ensembl genes (featureData
slot)
The callType
argument allows to retain only actively expressed genes or probesets, if set to “present” or “present high quality”. Genes or probesets that are absent in a given sample are given NA
values.
# use only present calls and fill expression matric with FPKM values
gene.expression.mouse.fpkm <- formatData(bgee, data_bgee_mouse_gse30617, callType = "present", stats = "fpkm")
##
## Extracting expression data matrix...
## Keeping only present genes.
##
## Extracting features information...
##
## Extracting samples information...
## ExpressionSet (storageMode: lockedEnvironment)
## assayData: 47729 features, 36 samples
## element names: exprs
## protocolData: none
## phenoData
## sampleNames: ERX012344 ERX012345 ... ERX012379 (36 total)
## varLabels: Library.ID Anatomical.entity.ID ... Stage.name (5 total)
## varMetadata: labelDescription
## featureData
## featureNames: ENSMUSG00000000001 ENSMUSG00000000003 ...
## ENSMUSG00000109578 (47729 total)
## fvarLabels: Gene.ID
## fvarMetadata: labelDescription
## experimentData: use 'experimentData(object)'
## Annotation:
For some documentation on the TopAnat analysis, please refer to our publications, or to the web-tool page (http://bgee.org/?page=top_anat#/).
Similarly to the quantitative data download example above, the first step of a topAnat analysis is to built an object from the Bgee class. For this example, we will focus on zebrafish:
##
## NOTE: You did not specify any data type. The argument dataType will be set to c("rna_seq","affymetrix","est","in_situ") for the next steps.
##
## Querying Bgee to get release information...
##
## NOTE: the file describing Bgee species information for release 14_0 was found in the download directory /tmp/Rtmp1QfSwr/Rbuild7bad6aa1d052/BgeeDB/vignettes. Data will not be redownloaded.
##
## API key built: b00a311a003fc3d62f337a5b6de2ba24813925b4f891bb347c870f1ecdabf1511a273f71980c3d2d7be31a1ee5404279e812282ec3f6d2908117b52e6274801a
Note : We are free to specify any data type of interest using the dataType
argument among rna_seq
, affymetrix
, est
or in_situ
, or even a combination of data types. If nothing is specified, as in the above example, all data types available for the targeted species are used. This equivalent to specifying dataType=c("rna_seq","affymetrix","est","in_situ")
.
The loadTopAnatData()
function loads a mapping from genes to anatomical structures based on calls of expression in anatomical structures. It also loads the structure of the anatomical ontology and the names of anatomical structures.
##
## Building URLs to retrieve organ relationships from Bgee.........
## URL successfully built (https://r.bgee.org/bgee14/?page=r_package&action=get_anat_entity_relations&display_type=tsv&species_list=7955&attr_list=SOURCE_ID&attr_list=TARGET_ID&api_key=b00a311a003fc3d62f337a5b6de2ba24813925b4f891bb347c870f1ecdabf1511a273f71980c3d2d7be31a1ee5404279e812282ec3f6d2908117b52e6274801a&source=BgeeDB_R_package&source_version=2.12.1)
## Submitting URL to Bgee webservice (can be long)
## Got results from Bgee webservice. Files are written in "/tmp/Rtmp1QfSwr/Rbuild7bad6aa1d052/BgeeDB/vignettes/Danio_rerio_Bgee_14_0"
##
## Building URLs to retrieve organ names from Bgee.................
## URL successfully built (https://r.bgee.org/bgee14/?page=r_package&action=get_anat_entities&display_type=tsv&species_list=7955&attr_list=ID&attr_list=NAME&api_key=b00a311a003fc3d62f337a5b6de2ba24813925b4f891bb347c870f1ecdabf1511a273f71980c3d2d7be31a1ee5404279e812282ec3f6d2908117b52e6274801a&source=BgeeDB_R_package&source_version=2.12.1)
## Submitting URL to Bgee webservice (can be long)
## Got results from Bgee webservice. Files are written in "/tmp/Rtmp1QfSwr/Rbuild7bad6aa1d052/BgeeDB/vignettes/Danio_rerio_Bgee_14_0"
##
## Building URLs to retrieve mapping of gene to organs from Bgee...
## URL successfully built (https://r.bgee.org/bgee14/?page=r_package&action=get_expression_calls&display_type=tsv&species_list=7955&attr_list=GENE_ID&attr_list=ANAT_ENTITY_ID&api_key=b00a311a003fc3d62f337a5b6de2ba24813925b4f891bb347c870f1ecdabf1511a273f71980c3d2d7be31a1ee5404279e812282ec3f6d2908117b52e6274801a&source=BgeeDB_R_package&source_version=2.12.1&data_qual=SILVER)
## Submitting URL to Bgee webservice (can be long)
## Got results from Bgee webservice. Files are written in "/tmp/Rtmp1QfSwr/Rbuild7bad6aa1d052/BgeeDB/vignettes/Danio_rerio_Bgee_14_0"
##
## Parsing the results.............................................
##
## Adding BGEE:0 as unique root of all terms of the ontology.......
##
## Done.
The strigency on the quality of expression calls can be changed with the confidence
argument. Finally, if you are interested in expression data coming from a particular developmental stage or a group of stages, please specify the a Uberon stage Id in the stage
argument.
## Loading silver and gold expression calls from affymetrix data made on embryonic samples only
## This is just given as an example, but is not run in this vignette because only few data are returned
bgee <- Bgee$new(species = "Danio_rerio", dataType="affymetrix")
myTopAnatData <- loadTopAnatData(bgee, stage="UBERON:0000068", confidence="silver")
Note 1: As mentioned above, the downloaded data files are stored in a versioned folder that can be set with the pathToData
argument when creating the Bgee class object (default is the working directory). If you query again Bgee with the exact same parameters, these cached files will be read instead of querying the web-service again. It is thus important, if you plan to reuse the same data for multiple parallel topAnat analyses, to plan to make use of these cached files instead of redownloading them for each analysis. The cached files also give the possibility to repeat analyses offline.
Note 2: In releases up to Bgee 13.2 allowed confidence`` values were `low_quality` or or `high_quality`. From Bgee 14.0
confidence``values are
goldor
silver`.
First we need to prepare a list of genes in the foreground and in the background. The input format is the same as the gene list required to build a topGOdata
object in the topGO
package: a vector with background genes as names, and 0 or 1 values depending if a gene is in the foreground or not. In this example we will look at genes with an annotated phenotype related to “pectoral fin” . We use the biomaRt
package to retrieve this list of genes. We expect them to be enriched for expression in male tissues, notably the testes. The background list of genes is set to all genes annotated to at least one Gene Ontology term, allowing to account for biases in which types of genes are more likely to receive Gene Ontology annotation.
# if (!requireNamespace("BiocManager", quietly=TRUE))
# install.packages("BiocManager")
# BiocManager::install("biomaRt")
library(biomaRt)
ensembl <- useMart("ENSEMBL_MART_ENSEMBL", dataset="drerio_gene_ensembl", host="mar2016.archive.ensembl.org")
# get the mapping of Ensembl genes to phenotypes. It will corresponds to the background genes
universe <- getBM(filters=c("phenotype_source"), value=c("ZFIN"), attributes=c("ensembl_gene_id","phenotype_description"), mart=ensembl)
# select phenotypes related to pectoral fin
phenotypes <- grep("pectoral fin", unique(universe$phenotype_description), value=T)
# Foreground genes are those with an annotated phenotype related to "pectoral fin"
myGenes <- unique(universe$ensembl_gene_id[universe$phenotype_description %in% phenotypes])
# Prepare the gene list vector
geneList <- factor(as.integer(unique(universe$ensembl_gene_id) %in% myGenes))
names(geneList) <- unique(universe$ensembl_gene_id)
summary(geneList)
# Prepare the topGO object
myTopAnatObject <- topAnat(myTopAnatData, geneList)
The above code using the biomaRt
package is not executed in this vignette to prevent building issues of our package in case of biomaRt downtime. Instead we use a geneList
object saved in the data/
folder that we built using pre-downloaded data.
##
## Checking topAnatData object.............
##
## Checking gene list......................
##
## Building most specific Ontology terms... ( 1173 Ontology terms found. )
##
## Building DAG topology................... ( 2035 Ontology terms and 3882 relations. )
##
## Annotating nodes (Can be long).......... ( 3005 genes annotated to the Ontology terms. )
Warning: This can be long, especially if the gene list is large, since the Uberon anatomical ontology is large and expression calls will be propagated through the whole ontology (e.g., expression in the forebrain will also be counted as expression in parent structures such as the brain, nervous system, etc). Consider running a script in batch mode if you have multiple analyses to do.
For this step, see the vignette of the topGO
package for more details, as you have to directly use the tests implemented in the topGO
package, as shown in this example:
##
## -- Weight Algorithm --
##
## The algorithm is scoring 1005 nontrivial nodes
## parameters:
## test statistic: fisher : ratio
##
## Level 27: 1 nodes to be scored.
##
## Level 26: 1 nodes to be scored.
##
## Level 25: 1 nodes to be scored.
##
## Level 24: 4 nodes to be scored.
##
## Level 23: 4 nodes to be scored.
##
## Level 22: 5 nodes to be scored.
##
## Level 21: 4 nodes to be scored.
##
## Level 20: 8 nodes to be scored.
##
## Level 19: 23 nodes to be scored.
##
## Level 18: 23 nodes to be scored.
##
## Level 17: 27 nodes to be scored.
##
## Level 16: 39 nodes to be scored.
##
## Level 15: 63 nodes to be scored.
##
## Level 14: 63 nodes to be scored.
##
## Level 13: 74 nodes to be scored.
##
## Level 12: 95 nodes to be scored.
##
## Level 11: 119 nodes to be scored.
##
## Level 10: 115 nodes to be scored.
##
## Level 9: 92 nodes to be scored.
##
## Level 8: 75 nodes to be scored.
##
## Level 7: 67 nodes to be scored.
##
## Level 6: 43 nodes to be scored.
##
## Level 5: 27 nodes to be scored.
##
## Level 4: 21 nodes to be scored.
##
## Level 3: 6 nodes to be scored.
##
## Level 2: 4 nodes to be scored.
##
## Level 1: 1 nodes to be scored.
Warning: This can be long because of the size of the ontology. Consider running scripts in batch mode if you have multiple analyses to do.
The makeTable
function allows to filter and format the test results, and calculate FDR values.
# Display results sigificant at a 1% FDR threshold
tableOver <- makeTable(myTopAnatData, myTopAnatObject, results, cutoff = 0.01)
##
## Building the results table for the 27 significant terms at FDR threshold of 0.01...
## Ordering results by pValue column in increasing order...
## Done
## organId organName annotated significant
## UBERON:0000151 UBERON:0000151 pectoral fin 439 79
## UBERON:0004357 UBERON:0004357 paired limb/fin bud 198 48
## UBERON:2000040 UBERON:2000040 median fin fold 59 20
## UBERON:0003051 UBERON:0003051 ear vesicle 391 49
## UBERON:0005729 UBERON:0005729 pectoral appendage field 20 11
## UBERON:0004376 UBERON:0004376 fin bone 34 12
## expected foldEnrichment pValue FDR
## UBERON:0000151 21.48 3.677840 1.358300e-27 1.468322e-24
## UBERON:0004357 9.69 4.953560 5.187251e-23 2.803709e-20
## UBERON:2000040 2.89 6.920415 9.370662e-13 3.376562e-10
## UBERON:0003051 19.13 2.561422 5.501734e-11 1.486844e-08
## UBERON:0005729 0.98 11.224490 3.052286e-10 6.599043e-08
## UBERON:0004376 1.66 7.228916 2.603199e-08 4.690096e-06
At the time of building this vignette (June 2018), there was 27 significant anatomical structures. The first term is “pectoral fin”, and the second “paired limb/fin bud”. Other terms in the list, especially those with high enrichment folds, are clearly related to pectoral fins or substructures of fins. This analysis shows that genes with phenotypic effects on pectoral fins are specifically expressed in or next to these structures
By default results are sorted by p-value, but this can be changed with the ordering
parameter by specifying which column should be used to order the results (preceded by a “-” sign to indicate that ordering should be made in decreasing order). For example, it is often convenient to sort significant structures by decreasing enrichment fold, using ordering = -6
. The full table of results can be obtained using cutoff = 1
.
Of note, it is possible to retrieve for a particular tissue the significant genes that were mapped to it.
# In order to retrieve significant genes mapped to the term " paired limb/fin bud"
term <- "UBERON:0004357"
termStat(myTopAnatObject, term)
## Annotated Significant Expected
## UBERON:0004357 198 48 9.69
## $`UBERON:0004357`
## [1] "ENSDARG00000001057" "ENSDARG00000001785" "ENSDARG00000002445"
## [4] "ENSDARG00000002795" "ENSDARG00000002933" "ENSDARG00000002952"
## [7] "ENSDARG00000003293" "ENSDARG00000003399" "ENSDARG00000004954"
## [10] "ENSDARG00000005479" "ENSDARG00000005645" "ENSDARG00000005762"
## [13] "ENSDARG00000006120" "ENSDARG00000006514" "ENSDARG00000007407"
## [16] "ENSDARG00000007438" "ENSDARG00000007641" "ENSDARG00000008305"
## [19] "ENSDARG00000008886" "ENSDARG00000009534" "ENSDARG00000011027"
## [22] "ENSDARG00000011407" "ENSDARG00000011618" "ENSDARG00000012078"
## [25] "ENSDARG00000012422" "ENSDARG00000012824" "ENSDARG00000013409"
## [28] "ENSDARG00000013853" "ENSDARG00000013881" "ENSDARG00000014091"
## [31] "ENSDARG00000014259" "ENSDARG00000014329" "ENSDARG00000014626"
## [34] "ENSDARG00000014634" "ENSDARG00000014796" "ENSDARG00000015554"
## [37] "ENSDARG00000015674" "ENSDARG00000016022" "ENSDARG00000016454"
## [40] "ENSDARG00000016858" "ENSDARG00000017219" "ENSDARG00000018025"
## [43] "ENSDARG00000018426" "ENSDARG00000018460" "ENSDARG00000018492"
## [46] "ENSDARG00000018693" "ENSDARG00000018902" "ENSDARG00000019260"
## [49] "ENSDARG00000019353" "ENSDARG00000019579" "ENSDARG00000019838"
## [52] "ENSDARG00000019995" "ENSDARG00000020527" "ENSDARG00000021389"
## [55] "ENSDARG00000021442" "ENSDARG00000021938" "ENSDARG00000022280"
## [58] "ENSDARG00000024561" "ENSDARG00000024894" "ENSDARG00000025081"
## [61] "ENSDARG00000025147" "ENSDARG00000025375" "ENSDARG00000025641"
## [64] "ENSDARG00000025891" "ENSDARG00000028071" "ENSDARG00000029764"
## [67] "ENSDARG00000030110" "ENSDARG00000030756" "ENSDARG00000030932"
## [70] "ENSDARG00000031222" "ENSDARG00000031809" "ENSDARG00000031894"
## [73] "ENSDARG00000031952" "ENSDARG00000033327" "ENSDARG00000033616"
## [76] "ENSDARG00000034375" "ENSDARG00000035559" "ENSDARG00000035648"
## [79] "ENSDARG00000036254" "ENSDARG00000036558" "ENSDARG00000037109"
## [82] "ENSDARG00000037556" "ENSDARG00000037675" "ENSDARG00000037677"
## [85] "ENSDARG00000038006" "ENSDARG00000038428" "ENSDARG00000038672"
## [88] "ENSDARG00000038879" "ENSDARG00000038990" "ENSDARG00000038991"
## [91] "ENSDARG00000040534" "ENSDARG00000040764" "ENSDARG00000041430"
## [94] "ENSDARG00000041609" "ENSDARG00000041706" "ENSDARG00000041799"
## [97] "ENSDARG00000042233" "ENSDARG00000042296" "ENSDARG00000043130"
## [100] "ENSDARG00000043559" "ENSDARG00000043923" "ENSDARG00000044511"
## [103] "ENSDARG00000044574" "ENSDARG00000052131" "ENSDARG00000052139"
## [106] "ENSDARG00000052344" "ENSDARG00000052494" "ENSDARG00000052652"
## [109] "ENSDARG00000053479" "ENSDARG00000053493" "ENSDARG00000054026"
## [112] "ENSDARG00000054030" "ENSDARG00000054619" "ENSDARG00000055026"
## [115] "ENSDARG00000055027" "ENSDARG00000055381" "ENSDARG00000055398"
## [118] "ENSDARG00000056995" "ENSDARG00000057830" "ENSDARG00000058115"
## [121] "ENSDARG00000058543" "ENSDARG00000058822" "ENSDARG00000059073"
## [124] "ENSDARG00000059233" "ENSDARG00000059276" "ENSDARG00000059279"
## [127] "ENSDARG00000059437" "ENSDARG00000060397" "ENSDARG00000060808"
## [130] "ENSDARG00000061328" "ENSDARG00000061345" "ENSDARG00000061600"
## [133] "ENSDARG00000062824" "ENSDARG00000068365" "ENSDARG00000068567"
## [136] "ENSDARG00000068732" "ENSDARG00000069105" "ENSDARG00000069473"
## [139] "ENSDARG00000069763" "ENSDARG00000069922" "ENSDARG00000070069"
## [142] "ENSDARG00000070670" "ENSDARG00000071336" "ENSDARG00000071560"
## [145] "ENSDARG00000071699" "ENSDARG00000073814" "ENSDARG00000074378"
## [148] "ENSDARG00000074597" "ENSDARG00000075713" "ENSDARG00000076010"
## [151] "ENSDARG00000076554" "ENSDARG00000076566" "ENSDARG00000076856"
## [154] "ENSDARG00000077121" "ENSDARG00000077353" "ENSDARG00000077473"
## [157] "ENSDARG00000078696" "ENSDARG00000078784" "ENSDARG00000078864"
## [160] "ENSDARG00000079027" "ENSDARG00000079570" "ENSDARG00000079922"
## [163] "ENSDARG00000079964" "ENSDARG00000080453" "ENSDARG00000087196"
## [166] "ENSDARG00000089805" "ENSDARG00000090820" "ENSDARG00000091161"
## [169] "ENSDARG00000092136" "ENSDARG00000092809" "ENSDARG00000095743"
## [172] "ENSDARG00000095859" "ENSDARG00000098359" "ENSDARG00000099088"
## [175] "ENSDARG00000099175" "ENSDARG00000099458" "ENSDARG00000099996"
## [178] "ENSDARG00000100236" "ENSDARG00000100312" "ENSDARG00000100725"
## [181] "ENSDARG00000101076" "ENSDARG00000101199" "ENSDARG00000101209"
## [184] "ENSDARG00000101244" "ENSDARG00000101701" "ENSDARG00000101766"
## [187] "ENSDARG00000101831" "ENSDARG00000102470" "ENSDARG00000102750"
## [190] "ENSDARG00000102824" "ENSDARG00000102995" "ENSDARG00000103432"
## [193] "ENSDARG00000103515" "ENSDARG00000103754" "ENSDARG00000104353"
## [196] "ENSDARG00000104815" "ENSDARG00000105230" "ENSDARG00000105357"
# 48 significant genes mapped to this term for Bgee 14.0 and Ensembl 84
annotated <- genesInTerm(myTopAnatObject, term)[["UBERON:0004357"]]
annotated[annotated %in% sigGenes(myTopAnatObject)]
## [1] "ENSDARG00000002445" "ENSDARG00000002952" "ENSDARG00000003293"
## [4] "ENSDARG00000008305" "ENSDARG00000011407" "ENSDARG00000012824"
## [7] "ENSDARG00000013853" "ENSDARG00000013881" "ENSDARG00000014091"
## [10] "ENSDARG00000018426" "ENSDARG00000018693" "ENSDARG00000018902"
## [13] "ENSDARG00000019260" "ENSDARG00000019353" "ENSDARG00000019838"
## [16] "ENSDARG00000021389" "ENSDARG00000024894" "ENSDARG00000028071"
## [19] "ENSDARG00000030932" "ENSDARG00000031894" "ENSDARG00000036254"
## [22] "ENSDARG00000037677" "ENSDARG00000038006" "ENSDARG00000038672"
## [25] "ENSDARG00000041799" "ENSDARG00000042233" "ENSDARG00000042296"
## [28] "ENSDARG00000043559" "ENSDARG00000043923" "ENSDARG00000053493"
## [31] "ENSDARG00000054619" "ENSDARG00000058543" "ENSDARG00000060397"
## [34] "ENSDARG00000068567" "ENSDARG00000069473" "ENSDARG00000071336"
## [37] "ENSDARG00000073814" "ENSDARG00000076856" "ENSDARG00000077121"
## [40] "ENSDARG00000077353" "ENSDARG00000079027" "ENSDARG00000079570"
## [43] "ENSDARG00000087196" "ENSDARG00000095859" "ENSDARG00000099088"
## [46] "ENSDARG00000100312" "ENSDARG00000101831" "ENSDARG00000105357"
Warning: it is debated whether FDR correction is appropriate on enrichment test results, since tests on different terms of the ontologies are not independent. A nice discussion can be found in the vignette of the topGO
package.