This document offers an introduction and overview of motifbreakR, which allows the biologist to judge whether the sequence surrounding a polymorphism or mutation is a good match to known transcription factor binding sites, and how much information is gained or lost in one allele of the polymorphism relative to another or mutation vs. wildtype. motifbreakR is flexible, giving a choice of algorithms for interrogation of genomes with motifs from public sources that users can choose from; these are 1) a weighted-sum, 2) log-probabilities, and 3) relative entropy. motifbreakR can predict effects for novel or previously described variants in public databases, making it suitable for tasks beyond the scope of its original design. Lastly, it can be used to interrogate any genome curated within Bioconductor.
As of version 2.0 motifbreakR is also able to perform it’s analysis on indels, small insertions or deletions.
motifbreakR works with position probability matrices (PPM). PPM are derived as the fractional occurrence of nucleotides A,C,G, and T at each position of a position frequency matrix (PFM). PFM are simply the tally of each nucleotide at each position across a set of aligned sequences. With a PPM, one can generate probabilities based on the genome, or more practically, create any number of position specific scoring matrices (PSSM) based on the principle that the PPM contains information about the likelihood of observing a particular nucleotide at a particular position of a true transcription factor binding site.
This guide includes a brief overview of the processing flow, an example focusing more in depth on the practical aspect of using motifbreakR, and finally a detailed section on the scoring methods employed by the package.
motifbreakR may be used to interrogate SNPs or SNVs for their potential effect on transcription factor binding by examining how the two alleles of the variant effect the binding score of a motif. The basic process is outlined in the figure below.
This straightforward process allows the interrogation of SNPs and SNVs in the context of the different species represented by BSgenome packages (at least 22 different species) and allows the use of the full MotifDb data set that includes over 4200 motifs across 8 studies and 22 organisms that we have supplemented with over 2800 additional motifs across four additional studies in Humans see data(encodemotif)
1, data(factorbook)
2, data(hocomoco)
3 and data(homer)
4 for the additional studies that we have included.
Practically motifbreakR has involves three phases.
MotifList
, and your preferred scoring method.This section offers an example of how to use motifbreakR to identify potentially disrupted transcription factor binding sites due to 701 SNPs output from a FunciSNP analysis of Prostate Cancer (PCa) genome wide association studies (GWAS) risk loci. The SNPs are included in this package here:
library(motifbreakR)
pca.snps.file <- system.file("extdata", "pca.enhancer.snps", package = "motifbreakR")
pca.snps <- as.character(read.table(pca.snps.file)[,1])
The simplest form of a motifbreakR analysis is summarized as follows:
variants <- snps.from.rsid(rsid = pca.snps,
dbSNP = SNPlocs.Hsapiens.dbSNP142.GRCh37,
search.genome = BSgenome.Hsapiens.UCSC.hg19)
motifbreakr.results <- motifbreakR(snpList = variants, pwmList = MotifDb, threshold = 0.9)
plotMB(results = motifbreakr.results, rsid = "rs7837328", effect = "strong")
Lets look at these steps more closely and see how we can customize our analysis.
Variants can be input either as a list of rsIDs or as a .bed file. The main factor determining which you will use is if your variants have rsIDs that are included in one of the Bioconductor SNPlocs
packages. The present selection is seen here:
library(BSgenome)
available.SNPs()
## [1] "SNPlocs.Hsapiens.dbSNP.20101109" "SNPlocs.Hsapiens.dbSNP.20120608"
## [3] "SNPlocs.Hsapiens.dbSNP141.GRCh38" "SNPlocs.Hsapiens.dbSNP142.GRCh37"
## [5] "SNPlocs.Hsapiens.dbSNP144.GRCh37" "SNPlocs.Hsapiens.dbSNP144.GRCh38"
## [7] "SNPlocs.Hsapiens.dbSNP149.GRCh38" "SNPlocs.Hsapiens.dbSNP150.GRCh38"
## [9] "SNPlocs.Hsapiens.dbSNP151.GRCh38" "XtraSNPlocs.Hsapiens.dbSNP141.GRCh38"
## [11] "XtraSNPlocs.Hsapiens.dbSNP144.GRCh37" "XtraSNPlocs.Hsapiens.dbSNP144.GRCh38"
For cases where your rsIDs are not available in a SNPlocs package, or you have novel variants that are not cataloged at all, variants may be entered in BED format as seen below:
snps.file <- system.file("extdata", "snps.bed", package = "motifbreakR")
read.delim(snps.file, header = FALSE)
## V1 V2 V3 V4 V5 V6
## 1 chr2 12581137 12581138 rs10170896 0 +
## 2 chr2 12594017 12594018 chr2:12594018:G:A 0 +
## 3 chr3 192388677 192388678 rs13068005 0 +
## 4 chr4 122361479 122361480 rs12644995 0 +
## 5 chr6 44503245 44503246 chr6:44503246:A:T 0 +
## 6 chr6 44503247 44503248 chr6:44503248:G:C 0 +
## 7 chr6 85232897 85232898 rs4510639 0 +
## 8 chr6 44501872 44501873 rs932680 0 +
Our requirements for the BED file are that it must include chromosome
, start
, end
, name
, score
and strand
fields – where the name field is required to be in one of two formats, either an rsID that is present in a SNPlocs package, or in the form chromosome:position:referenceAllele:alternateAllele
e.g., chr2:12594018:G:A
. It is also essential that the fields are TAB separated, not a mixture of tabs and spaces.
More to the point here are the two methods for reading in the variants.
We use the SNPlocs.Hsapiens.dbSNP142.GRCh37 which is the SNP locations and alleles defined in dbSNP142 as a source for looking up our rsIDs and BSgenome.Hsapiens.UCSC.hg19 which holds the reference sequence for UCSC genome build hg19. Additional SNPlocs packages are availble from Bioconductor.
library(SNPlocs.Hsapiens.dbSNP142.GRCh37) # dbSNP142 in hg19
library(BSgenome.Hsapiens.UCSC.hg19) # hg19 genome
head(pca.snps)
## [1] "rs1551515" "rs1551513" "rs17762938" "rs4473999" "rs7823297" "rs9656964"
snps.mb <- snps.from.rsid(rsid = pca.snps,
dbSNP = SNPlocs.Hsapiens.dbSNP142.GRCh37,
search.genome = BSgenome.Hsapiens.UCSC.hg19)
## Warning in rowids2rowidx(user_rowids, ids, x_rowids, ifnotfound): SNP ids not found: rs78914317, rs75425437, rs114099824, rs79509278, rs74738513
##
## They were dropped.
snps.mb
## GRanges object with 700 ranges and 3 metadata columns:
## seqnames ranges strand | SNP_id REF ALT
## <Rle> <IRanges> <Rle> | <character> <DNAStringSet> <DNAStringSet>
## rs10007915 chr4 106065308 * | rs10007915 C G
## rs10015716 chr4 95548550 * | rs10015716 G A
## rs10034824 chr4 95524838 * | rs10034824 G T
## rs10056823 chr5 115609454 * | rs10056823 C G
## rs1006140 chr19 38778915 * | rs1006140 A G
## ... ... ... ... . ... ... ...
## rs9901746 chr17 36103149 * | rs9901746 G A
## rs9908087 chr17 69106937 * | rs9908087 T G
## rs991429 chr17 69109773 * | rs991429 G A
## rs9973650 chr2 238380266 * | rs9973650 G A
## rs998071 chr4 95591976 * | rs998071 C G
## -------
## seqinfo: 298 sequences (2 circular) from hg19 genome
A far greater variety of variants may be read into motifbreakR via the snps.from.file
function. In fact motifbreakR will work with any organism present as a Bioconductor BSgenome package. This includes 76 genomes representing 22 species.
library(BSgenome)
genomes <- available.genomes()
length(genomes)
## [1] 98
genomes
## [1] "BSgenome.Alyrata.JGI.v1" "BSgenome.Amellifera.BeeBase.assembly4"
## [3] "BSgenome.Amellifera.UCSC.apiMel2" "BSgenome.Amellifera.UCSC.apiMel2.masked"
## [5] "BSgenome.Aofficinalis.NCBI.V1" "BSgenome.Athaliana.TAIR.04232008"
## [7] "BSgenome.Athaliana.TAIR.TAIR9" "BSgenome.Btaurus.UCSC.bosTau3"
## [9] "BSgenome.Btaurus.UCSC.bosTau3.masked" "BSgenome.Btaurus.UCSC.bosTau4"
## [11] "BSgenome.Btaurus.UCSC.bosTau4.masked" "BSgenome.Btaurus.UCSC.bosTau6"
## [13] "BSgenome.Btaurus.UCSC.bosTau6.masked" "BSgenome.Btaurus.UCSC.bosTau8"
## [15] "BSgenome.Btaurus.UCSC.bosTau9" "BSgenome.Carietinum.NCBI.v1"
## [17] "BSgenome.Celegans.UCSC.ce10" "BSgenome.Celegans.UCSC.ce11"
## [19] "BSgenome.Celegans.UCSC.ce2" "BSgenome.Celegans.UCSC.ce6"
## [21] "BSgenome.Cfamiliaris.UCSC.canFam2" "BSgenome.Cfamiliaris.UCSC.canFam2.masked"
## [23] "BSgenome.Cfamiliaris.UCSC.canFam3" "BSgenome.Cfamiliaris.UCSC.canFam3.masked"
## [25] "BSgenome.Cjacchus.UCSC.calJac3" "BSgenome.Dmelanogaster.UCSC.dm2"
## [27] "BSgenome.Dmelanogaster.UCSC.dm2.masked" "BSgenome.Dmelanogaster.UCSC.dm3"
## [29] "BSgenome.Dmelanogaster.UCSC.dm3.masked" "BSgenome.Dmelanogaster.UCSC.dm6"
## [31] "BSgenome.Drerio.UCSC.danRer10" "BSgenome.Drerio.UCSC.danRer11"
## [33] "BSgenome.Drerio.UCSC.danRer5" "BSgenome.Drerio.UCSC.danRer5.masked"
## [35] "BSgenome.Drerio.UCSC.danRer6" "BSgenome.Drerio.UCSC.danRer6.masked"
## [37] "BSgenome.Drerio.UCSC.danRer7" "BSgenome.Drerio.UCSC.danRer7.masked"
## [39] "BSgenome.Dvirilis.Ensembl.dvircaf1" "BSgenome.Ecoli.NCBI.20080805"
## [41] "BSgenome.Gaculeatus.UCSC.gasAcu1" "BSgenome.Gaculeatus.UCSC.gasAcu1.masked"
## [43] "BSgenome.Ggallus.UCSC.galGal3" "BSgenome.Ggallus.UCSC.galGal3.masked"
## [45] "BSgenome.Ggallus.UCSC.galGal4" "BSgenome.Ggallus.UCSC.galGal4.masked"
## [47] "BSgenome.Ggallus.UCSC.galGal5" "BSgenome.Ggallus.UCSC.galGal6"
## [49] "BSgenome.Hsapiens.1000genomes.hs37d5" "BSgenome.Hsapiens.NCBI.GRCh38"
## [51] "BSgenome.Hsapiens.UCSC.hg17" "BSgenome.Hsapiens.UCSC.hg17.masked"
## [53] "BSgenome.Hsapiens.UCSC.hg18" "BSgenome.Hsapiens.UCSC.hg18.masked"
## [55] "BSgenome.Hsapiens.UCSC.hg19" "BSgenome.Hsapiens.UCSC.hg19.masked"
## [57] "BSgenome.Hsapiens.UCSC.hg38" "BSgenome.Hsapiens.UCSC.hg38.masked"
## [59] "BSgenome.Mdomestica.UCSC.monDom5" "BSgenome.Mfascicularis.NCBI.5.0"
## [61] "BSgenome.Mfuro.UCSC.musFur1" "BSgenome.Mmulatta.UCSC.rheMac10"
## [63] "BSgenome.Mmulatta.UCSC.rheMac2" "BSgenome.Mmulatta.UCSC.rheMac2.masked"
## [65] "BSgenome.Mmulatta.UCSC.rheMac3" "BSgenome.Mmulatta.UCSC.rheMac3.masked"
## [67] "BSgenome.Mmulatta.UCSC.rheMac8" "BSgenome.Mmusculus.UCSC.mm10"
## [69] "BSgenome.Mmusculus.UCSC.mm10.masked" "BSgenome.Mmusculus.UCSC.mm8"
## [71] "BSgenome.Mmusculus.UCSC.mm8.masked" "BSgenome.Mmusculus.UCSC.mm9"
## [73] "BSgenome.Mmusculus.UCSC.mm9.masked" "BSgenome.Osativa.MSU.MSU7"
## [75] "BSgenome.Ptroglodytes.UCSC.panTro2" "BSgenome.Ptroglodytes.UCSC.panTro2.masked"
## [77] "BSgenome.Ptroglodytes.UCSC.panTro3" "BSgenome.Ptroglodytes.UCSC.panTro3.masked"
## [79] "BSgenome.Ptroglodytes.UCSC.panTro5" "BSgenome.Ptroglodytes.UCSC.panTro6"
## [81] "BSgenome.Rnorvegicus.UCSC.rn4" "BSgenome.Rnorvegicus.UCSC.rn4.masked"
## [83] "BSgenome.Rnorvegicus.UCSC.rn5" "BSgenome.Rnorvegicus.UCSC.rn5.masked"
## [85] "BSgenome.Rnorvegicus.UCSC.rn6" "BSgenome.Scerevisiae.UCSC.sacCer1"
## [87] "BSgenome.Scerevisiae.UCSC.sacCer2" "BSgenome.Scerevisiae.UCSC.sacCer3"
## [89] "BSgenome.Sscrofa.UCSC.susScr11" "BSgenome.Sscrofa.UCSC.susScr3"
## [91] "BSgenome.Sscrofa.UCSC.susScr3.masked" "BSgenome.Tgondii.ToxoDB.7.0"
## [93] "BSgenome.Tguttata.UCSC.taeGut1" "BSgenome.Tguttata.UCSC.taeGut1.masked"
## [95] "BSgenome.Tguttata.UCSC.taeGut2" "BSgenome.Vvinifera.URGI.IGGP12Xv0"
## [97] "BSgenome.Vvinifera.URGI.IGGP12Xv2" "BSgenome.Vvinifera.URGI.IGGP8X"
Here we examine two possibilities. In one case we have a mixture of rsIDs and our naming scheme that allows for arbitrary variants. Second we have a list of variants for the zebrafish Danio rerio that does not have a SNPlocs
package, but does have it’s genome present among the availible.genomes()
.
snps.bed.file <- system.file("extdata", "snps.bed", package = "motifbreakR")
# see the contents
read.table(snps.bed.file, header = FALSE)
## V1 V2 V3 V4 V5 V6
## 1 chr2 12581137 12581138 rs10170896 0 +
## 2 chr2 12594017 12594018 chr2:12594018:G:A 0 +
## 3 chr3 192388677 192388678 rs13068005 0 +
## 4 chr4 122361479 122361480 rs12644995 0 +
## 5 chr6 44503245 44503246 chr6:44503246:A:T 0 +
## 6 chr6 44503247 44503248 chr6:44503248:G:C 0 +
## 7 chr6 85232897 85232898 rs4510639 0 +
## 8 chr6 44501872 44501873 rs932680 0 +
Seeing as we have some SNPs listed by their rsIDs we can query those by including a SNPlocs object as an argument to snps.from.file
library(SNPlocs.Hsapiens.dbSNP142.GRCh37)
#import the BED file
snps.mb.frombed <- snps.from.file(file = snps.bed.file,
dbSNP = SNPlocs.Hsapiens.dbSNP142.GRCh37,
search.genome = BSgenome.Hsapiens.UCSC.hg19,
format = "bed")
snps.mb.frombed
## Warning message:
## In snps.from.file(file = snps.bed.file, dbSNP = SNPlocs.Hsapiens.dbSNP142.GRCh37:
## 7601289 was found as a match for chr2:12594018:G:A; using entry from dbSNP
## GRanges object with 8 ranges and 4 metadata columns:
## seqnames ranges strand | SNP_id alleles_as_ambig REF
## <Rle> <IRanges> <Rle> | <character> <DNAStringSet> <DNAStringSet>
## rs10170896 chr2 12581138 + | rs10170896 R G
## rs12644995 chr4 122361480 + | rs12644995 M C
## rs13068005 chr3 192388678 + | rs13068005 R G
## rs4510639 chr6 85232898 + | rs4510639 Y C
## rs932680 chr6 44501873 + | rs932680 K G
## 7601289 chr2 12594018 + | 7601289 R G
## chr6:44503246:A:T chr6 44503246 + | chr6:44503246:A:T W A
## chr6:44503248:G:C chr6 44503248 + | chr6:44503248:G:C S G
## ALT
## <DNAStringSet>
## rs10170896 A
## rs12644995 A
## rs13068005 A
## rs4510639 T
## rs932680 T
## 7601289 A
## chr6:44503246:A:T T
## chr6:44503248:G:C C
## -------
## seqinfo: 93 sequences (1 circular) from hg19 genome
We see also that one of our custom variants chr2:12594018:G:A
was actually already included in dbSNP, and was therefor annotated in the output as rs7601289
If our BED file includes no rsIDs, then we may omit the dbSNP
argument from the function. This example uses variants from Danio rerio
library(BSgenome.Drerio.UCSC.danRer7)
snps.bedfile.nors <- system.file("extdata", "danRer.bed", package = "motifbreakR")
read.table(snps.bedfile.nors, header = FALSE)
## V1 V2 V3 V4 V5 V6
## 1 chr18 13030932 13030933 chr18:13030933:G:A 0 +
## 2 chr18 30445455 30445456 chr18:30445456:T:A 0 +
## 3 chr5 22065023 22065024 chr5:22065024:A:T 0 +
## 4 chr14 36140941 36140942 chr14:36140942:T:A 0 +
## 5 chr3 16701576 16701577 chr3:16701577:T:A 0 +
## 6 chr14 20887995 20887996 chr14:20887996:G:A 0 +
## 7 chr7 25195449 25195450 chr7:25195450:G:T 0 +
## 8 chr2 59181852 59181853 chr2:59181853:A:G 0 +
## 9 chr3 58162674 58162675 chr3:58162675:C:T 0 +
## 10 chr22 18708824 18708825 chr22:18708825:T:A 0 +
snps.mb.frombed <- snps.from.file(file = snps.bedfile.nors,
search.genome = BSgenome.Drerio.UCSC.danRer7,
format = "bed")
snps.mb.frombed
## GRanges object with 10 ranges and 3 metadata columns:
## seqnames ranges strand | SNP_id REF ALT
## <Rle> <IRanges> <Rle> | <character> <DNAStringSet> <DNAStringSet>
## chr18:13030933:G:A chr18 13030933 * | chr18:13030933:G:A G A
## chr18:30445456:T:A chr18 30445456 * | chr18:30445456:T:A T A
## chr5:22065024:A:T chr5 22065024 * | chr5:22065024:A:T A T
## chr14:36140942:T:A chr14 36140942 * | chr14:36140942:T:A T A
## chr3:16701577:T:A chr3 16701577 * | chr3:16701577:T:A T A
## chr14:20887996:G:A chr14 20887996 * | chr14:20887996:G:A G A
## chr7:25195450:G:T chr7 25195450 * | chr7:25195450:G:T G T
## chr2:59181853:A:G chr2 59181853 * | chr2:59181853:A:G A G
## chr3:58162675:C:T chr3 58162675 * | chr3:58162675:C:T C T
## chr22:18708825:T:A chr22 18708825 * | chr22:18708825:T:A T A
## -------
## seqinfo: 26 sequences (1 circular) from danRer7 genome
snps.from.file
also can take as input a vcf file with SNVs, by using format = "vcf"
.
As of version 2.0 motifbreakR is able to parse and analyse indels as well as SNVs. The function variants.from.file()
allows the import of indels and SNVs simultaneously.
snps.indel.vcf <- system.file("extdata", "chek2.vcf.gz", package = "motifbreakR")
snps.indel <- variants.from.file(file = snps.indel.vcf,
search.genome = BSgenome.Hsapiens.UCSC.hg19,
format = "vcf")
snps.indel
## GRanges object with 1456 ranges and 3 metadata columns:
## seqnames ranges strand | SNP_id REF ALT
## <Rle> <IRanges> <Rle> | <character> <DNAStringSet> <DNAStringSet>
## rs541513166 chr22 29083808 * | rs541513166 T TA
## rs540410451 chr22 29083826 * | rs540410451 G A
## rs562206743 chr22 29083843 * | rs562206743 A G
## rs529320954 chr22 29083856 * | rs529320954 A G
## rs544216926 chr22 29083913 * | rs544216926 C T
## ... ... ... ... . ... ... ...
## rs539227672 chr22 29137758 * | rs539227672 G A
## rs554107994 chr22 29137761 * | rs554107994 T G
## rs566344661 chr22 29137770 * | rs566344661 C G
## rs536566373 chr22 29137782 * | rs536566373 A G
## rs142541707 chr22 29137790 * | rs142541707 C A
## -------
## seqinfo: 298 sequences (2 circular) from hg19 genome
We can filter to specifically see the indels like this:
snps.indel[nchar(snps.indel$REF) > 1 | nchar(snps.indel$ALT) > 1]
## GRanges object with 66 ranges and 3 metadata columns:
## seqnames ranges strand | SNP_id REF ALT
## <Rle> <IRanges> <Rle> | <character> <DNAStringSet> <DNAStringSet>
## rs541513166 chr22 29083808 * | rs541513166 T TA
## rs552933761 chr22 29086616-29086617 * | rs552933761 CA C
## rs61611714 chr22 29086940-29086941 * | rs61611714 TG T
## rs541631272 chr22 29087474-29087478 * | rs541631272 GAAAT G
## rs537685613 chr22 29089333 * | rs537685613 A AT
## ... ... ... ... . ... ... ...
## rs543703620 chr22 29133462-29133463 * | rs543703620 CT C
## rs113960351 chr22 29135358 * | rs113960351 C CT
## rs17882761 chr22 29136187 * | rs17882761 C CA
## rs547061967 chr22 29136972-29136973 * | rs547061967 CG C
## rs199585274 chr22 29137694-29137695 * | rs199585274 CA C
## -------
## seqinfo: 298 sequences (2 circular) from hg19 genome
Now that we have our data in the required format, we may continue to the task at hand, and determine which variants modify potential transcription factor binding. An important element of this task is identifying a set of transcription factor binding motifs that we wish to query. Fortunately MotifDb includes a large selection of motifs across multiple species that we can see here:
library(MotifDb)
MotifDb
## MotifDb object of length 10701
## | Created from downloaded public sources: 2013-Aug-30
## | 10701 position frequency matrices from 21 sources:
## | FlyFactorSurvey: 614
## | HOCOMOCOv10: 1066
## | HOCOMOCOv11-core-A: 181
## | HOCOMOCOv11-core-B: 84
## | HOCOMOCOv11-core-C: 135
## | HOCOMOCOv11-secondary-A: 46
## | HOCOMOCOv11-secondary-B: 19
## | HOCOMOCOv11-secondary-C: 13
## | HOCOMOCOv11-secondary-D: 290
## | HOMER: 332
## | JASPAR_2014: 592
## | JASPAR_CORE: 459
## | ScerTF: 196
## | SwissRegulon: 684
## | UniPROBE: 380
## | cisbp_1.02: 874
## | hPDI: 437
## | jaspar2016: 1209
## | jaspar2018: 1564
## | jolma2013: 843
## | stamlab: 683
## | 61 organism/s
## | Hsapiens: 5384
## | Mmusculus: 1411
## | Dmelanogaster: 1287
## | Scerevisiae: 1051
## | Athaliana: 803
## | Celegans: 90
## | other: 675
## Scerevisiae-ScerTF-ABF2-badis
## Scerevisiae-ScerTF-CAT8-badis
## Scerevisiae-ScerTF-CST6-badis
## Scerevisiae-ScerTF-ECM23-badis
## Scerevisiae-ScerTF-EDS1-badis
## ...
## Mmusculus-UniPROBE-Zfp740.UP00022
## Mmusculus-UniPROBE-Zic1.UP00102
## Mmusculus-UniPROBE-Zic2.UP00057
## Mmusculus-UniPROBE-Zic3.UP00006
## Mmusculus-UniPROBE-Zscan4.UP00026
### Here we can see which organisms are availible under which sources
### in MotifDb
table(mcols(MotifDb)$organism, mcols(MotifDb)$dataSource)
FlyFactorSurvey | HOCOMOCOv10 | HOCOMOCOv11-core-A | HOCOMOCOv11-core-B | HOCOMOCOv11-core-C | HOCOMOCOv11-secondary-A | HOCOMOCOv11-secondary-B | HOCOMOCOv11-secondary-C | HOCOMOCOv11-secondary-D | HOMER | JASPAR_2014 | JASPAR_CORE | ScerTF | SwissRegulon | UniPROBE | cisbp_1.02 | hPDI | jaspar2016 | jaspar2018 | jolma2013 | stamlab | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Acarolinensis | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
Amajus | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 3 | 0 | 0 | 0 | 0 | 0 | 3 | 3 | 0 | 0 |
Anidulans | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 8 | 0 | 0 | 0 | 0 | 0 |
Apisum | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
Aterreus | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
Athaliana | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 48 | 5 | 0 | 0 | 0 | 107 | 0 | 191 | 452 | 0 | 0 |
Bdistachyon | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 |
Celegans | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 15 | 5 | 0 | 0 | 2 | 22 | 0 | 23 | 23 | 0 | 0 |
Cparvum | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 |
Csativa | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 |
Ddiscoideum | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 9 | 0 | 0 | 0 | 0 | 0 |
Dmelanogaster | 614 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 131 | 125 | 0 | 0 | 0 | 138 | 0 | 139 | 140 | 0 | 0 |
Drerio | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 0 | 0 | 0 | 0 | 0 |
Gaculeatus | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
Gallus | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Ggallus | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 4 | 0 | 0 |
Hcapsulatum | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
Hroretzi | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 |
Hsapiens | 0 | 640 | 181 | 84 | 135 | 46 | 19 | 13 | 290 | 0 | 117 | 66 | 0 | 684 | 2 | 313 | 437 | 442 | 522 | 710 | 683 |
Hvulgare | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 |
Mdomestica | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
Mgallopavo | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
Mmurinus | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
Mmusculus | 0 | 426 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 66 | 47 | 0 | 0 | 282 | 132 | 0 | 165 | 160 | 133 | 0 |
Mmusculus;Hsapiens | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 0 | 0 |
Mmusculus;Rnorvegicus | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 |
Mmusculus;Rnorvegicus;Hsapiens | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 6 | 0 | 0 |
Mmusculus;Rnorvegicus;Hsapiens;Ocuniculus | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |