1 Introduction

The SomaScan.db package provides extended biological annotations to be used in conjunction with the results of SomaLogic’s SomaScan assay, a fee-for-service proteomic technology platform designed to detect proteins across numerous biological pathways.

This vignette describes how to use the SomaScan.db package to annotate SomaScan data, i.e. add additional information to an ADAT file that will give biological context (at the gene level) to the platform’s reagents and their protein targets. SomaScan.db performs annotation by mapping SomaScan reagent IDs (SeqIds) to their corresponding protein(s) and gene(s), as well as biological pathways (GO, KEGG, etc.) and identifiers from other public data repositories.

SomaScan.db utilizes the same methods and setup as other Bioconductor annotation packages, and therefore the methods should be familiar if you’ve worked with such packages previously.

2 Package Installation

To begin, install and load the SomaScan.db package:

if (!requireNamespace("BiocManager", quietly = TRUE)) {
    install.packages("BiocManager")
}

BiocManager::install("SomaScan.db", version = remotes::bioc_version())

Once installed, the package can be loaded as follows:

library(SomaScan.db)
## Loading required package: AnnotationDbi
## Loading required package: stats4
## Loading required package: BiocGenerics
## 
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:stats':
## 
##     IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
## 
##     Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append,
##     as.data.frame, basename, cbind, colnames, dirname, do.call,
##     duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
##     lapply, mapply, match, mget, order, paste, pmax, pmax.int, pmin,
##     pmin.int, rank, rbind, rownames, sapply, setdiff, sort, table,
##     tapply, union, unique, unsplit, which.max, which.min
## Loading required package: Biobase
## Welcome to Bioconductor
## 
##     Vignettes contain introductory material; view with
##     'browseVignettes()'. To cite Bioconductor, see
##     'citation("Biobase")', and for packages 'citation("pkgname")'.
## Loading required package: IRanges
## Loading required package: S4Vectors
## 
## Attaching package: 'S4Vectors'
## The following object is masked from 'package:utils':
## 
##     findMatches
## The following objects are masked from 'package:base':
## 
##     I, expand.grid, unname
## 

3 Package Overview

Loading this package will expose an annotation object with the same name as the package, SomaScan.db. This object is a SQLite database containing annotation data for the SomaScan assay derived from popular public repositories. Viewing the object will present a metadata table containing information about the annotations and where they were obtained:

SomaScan.db
## SomaDb object:
## | DBSCHEMAVERSION: 2.1
## | Db type: ChipDb
## | Supporting package: SomaScan.db
## | DBSCHEMA: HUMANCHIP_DB
## | ORGANISM: Homo sapiens
## | SPECIES: Human
## | MANUFACTURER: SomaLogic
## | CHIPNAME: SomaScan
## | MANUFACTURERURL: https://somalogic.com/somascan-platform/
## | EGSOURCEDATE: 2022-Sep12
## | EGSOURCENAME: Entrez Gene
## | EGSOURCEURL: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
## | CENTRALID: ENTREZID
## | TAXID: 9606
## | GOSOURCENAME: Gene Ontology
## | GOSOURCEURL: http://current.geneontology.org/ontology/go-basic.obo
## | GOSOURCEDATE: 2022-07-01
## | GOEGSOURCEDATE: 2022-Sep12
## | GOEGSOURCENAME: Entrez Gene
## | GOEGSOURCEURL: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
## | KEGGSOURCENAME: KEGG GENOME
## | KEGGSOURCEURL: ftp://ftp.genome.jp/pub/kegg/genomes
## | KEGGSOURCEDATE: 2011-Mar15
## | GPSOURCENAME: UCSC Genome Bioinformatics (Homo sapiens)
## | GPSOURCEURL: 
## | GPSOURCEDATE: 2022-Aug31
## | ENSOURCEDATE: 2022-Jun28
## | ENSOURCENAME: Ensembl
## | ENSOURCEURL: ftp://ftp.ensembl.org/pub/current_fasta
## | UPSOURCENAME: Uniprot
## | UPSOURCEURL: http://www.UniProt.org/
## | UPSOURCEDATE: Fri Sep 23 16:26:35 2022
## 
## Please see: help('select') for usage information

The same information can be retrieved as a data frame by calling metadata(SomaScan.db).

Moving forward, this database object (SomaScan.db) will be used throughout the vignette to retrieve SomaScan annotations and map between identifiers.

For reference, the species information for the database can directly be retrieved with the following methods:

species(SomaScan.db)
## [1] "Homo sapiens"
taxonomyId(SomaScan.db)
## [1] 9606

It’s also possible to pull a more detailed summary of annotations and resource identifiers (aka keys) by calling the package as a function, with the .db extension removed:

SomaScan()
## Quality control information for SomaScan:
## 
## 
## This package has the following mappings:
## 
## SomaScanALIAS2PROBE has 26576 mapped keys (of 258270 keys)
## SomaScanENSEMBL has 7173 mapped keys (of 7267 keys)
## SomaScanENSEMBL2PROBE has 7275 mapped keys (of 40467 keys)
## SomaScanENTREZID has 7178 mapped keys (of 7267 keys)
## SomaScanENZYME has 1311 mapped keys (of 7267 keys)
## SomaScanENZYME2PROBE has 666 mapped keys (of 975 keys)
## SomaScanGENENAME has 7178 mapped keys (of 7267 keys)
## SomaScanGO has 7126 mapped keys (of 7267 keys)
## SomaScanGO2ALLPROBES has 18821 mapped keys (of 22741 keys)
## SomaScanGO2PROBE has 14289 mapped keys (of 18809 keys)
## SomaScanMAP has 7174 mapped keys (of 7267 keys)
## SomaScanOMIM has 6726 mapped keys (of 7267 keys)
## SomaScanPATH has 3089 mapped keys (of 7267 keys)
## SomaScanPATH2PROBE has 227 mapped keys (of 229 keys)
## SomaScanPMID has 7175 mapped keys (of 7267 keys)
## SomaScanPMID2PROBE has 493341 mapped keys (of 778807 keys)
## SomaScanREFSEQ has 7178 mapped keys (of 7267 keys)
## SomaScanSYMBOL has 7178 mapped keys (of 7267 keys)
## SomaScanUNIPROT has 7165 mapped keys (of 7267 keys)
## 
## 
## Additional Information about this package:
## 
## DB schema: HUMANCHIP_DB
## DB schema version: 2.1
## Organism: Homo sapiens
## Date for NCBI data: 2022-Sep12
## Date for GO data: 2022-07-01
## Date for KEGG data: 2011-Mar15
## Date for Golden Path data: 2022-Aug31
## Date for Ensembl data: 2022-Jun28

Note: Keys will be explained in greater detail later in this vignette.

4 Retrieve Annotation Data

The SomaScan.db package has 5 primary methods that can be used to query the database:

  • keys
  • keytypes
  • columns
  • select
  • mapIds

This vignette will describe how each of these methods can be used to obtain annotation data from SomaScan.db.


4.1 keys method

This annotation package is platform-based, meaning it was built around the unique identifiers from a specific platform (in this case, SomaLogic’s SomaScan platform). That identifier corresponds to each of the assay’s analytes, and therefore the analyte identifiers (SeqIds) are the primary term used to query the database (aka “key”).

All keys in the database can be retrieved using keys:

# Short list of primary keys
keys(SomaScan.db) |> head(10L)
##  [1] "10000-28"  "10001-7"   "10003-15"  "10006-25"  "10008-43"  "10010-10" 
##  [7] "10011-65"  "10012-5"   "10014-31"  "10015-119"

Each key retrieved in the output above corresponds to one of the assay’s unique analytes.


4.2 keytype method

When querying the database, we can also specify the type of key (“keytype”) being used. The keytype refers to the type of identifier that is used to generate a database query. While the database is centered around the SomaLogic SeqId, other identifiers can still be used to query the database.

We can list all available datatypes that can be used as query keys using keytypes():

## List all of the supported key types.
keytypes(SomaScan.db)
##  [1] "ACCNUM"       "ALIAS"        "ENSEMBL"      "ENSEMBLPROT"  "ENSEMBLTRANS"
##  [6] "ENTREZID"     "ENZYME"       "EVIDENCE"     "EVIDENCEALL"  "GENENAME"    
## [11] "GENETYPE"     "GO"           "GOALL"        "IPI"          "MAP"         
## [16] "OMIM"         "ONTOLOGY"     "ONTOLOGYALL"  "PATH"         "PFAM"        
## [21] "PMID"         "PROBEID"      "PROSITE"      "REFSEQ"       "SYMBOL"      
## [26] "UCSCKG"       "UNIPROT"

Note: the SomaScan assay analyte identifiers (SeqIds) are stored as the “PROBEID” keytype.

keytypes can also be used in conjunction with keys to retrieve all identifiers associated with the specified keytype. The example below will retrieve all UniProt IDs in SomaScan.db:

keys(SomaScan.db, keytype = "UNIPROT") |> head(20L)
##  [1] "P04217"     "V9HWD8"     "P01023"     "P18440"     "Q400J6"    
##  [6] "F5H5R8"     "A4Z6T7"     "P11245"     "A0A024R6P0" "P01011"    
## [11] "P22760"     "A0A024R410" "Q13685"     "C9JEH3"     "F1T0I5"    
## [16] "Q16613"     "P49588"     "P80404"     "X5D8S1"     "B2RUU2"

4.3 columns method

All available external annotations, corresponding to “columns” of the database, can be listed using columns():

columns(SomaScan.db)
##  [1] "ACCNUM"       "ALIAS"        "ENSEMBL"      "ENSEMBLPROT"  "ENSEMBLTRANS"
##  [6] "ENTREZID"     "ENZYME"       "EVIDENCE"     "EVIDENCEALL"  "GENENAME"    
## [11] "GENETYPE"     "GO"           "GOALL"        "IPI"          "MAP"         
## [16] "OMIM"         "ONTOLOGY"     "ONTOLOGYALL"  "PATH"         "PFAM"        
## [21] "PMID"         "PROBEID"      "PROSITE"      "REFSEQ"       "SYMBOL"      
## [26] "UCSCKG"       "UNIPROT"

Note: the SomaScan assay analyte identifiers (SeqIds) are stored in the “PROBEID” column.

This list may look very similar (or even identical) to the columns output. If identical, all columns can be used as query keys. For a more in-depth explanation of what each of these columns contains, consult the manual:

help("OMIM") # Example help call

Each columns entry also has a mapping object that contains the information connecting SeqIds → the annotation’s identifiers. To read further documentation about the object and the resource used to make it, check out the manual page for the mapping itself:

?SomaScanOMIM

4.4 select method

The list of columns returned by columns informs us as to what types of data are available; therefore, the column values can be used to retrieve specific pieces of information from the database. You can think of keys and columns as:

  • Keys: the information you already have (SeqIds/probe IDs), aka rows of the database
  • Columns: data types for which you want to retrieve information, aka columns of the database

The SomaScan.db database can be queried via select, using both the keys and columns.

When selecting columns and keys using the select method, the keys are returned in the left-most column of the output, in the PROBEID column. The results will be in the exact same order as the input keys:

# Randomly select a set of keys
example_keys <- withr::with_seed(101L, sample(keys(SomaScan.db),
                                              size = 5L,
                                              replace = FALSE
))

# Query keys in the database
select(SomaScan.db,
       keys = example_keys,
       columns = c("ENTREZID", "SYMBOL", "GENENAME")
)
## 'select()' returned 1:1 mapping between keys and columns
##     PROBEID ENTREZID  SYMBOL                                    GENENAME
## 1  20564-53     7070    THY1                  Thy-1 cell surface antigen
## 2   5481-16     5921   RASA1                 RAS p21 protein activator 1
## 3 17792-158     7915 ALDH5A1   aldehyde dehydrogenase 5 family member A1
## 4  21760-22     7317    UBA1 ubiquitin like modifier activating enzyme 1
## 5   5508-62     1509    CTSD                                 cathepsin D

Note: The message above (‘select()’ returned 1:1 mapping between keys and columns) will be described in detail in the next section of this vignette.

The data that is returned will always be in the same order as the provided keys. If select cannot find a mapping for a specific key, an NA value will be returned to retain the original query order.

# Inserting a new key that won't be found in the annotations ("TEST")
test_keys <- c(example_keys[1], "TEST")

select(SomaScan.db, keys = test_keys, columns = c("PROBEID", "ENTREZID"))
## 'select()' returned 1:1 mapping between keys and columns
##    PROBEID ENTREZID
## 1 20564-53     7070
## 2     <NA>     <NA>

In the example above, a “PROBEID” and “ENTREZID” value couldn’t be found for the character string “TEST”, so an NA was returned in its place.

4.4.1 One-to-Many Relationships

When using select, a message indicating the relationship between query keys and column data will be displayed along with the query results. This message will describe one of the three relationships below:

  • 1:1 mapping between keys and columns
  • 1:many mapping between keys and columns
  • many:many mapping between keys and columns

These messages describe the very real possibility that there are multiple identifiers associated with each key in a query. This can cause the number of rows returned by select() to exceed the number of keys used to retrieve the data; this is what is meant by the message “‘select()’ returned 1:many mapping between keys and columns”.

In these cases, you will still see concordance between the order of the provided keys and outputted results rows, but you should be aware that new rows were inserted into the results. This message is not an error, merely an informative notification to the user making it clear that more output rows than input items should be expected. Importantly, this message also does not relay information about the SomaScan menu itself or advice on how to handle many-to-one relationships between SomaScan reagents and their corresponding protein targets; rather, the message is directly related to this package’s select method and how it retrieves information from the database.

Because some columns may have a many-to-many relationship to each key, it is generally best practice to retrieve the minimum number of columns needed for a query. Additionally, when retrieving a column that is known to have a many-to-one relationship to each key, like GO terms, it’s best to request that information in its own query, like so:

# Good
select(SomaScan.db, keys = example_keys[3L], columns = "GO")
## 'select()' returned 1:many mapping between keys and columns
##      PROBEID         GO EVIDENCE ONTOLOGY
## 1  17792-158 GO:0004777      IBA       MF
## 2  17792-158 GO:0004777      IDA       MF
## 3  17792-158 GO:0004777      ISS       MF
## 4  17792-158 GO:0005739      HDA       CC
## 5  17792-158 GO:0005739      IBA       CC
## 6  17792-158 GO:0005739      IDA       CC
## 7  17792-158 GO:0005739      ISS       CC
## 8  17792-158 GO:0005759      TAS       CC
## 9  17792-158 GO:0006105      ISS       BP
## 10 17792-158 GO:0006536      ISS       BP
## 11 17792-158 GO:0007417      IMP       BP
## 12 17792-158 GO:0009450      IBA       BP
## 13 17792-158 GO:0009450      IDA       BP
## 14 17792-158 GO:0009450      IEA       BP
## 15 17792-158 GO:0009450      IMP       BP
## 16 17792-158 GO:0009791      IEA       BP
## 17 17792-158 GO:0042135      ISS       BP
## 18 17792-158 GO:0042802      IPI       MF
# Bad
select(SomaScan.db,
       keys = example_keys[3L],
       columns = c("UNIPROT", "ENSEMBL", "GO", "PATH", "IPI")
)
## Warning: You have selected the following columns that can have a many to one
##   relationship with the primary key: UNIPROT, ENSEMBL, GO, PATH, IPI .
##   Because you have selected more than a few such columns there is a
##   risk that this selection may balloon up into a very large result as
##   the number of rows returned multiplies accordingly. To experience
##   smaller/more manageable results and faster retrieval times, you might
##   want to consider selecting these columns separately.
## 'select()' returned 1:many mapping between keys and columns
##       PROBEID UNIPROT         ENSEMBL         GO EVIDENCE ONTOLOGY  PATH
## 1   17792-158  P51649 ENSG00000112294 GO:0004777      IBA       MF 00250
## 2   17792-158  P51649 ENSG00000112294 GO:0004777      IBA       MF 00250
## 3   17792-158  P51649 ENSG00000112294 GO:0004777      IBA       MF 00650
## 4   17792-158  P51649 ENSG00000112294 GO:0004777      IBA       MF 00650
## 5   17792-158  P51649 ENSG00000112294 GO:0004777      IBA       MF 01100
## 6   17792-158  P51649 ENSG00000112294 GO:0004777      IBA       MF 01100
## 7   17792-158  P51649 ENSG00000112294 GO:0004777      IDA       MF 00250
## 8   17792-158  P51649 ENSG00000112294 GO:0004777      IDA       MF 00250
## 9   17792-158  P51649 ENSG00000112294 GO:0004777      IDA       MF 00650
## 10  17792-158  P51649 ENSG00000112294 GO:0004777      IDA       MF 00650
## 11  17792-158  P51649 ENSG00000112294 GO:0004777      IDA       MF 01100
## 12  17792-158  P51649 ENSG00000112294 GO:0004777      IDA       MF 01100
## 13  17792-158  P51649 ENSG00000112294 GO:0004777      ISS       MF 00250
## 14  17792-158  P51649 ENSG00000112294 GO:0004777      ISS       MF 00250
## 15  17792-158  P51649 ENSG00000112294 GO:0004777      ISS       MF 00650
## 16  17792-158  P51649 ENSG00000112294 GO:0004777      ISS       MF 00650
## 17  17792-158  P51649 ENSG00000112294 GO:0004777      ISS       MF 01100
## 18  17792-158  P51649 ENSG00000112294 GO:0004777      ISS       MF 01100
## 19  17792-158  P51649 ENSG00000112294 GO:0005739      HDA       CC 00250
## 20  17792-158  P51649 ENSG00000112294 GO:0005739      HDA       CC 00250
## 21  17792-158  P51649 ENSG00000112294 GO:0005739      HDA       CC 00650
## 22  17792-158  P51649 ENSG00000112294 GO:0005739      HDA       CC 00650
## 23  17792-158  P51649 ENSG00000112294 GO:0005739      HDA       CC 01100
## 24  17792-158  P51649 ENSG00000112294 GO:0005739      HDA       CC 01100
## 25  17792-158  P51649 ENSG00000112294 GO:0005739      IBA       CC 00250
## 26  17792-158  P51649 ENSG00000112294 GO:0005739      IBA       CC 00250
## 27  17792-158  P51649 ENSG00000112294 GO:0005739      IBA       CC 00650
## 28  17792-158  P51649 ENSG00000112294 GO:0005739      IBA       CC 00650
## 29  17792-158  P51649 ENSG00000112294 GO:0005739      IBA       CC 01100
## 30  17792-158  P51649 ENSG00000112294 GO:0005739      IBA       CC 01100
## 31  17792-158  P51649 ENSG00000112294 GO:0005739      IDA       CC 00250
## 32  17792-158  P51649 ENSG00000112294 GO:0005739      IDA       CC 00250
## 33  17792-158  P51649 ENSG00000112294 GO:0005739      IDA       CC 00650
## 34  17792-158  P51649 ENSG00000112294 GO:0005739      IDA       CC 00650
## 35  17792-158  P51649 ENSG00000112294 GO:0005739      IDA       CC 01100
## 36  17792-158  P51649 ENSG00000112294 GO:0005739      IDA       CC 01100
## 37  17792-158  P51649 ENSG00000112294 GO:0005739      ISS       CC 00250
## 38  17792-158  P51649 ENSG00000112294 GO:0005739      ISS       CC 00250
## 39  17792-158  P51649 ENSG00000112294 GO:0005739      ISS       CC 00650
## 40  17792-158  P51649 ENSG00000112294 GO:0005739      ISS       CC 00650
## 41  17792-158  P51649 ENSG00000112294 GO:0005739      ISS       CC 01100
## 42  17792-158  P51649 ENSG00000112294 GO:0005739      ISS       CC 01100
## 43  17792-158  P51649 ENSG00000112294 GO:0005759      TAS       CC 00250
## 44  17792-158  P51649 ENSG00000112294 GO:0005759      TAS       CC 00250
## 45  17792-158  P51649 ENSG00000112294 GO:0005759      TAS       CC 00650
## 46  17792-158  P51649 ENSG00000112294 GO:0005759      TAS       CC 00650
## 47  17792-158  P51649 ENSG00000112294 GO:0005759      TAS       CC 01100
## 48  17792-158  P51649 ENSG00000112294 GO:0005759      TAS       CC 01100
## 49  17792-158  P51649 ENSG00000112294 GO:0006105      ISS       BP 00250
## 50  17792-158  P51649 ENSG00000112294 GO:0006105      ISS       BP 00250
## 51  17792-158  P51649 ENSG00000112294 GO:0006105      ISS       BP 00650
## 52  17792-158  P51649 ENSG00000112294 GO:0006105      ISS       BP 00650
## 53  17792-158  P51649 ENSG00000112294 GO:0006105      ISS       BP 01100
## 54  17792-158  P51649 ENSG00000112294 GO:0006105      ISS       BP 01100
## 55  17792-158  P51649 ENSG00000112294 GO:0006536      ISS       BP 00250
## 56  17792-158  P51649 ENSG00000112294 GO:0006536      ISS       BP 00250
## 57  17792-158  P51649 ENSG00000112294 GO:0006536      ISS       BP 00650
## 58  17792-158  P51649 ENSG00000112294 GO:0006536      ISS       BP 00650
## 59  17792-158  P51649 ENSG00000112294 GO:0006536      ISS       BP 01100
## 60  17792-158  P51649 ENSG00000112294 GO:0006536      ISS       BP 01100
## 61  17792-158  P51649 ENSG00000112294 GO:0007417      IMP       BP 00250
## 62  17792-158  P51649 ENSG00000112294 GO:0007417      IMP       BP 00250
## 63  17792-158  P51649 ENSG00000112294 GO:0007417      IMP       BP 00650
## 64  17792-158  P51649 ENSG00000112294 GO:0007417      IMP       BP 00650
## 65  17792-158  P51649 ENSG00000112294 GO:0007417      IMP       BP 01100
## 66  17792-158  P51649 ENSG00000112294 GO:0007417      IMP       BP 01100
## 67  17792-158  P51649 ENSG00000112294 GO:0009450      IBA       BP 00250
## 68  17792-158  P51649 ENSG00000112294 GO:0009450      IBA       BP 00250
## 69  17792-158  P51649 ENSG00000112294 GO:0009450      IBA       BP 00650
## 70  17792-158  P51649 ENSG00000112294 GO:0009450      IBA       BP 00650
## 71  17792-158  P51649 ENSG00000112294 GO:0009450      IBA       BP 01100
## 72  17792-158  P51649 ENSG00000112294 GO:0009450      IBA       BP 01100
## 73  17792-158  P51649 ENSG00000112294 GO:0009450      IDA       BP 00250
## 74  17792-158  P51649 ENSG00000112294 GO:0009450      IDA       BP 00250
## 75  17792-158  P51649 ENSG00000112294 GO:0009450      IDA       BP 00650
## 76  17792-158  P51649 ENSG00000112294 GO:0009450      IDA       BP 00650
## 77  17792-158  P51649 ENSG00000112294 GO:0009450      IDA       BP 01100
## 78  17792-158  P51649 ENSG00000112294 GO:0009450      IDA       BP 01100
## 79  17792-158  P51649 ENSG00000112294 GO:0009450      IEA       BP 00250
## 80  17792-158  P51649 ENSG00000112294 GO:0009450      IEA       BP 00250
## 81  17792-158  P51649 ENSG00000112294 GO:0009450      IEA       BP 00650
## 82  17792-158  P51649 ENSG00000112294 GO:0009450      IEA       BP 00650
## 83  17792-158  P51649 ENSG00000112294 GO:0009450      IEA       BP 01100
## 84  17792-158  P51649 ENSG00000112294 GO:0009450      IEA       BP 01100
## 85  17792-158  P51649 ENSG00000112294 GO:0009450      IMP       BP 00250
## 86  17792-158  P51649 ENSG00000112294 GO:0009450      IMP       BP 00250
## 87  17792-158  P51649 ENSG00000112294 GO:0009450      IMP       BP 00650
## 88  17792-158  P51649 ENSG00000112294 GO:0009450      IMP       BP 00650
## 89  17792-158  P51649 ENSG00000112294 GO:0009450      IMP       BP 01100
## 90  17792-158  P51649 ENSG00000112294 GO:0009450      IMP       BP 01100
## 91  17792-158  P51649 ENSG00000112294 GO:0009791      IEA       BP 00250
## 92  17792-158  P51649 ENSG00000112294 GO:0009791      IEA       BP 00250
## 93  17792-158  P51649 ENSG00000112294 GO:0009791      IEA       BP 00650
## 94  17792-158  P51649 ENSG00000112294 GO:0009791      IEA       BP 00650
## 95  17792-158  P51649 ENSG00000112294 GO:0009791      IEA       BP 01100
## 96  17792-158  P51649 ENSG00000112294 GO:0009791      IEA       BP 01100
## 97  17792-158  P51649 ENSG00000112294 GO:0042135      ISS       BP 00250
## 98  17792-158  P51649 ENSG00000112294 GO:0042135      ISS       BP 00250
## 99  17792-158  P51649 ENSG00000112294 GO:0042135      ISS       BP 00650
## 100 17792-158  P51649 ENSG00000112294 GO:0042135      ISS       BP 00650
## 101 17792-158  P51649 ENSG00000112294 GO:0042135      ISS       BP 01100
## 102 17792-158  P51649 ENSG00000112294 GO:0042135      ISS       BP 01100
## 103 17792-158  P51649 ENSG00000112294 GO:0042802      IPI       MF 00250
## 104 17792-158  P51649 ENSG00000112294 GO:0042802      IPI       MF 00250
## 105 17792-158  P51649 ENSG00000112294 GO:0042802      IPI       MF 00650
## 106 17792-158  P51649 ENSG00000112294 GO:0042802      IPI       MF 00650
## 107 17792-158  P51649 ENSG00000112294 GO:0042802      IPI       MF 01100
## 108 17792-158  P51649 ENSG00000112294 GO:0042802      IPI       MF 01100
## 109 17792-158  X5DQN2 ENSG00000112294 GO:0004777      IBA       MF 00250
## 110 17792-158  X5DQN2 ENSG00000112294 GO:0004777      IBA       MF 00250
## 111 17792-158  X5DQN2 ENSG00000112294 GO:0004777      IBA       MF 00650
## 112 17792-158  X5DQN2 ENSG00000112294 GO:0004777      IBA       MF 00650
## 113 17792-158  X5DQN2 ENSG00000112294 GO:0004777      IBA       MF 01100
## 114 17792-158  X5DQN2 ENSG00000112294 GO:0004777      IBA       MF 01100
## 115 17792-158  X5DQN2 ENSG00000112294 GO:0004777      IDA       MF 00250
## 116 17792-158  X5DQN2 ENSG00000112294 GO:0004777      IDA       MF 00250
## 117 17792-158  X5DQN2 ENSG00000112294 GO:0004777      IDA       MF 00650
## 118 17792-158  X5DQN2 ENSG00000112294 GO:0004777      IDA       MF 00650
## 119 17792-158  X5DQN2 ENSG00000112294 GO:0004777      IDA       MF 01100
## 120 17792-158  X5DQN2 ENSG00000112294 GO:0004777      IDA       MF 01100
## 121 17792-158  X5DQN2 ENSG00000112294 GO:0004777      ISS       MF 00250
## 122 17792-158  X5DQN2 ENSG00000112294 GO:0004777      ISS       MF 00250
## 123 17792-158  X5DQN2 ENSG00000112294 GO:0004777      ISS       MF 00650
## 124 17792-158  X5DQN2 ENSG00000112294 GO:0004777      ISS       MF 00650
## 125 17792-158  X5DQN2 ENSG00000112294 GO:0004777      ISS       MF 01100
## 126 17792-158  X5DQN2 ENSG00000112294 GO:0004777      ISS       MF 01100
## 127 17792-158  X5DQN2 ENSG00000112294 GO:0005739      HDA       CC 00250
## 128 17792-158  X5DQN2 ENSG00000112294 GO:0005739      HDA       CC 00250
## 129 17792-158  X5DQN2 ENSG00000112294 GO:0005739      HDA       CC 00650
## 130 17792-158  X5DQN2 ENSG00000112294 GO:0005739      HDA       CC 00650
## 131 17792-158  X5DQN2 ENSG00000112294 GO:0005739      HDA       CC 01100
## 132 17792-158  X5DQN2 ENSG00000112294 GO:0005739      HDA       CC 01100
## 133 17792-158  X5DQN2 ENSG00000112294 GO:0005739      IBA       CC 00250
## 134 17792-158  X5DQN2 ENSG00000112294 GO:0005739      IBA       CC 00250
## 135 17792-158  X5DQN2 ENSG00000112294 GO:0005739      IBA       CC 00650
## 136 17792-158  X5DQN2 ENSG00000112294 GO:0005739      IBA       CC 00650
## 137 17792-158  X5DQN2 ENSG00000112294 GO:0005739      IBA       CC 01100
## 138 17792-158  X5DQN2 ENSG00000112294 GO:0005739      IBA       CC 01100
## 139 17792-158  X5DQN2 ENSG00000112294 GO:0005739      IDA       CC 00250
## 140 17792-158  X5DQN2 ENSG00000112294 GO:0005739      IDA       CC 00250
## 141 17792-158  X5DQN2 ENSG00000112294 GO:0005739      IDA       CC 00650
## 142 17792-158  X5DQN2 ENSG00000112294 GO:0005739      IDA       CC 00650
## 143 17792-158  X5DQN2 ENSG00000112294 GO:0005739      IDA       CC 01100
## 144 17792-158  X5DQN2 ENSG00000112294 GO:0005739      IDA       CC 01100
## 145 17792-158  X5DQN2 ENSG00000112294 GO:0005739      ISS       CC 00250
## 146 17792-158  X5DQN2 ENSG00000112294 GO:0005739      ISS       CC 00250
## 147 17792-158  X5DQN2 ENSG00000112294 GO:0005739      ISS       CC 00650
## 148 17792-158  X5DQN2 ENSG00000112294 GO:0005739      ISS       CC 00650
## 149 17792-158  X5DQN2 ENSG00000112294 GO:0005739      ISS       CC 01100
## 150 17792-158  X5DQN2 ENSG00000112294 GO:0005739      ISS       CC 01100
## 151 17792-158  X5DQN2 ENSG00000112294 GO:0005759      TAS       CC 00250
## 152 17792-158  X5DQN2 ENSG00000112294 GO:0005759      TAS       CC 00250
## 153 17792-158  X5DQN2 ENSG00000112294 GO:0005759      TAS       CC 00650
## 154 17792-158  X5DQN2 ENSG00000112294 GO:0005759      TAS       CC 00650
## 155 17792-158  X5DQN2 ENSG00000112294 GO:0005759      TAS       CC 01100
## 156 17792-158  X5DQN2 ENSG00000112294 GO:0005759      TAS       CC 01100
## 157 17792-158  X5DQN2 ENSG00000112294 GO:0006105      ISS       BP 00250
## 158 17792-158  X5DQN2 ENSG00000112294 GO:0006105      ISS       BP 00250
## 159 17792-158  X5DQN2 ENSG00000112294 GO:0006105      ISS       BP 00650
## 160 17792-158  X5DQN2 ENSG00000112294 GO:0006105      ISS       BP 00650
## 161 17792-158  X5DQN2 ENSG00000112294 GO:0006105      ISS       BP 01100
## 162 17792-158  X5DQN2 ENSG00000112294 GO:0006105      ISS       BP 01100
## 163 17792-158  X5DQN2 ENSG00000112294 GO:0006536      ISS       BP 00250
## 164 17792-158  X5DQN2 ENSG00000112294 GO:0006536      ISS       BP 00250
## 165 17792-158  X5DQN2 ENSG00000112294 GO:0006536      ISS       BP 00650
## 166 17792-158  X5DQN2 ENSG00000112294 GO:0006536      ISS       BP 00650
## 167 17792-158  X5DQN2 ENSG00000112294 GO:0006536      ISS       BP 01100
## 168 17792-158  X5DQN2 ENSG00000112294 GO:0006536      ISS       BP 01100
## 169 17792-158  X5DQN2 ENSG00000112294 GO:0007417      IMP       BP 00250
## 170 17792-158  X5DQN2 ENSG00000112294 GO:0007417      IMP       BP 00250
## 171 17792-158  X5DQN2 ENSG00000112294 GO:0007417      IMP       BP 00650
## 172 17792-158  X5DQN2 ENSG00000112294 GO:0007417      IMP       BP 00650
## 173 17792-158  X5DQN2 ENSG00000112294 GO:0007417      IMP       BP 01100
## 174 17792-158  X5DQN2 ENSG00000112294 GO:0007417      IMP       BP 01100
## 175 17792-158  X5DQN2 ENSG00000112294 GO:0009450      IBA       BP 00250
## 176 17792-158  X5DQN2 ENSG00000112294 GO:0009450      IBA       BP 00250
## 177 17792-158  X5DQN2 ENSG00000112294 GO:0009450      IBA       BP 00650
## 178 17792-158  X5DQN2 ENSG00000112294 GO:0009450      IBA       BP 00650
## 179 17792-158  X5DQN2 ENSG00000112294 GO:0009450      IBA       BP 01100
## 180 17792-158  X5DQN2 ENSG00000112294 GO:0009450      IBA       BP 01100
## 181 17792-158  X5DQN2 ENSG00000112294 GO:0009450      IDA       BP 00250
## 182 17792-158  X5DQN2 ENSG00000112294 GO:0009450      IDA       BP 00250
## 183 17792-158  X5DQN2 ENSG00000112294 GO:0009450      IDA       BP 00650
## 184 17792-158  X5DQN2 ENSG00000112294 GO:0009450      IDA       BP 00650
## 185 17792-158  X5DQN2 ENSG00000112294 GO:0009450      IDA       BP 01100
## 186 17792-158  X5DQN2 ENSG00000112294 GO:0009450      IDA       BP 01100
## 187 17792-158  X5DQN2 ENSG00000112294 GO:0009450      IEA       BP 00250
## 188 17792-158  X5DQN2 ENSG00000112294 GO:0009450      IEA       BP 00250
## 189 17792-158  X5DQN2 ENSG00000112294 GO:0009450      IEA       BP 00650
## 190 17792-158  X5DQN2 ENSG00000112294 GO:0009450      IEA       BP 00650
## 191 17792-158  X5DQN2 ENSG00000112294 GO:0009450      IEA       BP 01100
## 192 17792-158  X5DQN2 ENSG00000112294 GO:0009450      IEA       BP 01100
## 193 17792-158  X5DQN2 ENSG00000112294 GO:0009450      IMP       BP 00250
## 194 17792-158  X5DQN2 ENSG00000112294 GO:0009450      IMP       BP 00250
## 195 17792-158  X5DQN2 ENSG00000112294 GO:0009450      IMP       BP 00650
## 196 17792-158  X5DQN2 ENSG00000112294 GO:0009450      IMP       BP 00650
## 197 17792-158  X5DQN2 ENSG00000112294 GO:0009450      IMP       BP 01100
## 198 17792-158  X5DQN2 ENSG00000112294 GO:0009450      IMP       BP 01100
## 199 17792-158  X5DQN2 ENSG00000112294 GO:0009791      IEA       BP 00250
## 200 17792-158  X5DQN2 ENSG00000112294 GO:0009791      IEA       BP 00250
## 201 17792-158  X5DQN2 ENSG00000112294 GO:0009791      IEA       BP 00650
## 202 17792-158  X5DQN2 ENSG00000112294 GO:0009791      IEA       BP 00650
## 203 17792-158  X5DQN2 ENSG00000112294 GO:0009791      IEA       BP 01100
## 204 17792-158  X5DQN2 ENSG00000112294 GO:0009791      IEA       BP 01100
## 205 17792-158  X5DQN2 ENSG00000112294 GO:0042135      ISS       BP 00250
## 206 17792-158  X5DQN2 ENSG00000112294 GO:0042135      ISS       BP 00250
## 207 17792-158  X5DQN2 ENSG00000112294 GO:0042135      ISS       BP 00650
## 208 17792-158  X5DQN2 ENSG00000112294 GO:0042135      ISS       BP 00650
## 209 17792-158  X5DQN2 ENSG00000112294 GO:0042135      ISS       BP 01100
## 210 17792-158  X5DQN2 ENSG00000112294 GO:0042135      ISS       BP 01100
## 211 17792-158  X5DQN2 ENSG00000112294 GO:0042802      IPI       MF 00250
## 212 17792-158  X5DQN2 ENSG00000112294 GO:0042802      IPI       MF 00250
## 213 17792-158  X5DQN2 ENSG00000112294 GO:0042802      IPI       MF 00650
## 214 17792-158  X5DQN2 ENSG00000112294 GO:0042802      IPI       MF 00650
## 215 17792-158  X5DQN2 ENSG00000112294 GO:0042802      IPI       MF 01100
## 216 17792-158  X5DQN2 ENSG00000112294 GO:0042802      IPI       MF 01100
## 217 17792-158  X5D299 ENSG00000112294 GO:0004777      IBA       MF 00250
## 218 17792-158  X5D299 ENSG00000112294 GO:0004777      IBA       MF 00250
## 219 17792-158  X5D299 ENSG00000112294 GO:0004777      IBA       MF 00650
## 220 17792-158  X5D299 ENSG00000112294 GO:0004777      IBA       MF 00650
## 221 17792-158  X5D299 ENSG00000112294 GO:0004777      IBA       MF 01100
## 222 17792-158  X5D299 ENSG00000112294 GO:0004777      IBA       MF 01100
## 223 17792-158  X5D299 ENSG00000112294 GO:0004777      IDA       MF 00250
## 224 17792-158  X5D299 ENSG00000112294 GO:0004777      IDA       MF 00250
## 225 17792-158  X5D299 ENSG00000112294 GO:0004777      IDA       MF 00650
## 226 17792-158  X5D299 ENSG00000112294 GO:0004777      IDA       MF 00650
## 227 17792-158  X5D299 ENSG00000112294 GO:0004777      IDA       MF 01100
## 228 17792-158  X5D299 ENSG00000112294 GO:0004777      IDA       MF 01100
## 229 17792-158  X5D299 ENSG00000112294 GO:0004777      ISS       MF 00250
## 230 17792-158  X5D299 ENSG00000112294 GO:0004777      ISS       MF 00250
## 231 17792-158  X5D299 ENSG00000112294 GO:0004777      ISS       MF 00650
## 232 17792-158  X5D299 ENSG00000112294 GO:0004777      ISS       MF 00650
## 233 17792-158  X5D299 ENSG00000112294 GO:0004777      ISS       MF 01100
## 234 17792-158  X5D299 ENSG00000112294 GO:0004777      ISS       MF 01100
## 235 17792-158  X5D299 ENSG00000112294 GO:0005739      HDA       CC 00250
## 236 17792-158  X5D299 ENSG00000112294 GO:0005739      HDA       CC 00250
## 237 17792-158  X5D299 ENSG00000112294 GO:0005739      HDA       CC 00650
## 238 17792-158  X5D299 ENSG00000112294 GO:0005739      HDA       CC 00650
## 239 17792-158  X5D299 ENSG00000112294 GO:0005739      HDA       CC 01100
## 240 17792-158  X5D299 ENSG00000112294 GO:0005739      HDA       CC 01100
## 241 17792-158  X5D299 ENSG00000112294 GO:0005739      IBA       CC 00250
## 242 17792-158  X5D299 ENSG00000112294 GO:0005739      IBA       CC 00250
## 243 17792-158  X5D299 ENSG00000112294 GO:0005739      IBA       CC 00650
## 244 17792-158  X5D299 ENSG00000112294 GO:0005739      IBA       CC 00650
## 245 17792-158  X5D299 ENSG00000112294 GO:0005739      IBA       CC 01100
## 246 17792-158  X5D299 ENSG00000112294 GO:0005739      IBA       CC 01100
## 247 17792-158  X5D299 ENSG00000112294 GO:0005739      IDA       CC 00250
## 248 17792-158  X5D299 ENSG00000112294 GO:0005739      IDA       CC 00250
## 249 17792-158  X5D299 ENSG00000112294 GO:0005739      IDA       CC 00650
## 250 17792-158  X5D299 ENSG00000112294 GO:0005739      IDA       CC 00650
## 251 17792-158  X5D299 ENSG00000112294 GO:0005739      IDA       CC 01100
## 252 17792-158  X5D299 ENSG00000112294 GO:0005739      IDA       CC 01100
## 253 17792-158  X5D299 ENSG00000112294 GO:0005739      ISS       CC 00250
## 254 17792-158  X5D299 ENSG00000112294 GO:0005739      ISS       CC 00250
## 255 17792-158  X5D299 ENSG00000112294 GO:0005739      ISS       CC 00650
## 256 17792-158  X5D299 ENSG00000112294 GO:0005739      ISS       CC 00650
## 257 17792-158  X5D299 ENSG00000112294 GO:0005739      ISS       CC 01100
## 258 17792-158  X5D299 ENSG00000112294 GO:0005739      ISS       CC 01100
## 259 17792-158  X5D299 ENSG00000112294 GO:0005759      TAS       CC 00250
## 260 17792-158  X5D299 ENSG00000112294 GO:0005759      TAS       CC 00250
## 261 17792-158  X5D299 ENSG00000112294 GO:0005759      TAS       CC 00650
## 262 17792-158  X5D299 ENSG00000112294 GO:0005759      TAS       CC 00650
## 263 17792-158  X5D299 ENSG00000112294 GO:0005759      TAS       CC 01100
## 264 17792-158  X5D299 ENSG00000112294 GO:0005759      TAS       CC 01100
## 265 17792-158  X5D299 ENSG00000112294 GO:0006105      ISS       BP 00250
## 266 17792-158  X5D299 ENSG00000112294 GO:0006105      ISS       BP 00250
## 267 17792-158  X5D299 ENSG00000112294 GO:0006105      ISS       BP 00650
## 268 17792-158  X5D299 ENSG00000112294 GO:0006105      ISS       BP 00650
## 269 17792-158  X5D299 ENSG00000112294 GO:0006105      ISS       BP 01100
## 270 17792-158  X5D299 ENSG00000112294 GO:0006105      ISS       BP 01100
## 271 17792-158  X5D299 ENSG00000112294 GO:0006536      ISS       BP 00250
## 272 17792-158  X5D299 ENSG00000112294 GO:0006536      ISS       BP 00250
## 273 17792-158  X5D299 ENSG00000112294 GO:0006536      ISS       BP 00650
## 274 17792-158  X5D299 ENSG00000112294 GO:0006536      ISS       BP 00650
## 275 17792-158  X5D299 ENSG00000112294 GO:0006536      ISS       BP 01100
## 276 17792-158  X5D299 ENSG00000112294 GO:0006536      ISS       BP 01100
## 277 17792-158  X5D299 ENSG00000112294 GO:0007417      IMP       BP 00250
## 278 17792-158  X5D299 ENSG00000112294 GO:0007417      IMP       BP 00250
## 279 17792-158  X5D299 ENSG00000112294 GO:0007417      IMP       BP 00650
## 280 17792-158  X5D299 ENSG00000112294 GO:0007417      IMP       BP 00650
## 281 17792-158  X5D299 ENSG00000112294 GO:0007417      IMP       BP 01100
## 282 17792-158  X5D299 ENSG00000112294 GO:0007417      IMP       BP 01100
## 283 17792-158  X5D299 ENSG00000112294 GO:0009450      IBA       BP 00250
## 284 17792-158  X5D299 ENSG00000112294 GO:0009450      IBA       BP 00250
## 285 17792-158  X5D299 ENSG00000112294 GO:0009450      IBA       BP 00650
## 286 17792-158  X5D299 ENSG00000112294 GO:0009450      IBA       BP 00650
## 287 17792-158  X5D299 ENSG00000112294 GO:0009450      IBA       BP 01100
## 288 17792-158  X5D299 ENSG00000112294 GO:0009450      IBA       BP 01100
## 289 17792-158  X5D299 ENSG00000112294 GO:0009450      IDA       BP 00250
## 290 17792-158  X5D299 ENSG00000112294 GO:0009450      IDA       BP 00250
## 291 17792-158  X5D299 ENSG00000112294 GO:0009450      IDA       BP 00650
## 292 17792-158  X5D299 ENSG00000112294 GO:0009450      IDA       BP 00650
## 293 17792-158  X5D299 ENSG00000112294 GO:0009450      IDA       BP 01100
## 294 17792-158  X5D299 ENSG00000112294 GO:0009450      IDA       BP 01100
## 295 17792-158  X5D299 ENSG00000112294 GO:0009450      IEA       BP 00250
## 296 17792-158  X5D299 ENSG00000112294 GO:0009450      IEA       BP 00250
## 297 17792-158  X5D299 ENSG00000112294 GO:0009450      IEA       BP 00650
## 298 17792-158  X5D299 ENSG00000112294 GO:0009450      IEA       BP 00650
## 299 17792-158  X5D299 ENSG00000112294 GO:0009450      IEA       BP 01100
## 300 17792-158  X5D299 ENSG00000112294 GO:0009450      IEA       BP 01100
## 301 17792-158  X5D299 ENSG00000112294 GO:0009450      IMP       BP 00250
## 302 17792-158  X5D299 ENSG00000112294 GO:0009450      IMP       BP 00250
## 303 17792-158  X5D299 ENSG00000112294 GO:0009450      IMP       BP 00650
## 304 17792-158  X5D299 ENSG00000112294 GO:0009450      IMP       BP 00650
## 305 17792-158  X5D299 ENSG00000112294 GO:0009450      IMP       BP 01100
## 306 17792-158  X5D299 ENSG00000112294 GO:0009450      IMP       BP 01100
## 307 17792-158  X5D299 ENSG00000112294 GO:0009791      IEA       BP 00250
## 308 17792-158  X5D299 ENSG00000112294 GO:0009791      IEA       BP 00250
## 309 17792-158  X5D299 ENSG00000112294 GO:0009791      IEA       BP 00650
## 310 17792-158  X5D299 ENSG00000112294 GO:0009791      IEA       BP 00650
## 311 17792-158  X5D299 ENSG00000112294 GO:0009791      IEA       BP 01100
## 312 17792-158  X5D299 ENSG00000112294 GO:0009791      IEA       BP 01100
## 313 17792-158  X5D299 ENSG00000112294 GO:0042135      ISS       BP 00250
## 314 17792-158  X5D299 ENSG00000112294 GO:0042135      ISS       BP 00250
## 315 17792-158  X5D299 ENSG00000112294 GO:0042135      ISS       BP 00650
## 316 17792-158  X5D299 ENSG00000112294 GO:0042135      ISS       BP 00650
## 317 17792-158  X5D299 ENSG00000112294 GO:0042135      ISS       BP 01100
## 318 17792-158  X5D299 ENSG00000112294 GO:0042135      ISS       BP 01100
## 319 17792-158  X5D299 ENSG00000112294 GO:0042802      IPI       MF 00250
## 320 17792-158  X5D299 ENSG00000112294 GO:0042802      IPI       MF 00250
## 321 17792-158  X5D299 ENSG00000112294 GO:0042802      IPI       MF 00650
## 322 17792-158  X5D299 ENSG00000112294 GO:0042802      IPI       MF 00650
## 323 17792-158  X5D299 ENSG00000112294 GO:0042802      IPI       MF 01100
## 324 17792-158  X5D299 ENSG00000112294 GO:0042802      IPI       MF 01100
##             IPI
## 1   IPI00336008
## 2   IPI00019888
## 3   IPI00336008
## 4   IPI00019888
## 5   IPI00336008
## 6   IPI00019888
## 7   IPI00336008
## 8   IPI00019888
## 9   IPI00336008
## 10  IPI00019888
## 11  IPI00336008
## 12  IPI00019888
## 13  IPI00336008
## 14  IPI00019888
## 15  IPI00336008
## 16  IPI00019888
## 17  IPI00336008
## 18  IPI00019888
## 19  IPI00336008
## 20  IPI00019888
## 21  IPI00336008
## 22  IPI00019888
## 23  IPI00336008
## 24  IPI00019888
## 25  IPI00336008
## 26  IPI00019888
## 27  IPI00336008
## 28  IPI00019888
## 29  IPI00336008
## 30  IPI00019888
## 31  IPI00336008
## 32  IPI00019888
## 33  IPI00336008
## 34  IPI00019888
## 35  IPI00336008
## 36  IPI00019888
## 37  IPI00336008
## 38  IPI00019888
## 39  IPI00336008
## 40  IPI00019888
## 41  IPI00336008
## 42  IPI00019888
## 43  IPI00336008
## 44  IPI00019888
## 45  IPI00336008
## 46  IPI00019888
## 47  IPI00336008
## 48  IPI00019888
## 49  IPI00336008
## 50  IPI00019888
## 51  IPI00336008
## 52  IPI00019888
## 53  IPI00336008
## 54  IPI00019888
## 55  IPI00336008
## 56  IPI00019888
## 57  IPI00336008
## 58  IPI00019888
## 59  IPI00336008
## 60  IPI00019888
## 61  IPI00336008
## 62  IPI00019888
## 63  IPI00336008
## 64  IPI00019888
## 65  IPI00336008
## 66  IPI00019888
## 67  IPI00336008
## 68  IPI00019888
## 69  IPI00336008
## 70  IPI00019888
## 71  IPI00336008
## 72  IPI00019888
## 73  IPI00336008
## 74  IPI00019888
## 75  IPI00336008
## 76  IPI00019888
## 77  IPI00336008
## 78  IPI00019888
## 79  IPI00336008
## 80  IPI00019888
## 81  IPI00336008
## 82  IPI00019888
## 83  IPI00336008
## 84  IPI00019888
## 85  IPI00336008
## 86  IPI00019888
## 87  IPI00336008
## 88  IPI00019888
## 89  IPI00336008
## 90  IPI00019888
## 91  IPI00336008
## 92  IPI00019888
## 93  IPI00336008
## 94  IPI00019888
## 95  IPI00336008
## 96  IPI00019888
## 97  IPI00336008
## 98  IPI00019888
## 99  IPI00336008
## 100 IPI00019888
## 101 IPI00336008
## 102 IPI00019888
## 103 IPI00336008
## 104 IPI00019888
## 105 IPI00336008
## 106 IPI00019888
## 107 IPI00336008
## 108 IPI00019888
## 109 IPI00336008
## 110 IPI00019888
## 111 IPI00336008
## 112 IPI00019888
## 113 IPI00336008
## 114 IPI00019888
## 115 IPI00336008
## 116 IPI00019888
## 117 IPI00336008
## 118 IPI00019888
## 119 IPI00336008
## 120 IPI00019888
## 121 IPI00336008
## 122 IPI00019888
## 123 IPI00336008
## 124 IPI00019888
## 125 IPI00336008
## 126 IPI00019888
## 127 IPI00336008
## 128 IPI00019888
## 129 IPI00336008
## 130 IPI00019888
## 131 IPI00336008
## 132 IPI00019888
## 133 IPI00336008
## 134 IPI00019888
## 135 IPI00336008
## 136 IPI00019888
## 137 IPI00336008
## 138 IPI00019888
## 139 IPI00336008
## 140 IPI00019888
## 141 IPI00336008
## 142 IPI00019888
## 143 IPI00336008
## 144 IPI00019888
## 145 IPI00336008
## 146 IPI00019888
## 147 IPI00336008
## 148 IPI00019888
## 149 IPI00336008
## 150 IPI00019888
## 151 IPI00336008
## 152 IPI00019888
## 153 IPI00336008
## 154 IPI00019888
## 155 IPI00336008
## 156 IPI00019888
## 157 IPI00336008
## 158 IPI00019888
## 159 IPI00336008
## 160 IPI00019888
## 161 IPI00336008
## 162 IPI00019888
## 163 IPI00336008
## 164 IPI00019888
## 165 IPI00336008
## 166 IPI00019888
## 167 IPI00336008
## 168 IPI00019888
## 169 IPI00336008
## 170 IPI00019888
## 171 IPI00336008
## 172 IPI00019888
## 173 IPI00336008
## 174 IPI00019888
## 175 IPI00336008
## 176 IPI00019888
## 177 IPI00336008
## 178 IPI00019888
## 179 IPI00336008
## 180 IPI00019888
## 181 IPI00336008
## 182 IPI00019888
## 183 IPI00336008
## 184 IPI00019888
## 185 IPI00336008
## 186 IPI00019888
## 187 IPI00336008
## 188 IPI00019888
## 189 IPI00336008
## 190 IPI00019888
## 191 IPI00336008
## 192 IPI00019888
## 193 IPI00336008
## 194 IPI00019888
## 195 IPI00336008
## 196 IPI00019888
## 197 IPI00336008
## 198 IPI00019888
## 199 IPI00336008
## 200 IPI00019888
## 201 IPI00336008
## 202 IPI00019888
## 203 IPI00336008
## 204 IPI00019888
## 205 IPI00336008
## 206 IPI00019888
## 207 IPI00336008
## 208 IPI00019888
## 209 IPI00336008
## 210 IPI00019888
## 211 IPI00336008
## 212 IPI00019888
## 213 IPI00336008
## 214 IPI00019888
## 215 IPI00336008
## 216 IPI00019888
## 217 IPI00336008
## 218 IPI00019888
## 219 IPI00336008
## 220 IPI00019888
## 221 IPI00336008
## 222 IPI00019888
## 223 IPI00336008
## 224 IPI00019888
## 225 IPI00336008
## 226 IPI00019888
## 227 IPI00336008
## 228 IPI00019888
## 229 IPI00336008
## 230 IPI00019888
## 231 IPI00336008
## 232 IPI00019888
## 233 IPI00336008
## 234 IPI00019888
## 235 IPI00336008
## 236 IPI00019888
## 237 IPI00336008
## 238 IPI00019888
## 239 IPI00336008
## 240 IPI00019888
## 241 IPI00336008
## 242 IPI00019888
## 243 IPI00336008
## 244 IPI00019888
## 245 IPI00336008
## 246 IPI00019888
## 247 IPI00336008
## 248 IPI00019888
## 249 IPI00336008
## 250 IPI00019888
## 251 IPI00336008
## 252 IPI00019888
## 253 IPI00336008
## 254 IPI00019888
## 255 IPI00336008
## 256 IPI00019888
## 257 IPI00336008
## 258 IPI00019888
## 259 IPI00336008
## 260 IPI00019888
## 261 IPI00336008
## 262 IPI00019888
## 263 IPI00336008
## 264 IPI00019888
## 265 IPI00336008
## 266 IPI00019888
## 267 IPI00336008
## 268 IPI00019888
## 269 IPI00336008
## 270 IPI00019888
## 271 IPI00336008
## 272 IPI00019888
## 273 IPI00336008
## 274 IPI00019888
## 275 IPI00336008
## 276 IPI00019888
## 277 IPI00336008
## 278 IPI00019888
## 279 IPI00336008
## 280 IPI00019888
## 281 IPI00336008
## 282 IPI00019888
## 283 IPI00336008
## 284 IPI00019888
## 285 IPI00336008
## 286 IPI00019888
## 287 IPI00336008
## 288 IPI00019888
## 289 IPI00336008
## 290 IPI00019888
## 291 IPI00336008
## 292 IPI00019888
## 293 IPI00336008
## 294 IPI00019888
## 295 IPI00336008
## 296 IPI00019888
## 297 IPI00336008
## 298 IPI00019888
## 299 IPI00336008
## 300 IPI00019888
## 301 IPI00336008
## 302 IPI00019888
## 303 IPI00336008
## 304 IPI00019888
## 305 IPI00336008
## 306 IPI00019888
## 307 IPI00336008
## 308 IPI00019888
## 309 IPI00336008
## 310 IPI00019888
## 311 IPI00336008
## 312 IPI00019888
## 313 IPI00336008
## 314 IPI00019888
## 315 IPI00336008
## 316 IPI00019888
## 317 IPI00336008
## 318 IPI00019888
## 319 IPI00336008
## 320 IPI00019888
## 321 IPI00336008
## 322 IPI00019888
## 323 IPI00336008
## 324 IPI00019888

The example above illustrates why it is preferred to request as few columns as possible, especially when working with GO terms.

4.4.2 Specifying Keytype

In SomaScan.db, the default keytype for select is the probe ID. This means that when using a SeqId (aka “PROBEID”) to retrieve annotations, the keytype= argument does not need to be defined, and can be left out of the select call entirely. The default (PROBEID) will be used.

select(SomaScan.db, keys = example_keys, columns = c("ENTREZID", "UNIPROT"))
## 'select()' returned 1:many mapping between keys and columns
##      PROBEID ENTREZID    UNIPROT
## 1   20564-53     7070     B0YJA4
## 2   20564-53     7070     P04216
## 3    5481-16     5921     P20936
## 4    5481-16     5921     Q59GK3
## 5  17792-158     7915     P51649
## 6  17792-158     7915     X5DQN2
## 7  17792-158     7915     X5D299
## 8   21760-22     7317 A0A024R1A3
## 9   21760-22     7317     P22314
## 10   5508-62     1509     P07339
## 11   5508-62     1509     V9HWI3

However, the database can be searched using more than just SeqIds.

For example, you may want to retrieve a list of SeqIds that are associated with a specific gene of interest - let’s use SMAD2 as an example. You can work “backwards” to retrieve the SeqIds associated with SMAD2 by setting the keytype="SYMBOL":

select(SomaScan.db,
       columns = c("PROBEID", "ENTREZID"),
       keys = "SMAD2", keytype = "SYMBOL"
)
## 'select()' returned 1:many mapping between keys and columns
##   SYMBOL   PROBEID ENTREZID
## 1  SMAD2   10364-6     4087
## 2  SMAD2 11353-143     4087

Sometimes, this may appear not to work. Let’s use CASC4 as an example:

select(SomaScan.db,
       columns = c("PROBEID", "ENTREZID"),
       keys = "CASC4", keytype = "SYMBOL"
)
## Error in .testForValidKeys(x, keys, keytype, fks): None of the keys entered are valid keys for 'SYMBOL'. Please use the keys method to see a listing of valid arguments.

The error message above implies that CASC4 is not a valid key for the “SYMBOL” keytype, which means that no entry for CASC4 was found in the “SYMBOL” column. However, genes can be tricky to search, and in some cases have many common names. We can improve this using keytype="ALIAS"; this data type contains the various aliases associated with gene names found in the “SYMBOL” column. Using keytype="ALIAS", we can cast a wider net:

select(SomaScan.db,
       columns = c("SYMBOL", "PROBEID", "ENTREZID"),
       keys = "CASC4", keytype = "ALIAS"
)
## 'select()' returned 1:many mapping between keys and columns
##   ALIAS SYMBOL  PROBEID ENTREZID
## 1 CASC4  GOLM2 10613-33   113201
## 2 CASC4  GOLM2  8838-10   113201

This reveals the source of our problem! CASC4 is also known as GOLM4, and this symbol is used in the annotations database. Because of this, searching for CASC4 as a symbol returns no results, but the same query is able to identify an entry when the “ALIAS” column is specified.

Additionally, we can see that CASC4/GOLM2 is associated with two SeqIds - 10613-33 and 8838-10. How is this possible, and why does this happen? For more information, please reference the Advanced Usage Examples ( vignette("advanced_usage_examples", package = "SomaScan.db")).

4.4.3 Specifying Menu Version

SomaScan.db contains annotations for all currently available versions of the SomaScan menu by default. However, the menu= argument of select can be used to retrieve analytes from a single, specified menu. For example, if you have SomaScan data from an older menu version, like the 5k menu (also known as V4.0), this argument can be used to retrieve annotations associated with that menu specifically:

all_keys <- keys(SomaScan.db)
menu_5k <- select(SomaScan.db,
                  keys = all_keys, columns = "PROBEID",
                  menu = "5k"
)

head(menu_5k)
##    PROBEID
## 1 10000-28
## 2  10001-7
## 3 10003-15
## 4 10006-25
## 5 10008-43
## 6 10011-65

The SeqIds in the menu_5k data frame represent the 4966 analytes that were available in v4.0 of the SomaScan menu. All of the listed analytes have currently available annotations in SomaScan.db. Those that are not represented do not currently have annotations available in SomaScan.db.

If the menu= argument is not specified in select, annotations for all available analytes are returned.


4.5 mapIDs method

For situations in which you wish only to retrieve one data type from the database, the mapIds method may be cleaner and more streamlined than using select, and can help avoid problems with one-to-many mapping of keys. For example, if you are only interested in the gene symbols associated with a set of SomaScan analytes, they can be retrieved like so:

mapIds(SomaScan.db, keys = example_keys, column = "SYMBOL")
## 'select()' returned 1:1 mapping between keys and columns
##  20564-53   5481-16 17792-158  21760-22   5508-62 
##    "THY1"   "RASA1" "ALDH5A1"    "UBA1"    "CTSD"

mapIds will return a named vector from a single column, while select returns a data frame and can be used to retrieve data from multiple columns.

The primary difference between mapIds and select is how the method handles one-to-many mapping, i.e. when the chosen key maps to > 1 entry in the selected column. When this occurs, only the first value (by default) is returned.

Compare the output in the examples below:

# Only 1 symbol per key
mapIds(SomaScan.db, keys = example_keys[3L], column = "GO")
## 'select()' returned 1:many mapping between keys and columns
##    17792-158 
## "GO:0004777"
# All entries for chosen key
select(SomaScan.db, keys = example_keys[3L], column = "GO")
## 'select()' returned 1:many mapping between keys and columns
##      PROBEID         GO EVIDENCE ONTOLOGY
## 1  17792-158 GO:0004777      IBA       MF
## 2  17792-158 GO:0004777      IDA       MF
## 3  17792-158 GO:0004777      ISS       MF
## 4  17792-158 GO:0005739      HDA       CC
## 5  17792-158 GO:0005739      IBA       CC
## 6  17792-158 GO:0005739      IDA       CC
## 7  17792-158 GO:0005739      ISS       CC
## 8  17792-158 GO:0005759      TAS       CC
## 9  17792-158 GO:0006105      ISS       BP
## 10 17792-158 GO:0006536      ISS       BP
## 11 17792-158 GO:0007417      IMP       BP
## 12 17792-158 GO:0009450      IBA       BP
## 13 17792-158 GO:0009450      IDA       BP
## 14 17792-158 GO:0009450      IEA       BP
## 15 17792-158 GO:0009450      IMP       BP
## 16 17792-158 GO:0009791      IEA       BP
## 17 17792-158 GO:0042135      ISS       BP
## 18 17792-158 GO:0042802      IPI       MF

Note that the mapIds method warning message states that it returned 1:many mappings between keys and columns, however only one value was returned for the desired SeqId. This is because there were more mapped values that were discarded when the results were converted to a named vector. This may not be a problem for some columns, like “SYMBOL” (typically there is only one gene symbol per gene), but it may present a problem for others (like GO terms or KEGG pathways). Think carefully when using mapIds, or consider specifying the multiVals= argument to indicate what should be done with multi-mapped output.

The default behavior of mapIds is to return the first available result:

# The default - returns the first available result
mapIds(SomaScan.db, keys = example_keys[3], column = "GO", multiVals = "first")
## 'select()' returned 1:many mapping between keys and columns
##    17792-158 
## "GO:0004777"

Again, the select message here indicates that, while only 1 value was returned, there were many more GO term matches. All of the matches can be viewed by specifying multiVals="list":

# Returns a list object of results, instead of only returning the first result
mapIds(SomaScan.db, keys = example_keys[3], column = "GO", multiVals = "list")
## 'select()' returned 1:many mapping between keys and columns
## $`17792-158`
##  [1] "GO:0004777" "GO:0004777" "GO:0004777" "GO:0005739" "GO:0005739"
##  [6] "GO:0005739" "GO:0005739" "GO:0005759" "GO:0006105" "GO:0006536"
## [11] "GO:0007417" "GO:0009450" "GO:0009450" "GO:0009450" "GO:0009450"
## [16] "GO:0009791" "GO:0042135" "GO:0042802"

5 Adding Target Full Names

Because the annotations in this package are compiled from public repositories, information typically found in an ADAT may be missing. For example, in an ADAT file, each SeqId is associated with a protein target, and the name of that target is provided as both an abbreviated symbol (“Target”) and full description (“Target Full Name”). The SomaScan.db package does not contain data from a particular ADAT file; however, it does contain a function to add the full protein target name to any data frame obtained via select.

As an example, we will generate a data frame of Ensembl gene IDs and OMIM IDs:

ensg <- select(SomaScan.db,
               keys = example_keys[1:3L],
               columns = c("ENSEMBL", "OMIM")
)
## 'select()' returned 1:many mapping between keys and columns
ensg
##     PROBEID         ENSEMBL   OMIM
## 1  20564-53 ENSG00000154096 188230
## 2   5481-16 ENSG00000145715 139150
## 3   5481-16 ENSG00000145715 605462
## 4   5481-16 ENSG00000145715 608354
## 5 17792-158 ENSG00000112294 271980
## 6 17792-158 ENSG00000112294 610045

We will now append the Target Full Name to this data frame:

addTargetFullName(ensg)
##     PROBEID                                      TARGETFULLNAME         ENSEMBL
## 1 17792-158 Succinate-semialdehyde dehydrogenase, mitochondrial ENSG00000112294
## 2 17792-158 Succinate-semialdehyde dehydrogenase, mitochondrial ENSG00000112294
## 3  20564-53                         Thy-1 membrane glycoprotein ENSG00000154096
## 4   5481-16                     Ras GTPase-activating protein 1 ENSG00000145715
## 5   5481-16                     Ras GTPase-activating protein 1 ENSG00000145715
## 6   5481-16                     Ras GTPase-activating protein 1 ENSG00000145715
##     OMIM
## 1 271980
## 2 610045
## 3 188230
## 4 139150
## 5 605462
## 6 608354

The full protein target name will be appended to the input data frame, with the Target Full Name (in the “TARGETFULLNAME” column) always added to the right of the “PROBEID” column.

6 Package Objects

In addition to the methods mentioned above, there is an R object that can be used to retrieve SomaScan analytes from a specific menu version. The object is a list, with each element in the list containing a character vector of SeqIds that were available in the specified menu.

summary(somascan_menu)
##      Length Class  Mode     
## v4.0 4966   -none- character
## v4.1 7267   -none- character
lapply(somascan_menu, head)
## $v4.0
## [1] "10000-28" "10001-7"  "10003-15" "10006-25" "10008-43" "10011-65"
## 
## $v4.1
## [1] "10000-28" "10001-7"  "10003-15" "10006-25" "10008-43" "10010-10"

This object also provides a quick and easy way of comparing the available SomaScan menus:

setdiff(somascan_menu$v4.1, somascan_menu$v4.0) |> head(50L)
##  [1] "10010-10"  "10025-1"   "10039-32"  "10069-2"   "10351-51"  "10354-57" 
##  [7] "10379-19"  "10382-1"   "10398-110" "10420-30"  "10439-57"  "10457-3"  
## [13] "10460-1"   "10463-23"  "10470-34"  "10472-53"  "10473-2"   "10479-18" 
## [19] "10480-33"  "10505-12"  "10528-2"   "10576-7"   "10626-116" "10631-9"  
## [25] "10636-1"   "10670-26"  "10738-11"  "10741-22"  "10743-13"  "10746-24" 
## [31] "10780-10"  "10801-11"  "10819-108" "10855-55"  "10870-32"  "10894-25" 
## [37] "10966-1"   "10967-12"  "10970-3"   "10976-44"  "10980-11"  "11081-1"  
## [43] "11083-23"  "11121-56"  "11150-3"   "11159-14"  "11184-51"  "11200-52" 
## [49] "11203-97"  "11232-46"

7 Session Info

sessionInfo()
## R version 4.3.1 (2023-06-16)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.3 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.18-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
## [1] SomaScan.db_0.99.7   AnnotationDbi_1.64.0 IRanges_2.36.0      
## [4] S4Vectors_0.40.0     Biobase_2.62.0       BiocGenerics_0.48.0 
## [7] withr_2.5.1          BiocStyle_2.30.0    
## 
## loaded via a namespace (and not attached):
##  [1] bit_4.0.5               jsonlite_1.8.7          compiler_4.3.1         
##  [4] BiocManager_1.30.22     crayon_1.5.2            blob_1.2.4             
##  [7] bitops_1.0-7            Biostrings_2.70.0       jquerylib_0.1.4        
## [10] png_0.1-8               yaml_2.3.7              fastmap_1.1.1          
## [13] org.Hs.eg.db_3.18.0     R6_2.5.1                XVector_0.42.0         
## [16] GenomeInfoDb_1.38.0     knitr_1.44              bookdown_0.36          
## [19] GenomeInfoDbData_1.2.11 DBI_1.1.3               bslib_0.5.1            
## [22] rlang_1.1.1             KEGGREST_1.42.0         cachem_1.0.8           
## [25] xfun_0.40               sass_0.4.7              bit64_4.0.5            
## [28] RSQLite_2.3.1           memoise_2.0.1           cli_3.6.1              
## [31] zlibbioc_1.48.0         digest_0.6.33           vctrs_0.6.4            
## [34] evaluate_0.22           RCurl_1.98-1.12         rmarkdown_2.25         
## [37] httr_1.4.7              pkgconfig_2.0.3         tools_4.3.1            
## [40] htmltools_0.5.6.1