A simple introduction to the ImmuneSpaceR package

2017-04-24

This package provides a thin wrapper around Rlabkey and connects to the ImmuneSpace database, making it easier to fetch datasets, including gene expression data, hai, and so forth, from specific studies.

Contents

  1. Configuration
  2. Connections
  3. Datasets
  4. Gene expression
  5. Plots
  6. Cross study connections
  7. sessionInfo

Configuration

In order to connect to ImmuneSpace, you will need a .netrc file in your home directory that will contain a machine name (hostname of ImmuneSpace), and login and password. See here for more information.

A netrc file may look like this:

machine www.immunespace.org
login myuser@domain.com
password supersecretpassword

Set up your netrc file now

Put it in your home directory. If you type:

ls ~/.netrc

at the command prompt, you should see it there. If it’s not there, create one now. Make sure you have a valid login and password. If you don’t have one, go to ImmuneSpace now and set yourself up with an account.

Back to top

Instantiate a connection

We’ll be looking at study SDY269. If you want to use a different study, change that string. The connections have state, so you can instantiate multiple connections to different studies simultaneously.

library(ImmuneSpaceR)
sdy269 <- CreateConnection(study = "SDY269")
sdy269
## Immunespace Connection to study SDY269
## URL: https://www.immunespace.org/Studies/SDY269
## User: unknown_user at not_a_domain.com
## Available datasets
##  cohort_membership
##  pcr
##  elisa
##  fcs_analyzed_result
##  fcs_sample_files
##  hai
##  elispot
##  demographics
##  gene_expression_files
## Expression Matrices
##  TIV_2008
##  LAIV_2008

The call to CreateConnection instantiates the connection Printing the object shows where it’s connected, to what study, and the available data sets and gene expression matrices.

Note that when a script is running on ImmuneSpace, some variables set in the global environments will automatically indicate which study should be used and the study argument can be skipped.

Back to top

Fetching data sets

We can grab any of the datasets listed in the connection.

sdy269$getDataset("hai")
##      participant_id age_reported gender                      race
##   1:  SUB112829.269           26   Male                     White
##   2:  SUB112870.269           33   Male                     White
##   3:  SUB112873.269           25   Male                     White
##   4:  SUB112832.269           26   Male                     White
##   5:  SUB112856.269           46 Female Black or African American
##  ---                                                             
## 332:  SUB112883.269           23 Female                     Asian
## 333:  SUB112867.269           37   Male                     White
## 334:  SUB112844.269           29 Female                     White
## 335:  SUB112881.269           29 Female Black or African American
## 336:  SUB112879.269           35 Female                     White
##               cohort study_time_collected study_time_collected_unit
##   1: LAIV group 2008                    0                      Days
##   2: LAIV group 2008                    0                      Days
##   3: LAIV group 2008                    0                      Days
##   4:  TIV Group 2008                   28                      Days
##   5:  TIV Group 2008                   28                      Days
##  ---                                                               
## 332: LAIV group 2008                    0                      Days
## 333: LAIV group 2008                   28                      Days
## 334: LAIV group 2008                   28                      Days
## 335: LAIV group 2008                   28                      Days
## 336:  TIV Group 2008                    0                      Days
##                   virus value_reported
##   1:   B/Florida/4/2006             20
##   2:   B/Florida/4/2006              5
##   3:   B/Florida/4/2006              5
##   4: A/Brisbane/59/2007             20
##   5: A/Brisbane/59/2007            160
##  ---                                  
## 332: A/Uruguay/716/2007              5
## 333:   B/Florida/4/2006              5
## 334:   B/Florida/4/2006             20
## 335:   B/Florida/4/2006              5
## 336:  B/Brisbane/3/2007              5

The sdy269 object is an R5 class, so it behaves like a true object. Functions (like getDataset) are members of the object, thus the $ semantics to access member functions.

The first time you retrieve a data set, it will contact the database. The data is cached locally, so the next time you call getDataset on the same dataset, it will retrieve the cached local copy. This is much faster.

To get only a subset of the data and speed up the download, filters can be passed to getDataset. The filters are created using the makeFilter function of the Rlabkey package.

library(Rlabkey)
myFilter <- makeFilter(c("gender", "EQUAL", "Female"))
hai <- sdy269$getDataset("hai", colFilter = myFilter)

See ?makeFilter for more information on the syntax.

For more information about getDataset’s options, refer to the dedicated vignette.

Back to top

Fetching expression matrices

We can also grab a gene expression matrix

sdy269$getGEMatrix("LAIV_2008")
## Downloading matrix..
## Downloading Features..
## Constructing ExpressionSet
## ExpressionSet (storageMode: lockedEnvironment)
## assayData: 54715 features, 83 samples 
##   element names: exprs 
## protocolData: none
## phenoData
##   sampleNames: BS586216 BS586160 ... BS586111 (83 total)
##   varLabels: biosample_accession participant_id ...
##     study_time_collected_unit (5 total)
##   varMetadata: labelDescription
## featureData
##   featureNames: 1007_PM_s_at 1053_PM_at ... AFFX-r2-TagQ-5_at
##     (54715 total)
##   fvarLabels: FeatureId gene_symbol
##   fvarMetadata: labelDescription
## experimentData: use 'experimentData(object)'
## Annotation:

The object contacts the DB and downloads the matrix file. This is stored and cached locally as a data.table. The next time you access it, it will be much faster since it won’t need to contact the database again.

It is also possible to call this function using multiple matrix names. In this case, all the matrices are downloaded and combined into a single ExpressionSet.

sdy269$getGEMatrix(c("TIV_2008", "LAIV_2008"))
## Downloading matrix..
## Downloading matrix..
## Constructing ExpressionSet
## Constructing ExpressionSet
## Combining ExpressionSets
## ExpressionSet (storageMode: lockedEnvironment)
## assayData: 54715 features, 163 samples 
##   element names: exprs 
## protocolData: none
## phenoData
##   sampleNames: BS586131 BS586187 ... BS586111 (163 total)
##   varLabels: biosample_accession participant_id ...
##     study_time_collected_unit (5 total)
##   varMetadata: labelDescription
## featureData
##   featureNames: 1 2 ... 54715 (54715 total)
##   fvarLabels: FeatureId gene_symbol
##   fvarMetadata: labelDescription
## experimentData: use 'experimentData(object)'
## Annotation:

Finally, the summary argument will let you download the matrix with gene symbols in place of priobe ids.

gs <- sdy269$getGEMatrix("TIV_2008", summary = TRUE)
## Downloading matrix..
## Constructing ExpressionSet

If the connection was created with verbose = TRUE, some functions will display additional informations such as the valid dataset names.

Back to top

Quick plots

A quick plot of a data set can be generated using the quick_plot function.

quick_plot automatically chooses the type of plot depending on the selected dataset.

sdy269$quick_plot("hai")

sdy269$quick_plot("elisa")

However, the type argument can be used to manually select from “boxplot”, “heatmap”, “violin” and “line”.

Back to top

Cross study connections

To fetch data from multiple studies, simply create a connection at the project level.

con <- CreateConnection("")

This will instantiate a connection at the Studies level. Most functions work cross study connections just like they do on single studies.

You can get a list of datasets and gene expression matrices available accross all studies.

con
## Immunespace Connection to study Studies
## URL: https://www.immunespace.org/Studies/
## User: unknown_user at not_a_domain.com
## Available datasets
##  fcs_analyzed_result
##  elisa
##  mbaa
##  pcr
##  hai
##  hla_typing
##  neut_ab_titer
##  elispot
##  fcs_control_files
##  fcs_sample_files
##  gene_expression_files
##  demographics
##  cohort_membership
## Expression Matrices
##  TIV_older
##  SDY296_AIRFV_2011
##  LAIV_2008
##  TIV_2008
##  TIV_2007
##  pH1N1_2009
##  TIV_young
##  SDY63_older_PBMC_year1
##  SDY63_young_PBMC_year1
##  SDY404_older_PBMC_year2
##  SDY404_young_PBMC_year2
##  Cohort1_young
##  Cohort2_older
##  TIV_2011
##  Pneumovax23_group1
##  Fluzone_group1
##  Fluzone_group2
##  Pneumovax23_group2
##  Saline_group1
##  Saline_group2
##  VLminus
##  VLplus
##  TIV_2010
##  SDY301_AIRFV_2012

In cross-study connections, getDataset and getGEMatrix will combine the requested datasets or expression matrices. See the dedicated vignettes for more information.

Likewise, quick_plot will plot accross studies. Note that in most cases the datasets will have too many cohorts/subjects, making the filtering of the data a necessity. The colFilter argument can be used here, as described in the getDataset section.

plotFilter <- makeFilter(c("cohort", "IN", "TIV 2010;TIV Group 2008"))
con$quick_plot("elispot", filter = plotFilter)

The figure above shows the ELISPOT results for two different years of TIV vaccine cohorts from two different studies.

Back to top

sessionInfo()

sessionInfo()
## R version 3.4.0 (2017-04-21)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.2 LTS
## 
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.5-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.5-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] Rlabkey_2.1.134    rjson_0.2.15       RCurl_1.95-4.8    
## [4] bitops_1.0-6       ImmuneSpaceR_1.4.0 rmarkdown_1.4     
## [7] knitr_1.15.1      
## 
## loaded via a namespace (and not attached):
##  [1] Biobase_2.36.0      viridis_0.4.0       httr_1.2.1         
##  [4] tidyr_0.6.1         jsonlite_1.4        viridisLite_0.2.0  
##  [7] foreach_1.4.3       gtools_3.5.0        assertthat_0.2.0   
## [10] stats4_3.4.0        yaml_2.1.14         robustbase_0.92-7  
## [13] backports_1.0.5     lattice_0.20-35     digest_0.6.12      
## [16] RColorBrewer_1.1-2  colorspace_1.3-2    htmltools_0.3.5    
## [19] plyr_1.8.4          pheatmap_1.0.8      purrr_0.2.2        
## [22] mvtnorm_1.0-6       scales_0.4.1        gdata_2.17.0       
## [25] whisker_0.3-2       tibble_1.3.0        ggplot2_2.2.1      
## [28] nnet_7.3-12         BiocGenerics_0.22.0 lazyeval_0.2.0     
## [31] magrittr_1.5        mclust_5.2.3        heatmaply_0.9.1    
## [34] evaluate_0.10       MASS_7.3-47         gplots_3.0.1       
## [37] class_7.3-14        tools_3.4.0         registry_0.3       
## [40] data.table_1.10.4   trimcluster_0.1-2   stringr_1.2.0      
## [43] plotly_4.5.6        kernlab_0.9-25      munsell_0.4.3      
## [46] cluster_2.0.6       fpc_2.1-10          compiler_3.4.0     
## [49] caTools_1.17.1      grid_3.4.0          iterators_1.0.8    
## [52] htmlwidgets_0.8     labeling_0.3        base64enc_0.1-3    
## [55] gtable_0.2.0        codetools_0.2-15    flexmix_2.3-13     
## [58] DBI_0.6-1           reshape2_1.4.2      TSP_1.1-5          
## [61] R6_2.2.0            seriation_1.2-1     gridExtra_2.2.1    
## [64] prabclus_2.2-6      dplyr_0.5.0         rprojroot_1.2      
## [67] KernSmooth_2.23-15  dendextend_1.5.2    modeltools_0.2-21  
## [70] stringi_1.1.5       parallel_3.4.0      Rcpp_0.12.10       
## [73] gclus_1.3.1         DEoptimR_1.0-8      diptest_0.75-7