This vignette shows detailed examples for all functionalities of the
getDataset
method.
As explained into the introductory vignette, datasets must be
downloaded from ImmuneSpaceConnection
objects. We must
first instantiate a connection to the study or studies of interest.
Throughout this vignette, we will use two connections, one to a single
study, and one to to all available data.
Now that the connections have been instantiated, we can start
downloading from themm but we need to figure out which datasets are
available within our chosen studies. Printing the connections will,
among other information, list the datasets availables. The
listDatasets
method will display only the information we
are looking for.
## datasets
## cohort_membership
## demographics
## elisa
## elispot
## fcs_analyzed_result
## fcs_sample_files
## gene_expression_files
## hai
## pcr
## Expression Matrices
## SDY269_PBMC_TIV_Geo
## SDY269_PBMC_LAIV_Geo
## datasets
## cohort_membership
## demographics
## elisa
## elispot
## fcs_analyzed_result
## fcs_control_files
## fcs_sample_files
## gene_expression_files
## hai
## hla_typing
## mbaa
## neut_ab_titer
## pcr
## Expression Matrices
## SDY1630_Spleen_AllSubjects
## SDY1630_PBMC_AllSubjects
## SDY1630_LungLymphNode_AllSubjects
## SDY1630_Lung_AllSubjects
## SDY1630_BoneMarrow_AllSubjects
## SDY1529_WholeBlood_HealthyAdults_PreVax_Geo
## SDY1529_WholeBlood_HealthyAdults_PostVax_Geo
## SDY787_Tcell_wPaPboost_Geo
## SDY787_Tcell_aP_Geo
## SDY787_Tcell_aPaPboost_Geo
## SDY1328_WholeBlood_HealthyAldults_Geo
## SDY520_WholeBlood_Young_geo
## SDY520_WholeBlood_Older_Geo
## SDY640_WholeBlood_Young_Geo
## SDY640_WholeBlood_Older_Geo
## SDY1086_PBMC_GroupB_Geo
## SDY1086_PBMC_GroupA_Geo
## SDY1267_PBMC_RRR_Geo
## SDY1267_PBMC_ARR_Geo
## SDY1412_WholeBlood_EPC002_geo
## SDY1256_WholeBlood_EPIC001_geo
## SDY645_WholeBlood_JuvDM
## SDY376_WholeBlood_JDM
## SDY80_PBMC_Cohort2_geo
## SDY299_WholeBlood_HEPISLAV
## SDY180_WholeBlood_Grp2Saline_Geo
## SDY180_WholeBlood_Grp2Pneunomax23_Geo
## SDY180_WholeBlood_Grp2Fluzone_Geo
## SDY180_WholeBlood_Grp1Saline_Geo
## SDY180_WholeBlood_Grp1Pneunomax23_Geo
## SDY180_WholeBlood_Grp1Fluzone_Geo
## SDY1325_WholeBlood_LowIntraMuscularPS_geo
## SDY1294_PBMC_ChineseCohort_Geo
## SDY1119_PBMC_oldHealthy_Geo
## SDY1119_PBMC_oldT2D_Geo
## SDY1119_PBMC_youngT2D_Geo
## SDY1119_PBMC_youngHealthy_Geo
## SDY1289_WholeBlood_MontrealCohort_Geo
## SDY1289_WholeBlood_LausanneCohort_Geo
## SDY1324_PBMC_nonBCGvacc
## SDY1324_PBMC_LatentTB
## SDY1324_PBMC_BCGvacc
## SDY89_WholeBlood_EnergixB
## SDY1370_Bcell_lc16m8_geo
## SDY1370_Bcell_dryvax_geo
## SDY1370_Tcell_lc16m8_geo
## SDY1370_Tcell_dryvax_geo
## SDY1370_PBMC_lc16m8_geo
## SDY1370_PBMC_dryvax_geo
## SDY1368_WholeBlood_Twin_Geo
## SDY1368_WholeBlood_NonTwin_Geo
## SDY67_PBMC_HealthyAdults
## SDY224_PBMC_TIV2010_ImmPort
## SDY888_PBMC_UninfectedEndemicArea_Geo
## SDY888_PBMC_UninfectedNonEndemicArea_Geo
## SDY888_PBMC_InfectedEndemicArea_Geo
## SDY28_PBMC_Dryvax
## SDY34_PBMC_TIV
## SDY34_PBMC_Controls
## SDY305_Other_IDTIV_Geo
## SDY305_Other_TIV_Geo
## SDY112_Other_GroupC
## SDY112_Other_GroupB
## SDY112_Other_GroupA
## SDY315_Other_GroupC_Geo
## SDY315_Other_GroupB_Geo
## SDY315_Other_GroupA_Geo
## SDY406_Other_ILI_Geo
## SDY113_Other_IDTIV_Geo
## SDY113_Other_LAIV_Geo
## SDY113_Other_TIV_Geo
## SDY144_Other_TIV_Geo
## SDY690_PBMC_Energixb
## SDY690_WholeBlood_Energixb
## SDY597_Other_InVitro
## SDY522_Other_LAIV
## SDY387_WholeBlood_NCH2010
## SDY372_WholeBlood_JDM2012
## SDY368_WholeBlood_NCH2013
## SDY364_WholeBlood_NCH2012
## SDY312_Other_GroupC
## SDY312_Other_GroupB
## SDY312_Other_GroupA
## SDY301_Other_AIRFV
## SDY296_WholeBlood_AIRFV
## SDY667_WholeBlood_PSORPPP
## SDY212_WholeBlood_Older_Geo
## SDY212_WholeBlood_Young_Geo
## SDY212_PBMC_Older_Geo
## SDY212_PBMC_Young_geo
## SDY270_PBMC_TIVGroup_Geo
## SDY1373_WholeBlood_highDose_Geo
## SDY1373_WholeBlood_lowDose_Geo
## SDY1364_PBMC_IntraDermal_Geo
## SDY1364_PBMC_IntraMuscular_Geo
## SDY1325_WholeBlood_IntramuscularCRM_Geo
## SDY1325_WholeBlood_IntramuscularPS_Geo
## SDY1325_WholeBlood_SubcutaneousPS_Geo
## SDY1291_PBMC_HealthyHIVUninfected_Geo
## SDY1293_PBMC_Vaccinated_geo
## SDY1293_PBMC_Control_Geo
## SDY1276_WholeBlood_Validation_Geo
## SDY1264_PBMC_Trial2_Geo
## SDY1264_PBMC_Trial1_Geo
## SDY1260_PBMC_MCV4_Geo
## SDY1260_PBMC_MPSV4_Geo
## SDY984_PBMC_Elderly_Geo
## SDY984_PBMC_Young_Geo
## SDY61_PBMC_TIVGrp
## SDY56_PBMC_Older
## SDY56_PBMC_Young
## SDY63_PBMC_Young_Geo
## SDY63_PBMC_Older_Geo
## SDY404_PBMC_Young_Geo
## SDY404_PBMC_Older_Geo
## SDY400_PBMC_Older_Geo
## SDY400_PBMC_Young_Geo
## SDY269_PBMC_TIV_Geo
## SDY269_PBMC_LAIV_Geo
## SDY300_dendriticCell_dcMonoFlu2011
## SDY300_otherCell_dcMonoFlu2011
## SDY162_Macrophage_VLplus
## SDY162_PBMC_VLplus
## SDY162_Macrophage_VLminus
## SDY162_PBMC_VLminus
Naturally, all
contains every dataset available on
ImmuneSpace as it combines all available studies. Additionaly, when
creating connection object with verbose = TRUE
, a call to
the getDataset
method with an invalid dataset name will
return the list of valid datasets.
Calling getDataset
returns a selected dataset as it is
displayed on ImmuneSpace.
## participant_id age_reported gender race cohort
## 1: SUB112829.269 26 Male White LAIV group 2008
## 2: SUB112829.269 26 Male White LAIV group 2008
## 3: SUB112829.269 26 Male White LAIV group 2008
## 4: SUB112829.269 26 Male White LAIV group 2008
## 5: SUB112829.269 26 Male White LAIV group 2008
## 6: SUB112829.269 26 Male White LAIV group 2008
## study_time_collected study_time_collected_unit virus
## 1: 0 Days A/South Dakota/06/2007
## 2: 0 Days A/Uruguay/716/2007
## 3: 0 Days B/Florida/4/2006
## 4: 28 Days A/South Dakota/06/2007
## 5: 28 Days A/Uruguay/716/2007
## 6: 28 Days B/Florida/4/2006
## value_preferred
## 1: 40
## 2: 40
## 3: 20
## 4: 40
## 5: 40
## 6: 40
Because some datasets such as flow cytometry results can contain a
large number of rows, the method returns data.table
objects
to improve performance. This is especially important with multi-study
connections.
The datasets can be filtered before download. Filters should be
created using Rlabkey
’s makeFilter
function.
Each filter is composed of three part: * A column name or column label * An operator * A value or array of values separated by a semi-colon
library(Rlabkey)
# Get participants under age of 30
young_filter <- makeFilter(c("age_reported", "LESS_THAN", 30))
# Get a specific list of two participants
pid_filter <- makeFilter(c("participantid", "IN", "SUB112841.269;SUB112834.269"))
For a list of available operators, see
?Rlabkey::makeFilter
.
# HAI data for participants of study SDY269 under age of 30
hai_young <- sdy269$getDataset("hai", colFilter = young_filter)
# List of participants under age 30
demo_young <- all$getDataset("demographics", colFilter = young_filter)
# ELISPOT assay results for two participants
mbaa_pid2 <- all$getDataset("elispot", colFilter = pid_filter)
Note that filtering is done before download. When performance is a
concern, it is faster to do the filtering via the
colfFilter
argument than on the returned table.
Any dataset grid on ImmuneSpace offers the possibility to switch views between ‘Default’ and ‘Full’. The Default view contains information that is directly relevant to the user. Sample description and results are joined with basic demographic. However, this is not the way data is organized in the database. The ‘Full’ view is a representation of the data as it is stored on ImmPort. The accession columns are used under the hood for join operations. They will be useful to developers and user writing reports to be displayed in ImmuneSpace studies.
Screen capture of the button bar of a dataset grid on ImmuneSpace.
The original_view
argument decides which view is
downloaded. If set to TRUE
, the ‘Full’ view is
returned.
## [1] "participant_id" "arm_accession"
## [3] "biosample_accession" "expsample_accession"
## [5] "experiment_accession" "study_accession"
## [7] "study_time_collected" "study_time_collected_unit"
## [9] "virus" "value_reported"
## [11] "value_preferred" "unit_reported"
## [13] "unit_preferred"
For additional information, refer to the ‘Working with tabular data’ video tutorial.
As explained in the introductory guide, the
ImmuneSpaceConnection
class is a R6
class. It means its objects have fields accessed by reference. As a
consequence, they can be modified without making a copy of the entire
object. ImmuneSpaceR uses this feature to store downloaded datasets and
expression matrices. Subsequent calls to getDataset
with
the same input will be faster.
See ?R6::R6Class
for more information about R6 class
system.
We can see the data currently cached using the cache
field. This is not intended to be used for data manipulation and only
displayed here to explain what gets cached.
## [1] "GE_matrices" "a686d76d73f361f98814f8a71771a3b4"
## [3] "b1fc871b0317b0835fcd962f804ca1e8" "eca5f7a1d74ad1de545fd45b773f8f68"
## [5] "0518ce2a79b128c0fdf56d816576dd9c"
Different views are saved separately.
## [1] "GE_matrices" "a686d76d73f361f98814f8a71771a3b4"
## [3] "b1fc871b0317b0835fcd962f804ca1e8" "eca5f7a1d74ad1de545fd45b773f8f68"
## [5] "0518ce2a79b128c0fdf56d816576dd9c" "ef3cb0b3940dde6bf773a36e93651cc4"
Because of the infinite number of filters and combinations of filters, we do not cache filtered datasets.
If, for any reason, a specific dataset needs to be redownloaded, the
reload
argument will clear the cache for that specific
getDataset
call and download the table again.
Finally, it is possible to clear every cached dataset (and expression matrix).
## [1] "GE_matrices"
Again, the cache
field should never be modified
manually. When in doubt, simply reload the dataset.
## R version 4.3.0 RC (2023-04-13 r84266)
## Platform: aarch64-apple-darwin20 (64-bit)
## Running under: macOS Monterey 12.6.1
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
##
## locale:
## [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: America/New_York
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] Rlabkey_2.10.0 jsonlite_1.8.4 httr_1.4.5
## [4] ImmuneSpaceR_1.28.0 rmarkdown_2.21 knitr_1.42
##
## loaded via a namespace (and not attached):
## [1] tidyselect_1.2.0 viridisLite_0.4.1 farver_2.1.1
## [4] dplyr_1.1.1 viridis_0.6.2 bitops_1.0-7
## [7] fastmap_1.1.1 TSP_1.2-4 lazyeval_0.2.2
## [10] XML_3.99-0.14 digest_0.6.31 lifecycle_1.0.3
## [13] magrittr_2.0.3 compiler_4.3.0 rlang_1.1.0
## [16] sass_0.4.5 tools_4.3.0 utf8_1.2.3
## [19] yaml_2.3.7 data.table_1.14.8 labeling_0.4.2
## [22] htmlwidgets_1.6.2 curl_5.0.0 RColorBrewer_1.1-3
## [25] KernSmooth_2.23-20 registry_0.5-1 ca_0.71.1
## [28] withr_2.5.0 purrr_1.0.1 RProtoBufLib_2.12.0
## [31] BiocGenerics_0.46.0 grid_4.3.0 stats4_4.3.0
## [34] preprocessCore_1.62.1 fansi_1.0.4 caTools_1.18.2
## [37] colorspace_2.1-0 ggplot2_3.4.2 scales_1.2.1
## [40] gtools_3.9.4 iterators_1.0.14 cli_3.6.1
## [43] ncdfFlow_2.46.0 generics_0.1.3 heatmaply_1.4.2
## [46] cachem_1.0.7 flowCore_2.12.0 zlibbioc_1.46.0
## [49] assertthat_0.2.1 matrixStats_0.63.0 vctrs_0.6.1
## [52] webshot_0.5.4 cytolib_2.12.0 seriation_1.4.2
## [55] S4Vectors_0.38.1 Rgraphviz_2.44.0 dendextend_1.17.1
## [58] foreach_1.5.2 plotly_4.10.1 jquerylib_0.1.4
## [61] tidyr_1.3.0 glue_1.6.2 codetools_0.2-19
## [64] gtable_0.3.3 munsell_0.5.0 tibble_3.2.1
## [67] pillar_1.9.0 htmltools_0.5.5 gplots_3.1.3
## [70] graph_1.78.0 R6_2.5.1 evaluate_0.20
## [73] flowWorkspace_4.12.0 Biobase_2.60.0 highr_0.10
## [76] pheatmap_1.0.12 bslib_0.4.2 Rcpp_1.0.10
## [79] gridExtra_2.3 xfun_0.38 pkgconfig_2.0.3