This vignette shows detailed examples for the
getGEMatrix()
method
As explained into the introductory vignette, datasets must be
downloaded from ImmuneSpaceConnection
objects. We must
first instantiate a connection to the study or studies of interest.
Throughout this vignette, we will use two connections, one to a single
study, and one to to all available data.
library(ImmuneSpaceR)
<- CreateConnection("SDY269")
sdy269 <- CreateConnection("") all
Now that the connections have been instantiated, we can start downloading from them. But we need to figure out which processed matrices are available within our chosen studies.
On the ImmuneSpace portal, in the study of interest or at the project level, the Gene expression matrices table will show the available runs.
Printing the connections will, among other information, list the
datasets availables. The listDatasets
method will only
display the downloadable data. looking for. With
output = "expression"
, the datasets wont be printed.
$listDatasets() sdy269
## datasets
## cohort_membership
## demographics
## elisa
## elispot
## fcs_analyzed_result
## fcs_sample_files
## gene_expression_files
## hai
## pcr
## Expression Matrices
## SDY269_PBMC_TIV_Geo
## SDY269_PBMC_LAIV_Geo
Using output = "expression"
, we can remove the datasets
from the output.
$listDatasets(output = "expression") all
## Expression Matrices
## SDY1630_Spleen_AllSubjects
## SDY1630_PBMC_AllSubjects
## SDY1630_LungLymphNode_AllSubjects
## SDY1630_Lung_AllSubjects
## SDY1630_BoneMarrow_AllSubjects
## SDY1529_WholeBlood_HealthyAdults_PreVax_Geo
## SDY1529_WholeBlood_HealthyAdults_PostVax_Geo
## SDY787_Tcell_wPaPboost_Geo
## SDY787_Tcell_aP_Geo
## SDY787_Tcell_aPaPboost_Geo
## SDY1328_WholeBlood_HealthyAldults_Geo
## SDY520_WholeBlood_Young_geo
## SDY520_WholeBlood_Older_Geo
## SDY640_WholeBlood_Young_Geo
## SDY640_WholeBlood_Older_Geo
## SDY1086_PBMC_GroupB_Geo
## SDY1086_PBMC_GroupA_Geo
## SDY1267_PBMC_RRR_Geo
## SDY1267_PBMC_ARR_Geo
## SDY1412_WholeBlood_EPC002_geo
## SDY1256_WholeBlood_EPIC001_geo
## SDY645_WholeBlood_JuvDM
## SDY376_WholeBlood_JDM
## SDY80_PBMC_Cohort2_geo
## SDY299_WholeBlood_HEPISLAV
## SDY180_WholeBlood_Grp2Saline_Geo
## SDY180_WholeBlood_Grp2Pneunomax23_Geo
## SDY180_WholeBlood_Grp2Fluzone_Geo
## SDY180_WholeBlood_Grp1Saline_Geo
## SDY180_WholeBlood_Grp1Pneunomax23_Geo
## SDY180_WholeBlood_Grp1Fluzone_Geo
## SDY1325_WholeBlood_LowIntraMuscularPS_geo
## SDY1294_PBMC_ChineseCohort_Geo
## SDY1119_PBMC_oldHealthy_Geo
## SDY1119_PBMC_oldT2D_Geo
## SDY1119_PBMC_youngT2D_Geo
## SDY1119_PBMC_youngHealthy_Geo
## SDY1289_WholeBlood_MontrealCohort_Geo
## SDY1289_WholeBlood_LausanneCohort_Geo
## SDY1324_PBMC_nonBCGvacc
## SDY1324_PBMC_LatentTB
## SDY1324_PBMC_BCGvacc
## SDY89_WholeBlood_EnergixB
## SDY1370_Bcell_lc16m8_geo
## SDY1370_Bcell_dryvax_geo
## SDY1370_Tcell_lc16m8_geo
## SDY1370_Tcell_dryvax_geo
## SDY1370_PBMC_lc16m8_geo
## SDY1370_PBMC_dryvax_geo
## SDY1368_WholeBlood_Twin_Geo
## SDY1368_WholeBlood_NonTwin_Geo
## SDY67_PBMC_HealthyAdults
## SDY224_PBMC_TIV2010_ImmPort
## SDY888_PBMC_UninfectedEndemicArea_Geo
## SDY888_PBMC_UninfectedNonEndemicArea_Geo
## SDY888_PBMC_InfectedEndemicArea_Geo
## SDY28_PBMC_Dryvax
## SDY34_PBMC_TIV
## SDY34_PBMC_Controls
## SDY305_Other_IDTIV_Geo
## SDY305_Other_TIV_Geo
## SDY112_Other_GroupC
## SDY112_Other_GroupB
## SDY112_Other_GroupA
## SDY315_Other_GroupC_Geo
## SDY315_Other_GroupB_Geo
## SDY315_Other_GroupA_Geo
## SDY406_Other_ILI_Geo
## SDY113_Other_IDTIV_Geo
## SDY113_Other_LAIV_Geo
## SDY113_Other_TIV_Geo
## SDY144_Other_TIV_Geo
## SDY690_PBMC_Energixb
## SDY690_WholeBlood_Energixb
## SDY597_Other_InVitro
## SDY522_Other_LAIV
## SDY387_WholeBlood_NCH2010
## SDY372_WholeBlood_JDM2012
## SDY368_WholeBlood_NCH2013
## SDY364_WholeBlood_NCH2012
## SDY312_Other_GroupC
## SDY312_Other_GroupB
## SDY312_Other_GroupA
## SDY301_Other_AIRFV
## SDY296_WholeBlood_AIRFV
## SDY667_WholeBlood_PSORPPP
## SDY212_WholeBlood_Older_Geo
## SDY212_WholeBlood_Young_Geo
## SDY212_PBMC_Older_Geo
## SDY212_PBMC_Young_geo
## SDY270_PBMC_TIVGroup_Geo
## SDY1373_WholeBlood_highDose_Geo
## SDY1373_WholeBlood_lowDose_Geo
## SDY1364_PBMC_IntraDermal_Geo
## SDY1364_PBMC_IntraMuscular_Geo
## SDY1325_WholeBlood_IntramuscularCRM_Geo
## SDY1325_WholeBlood_IntramuscularPS_Geo
## SDY1325_WholeBlood_SubcutaneousPS_Geo
## SDY1291_PBMC_HealthyHIVUninfected_Geo
## SDY1293_PBMC_Vaccinated_geo
## SDY1293_PBMC_Control_Geo
## SDY1276_WholeBlood_Validation_Geo
## SDY1264_PBMC_Trial2_Geo
## SDY1264_PBMC_Trial1_Geo
## SDY1260_PBMC_MCV4_Geo
## SDY1260_PBMC_MPSV4_Geo
## SDY984_PBMC_Elderly_Geo
## SDY984_PBMC_Young_Geo
## SDY61_PBMC_TIVGrp
## SDY56_PBMC_Older
## SDY56_PBMC_Young
## SDY63_PBMC_Young_Geo
## SDY63_PBMC_Older_Geo
## SDY404_PBMC_Young_Geo
## SDY404_PBMC_Older_Geo
## SDY400_PBMC_Older_Geo
## SDY400_PBMC_Young_Geo
## SDY269_PBMC_TIV_Geo
## SDY269_PBMC_LAIV_Geo
## SDY300_dendriticCell_dcMonoFlu2011
## SDY300_otherCell_dcMonoFlu2011
## SDY162_Macrophage_VLplus
## SDY162_PBMC_VLplus
## SDY162_Macrophage_VLminus
## SDY162_PBMC_VLminus
Naturally, all
contains every processed matrices
available on ImmuneSpace as it combines all available studies.
The getGEMatrix
method will accept any of the run names
listed in the connection.
<- sdy269$getGEMatrix("SDY269_PBMC_TIV_Geo") TIV_2008
## Downloading matrix..
## Constructing ExpressionSet
<- all$getGEMatrix(matrixName = "SDY144_Other_TIV_Geo") TIV_2011
## Downloading matrix..
## Constructing ExpressionSet
The matrices are returned as ExpressionSet
where the
phenoData slot contains basic demographic information and the
featureData slot shows a mapping of probe to official gene symbols.
TIV_2008
## ExpressionSet (storageMode: lockedEnvironment)
## assayData: 16146 features, 80 samples
## element names: exprs
## protocolData: none
## phenoData
## sampleNames: BS586128 BS586240 ... BS586267 (80 total)
## varLabels: participant_id study_time_collected ...
## exposure_process_preferred (8 total)
## varMetadata: labelDescription
## featureData
## featureNames: DDR1 RFC2 ... NUS1P3 (16146 total)
## fvarLabels: FeatureId gene_symbol
## fvarMetadata: labelDescription
## experimentData: use 'experimentData(object)'
## Annotation:
The cohortType
argument can be used in place of the run
name (x
). It is a concatenation of “cohort” and “cell type”
so that you may use matrices for analysis that have been normalized
within cell-type. Likewise, the list of valid cohortTypes can be found
in the Gene expression matrices table.
<- sdy269$getGEMatrix(cohortType = "LAIV group 2008_PBMC") LAIV_2008
## Downloading matrix..
## Constructing ExpressionSet
Note that when cohort is used, x
is ignored.
By default, the returned ExpressionSet
s have probe names
as features (or rows). However, multiple probes often match the same
gene and merging experiments from different arrays is impossible at
feature level. When they are available, the summary
argument allows to return the matrices with gene symbols instead of
probes. You should use currAnno
set to TRUE
to
use the latest official gene symbols mapped for each probe, but you can
also set this to FALSE
to retrieve the original mappings
from when the matrix was created.
<- sdy269$getGEMatrix("SDY269_PBMC_TIV_Geo", outputType = "summary", annotation = "latest") TIV_2008_sum
## Returning SDY269_PBMC_TIV_Geo_summary_latest_eset from cache
Probes that do not map to a unique gene are removed and expression is averaged by gene.
TIV_2008_sum
## ExpressionSet (storageMode: lockedEnvironment)
## assayData: 16146 features, 80 samples
## element names: exprs
## protocolData: none
## phenoData
## sampleNames: BS586128 BS586240 ... BS586267 (80 total)
## varLabels: participant_id study_time_collected ...
## exposure_process_preferred (8 total)
## varMetadata: labelDescription
## featureData
## featureNames: DDR1 RFC2 ... NUS1P3 (16146 total)
## fvarLabels: FeatureId gene_symbol
## fvarMetadata: labelDescription
## experimentData: use 'experimentData(object)'
## Annotation:
In order to faciliate analysis across experiments and studies, when
multiple runs or cohorts are specified, getGEMatrix
will
attempt to combine the selected expression matrices into a single
ExpressionSet
.
To avoid returning an empty object, it is usually recommended to use the summarized version of the matrices, thus combining by genes. This is almost always necessary when combining data from multiple studies.
# Within a study
<- sdy269$getGEMatrix(c("SDY269_PBMC_TIV_Geo", "SDY269_PBMC_LAIV_Geo")) em269
## Returning SDY269_PBMC_TIV_Geo_summary_latest_eset from cache
## Returning SDY269_PBMC_LAIV_Geo_summary_latest_eset from cache
## Combining ExpressionSets
# Combining across studies
<- all$getGEMatrix(c("SDY269_PBMC_TIV_Geo", "SDY144_Other_TIV_Geo"),
TIV_seasons outputType = "summary",
annotation = "latest")
## Downloading matrix..
## Constructing ExpressionSet
## Returning SDY144_Other_TIV_Geo_summary_latest_eset from cache
## Combining ExpressionSets
As explained in the introductory, the
ImmuneSpaceConnection
class is a R6
class. It means its objects have fields accessed by reference. As a
consequence, they can be modified without making a copy of the entire
object. ImmuneSpaceR uses this feature to store downloaded datasets and
expression matrices. Subsequent calls to getGEMatrix
with
the same input will be faster.
See ?R6::R6Class
for more information about R6 class
system.
We can see a list of already downloaded runs and feature sets the
cache
field. This is not intended to be used for data
manipulation and only displayed here to explain what gets cached.
names(sdy269$cache)
## [1] "GE_matrices"
## [2] "SDY269_PBMC_TIV_Geo_sum_latest"
## [3] "featureset_18"
## [4] "SDY269_PBMC_TIV_Geo_summary_latest_eset"
## [5] "SDY269_PBMC_LAIV_Geo_sum_latest"
## [6] "SDY269_PBMC_LAIV_Geo_summary_latest_eset"
If, for any reason, a specific marix needs to be redownloaded, the
reload
argument will clear the cache for that specific
getGEMatrix
call and download the file and metadata
again.
<- sdy269$getGEMatrix("SDY269_PBMC_TIV_Geo", reload = TRUE) TIV_2008
## Downloading matrix..
## Constructing ExpressionSet
Finally, it is possible to clear every cached expression matrix (and dataset).
$clearCache() sdy269
Again, the cache
field should never be modified
manually. When in doubt, simply reload the expression matrix.
sessionInfo()
## R version 4.2.2 (2022-10-31)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Big Sur ... 10.16
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
##
## locale:
## [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] Rlabkey_2.9.0 jsonlite_1.8.4 httr_1.4.4
## [4] ImmuneSpaceR_1.26.1 rmarkdown_2.19 knitr_1.41
##
## loaded via a namespace (and not attached):
## [1] Biobase_2.58.0 viridis_0.6.2 sass_0.4.4
## [4] tidyr_1.2.1 viridisLite_0.4.1 foreach_1.5.2
## [7] gtools_3.9.4 bslib_0.4.1 assertthat_0.2.1
## [10] highr_0.9 stats4_4.2.2 flowWorkspace_4.10.0
## [13] yaml_2.3.6 pillar_1.8.1 glue_1.6.2
## [16] digest_0.6.31 RColorBrewer_1.1-3 colorspace_2.0-3
## [19] preprocessCore_1.60.1 htmltools_0.5.4 XML_3.99-0.13
## [22] pkgconfig_2.0.3 pheatmap_1.0.12 zlibbioc_1.44.0
## [25] purrr_0.3.5 flowCore_2.10.0 scales_1.2.1
## [28] webshot_0.5.4 tibble_3.1.8 farver_2.1.1
## [31] generics_0.1.3 ggplot2_3.4.0 withr_2.5.0
## [34] cachem_1.0.6 BiocGenerics_0.44.0 lazyeval_0.2.2
## [37] cli_3.4.1 magrittr_2.0.3 heatmaply_1.4.0
## [40] evaluate_0.19 fansi_1.0.3 gplots_3.1.3
## [43] graph_1.76.0 registry_0.5-1 tools_4.2.2
## [46] data.table_1.14.6 ncdfFlow_2.44.0 lifecycle_1.0.3
## [49] matrixStats_0.63.0 stringr_1.5.0 plotly_4.10.1
## [52] S4Vectors_0.36.1 munsell_0.5.0 compiler_4.2.2
## [55] jquerylib_0.1.4 ca_0.71.1 caTools_1.18.2
## [58] rlang_1.0.6 grid_4.2.2 iterators_1.0.14
## [61] htmlwidgets_1.6.0 labeling_0.4.2 bitops_1.0-7
## [64] codetools_0.2-18 cytolib_2.10.0 gtable_0.3.1
## [67] DBI_1.1.3 curl_4.3.3 TSP_1.2-1
## [70] R6_2.5.1 RProtoBufLib_2.10.0 seriation_1.4.0
## [73] gridExtra_2.3 dplyr_1.0.10 fastmap_1.1.0
## [76] utf8_1.2.2 KernSmooth_2.23-20 dendextend_1.16.0
## [79] Rgraphviz_2.42.0 stringi_1.7.8 Rcpp_1.0.9
## [82] vctrs_0.5.1 tidyselect_1.2.0 xfun_0.35