1 Loading the data

To retrieve a dataset, we can use a dataset’s corresponding named function <id>(), where <id> should correspond to one a valid dataset identifier (see ?VectraPolarisData). Below both the lung and ovarian cancer datasets are loaded this way.

library(VectraPolarisData)
spe_lung <- HumanLungCancerV3()
spe_ovarian <- HumanOvarianCancerVP()

Alternatively, data can loaded directly from Bioconductor’s ExperimentHub as follows. First, we initialize a hub instance and store the complete list of records in a variable eh. Using query(), we then identify any records made available by the VectraPolarisData package, as well as their accession IDs (EH7311 for the lung cancer data). Finally, we can load the data into R via eh[[id]], where id corresponds to the data entry’s identifier we’d like to load. E.g.:

library(ExperimentHub)
eh <- ExperimentHub()        # initialize hub instance
q <- query(eh, "VectraPolarisData") # retrieve 'VectraPolarisData' records
id <- q$ah_id[1]             # specify dataset ID to load
spe <- eh[[id]]              # load specified dataset

2 Data Representation

Both the HumanLungCancerV3() and HumanOvarianCancerVP() datasets are stored as SpatialExperiment objects. This allows users of our data to interact with methods built for SingleCellExperiment, SummarizedExperiment, and SpatialExperiment class methods in Bioconductor. See this ebook for more details on SpatialExperiment. To get cell level tabular data that can be stored in this format, raw multiplex.tiff images have been preprocessed, segmented and cell phenotyped using Inform software from Akoya Biosciences.

The SpatialExperiment class was originally built for spatial transcriptomics data and follows the structure depicted in the schematic below (Righelli et al. 2021):

To adapt this class structure for multiplex imaging data we use slots in the following way:

  • assays slot: intensities, nucleus_intensities, membrane_intensities
  • sample_id slot: contains image identifier. For the VectraOvarianDataVP this also identifies the subject because there is one image per subject
  • colData slot: Other cell-level characteristics of the marker intensities, cell phenotypes, cell shape characteristics
  • spatialCoordsNames slot: The x- and y- coordinates describing the location of the center point in the image for each cell
  • metadata slot: A dataframe of subject-level patient clinical characteristics.

3 Transforming to other data formats

The following code shows how to transform the SpatialExperiment class object to a data.frame class object, if that is preferred for analysis. The example below is shown using the HumanOvarianVP dataset.

library(dplyr)

## Assays slots
assays_slot <- assays(spe_ovarian)
intensities_df <- assays_slot$intensities
rownames(intensities_df) <- paste0("total_", rownames(intensities_df))
nucleus_intensities_df<- assays_slot$nucleus_intensities
rownames(nucleus_intensities_df) <- paste0("nucleus_", rownames(nucleus_intensities_df))
membrane_intensities_df<- assays_slot$membrane_intensities
rownames(membrane_intensities_df) <- paste0("membrane_", rownames(membrane_intensities_df))

# colData and spatialData
colData_df <- colData(spe_ovarian)
spatialCoords_df <- spatialCoords(spe_ovarian)

# clinical data
patient_level_df <- metadata(spe_ovarian)$clinical_data

cell_level_df <- as.data.frame(cbind(colData_df, 
                                     spatialCoords_df,
                                     t(intensities_df),
                                     t(nucleus_intensities_df),
                                     t(membrane_intensities_df))
                               )


ovarian_df <- full_join(patient_level_df, cell_level_df, by = "sample_id")

4 Citation Info

The objects provided in this package are rich data sources we encourage others to use in their own analyses. If you do include them in your peer-reviewed work, we ask that you cite our package and the original studies.

To cite the VectraPolarisData package, use:

@Manual{VectraPolarisData,
    title = {VectraPolarisData: Vectra Polaris and Vectra 3 multiplex single-cell imaging data},
    author = {Wrobel, J and Ghosh, T},
    year = {2022},
    note = {Bioconductor R package version 1.0},
  }

To cite the HumanLungCancerV3() data in bibtex format, use:

@article{johnson2021cancer,
  title={Cancer cell-specific MHCII expression as a determinant of the immune infiltrate organization and function in the non-small cell lung cancer tumor microenvironment.},
  author={Johnson, AM and Boland, JM and Wrobel, J and Klezcko, EK and Weiser-Evans, M and Hopp, K and Heasley, L and Clambey, ET and Jordan, K and Nemenoff, RA and others},
  journal={Journal of Thoracic Oncology: Official Publication of the International Association for the Study of Lung Cancer},
  year={2021}
}

To cite the HumanOvarianCancerVP() data, use:

@article{steinhart2021spatial,
  title={The spatial context of tumor-infiltrating immune cells associates with improved ovarian cancer survival},
  author={Steinhart, Benjamin and Jordan, Kimberly R and Bapat, Jaidev and Post, Miriam D and Brubaker, Lindsay W and Bitler, Benjamin G and Wrobel, Julia},
  journal={Molecular Cancer Research},
  volume={19},
  number={12},
  pages={1973--1979},
  year={2021},
  publisher={AACR}
}

5 Data Dictionaries

Detailed tables representing what is provided in each dataset are listed here

5.1 HumanLungCancerV3

In the table below note the following shorthand:

  • [marker] represents one of: cd3, cd8, cd14, cd19, cd68, ck, dapi, hladr,
  • [cell region] represents one of: entire_cell, membrane, nucleus

Table 1: data dictionary for HumanLungCancerV3

Variable Slot Description Variable coding
[marker] assays: intensities mean total cell intensity for [marker]  
[marker] assays: nucleus_intensities mean nucleus intensity for [marker]
[marker] assays: membrane_intensities mean membrane intensity for [marker]
sample_id image identifier, also subject id for the ovarian data
cell_id colData







cell identifier
slide_id slide identifier, also the patient id for the lung data
tissue category type of tissue (indicates a region of the image) Stroma or Tumor
[cell region]_[marker]_min min [cell region] intensity for [marker]
[cell region]_[marker]_max max [cell region] intensity for [marker]
[cell region]_[marker]_std_dev [cell region] std dev of intensity for [marker]
[cell region]_[marker]_total total [cell region] intensity for [marker]
[cell region]_area_square_microns [cell region] area in square microns
[cell region]_compactness [cell region] compactness
[cell region]_minor_axis [cell region] length of minor axis
[cell region]_major_axis [cell region] length of major axis
[cell region]_axis_ratio [cell region] ratio of major and minor axis
phenotype_[marker] cell phenotype label as determined by Inform software
cell_x_position spatialCoordsNames cell x coordinate
cell_y_position cell y coordinate
gender metadata gender “M”, “F”
mhcII_status MHCII status, from Johnson et.al. 2021 “low”, “high”
age_at_diagnosis age at diagnosis
stage_at_diagnosis stage of the cancer when image was collected
stage_numeric numeric version of stage variable
pack_years pack-years of cigarette smoking
survival_days time in days from date of diagnosis to date of death or censoring event
survival_status did the participant pass away? 0 = no, 1 = yes
cause_of_death cause of death
recurrence_or_lung_ca_death did the participant have a recurrence or death event? 0 = no, 1 = yes
time_to_recurrence_days time in days from date of diagnosis to first recurrent event
adjuvant_therapy whether or not the participant received adjuvant therapy “No”, “Yes”

5.2 HumanOvarianCancerVP

In the table below note the following shorthand:

  • [marker] represents one of: cd3, cd8, cd19, cd68, ck, dapi, ier3, ki67, pstat3
  • [cell region] represents one of: cytoplasm, membrane, nucleus

Table 2: data dictionary for HumanOvarianCancerVP

Variable Slot Description Variable coding
[marker] assays: intensities mean total cell intensity for [marker]  
[marker] assays: nucleus_intensities mean nucleus intensity for [marker]
[marker] assays: membrane_intensities mean membrane intensity for [marker]
sample_id image identifier, also subject id for the ovarian data
cell_id colData







cell identifier
slide_id slide identifier
tissue category type of tissue (indicates a region of the image) Stroma or Tumor
[cell region]_[marker]_min min [cell region] intensity for [marker]
[cell region]_[marker]_max max [cell region] intensity for [marker]
[cell region]_[marker]_std_dev [cell region] std dev of intensity for [marker]
[cell region]_[marker]_total total [cell region] intensity for [marker]
[cell region]_area_square_microns [cell region] area in square microns
[cell region]_compactness [cell region] compactness
[cell region]_minor_axis [cell region] length of minor axis
[cell region]_major_axis [cell region] length of major axis
[cell region]_axis_ratio [cell region] ratio of major and minor axis
cell_x_position spatialCoordsNames cell x coordinate
cell_y_position cell y coordinate
diagnosis metadata
primary primary tumor from initial diagnosis? 0 = no, 1 = yes
recurrent tumor from a recurrent event (not initial diagnosis tumor)? 0 = no, 1 = yes
treatment_effect was tumor treated with chemo prior to imaging? 0 = no, 1 = yes
stage stage of the cancer when image was collected I,II,II,IV
grade grade of cancer severity (nearly all 3)
survival_time time in months from date of diagnosis to date of death or censoring event
death did the participant pass away? 0 = no, 1 = yes
BRCA_mutation does the participant have a BRCA mutation? 0 = no, 1 = yes
age_at_diagnosis age at diagnosis
time_to_recurrence time in months from date of diagnosis to first recurrent event
parpi_inhibitor whether or not the participant received PARPi inhibitor N = no, Y = yes
debulking subjective rating of how the tumor removal process went optimal, suboptimal, interval

Note: the debulking variable described as optimal if surgeon believes tumor area was reduced to 1 cm or below; suboptimal if surgeon was unable to remove significant amount of tumor due to various reasons; interval if tumor removal came after three cycles of chemo

6 Session Info

sessionInfo()
#> R version 4.4.0 beta (2024-04-15 r86425)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 22.04.4 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.19-bioc/R/lib/libRblas.so 
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB              LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: America/New_York
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#>  [1] dplyr_1.1.4                 VectraPolarisData_1.8.0    
#>  [3] SpatialExperiment_1.14.0    SingleCellExperiment_1.26.0
#>  [5] SummarizedExperiment_1.34.0 Biobase_2.64.0             
#>  [7] GenomicRanges_1.56.0        GenomeInfoDb_1.40.0        
#>  [9] IRanges_2.38.0              S4Vectors_0.42.0           
#> [11] MatrixGenerics_1.16.0       matrixStats_1.3.0          
#> [13] ExperimentHub_2.12.0        AnnotationHub_3.12.0       
#> [15] BiocFileCache_2.12.0        dbplyr_2.5.0               
#> [17] BiocGenerics_0.50.0         BiocStyle_2.32.0           
#> 
#> loaded via a namespace (and not attached):
#>  [1] KEGGREST_1.44.0         rjson_0.2.21            xfun_0.43              
#>  [4] bslib_0.7.0             lattice_0.22-6          vctrs_0.6.5            
#>  [7] tools_4.4.0             generics_0.1.3          curl_5.2.1             
#> [10] tibble_3.2.1            fansi_1.0.6             AnnotationDbi_1.66.0   
#> [13] RSQLite_2.3.6           blob_1.2.4              pkgconfig_2.0.3        
#> [16] Matrix_1.7-0            lifecycle_1.0.4         GenomeInfoDbData_1.2.12
#> [19] compiler_4.4.0          Biostrings_2.72.0       htmltools_0.5.8.1      
#> [22] sass_0.4.9              yaml_2.3.8              pillar_1.9.0           
#> [25] crayon_1.5.2            jquerylib_0.1.4         DelayedArray_0.30.0    
#> [28] cachem_1.0.8            magick_2.8.3            abind_1.4-5            
#> [31] mime_0.12               tidyselect_1.2.1        digest_0.6.35          
#> [34] purrr_1.0.2             bookdown_0.39           BiocVersion_3.19.1     
#> [37] grid_4.4.0              fastmap_1.1.1           SparseArray_1.4.0      
#> [40] cli_3.6.2               magrittr_2.0.3          S4Arrays_1.4.0         
#> [43] utf8_1.2.4              withr_3.0.0             filelock_1.0.3         
#> [46] UCSC.utils_1.0.0        rappdirs_0.3.3          bit64_4.0.5            
#> [49] rmarkdown_2.26          XVector_0.44.0          httr_1.4.7             
#> [52] bit_4.0.5               png_0.1-8               memoise_2.0.1          
#> [55] evaluate_0.23           knitr_1.46              rlang_1.1.3            
#> [58] Rcpp_1.0.12             glue_1.7.0              DBI_1.2.2              
#> [61] BiocManager_1.30.22     jsonlite_1.8.8          R6_2.5.1               
#> [64] zlibbioc_1.50.0