This vignette introduces the SpaceTrooper package for
preprocessing and quality control of imaging-based spatial omics from
platforms like CosMx on Protein assay.
To install SpaceTrooper, use the following commands:
In this section, we load data from various platforms using the
package’s functions. The goal is to provide a uniform
SpatialExperiment object across all technologies, allowing
for consistent QC analysis.
The functions in SpaceTrooper compute missing metrics as
needed and allow for the inclusion of polygons with the
keep_polygons argument. This stores polygons in the
colData of the SpatialExperiment.
# Load the SpaceTrooper library
library(SpaceTrooper)
# Load Xenium data into a Spatial Experiment object (SPE)
protfolder <- system.file( "extdata", "S01_prot", package="SpaceTrooper")
(spe <- readCosmxProteinSPE(protfolder))## class: SpatialExperiment
## dim: 69 745
## metadata(4): fov_positions fov_dim polygons technology
## assays(1): counts
## rownames(69): 4-1BB B7-H3 ... Ms IgG1 Rb IgG
## rowData names(0):
## colnames(745): f200_c10 f200_c100 ... f201_c96 f201_c98
## colData names(58): fov cellID ... cell sample_id
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## spatialCoords names(2) : CenterX_global_px CenterY_global_px
## imgData names(1): sample_id
## DataFrame with 745 rows and 58 columns
## fov cellID Area AspectRatio Width Height
## <integer> <integer> <integer> <numeric> <integer> <integer>
## f200_c10 200 10 1596 0.55 58 32
## f200_c100 200 100 5855 0.92 101 93
## f200_c102 200 102 9468 0.89 106 119
## f200_c104 200 104 4888 0.84 90 76
## f200_c106 200 106 5757 0.74 106 78
## ... ... ... ... ... ... ...
## f201_c90 201 90 4425 0.93 83 77
## f201_c92 201 92 9563 0.83 108 130
## f201_c94 201 94 5388 0.91 92 84
## f201_c96 201 96 5587 0.79 76 96
## f201_c98 201 98 76154 0.81 322 396
## Mean.PanCK Max.PanCK Mean.CD68 Max.CD68 Mean.Membrane Max.Membrane
## <integer> <integer> <integer> <integer> <integer> <integer>
## f200_c10 896 1184 350 604 1475 2776
## f200_c100 809 1120 320 636 1918 4380
## f200_c102 556 1948 246 5816 375 3108
## f200_c104 694 7348 272 472 1264 3212
## f200_c106 612 1216 338 1548 527 2248
## ... ... ... ... ... ... ...
## f201_c90 1226 1516 499 696 653 1216
## f201_c92 570 1172 223 404 261 880
## f201_c94 484 1088 208 400 506 1516
## f201_c96 903 1564 395 4584 688 3412
## f201_c98 1047 2608 423 4552 801 3540
## Mean.CD45 Max.CD45 Mean.DAPI Max.DAPI SplitRatioToLocal NucArea
## <integer> <integer> <integer> <integer> <numeric> <integer>
## f200_c10 14940 31860 4901 7524 0.19 1292
## f200_c100 15531 33192 5157 8604 0.00 4696
## f200_c102 284 4976 1710 5544 0.00 3704
## f200_c104 12480 35720 3399 5944 0.00 3032
## f200_c106 268 668 202 320 0.00 0
## ... ... ... ... ... ... ...
## f201_c90 389 808 321 380 0 0
## f201_c92 211 532 172 296 0 0
## f201_c94 241 476 148 228 0 0
## f201_c96 357 5088 964 3736 0 1936
## f201_c98 373 4060 309 2580 0 1508
## NucAspectRatio Circularity Eccentricity Perimeter Solidity
## <numeric> <numeric> <numeric> <integer> <numeric>
## f200_c10 0.67 3.87 0.66 72 22.17
## f200_c100 0.98 0.93 0.56 282 20.76
## f200_c102 0.76 1.02 0.67 342 27.68
## f200_c104 0.97 0.92 0.80 259 18.87
## f200_c106 0.00 0.92 0.63 281 20.49
## ... ... ... ... ... ...
## f201_c90 0.00 0.95 0.69 242 18.29
## f201_c92 0.00 0.92 0.70 361 26.49
## f201_c94 0.00 0.96 0.72 265 20.33
## f201_c96 0.69 1.00 0.68 265 21.08
## f201_c98 0.80 0.74 0.72 1135 67.10
## cell_id X version dualfiles Run_name
## <character> <integer> <character> <character> <character>
## f200_c10 f200_c10 1 v6 ? Run0
## f200_c100 f200_c100 1 v6 ? Run0
## f200_c102 f200_c102 1 v6 ? Run0
## f200_c104 f200_c104 1 v6 ? Run0
## f200_c106 f200_c106 1 v6 ? Run0
## ... ... ... ... ... ...
## f201_c90 f201_c90 1 v6 ? Run0
## f201_c92 f201_c92 1 v6 ? Run0
## f201_c94 f201_c94 1 v6 ? Run0
## f201_c96 f201_c96 1 v6 ? Run0
## f201_c98 f201_c98 1 v6 ? Run0
## Run_Tissue_name ISH.concentration Dash tissue Panel
## <character> <character> <character> <character> <character>
## f200_c10 S0 1nM PILOT tissue WTx
## f200_c100 S0 1nM PILOT tissue WTx
## f200_c102 S0 1nM PILOT tissue WTx
## f200_c104 S0 1nM PILOT tissue WTx
## f200_c106 S0 1nM PILOT tissue WTx
## ... ... ... ... ... ...
## f201_c90 S0 1nM PILOT tissue WTx
## f201_c92 S0 1nM PILOT tissue WTx
## f201_c94 S0 1nM PILOT tissue WTx
## f201_c96 S0 1nM PILOT tissue WTx
## f201_c98 S0 1nM PILOT tissue WTx
## assay_type slide_ID median_RNA RNA_quantile_0.75 RNA_quantile_0.8
## <character> <integer> <numeric> <numeric> <numeric>
## f200_c10 protein 1 24859.4 141202 249217
## f200_c100 protein 1 24859.4 141202 249217
## f200_c102 protein 1 24859.4 141202 249217
## f200_c104 protein 1 24859.4 141202 249217
## f200_c106 protein 1 24859.4 141202 249217
## ... ... ... ... ... ...
## f201_c90 protein 1 11016.9 45561.7 56296.9
## f201_c92 protein 1 11016.9 45561.7 56296.9
## f201_c94 protein 1 11016.9 45561.7 56296.9
## f201_c96 protein 1 11016.9 45561.7 56296.9
## f201_c98 protein 1 11016.9 45561.7 56296.9
## RNA_quantile_0.85 RNA_quantile_0.9 RNA_quantile_0.95
## <numeric> <numeric> <numeric>
## f200_c10 439292 582926 1175591
## f200_c100 439292 582926 1175591
## f200_c102 439292 582926 1175591
## f200_c104 439292 582926 1175591
## f200_c106 439292 582926 1175591
## ... ... ... ...
## f201_c90 140621 208723 361311
## f201_c92 140621 208723 361311
## f201_c94 140621 208723 361311
## f201_c96 140621 208723 361311
## f201_c98 140621 208723 361311
## RNA_quantile_0.99 nCount_RNA nFeature_RNA median_negprobes
## <numeric> <numeric> <integer> <numeric>
## f200_c10 2758078 36035.65 67 6432.11
## f200_c100 2758078 39097.62 67 6432.11
## f200_c102 2758078 9059.81 67 6432.11
## f200_c104 2758078 30541.70 67 6432.11
## f200_c106 2758078 8988.49 67 6432.11
## ... ... ... ... ...
## f201_c90 1033930 6648.66 67 4190.43
## f201_c92 1033930 6344.07 67 4190.43
## f201_c94 1033930 7575.43 67 4190.43
## f201_c96 1033930 7713.20 67 4190.43
## f201_c98 1033930 6502.25 67 4190.43
## negprobes_quantile_0.75 negprobes_quantile_0.8
## <numeric> <numeric>
## f200_c10 7810.51 8086.19
## f200_c100 7810.51 8086.19
## f200_c102 7810.51 8086.19
## f200_c104 7810.51 8086.19
## f200_c106 7810.51 8086.19
## ... ... ...
## f201_c90 5042.51 5212.92
## f201_c92 5042.51 5212.92
## f201_c94 5042.51 5212.92
## f201_c96 5042.51 5212.92
## f201_c98 5042.51 5212.92
## negprobes_quantile_0.85 negprobes_quantile_0.9
## <numeric> <numeric>
## f200_c10 8361.87 8637.55
## f200_c100 8361.87 8637.55
## f200_c102 8361.87 8637.55
## f200_c104 8361.87 8637.55
## f200_c106 8361.87 8637.55
## ... ... ...
## f201_c90 5383.34 5553.76
## f201_c92 5383.34 5553.76
## f201_c94 5383.34 5553.76
## f201_c96 5383.34 5553.76
## f201_c98 5383.34 5553.76
## negprobes_quantile_0.95 negprobes_quantile_0.99 nCount_negprobes
## <numeric> <numeric> <numeric>
## f200_c10 8913.23 9133.77 15.20
## f200_c100 8913.23 9133.77 14.72
## f200_c102 8913.23 9133.77 15.70
## f200_c104 8913.23 9133.77 12.46
## f200_c106 8913.23 9133.77 20.58
## ... ... ... ...
## f201_c90 5724.17 5860.51 23.49
## f201_c92 5724.17 5860.51 14.87
## f201_c94 5724.17 5860.51 13.29
## f201_c96 5724.17 5860.51 21.39
## f201_c98 5724.17 5860.51 21.85
## nFeature_negprobes Area.um2 CenterX_local_px CenterY_local_px
## <integer> <numeric> <integer> <integer>
## f200_c10 2 22.9824 1365 16
## f200_c100 2 84.3120 762 204
## f200_c102 2 136.3392 2837 205
## f200_c104 2 70.3872 1443 202
## f200_c106 2 82.9008 3947 191
## ... ... ... ... ...
## f201_c90 2 63.7200 3576 787
## f201_c92 2 137.7072 128 837
## f201_c94 2 77.5872 2364 844
## f201_c96 2 80.4528 2772 870
## f201_c98 2 1096.6176 1923 1048
## cell sample_id
## <character> <character>
## f200_c10 c_1_200_10 sample01
## f200_c100 c_1_200_100 sample01
## f200_c102 c_1_200_102 sample01
## f200_c104 c_1_200_104 sample01
## f200_c106 c_1_200_106 sample01
## ... ... ...
## f201_c90 c_1_201_90 sample01
## f201_c92 c_1_201_92 sample01
## f201_c94 c_1_201_94 sample01
## f201_c96 c_1_201_96 sample01
## f201_c98 c_1_201_98 sample01
The package offers several functions for spatial data analysis, including quality control and visualization.
This tutorial focuses on CosMx protein data, which provides Fields of View (FoVs) with cell identifiers. Note that FoVs are unique to CosMx.
Additionally, even if not tested, the same approach can be extended
on Akoya CODEX data, as far as a SpatialExperiment object
is created. Polygons can be loaded later if needed.
The plotCellsFovs function shows a map of the FoVs
within an experiment. This plot is specific to CosMx data and uses cell
centroids.
Please keep in mind, that this specific experiment had unaligned
fov_positions and cell centroids positions. An alignment
approach, can be found at the end of the
scripts/datacreation.R file.
Because the dataset is a subset of just one Field of View of the original experiment, we are able to see the identifier of the FoV in black and the centroids of the cells in purple.
When an experiment has multiple FoVs, you can see the map and the topological organization of the FoVs, together with their identifiers.
The spatialPerCellQC function, inspired by
scater::addPerCellQC, computes additional metrics for each
cell in the SpatialExperiment. It also allows for the
detection of negative control probes, which is crucial for QC.
By default, it automatically removes 0 counts cells, but this can be
handled with the rmZeros argument.
Here, for transparency, we specified the negProbList for
CosMx protein assays, but the algorithm has already a set of negative
probes for the mostly used probes in multiple technologies. Notice that
despite the same approach can be applied to CODEX data, it is not
provided a list of negative probes for this technology, so the user
needs to specify them.
# Perform per-cell quality control checks
spe <- spatialPerCellQC(spe, negProbList=c("Ms IgG1", "Rb IgG"))
names(colData(spe))## [1] "fov" "cellID"
## [3] "Area" "AspectRatio"
## [5] "Width" "Height"
## [7] "Mean.PanCK" "Max.PanCK"
## [9] "Mean.CD68" "Max.CD68"
## [11] "Mean.Membrane" "Max.Membrane"
## [13] "Mean.CD45" "Max.CD45"
## [15] "Mean.DAPI" "Max.DAPI"
## [17] "SplitRatioToLocal" "NucArea"
## [19] "NucAspectRatio" "Circularity"
## [21] "Eccentricity" "Perimeter"
## [23] "Solidity" "cell_id"
## [25] "X" "version"
## [27] "dualfiles" "Run_name"
## [29] "Run_Tissue_name" "ISH.concentration"
## [31] "Dash" "tissue"
## [33] "Panel" "assay_type"
## [35] "slide_ID" "median_RNA"
## [37] "RNA_quantile_0.75" "RNA_quantile_0.8"
## [39] "RNA_quantile_0.85" "RNA_quantile_0.9"
## [41] "RNA_quantile_0.95" "RNA_quantile_0.99"
## [43] "nCount_RNA" "nFeature_RNA"
## [45] "median_negprobes" "negprobes_quantile_0.75"
## [47] "negprobes_quantile_0.8" "negprobes_quantile_0.85"
## [49] "negprobes_quantile_0.9" "negprobes_quantile_0.95"
## [51] "negprobes_quantile_0.99" "nCount_negprobes"
## [53] "nFeature_negprobes" "Area_um"
## [55] "CenterX_local_px" "CenterY_local_px"
## [57] "cell" "sample_id"
## [59] "sum" "detected"
## [61] "subsets_Ms IgG1_sum" "subsets_Ms IgG1_detected"
## [63] "subsets_Ms IgG1_percent" "subsets_Rb IgG_sum"
## [65] "subsets_Rb IgG_detected" "subsets_Rb IgG_percent"
## [67] "total" "control_sum"
## [69] "control_detected" "target_sum"
## [71] "target_detected" "CenterX_global_px"
## [73] "CenterY_global_px" "ctrl_total_ratio"
## [75] "log2Ctrl_total_ratio" "CenterX_global_um"
## [77] "CenterY_global_um" "dist_border_x"
## [79] "dist_border_y" "dist_border"
## [81] "log2AspectRatio" "CountArea"
## [83] "log2CountArea"
You can investigate individual metrics by viewing their histograms.
For outliers, use the use_fences argument to display the
fences computed by computeSpatialOutlier (see next
chunk).
# Plot a histogram of proportion of counts respect to the cell area in micron
plotMetricHist(spe, metric="log2CountArea")# Plot a histogram of proportion of negative probe counts respect to the total
# counts in cells
plotMetricHist(spe, metric="log2Ctrl_total_ratio")These plots show, respectively, the distributions of the total counts
(sum), of the Area in micron (Area_um), the
relationship between the counts and the Area of each cell
(log2CountArea) and the proportion between the negative
probes counts and the total counts of each cell
(log2Ctrl_total_ratio).
Spatial outlier detection is another critical step in QC. While the flag score addresses some metrics, other outlier detection methods may be needed.
The computeSpatialOutlier function allows the
computation of the medcouple statistics on a specified metric
(compute_by argument). The medcouple is specifically
designed for asymmetric distributions, indeed the function stamps a
warning message when this requisite is not satisfied. It can also use
scuttle::isOutlier for symmetric distributions. The
method argument supports mc,
scuttle, or both.
This outlier detection approach can be used to decide if and which cells can be discarded on a singular metric.
# Identify spatial outliers based on cell area (Area_um)
spe <- computeSpatialOutlier(spe, computeBy="Area_um", method="both")
# Identify spatial outliers based on mean DAPI intensity
spe <- computeSpatialOutlier(spe, computeBy="Mean.DAPI", method="both")
names(colData(spe))## [1] "fov" "cellID"
## [3] "Area" "AspectRatio"
## [5] "Width" "Height"
## [7] "Mean.PanCK" "Max.PanCK"
## [9] "Mean.CD68" "Max.CD68"
## [11] "Mean.Membrane" "Max.Membrane"
## [13] "Mean.CD45" "Max.CD45"
## [15] "Mean.DAPI" "Max.DAPI"
## [17] "SplitRatioToLocal" "NucArea"
## [19] "NucAspectRatio" "Circularity"
## [21] "Eccentricity" "Perimeter"
## [23] "Solidity" "cell_id"
## [25] "X" "version"
## [27] "dualfiles" "Run_name"
## [29] "Run_Tissue_name" "ISH.concentration"
## [31] "Dash" "tissue"
## [33] "Panel" "assay_type"
## [35] "slide_ID" "median_RNA"
## [37] "RNA_quantile_0.75" "RNA_quantile_0.8"
## [39] "RNA_quantile_0.85" "RNA_quantile_0.9"
## [41] "RNA_quantile_0.95" "RNA_quantile_0.99"
## [43] "nCount_RNA" "nFeature_RNA"
## [45] "median_negprobes" "negprobes_quantile_0.75"
## [47] "negprobes_quantile_0.8" "negprobes_quantile_0.85"
## [49] "negprobes_quantile_0.9" "negprobes_quantile_0.95"
## [51] "negprobes_quantile_0.99" "nCount_negprobes"
## [53] "nFeature_negprobes" "Area_um"
## [55] "CenterX_local_px" "CenterY_local_px"
## [57] "cell" "sample_id"
## [59] "sum" "detected"
## [61] "subsets_Ms IgG1_sum" "subsets_Ms IgG1_detected"
## [63] "subsets_Ms IgG1_percent" "subsets_Rb IgG_sum"
## [65] "subsets_Rb IgG_detected" "subsets_Rb IgG_percent"
## [67] "total" "control_sum"
## [69] "control_detected" "target_sum"
## [71] "target_detected" "CenterX_global_px"
## [73] "CenterY_global_px" "ctrl_total_ratio"
## [75] "log2Ctrl_total_ratio" "CenterX_global_um"
## [77] "CenterY_global_um" "dist_border_x"
## [79] "dist_border_y" "dist_border"
## [81] "log2AspectRatio" "CountArea"
## [83] "log2CountArea" "Area_um_outlier_mc"
## [85] "Area_um_outlier_sc" "Mean.DAPI_outlier_mc"
## [87] "Mean.DAPI_outlier_sc"
If we computed outliers with the computeSpatialOutlier
function, we can also visualize which fences have been used to create
the filter on the cells.
# Plot a histogram with fences to identify outliers using the medcouple
plotMetricHist(spe, metric="Area_um", useFences="Area_um_outlier_mc")# Plot a histogram with fences to identify outliers using scuttle
plotMetricHist(spe, metric="Area_um", useFences="Area_um_outlier_sc")# Plot a histogram with fences to identify outliers using the medcouple
plotMetricHist(spe, metric="Mean.DAPI", useFences="Mean.DAPI_outlier_mc")# Plot a histogram with fences to identify outliers using scuttle
plotMetricHist(spe, metric="Mean.DAPI", useFences="Mean.DAPI_outlier_sc")We visualize the fences computed with medcouple and scuttle outlier detection approaches, to directly inspect differences and the amount of detected outlier each method detected.
If we want, we can already use these fences to remove the computed outliers.
Next, we use computeQCScore to calculate a flag score
based on previously computed metrics. The quality score combines
transcript counts related to cell
area, the aspect ratio of each cell, and its
distance from the FoV border (only for CosMx, this last
one is not used otherwise).
See the help(computeQCScore) details section for
additional details.
## [1] "fov" "cellID"
## [3] "Area" "AspectRatio"
## [5] "Width" "Height"
## [7] "Mean.PanCK" "Max.PanCK"
## [9] "Mean.CD68" "Max.CD68"
## [11] "Mean.Membrane" "Max.Membrane"
## [13] "Mean.CD45" "Max.CD45"
## [15] "Mean.DAPI" "Max.DAPI"
## [17] "SplitRatioToLocal" "NucArea"
## [19] "NucAspectRatio" "Circularity"
## [21] "Eccentricity" "Perimeter"
## [23] "Solidity" "cell_id"
## [25] "X" "version"
## [27] "dualfiles" "Run_name"
## [29] "Run_Tissue_name" "ISH.concentration"
## [31] "Dash" "tissue"
## [33] "Panel" "assay_type"
## [35] "slide_ID" "median_RNA"
## [37] "RNA_quantile_0.75" "RNA_quantile_0.8"
## [39] "RNA_quantile_0.85" "RNA_quantile_0.9"
## [41] "RNA_quantile_0.95" "RNA_quantile_0.99"
## [43] "nCount_RNA" "nFeature_RNA"
## [45] "median_negprobes" "negprobes_quantile_0.75"
## [47] "negprobes_quantile_0.8" "negprobes_quantile_0.85"
## [49] "negprobes_quantile_0.9" "negprobes_quantile_0.95"
## [51] "negprobes_quantile_0.99" "nCount_negprobes"
## [53] "nFeature_negprobes" "Area_um"
## [55] "CenterX_local_px" "CenterY_local_px"
## [57] "cell" "sample_id"
## [59] "sum" "detected"
## [61] "subsets_Ms IgG1_sum" "subsets_Ms IgG1_detected"
## [63] "subsets_Ms IgG1_percent" "subsets_Rb IgG_sum"
## [65] "subsets_Rb IgG_detected" "subsets_Rb IgG_percent"
## [67] "total" "control_sum"
## [69] "control_detected" "target_sum"
## [71] "target_detected" "CenterX_global_px"
## [73] "CenterY_global_px" "ctrl_total_ratio"
## [75] "log2Ctrl_total_ratio" "CenterX_global_um"
## [77] "CenterY_global_um" "dist_border_x"
## [79] "dist_border_y" "dist_border"
## [81] "log2AspectRatio" "CountArea"
## [83] "log2CountArea" "Area_um_outlier_mc"
## [85] "Area_um_outlier_sc" "Mean.DAPI_outlier_mc"
## [87] "Mean.DAPI_outlier_sc" "QC_score"
Logical filters can then be computed using
computeQScoreFlags, which requires thresholds for various
metrics. Currently, the function considers:
Quality Score (qsThreshold): Cells
with scores below this threshold (default 0.5) are flagged for
exclusion. This value can be used to indicate the quantile for the
filtering when setting the useQSQuantiles argument to
TRUE.
Quality Score Quantiles
(useQSQuantiles): Option to filter based on
quantiles (default FALSE).
# Compute flags to identify cells for filtering
spe <- computeQCScoreFlags(spe, qsThreshold=0.5)
names(colData(spe))## [1] "fov" "cellID"
## [3] "Area" "AspectRatio"
## [5] "Width" "Height"
## [7] "Mean.PanCK" "Max.PanCK"
## [9] "Mean.CD68" "Max.CD68"
## [11] "Mean.Membrane" "Max.Membrane"
## [13] "Mean.CD45" "Max.CD45"
## [15] "Mean.DAPI" "Max.DAPI"
## [17] "SplitRatioToLocal" "NucArea"
## [19] "NucAspectRatio" "Circularity"
## [21] "Eccentricity" "Perimeter"
## [23] "Solidity" "cell_id"
## [25] "X" "version"
## [27] "dualfiles" "Run_name"
## [29] "Run_Tissue_name" "ISH.concentration"
## [31] "Dash" "tissue"
## [33] "Panel" "assay_type"
## [35] "slide_ID" "median_RNA"
## [37] "RNA_quantile_0.75" "RNA_quantile_0.8"
## [39] "RNA_quantile_0.85" "RNA_quantile_0.9"
## [41] "RNA_quantile_0.95" "RNA_quantile_0.99"
## [43] "nCount_RNA" "nFeature_RNA"
## [45] "median_negprobes" "negprobes_quantile_0.75"
## [47] "negprobes_quantile_0.8" "negprobes_quantile_0.85"
## [49] "negprobes_quantile_0.9" "negprobes_quantile_0.95"
## [51] "negprobes_quantile_0.99" "nCount_negprobes"
## [53] "nFeature_negprobes" "Area_um"
## [55] "CenterX_local_px" "CenterY_local_px"
## [57] "cell" "sample_id"
## [59] "sum" "detected"
## [61] "subsets_Ms IgG1_sum" "subsets_Ms IgG1_detected"
## [63] "subsets_Ms IgG1_percent" "subsets_Rb IgG_sum"
## [65] "subsets_Rb IgG_detected" "subsets_Rb IgG_percent"
## [67] "total" "control_sum"
## [69] "control_detected" "target_sum"
## [71] "target_detected" "CenterX_global_px"
## [73] "CenterY_global_px" "ctrl_total_ratio"
## [75] "log2Ctrl_total_ratio" "CenterX_global_um"
## [77] "CenterY_global_um" "dist_border_x"
## [79] "dist_border_y" "dist_border"
## [81] "log2AspectRatio" "CountArea"
## [83] "log2CountArea" "Area_um_outlier_mc"
## [85] "Area_um_outlier_sc" "Mean.DAPI_outlier_mc"
## [87] "Mean.DAPI_outlier_sc" "QC_score"
## [89] "low_qcscore"
##
## FALSE TRUE
## 695 50
We detected 61 cells to be removed.
While for other metrics such as the total counts and the negative
prob ratio, the function computeThresholdFlags
considers:
Total Counts (totalThreshold):
Minimum count threshold (default 0).
Negative Probe Ratio
(ctrlTotRatioThreshold): Minimum ratio of negative
probes to total counts (default 0.1).
spe <- computeThresholdFlags(spe, totalThreshold=0,
ctrlTotRatioThreshold=0.1)
table(spe$threshold_flags)##
## FALSE
## 745
In this example, we don’t find any cell to be removed.
To better understand the quality score values we start to load the polygons, giving us a better overview of the cells characteristics.
We can load and add polygons to the SPE object using the following
functions. Each technology has its own readPolygons
function to standardize the loaded sf object and handle
different file types.
# Read polygon data associated with cells in the SPE
# the polygon file path is stored in the spe metadata
pols <- readPolygonsCosmx(metadata(spe)$polygons)
# Add the polygon data to the SPE object
spe <- addPolygonsToSPE(spe, pols)Once the polygons are stored in an sf object within
colData, they can be visualized using functions based on
the ggplot2 library.
Showing the cells on a white background for better visualization.
We can see in yellow and darkviolet that
there are few cells with extreme values of log2AspectRatio
and Area:um in micron.
We can see that the quality score is able to detect both these aspects and highlight the cells that are mostly isolated on the FoV border or showing a weird confomation.
We always recommend to be aware of the cell populations in the under-study context, before proceeding to remove the detected cells.
The plotZoomFovsMap function allows you to visualize a
map of the FoVs with a zoom-in of selected FoVs, colored by the
colour_by argument.
We see on the left side the map of all the FoVs (only the FoV 16 in this case), together with the poligons on the right, coloured by the quality score. Allowing us to have a better view of a specific tissue area in the whole experiment.
In this vignette, we explored the main functionalities of the
SpaceTrooper package for spatial data analysis. Main steps
shown are: * data and polygons loading for CosMx Protein, CosMx *
quality control: + outlier detection: medcouple and scuttle MAD + flag
score: a score combining transcript counts,
cell area, aspect ratio and
distance from the FoV border * visualization: +
centroids: with ggplot2 + polygons: sf + ggplot2
## R version 4.5.2 (2025-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.3 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] SpaceTrooper_1.0.1 SpatialExperiment_1.20.0
## [3] SingleCellExperiment_1.32.0 SummarizedExperiment_1.40.0
## [5] Biobase_2.70.0 GenomicRanges_1.62.1
## [7] Seqinfo_1.0.0 IRanges_2.44.0
## [9] S4Vectors_0.48.0 BiocGenerics_0.56.0
## [11] generics_0.1.4 MatrixGenerics_1.22.0
## [13] matrixStats_1.5.0 BiocStyle_2.38.0
##
## loaded via a namespace (and not attached):
## [1] DBI_1.2.3 gridExtra_2.3
## [3] rlang_1.1.6 magrittr_2.0.4
## [5] scater_1.38.0 e1071_1.7-16
## [7] compiler_4.5.2 DelayedMatrixStats_1.32.0
## [9] SpatialExperimentIO_1.2.0 sfheaders_0.4.5
## [11] vctrs_0.6.5 shape_1.4.6.1
## [13] pkgconfig_2.0.3 fastmap_1.2.0
## [15] backports_1.5.0 magick_2.9.0
## [17] XVector_0.50.0 labeling_0.4.3
## [19] scuttle_1.20.0 rmarkdown_2.30
## [21] ggbeeswarm_0.7.3 purrr_1.2.0
## [23] bit_4.6.0 xfun_0.54
## [25] glmnet_4.1-10 cachem_1.1.0
## [27] beachmat_2.26.0 jsonlite_2.0.0
## [29] rhdf5filters_1.22.0 DelayedArray_0.36.0
## [31] Rhdf5lib_1.32.0 BiocParallel_1.44.0
## [33] irlba_2.3.5.1 broom_1.0.11
## [35] parallel_4.5.2 R6_2.6.1
## [37] bslib_0.9.0 RColorBrewer_1.1-3
## [39] limma_3.66.0 car_3.1-3
## [41] jquerylib_0.1.4 iterators_1.0.14
## [43] Rcpp_1.1.0.8 assertthat_0.2.1
## [45] knitr_1.50 R.utils_2.13.0
## [47] splines_4.5.2 Matrix_1.7-4
## [49] tidyselect_1.2.1 viridis_0.6.5
## [51] abind_1.4-8 yaml_2.3.12
## [53] codetools_0.2-20 lattice_0.22-7
## [55] tibble_3.3.0 withr_3.0.2
## [57] S7_0.2.1 evaluate_1.0.5
## [59] sf_1.0-23 survival_3.8-3
## [61] units_1.0-0 proxy_0.4-28
## [63] pillar_1.11.1 BiocManager_1.30.27
## [65] ggpubr_0.6.2 carData_3.0-5
## [67] KernSmooth_2.23-26 foreach_1.5.2
## [69] ggplot2_4.0.1 sparseMatrixStats_1.22.0
## [71] scales_1.4.0 class_7.3-23
## [73] glue_1.8.0 maketools_1.3.2
## [75] tools_4.5.2 BiocNeighbors_2.4.0
## [77] robustbase_0.99-6 sys_3.4.3
## [79] data.table_1.17.8 ScaledMatrix_1.18.0
## [81] locfit_1.5-9.12 ggsignif_0.6.4
## [83] buildtools_1.0.0 cowplot_1.2.0
## [85] rhdf5_2.54.1 grid_4.5.2
## [87] tidyr_1.3.1 DropletUtils_1.30.0
## [89] edgeR_4.8.1 beeswarm_0.4.0
## [91] BiocSingular_1.26.1 HDF5Array_1.38.0
## [93] vipor_0.4.7 rsvd_1.0.5
## [95] Formula_1.2-5 cli_3.6.5
## [97] viridisLite_0.4.2 S4Arrays_1.10.1
## [99] arrow_22.0.0 dplyr_1.1.4
## [101] DEoptimR_1.1-4 gtable_0.3.6
## [103] R.methodsS3_1.8.2 rstatix_0.7.3
## [105] sass_0.4.10 digest_0.6.39
## [107] classInt_0.4-11 ggrepel_0.9.6
## [109] SparseArray_1.10.7 dqrng_0.4.1
## [111] rjson_0.2.23 farver_2.1.2
## [113] htmltools_0.5.9 R.oo_1.27.1
## [115] lifecycle_1.0.4 h5mread_1.2.1
## [117] statmod_1.5.1 bit64_4.6.0-1