1 Setup

library(MerfishData)
library(ExperimentHub)
library(ggplot2)
library(grid)

2 Data

Spatial transcriptomics protocols based on in situ sequencing or multiplexed RNA fluorescent hybridization can reveal detailed tissue organization. However, distinguishing the boundaries of individual cells in such data is challenging. Current segmentation methods typically approximate cells positions using nuclei stains.

Petukhov et al., 2021, describe Baysor, a segmentation method, which optimizes 2D or 3D cell boundaries considering joint likelihood of transcriptional composition and cell morphology. Baysor can also perform segmentation based on the detected transcripts alone.

Petukhov et al., 2021, compare the results of Baysor segmentation (mRNA-only) to the results of a deep learning-based segmentation method called Cellpose from Stringer et al., 2021. Cellpose applies a machine learning framework for the segmentation of cell bodies, membranes and nuclei from microscopy images.

Petukhov et al., 2021 apply Baysor and Cellpose to MERFISH data from cryosections of mouse ileum. The MERFISH encoding probe library was designed to target 241 genes, including previously defined markers for the majority of gut cell types.

Def. ileum: the final and longest segment of the small intestine.

Samples were also stained with anti-Na+/K+-ATPase primary antibodies, oligo-labeled secondary antibodies and DAPI. MERFISH measurements across multiple fields of view and nine z planes were performed to provide a volumetric reconstruction of the distribution of the targeted mRNAs, the cell boundaries marked by Na+/K+-ATPase IF and cell nuclei stained with DAPI.

The data was obtained from the datadryad data publication.

This vignette demonstrates how to obtain the MERFISH mouse ileum dataset from Petukhov et al., 2021 from Bioconductor’s ExperimentHub.

eh <- ExperimentHub()
query(eh, c("MerfishData", "ileum"))
#> ExperimentHub with 9 records
#> # snapshotDate(): 2023-04-24
#> # $dataprovider: Boston Children's Hospital
#> # $species: Mus musculus
#> # $rdataclass: data.frame, matrix, EBImage
#> # additional mcols(): taxonomyid, genome, description,
#> #   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
#> #   rdatapath, sourceurl, sourcetype 
#> # retrieve records with, e.g., 'object[["EH7543"]]' 
#> 
#>            title                                 
#>   EH7543 | Petukhov2021_ileum_molecules          
#>   EH7544 | Petukhov2021_ileum_dapi               
#>   EH7545 | Petukhov2021_ileum_membrane           
#>   EH7547 | Petukhov2021_ileum_baysor_segmentation
#>   EH7548 | Petukhov2021_ileum_baysor_counts      
#>   EH7549 | Petukhov2021_ileum_baysor_coldata     
#>   EH7550 | Petukhov2021_ileum_baysor_polygons    
#>   EH7551 | Petukhov2021_ileum_cellpose_counts    
#>   EH7552 | Petukhov2021_ileum_cellpose_coldata

2.1 Raw data

mRNA molecule data: 820k observations for 241 genes

mol.dat <- eh[["EH7543"]]
dim(mol.dat)
#> [1] 819665     12
head(mol.dat)
#>   molecule_id gene x_pixel y_pixel z_pixel      x_um      y_um z_um area
#> 1           1 Maoa    1705    1271       0 -2935.386 -1218.580  2.5    4
#> 2           2 Maoa    1725    1922       0 -2933.229 -1147.614  2.5    4
#> 3           3 Maoa    1753    1863       0 -2930.104 -1154.062  2.5    5
#> 4           4 Maoa    1760    1865       0 -2929.339 -1153.784  2.5    7
#> 5           5 Maoa    1904     794       0 -2913.718 -1270.474  2.5    6
#> 6           6 Maoa    1915    1430       0 -2912.497 -1201.232  2.5    6
#>   total_magnitude brightness  qc_score
#> 1        420.1126   2.021306 0.9543635
#> 2        269.5874   1.828640 0.9082457
#> 3        501.4615   2.001268 0.9772191
#> 4        639.0364   1.960428 0.9913161
#> 5        519.3154   1.937280 0.9832103
#> 6        842.2258   2.147277 0.9925655
length(unique(mol.dat$gene))
#> [1] 241

Image data:

  1. DAPI stain signal:
dapi.img <- eh[["EH7544"]]
dapi.img
#> Image 
#>   colorMode    : Grayscale 
#>   storage.mode : double 
#>   dim          : 5721 9392 9 
#>   frames.total : 9 
#>   frames.render: 9 
#> 
#> imageData(object)[1:5,1:6,1]
#>      [,1] [,2] [,3] [,4] [,5] [,6]
#> [1,]    0    0    0    0    0    0
#> [2,]    0    0    0    0    0    0
#> [3,]    0    0    0    0    0    0
#> [4,]    0    0    0    0    0    0
#> [5,]    0    0    0    0    0    0
plot(dapi.img, all = TRUE)