RETROFIT is a statistical method for reference-free deconvolution of spatial transcriptomics data to estimate cell type mixtures. In this Vignette, we will estimate cell type composition of a Colon dataset. We will annotate cell types using marker gene lists as well as single cell reference. We will also reproduce some of the results from the paper for illustration.
Install and load the packages using the following steps:
if(!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("retrofit")
library(retrofit)
First load the Colon ST data from fetal 12 PCW tissue, using the following command:
data("vignetteColonData")
X = vignetteColonData$a3_x
Initialize the required parameters for deconvolution. - iterations: Number of iterations (default = 4000) - L: Number of components required
iterations = 10
L = 20
After initialization, run retrofit on the data (X) as follows:
result = retrofit::decompose(X, L=L, iterations=iterations, verbose=TRUE)
## [1] "iteration: 1 0.14 Seconds"
## [1] "iteration: 2 0.139 Seconds"
## [1] "iteration: 3 0.14 Seconds"
## [1] "iteration: 4 0.138 Seconds"
## [1] "iteration: 5 0.176 Seconds"
## [1] "iteration: 6 0.143 Seconds"
## [1] "iteration: 7 0.141 Seconds"
## [1] "iteration: 8 0.142 Seconds"
## [1] "iteration: 9 0.157 Seconds"
## [1] "iteration: 10 0.141 Seconds"
## [1] "Iteration mean: 0.146 seconds total: 1.456 seconds"
H = result$h
W = result$w
Th = result$th
After deconvolution of ST data, we have our estimates of W (a matrix of order G x L), H (a matrix of order L x S) and Theta (a vector of L components). Here, we are using number of iterations as 20 for demonstration purposes. For reproducing results in the paper, we need to run RETROFIT for iterations = 4000. The whole computation is omitted here due to time complexity (> 10min). We will load the results from 4000 iterations for the rest of the analysis.
H = vignetteColonData$a3_results_4k_iterations$decompose$h
W = vignetteColonData$a3_results_4k_iterations$decompose$w
Th = vignetteColonData$a3_results_4k_iterations$decompose$th
Above results are obtained by running the code below.
iterations = 4000
set.seed(1)
result = retrofit::decompose(x, L=L, iterations=iterations)
Next, we need to annotate the components, to get the proportion of K cell types. We can do this in two ways: (a) using an annotated single cell reference or (b) using the known marker genes.
Here, we will annotate components using single-cell reference. Load the single-cell reference data:
sc_ref = vignetteColonData$sc_ref
This file contains average gene expression values for K=8 cell types. Run the following command to get K cell-type mixtures from the ST data X:
K = ncol(sc_ref) # number of cell types
result = annotateWithCorrelations(sc_ref, K, W, H)
H_sc = result$h
W_sc = result$w
H_sc_prop = result$h_prop
W_sc_prop = result$w_prop
We assign components to the cell type it has maximum correlation with as shown in figure above. Finally, cell-type proportions for each spot in the ST data (X) are stored in H_mark_prop of order K x S. You can verify that the mappings from both methods are largely similar.
W_norm <- W
for(i in 1:nrow(W)){
W_norm[i,]=W[i,]/sum(W[i,])
}
corrplot::corrplot(stats::cor(sc_ref, W_norm),
is.corr=FALSE,
mar=c(0,0,1,0),
col = colorRampPalette(c("white", "deepskyblue", "blue4"))(100),
main="Correlation-based Mapping Matrix")
Here, we will annotate using known marker genes for cell types. Load the marker gene list:
marker_ref = vignetteColonData$marker_ref
This file contains the list of marker genes for K = 8 cell types. Run the following command to get K cell-type mixtures from the ST data X:
K = length(marker_ref) # number of cell types
result = retrofit::annotateWithMarkers(marker_ref, K, W, H)
H_mark = result$h
W_mark = result$w
H_mark_prop = result$h_prop
W_mark_prop = result$w_prop
gene_sums = result$gene_sums
corrplot::corrplot(gene_sums,
is.corr=FALSE,
mar=c(0,0,1,0),
col=colorRampPalette(c("white", "deepskyblue", "blue4"))(100),
main="Marker-based Mapping Matrix")
We assign components to the cell type it has maximum average marker expression in, as shown in figure above. Finally, cell-type proportions for each spot in the ST data (X) are stored in H_est of order K x S.
Hereon, we will be reproducing some of the analysis from the paper using marker-based mapping results.
Proportion of cell-types in this tissue:
rowSums(H_mark_prop)/ncol(H_mark_prop)
## Endothelium Epithelial Fibroblasts Immune Muscle Myo.Meso
## 0.13018273 0.13980586 0.24423669 0.09745911 0.12181790 0.10049303
## Neural Pericytes
## 0.09560567 0.07039902
Load the coordinates for the spots in the tissue and the tissue image:
coords=vignetteColonData$a3_coords
img <- png::readPNG(RCurl::getURLContent("https://user-images.githubusercontent.com/90921267/210159136-96b56551-f414-4b0b-921e-98f05a98c8bc.png"))
t <- grid::rasterGrob(img, width=ggplot2::unit(1,"npc"), height=ggplot2::unit(1,"npc"), interpolate = FALSE)
Find the cell-type with maximum proportion in each spot and plot:
dat=as.data.frame(cbind(coords,t(H_mark_prop)))
colnames(dat)=c(colnames(coords),rownames(H_mark_prop))
df=dat[,-c(1:3)]
df$CellType=NA
for(i in 1:nrow(df)){
df$CellType[i]=names(which.max(df[i,-c(1:2)]))
}
df$CellType=factor(df$CellType,levels=c("Epithelial", "Immune", "Myo.Meso","Muscle",
"Neural", "Endothelium", "Pericytes","Fibroblasts"))
ggplot2::ggplot(df, ggplot2::aes(x=imagerow, y=imagecol)) +
ggplot2::geom_point(size=1.8, shape=21, ggplot2::aes(fill=CellType))+
ggplot2::scale_fill_manual(values=c("#999999", "#E69F00", "#56B4E9","#009E73", "#CC79A7",
"#0072B2", "#D55E00" ,"#F0E442"),
labels=c("Epithelial", "Immune","Myofibroblasts/\nMesothelium",
"Muscle","Neural","Endothelium",
"Pericytes","Fibroblasts"))+
ggplot2::xlab("") + ggplot2::ylab("") + ggplot2::theme_classic()+
ggplot2::theme(legend.position = "none",
axis.line = ggplot2::element_blank(),
plot.margin = ggplot2::margin(-0.2, -0.2, 0.2, -1, "cm"),
axis.text.x = ggplot2::element_blank(), axis.text.y = ggplot2::element_blank(),
axis.ticks.x = ggplot2::element_blank(), axis.ticks.y = ggplot2::element_blank())