The tomo-seq technique is based on cryosectioning of tissue and performing RNA-seq on consecutive sections. Unlike common RNA-seq which is performed on independent samples, tomo-seq is performed on consecutive sections from one sample. Therefore tomo-seq data contain spatial information of transcriptome, and it is a good approach to examine gene expression change across an anatomic region.
This vignette will demonstrate the workflow to analyze and visualize tomo-seq data using tomoda. The main purpose of the package it to find anatomic zones with similar transcriptional profiles and spatially expressed genes in a tomo-seq sample. Several visualization functions create easy-to-modify plots are available to help users do this.
At the beginning, we load necessary libraries.
library(SummarizedExperiment)
library(tomoda)
This package contains an examplary dataset geneated from 3 day post cryinjury heart of zebrafish, obtained from GSE74652. The dataset contains the raw read count of 16495 genes across 40 sections. Here we load the dataset and view the first several rows of it.
data(zh.data)
head(zh.data)
#> X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17
#> ENSDARG00000000002 1 0 0 0 0 2 0 3 1 1 3 0 0 4 2 7 3
#> ENSDARG00000000018 0 0 0 0 0 2 2 4 1 6 3 2 2 6 1 3 1
#> ENSDARG00000000019 4 0 1 2 1 0 4 1 4 0 6 9 2 9 1 8 3
#> ENSDARG00000000068 1 0 0 0 0 0 2 4 2 1 3 0 1 1 1 2 0
#> ENSDARG00000000069 13 0 1 0 0 1 5 4 5 7 14 8 3 8 2 8 10
#> ENSDARG00000000086 0 0 0 0 0 0 0 0 0 0 0 1 1 2 0 1 0
#> X18 X19 X20 X21 X22 X23 X24 X25 X26 X27 X28 X29 X30 X31 X32
#> ENSDARG00000000002 3 2 0 7 4 1 0 0 1 3 0 0 0 0 3
#> ENSDARG00000000018 0 1 1 2 1 5 1 0 2 1 5 0 0 0 2
#> ENSDARG00000000019 11 6 2 9 4 12 1 2 6 9 1 4 4 5 7
#> ENSDARG00000000068 0 0 0 0 2 1 0 1 0 0 0 0 0 1 0
#> ENSDARG00000000069 8 5 0 3 2 6 2 4 3 0 0 1 0 1 0
#> ENSDARG00000000086 2 0 1 2 0 2 1 0 0 0 0 1 0 0 0
#> X33 X34 X35 X36 X37 X38 X39 X40
#> ENSDARG00000000002 3 1 5 0 12 3 2 1
#> ENSDARG00000000018 1 1 3 3 6 4 3 7
#> ENSDARG00000000019 13 10 21 0 23 3 13 9
#> ENSDARG00000000068 0 0 3 0 0 0 1 0
#> ENSDARG00000000069 2 0 5 0 7 2 5 3
#> ENSDARG00000000086 0 0 1 0 7 2 0 2
When using your own tomo-seq dataset, try to make your data the same structure as the examplary read count matrix. Each row corresponds to a gene and each row correspond to a section. The row names of the matrix are gene names. Importantly, the columns MUST be ordered according to the spatial sequence of sections.
Now we create an object representing from the raw read count matrix. Genes
expressed in less than 3 sections are filtered out. You can change this
threshold by changing the parameter min.section
of function createTomo
. The
output object is an instance of SummarizedExperiment. If you have
additional information about sections, save them in colData(object)
, a data
frame used to save meta data describing sections.
zh <- createTomo(zh.data)
#> Normalized count matrix is saved in assay 'normalized'.
#> Scaled count matrix is saved in assay 'scaled'.
zh
#> class: SummarizedExperiment
#> dim: 12865 40
#> metadata(0):
#> assays(3): count normalized scaled
#> rownames(12865): ENSDARG00000000002 ENSDARG00000000018 ...
#> ENSDARG00000095236 ENSDARG00000095580
#> rowData names(1): gene
#> colnames(40): X1 X2 ... X39 X40
#> colData names(1): section
If you have a normalized expression matrix rather than raw read count matrix, it can also be used for input.
your_object <- createTomo(matrix.normalized = normalized)
# Replace 'normalized' with your normalized expression matrix.
If you have an existing SummarizedExperiment object, createTomo
also accepts
it as input. Just remember that the object must contain at least one of ‘count’
assay and ‘normalized’ assay.
your_object <- createTomo(se)
# Replace 'se' with a SummarizedExperiment object.
By default, raw read count matrix is normalized and scaled across sections. The
raw read count, normalized read count matrix and scaled read count matrix are
saved in ‘count’, ‘normalized’ and ‘scale’ assays of the object. These matrices
can be accessed using function assay
.
head(assay(zh, 'scaled'), 2)
#> X1 X2 X3 X4 X5
#> ENSDARG00000000002 0.2711605 -0.7486253 -0.7486253 -0.7486253 -0.7486253
#> ENSDARG00000000018 -0.7375048 -0.7375048 -0.7375048 -0.7375048 -0.7375048
#> X6 X7 X8 X9 X10
#> ENSDARG00000000002 4.725422 -0.7486253 1.145541 0.13187095 0.06301435
#> ENSDARG00000000018 3.300955 0.2058376 1.125715 -0.08792157 2.85520219
#> X11 X12 X13 X14 X15
#> ENSDARG00000000002 0.3100379 -0.7486253 -0.7486253 0.7223615 0.2350572
#> ENSDARG00000000018 0.0435205 -0.1800035 0.6251889 0.8903187 -0.3746505
#> X16 X17 X18 X19 X20
#> ENSDARG00000000002 1.2367534 0.4586294 0.3047269 0.2002146 -0.7486253
#> ENSDARG00000000018 -0.1097734 -0.4406221 -0.7375048 -0.3875030 -0.1266124
#> X21 X22 X23 X24 X25
#> ENSDARG00000000002 1.1313419 0.9039949 -0.52762436 -0.7486253 -0.7486253
#> ENSDARG00000000018 -0.3412363 -0.4327010 0.07770893 -0.3048022 -0.7375048
#> X26 X27 X28 X29 X30
#> ENSDARG00000000002 -0.4885558 0.5504044 -0.7486253 -0.7486253 -0.7486253
#> ENSDARG00000000018 -0.3537738 -0.4180532 3.0631010 -0.7375048 -0.7375048
#> X31 X32 X33 X34 X35
#> ENSDARG00000000002 -0.7486253 0.3513916 -0.05306309 -0.3883656 0.01307173
#> ENSDARG00000000018 -0.7375048 -0.1964822 -0.56645521 -0.4717243 -0.40034109
#> X36 X37 X38 X39 X40
#> ENSDARG00000000002 -0.7486253 0.8666370 -0.10033582 -0.3579938 -0.4763123
#> ENSDARG00000000018 0.2580826 -0.1416776 -0.09980686 -0.3052241 0.6687815
During normalization, the library sizes of all sections are set to the median of
all library sizes. They can also be normalized to 1 million counts to obtain
Count Per Million (CPM) value by setting parameter normalize.method = "cpm"
.
zh <- createTomo(zh.data, normalize.method = "cpm")
We do not normalize gene lengths as we will not perform comparision between two genes. If the normalized read count matrix is used as input, this step is skipped.
Then the normalized data is scaled across sections for each gene. The normalized read counts of each gene are subjected to Z score transformation such that they have mean as 0 and standard deviation as 1.
A good start to analyze tomo-seq data is correlation analysis. Here we calculate
the Pearson correlation coefficients between every pair of sections across all
genes and visualize them with a heatmap. Parameter max.cor
defines the maximum
value for the heatmap, and coefficients bigger than it are clipped to it. This
is because diagonal coefficients are 1, usually much bigger than other
coefficients, so clipping them to a smaller value will show other coefficients
more clearly.
corHeatmap(zh, max.cor=0.3)
We would expect that adjacent sections have similar transcriptional profiles and thus have bigger correlation coefficients. Therefore, a pair of adjacent sections with small correlation coefficients should be noted. They may act as borders of two zones with different transcriptional profiles. A border of different zones is usually a border of dark blue and light blue/green/yellow on the heatmap. For example, section X13 and X20 are two borders in this dataset according to the heatmap.
Another method to visualize the similarity of sections is to perform dimensionality reduction. Sections are embedded in a two-dimensional space and plotted as points. similar sections are modeled by nearby points and dissimilar sections are modeled by distant points with high probability.
We first try PCA, a classic linear dimensionality reduction algorithm. We can
see a general trend of bottom-left to upper-right with increasing section
indexes, but it is hard to find clear borders. The embeddings of sections output
by the function are saved in the Tomo object, and you can access them with
colData(object)
.
zh <- runPCA(zh)
#> PC embeddings for sections are saved in column data.
embedPlot(zh, method="PCA")
#> Warning: ggrepel: 6 unlabeled data points (too many overlaps). Consider
#> increasing max.overlaps
head(colData(zh))
#> DataFrame with 6 rows and 3 columns
#> section PC1 PC2
#> <character> <numeric> <numeric>
#> X1 X1 0.0612488 -0.282025
#> X2 X2 0.0524679 -0.274253
#> X3 X3 0.0389170 -0.403186
#> X4 X4 0.0779290 -0.323700
#> X5 X5 0.0510980 -0.407276
#> X6 X6 0.0863790 -0.304588
Next we move to two popular non-linear dimensionality reduction algorithm, tSNE and UMAP. These algorithms are designed to learn the underlying manifold of data and project similar sections together in low-dimensional spaces. Users are welcomed to tune the parameter of these algorithm to show better results with custom dataset.
In the examplary dataset, two clusters of sections with a large margin are shown in both tSNE and UMAP embedding plots. According to the labels of sections, we could identify a border at X21 ~ X22.
set.seed(1)
zh <- runTSNE(zh)
#> TSNE embeddings for sections are saved in column data.
embedPlot(zh, method="TSNE")
zh <- runUMAP(zh)
#> UMAP embeddings for sections are saved in column data.
embedPlot(zh, method="UMAP")