spatial_enrich {spatialHeatmap}R Documentation

Identify Spatial Feature-Specifcally Expressed Genes

Description

This functionality is an extension of the spatial heatmap. It identifies spatial feature-specifically expressed genes and thus enables the spatial heatmap to visualize feature-specific profiles. The spatial features include cellular compartments, tissues, organs, etc. The function compares the target feature with all other selected features in a pairwise manner. The genes significantly up- or down-regulated in the target feature across all pairwise comparisons are denoted final target feature-specifcally expressed genes. The underlying methods include edgeR (Robinson et al, 2010), limma (Ritchie et al, 2015), DESeq2 (Love et al, 2014), distinct (Tiberi et al, 2020). The feature-specific genes are first detected with each method and can be summarized across methods.
In addition to feature-specific genes, this function is also able to identify genes specifically expressed in certain condition or in composite factor. The latter is a combination of multiple expermental factors. E.g. the spatiotemporal factor is a combination of feature and time points.

Usage

spatial_enrich(
  data,
  methods = c("edgeR", "limma"),
  norm = "TMM",
  log2.trans.dis = TRUE,
  log2.fc = 1,
  p.adjust = "BH",
  fdr = 0.05,
  aggr = "mean",
  log2.trans.aggr = TRUE
)

Arguments

data

A SummarizedExperiment object, which is returned by sub_data. The colData slot is required to contain at least two columns of "features" and "factors" respectively. The rowData slot can optionally contain a column of discriptions of each gene and the column name should be metadata.

methods

One or more of edgeR, limma, DESeq2, distinct. The default is c('edgeR', 'limma').

norm

The normalization method (TMM, RLE, upperquartile, none) in edgeR. The default is TMM. Details: https://www.rdocumentation.org/packages/edgeR/versions/3.14.0/topics/calcNormFactors

log2.trans.dis

Logical, only applicable when distinct is in methods. The default is TRUE, and the count data is transformed to log-2 scale.

log2.fc

The log2-fold change cutoff. The default is 1.

p.adjust

The method (holm, hochberg, hommel, bonferroni, BH, BY, fdr, none) to adjust p values in multiple hypothesis testing. The default is BH.

fdr

The FDR cutoff. The default is 0.05.

aggr

One of mean (default), median. The method to aggregated replicates in the data frame of feature-specific genes.

log2.trans.aggr

Logical. If TRUE (default), the aggregated data (see aggr) is transformed to log2-scale, included in the returned data frame of feature-specific genes, and would be further used in the spatial heatmaps.

Value

A nested list containing the feature-specific genes summarized across methods within methods.

Author(s)

Jianhai Zhang jianhai.zhang@email.ucr.edu
Dr. Thomas Girke thomas.girke@ucr.edu

References

Cardoso-Moreira, Margarida, Jean Halbert, Delphine Valloton, Britta Velten, Chunyan Chen, Yi Shao, Angélica Liechti, et al. 2019. “Gene Expression Across Mammalian Organ Development.” Nature 571 (7766): 505–9
Keays, Maria. 2019. ExpressionAtlas: Download Datasets from EMBL-EBI Expression Atlas
Martin Morgan, Valerie Obenchain, Jim Hester and Hervé Pagès (2018). SummarizedExperiment: SummarizedExperiment container. R package version 1.10.1
Robinson MD, McCarthy DJ and Smyth GK (2010). edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139-140
Ritchie, M.E., Phipson, B., Wu, D., Hu, Y., Law, C.W., Shi, W., and Smyth, G.K. (2015). limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research 43(7), e47.
Love, M.I., Huber, W., Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 Genome Biology 15(12):550 (2014)
Simone Tiberi and Mark D. Robinson. (2020). distinct: distinct: a method for differential analyses via hierarchical permutation tests. R package version 1.2.0. https://github.com/SimoneTiberi/distinct

Examples

## In the following examples, the toy data come from an RNA-seq analysis on development of 7
## chicken organs under 9 time points (Cardoso-Moreira et al. 2019). For conveninece, it is
## included in this package. The complete raw count data are downloaded using the R package
## ExpressionAtlas (Keays 2019) with the accession number "E-MTAB-6769".   

library(SummarizedExperiment)

## Set up toy data.

# Access toy data. 
cnt.chk <- system.file('extdata/shinyApp/example/count_chicken.txt', package='spatialHeatmap')
count.chk <- read.table(cnt.chk, header=TRUE, row.names=1, sep='\t')
count.chk[1:3, 1:5]

# A targets file describing samples and conditions is required for toy data. It should be made
# based on the experiment design, which is accessible through the accession number 
# "E-MTAB-6769" in the R package ExpressionAtlas. An example targets file is included in this
# package and accessed below. 
# Access the count table. 
cnt.chk <- system.file('extdata/shinyApp/example/count_chicken.txt', package='spatialHeatmap')
count.chk <- read.table(cnt.chk, header=TRUE, row.names=1, sep='\t')
count.chk[1:3, 1:5]
# Access the example targets file. 
tar.chk <- system.file('extdata/shinyApp/example/target_chicken.txt', package='spatialHeatmap')
target.chk <- read.table(tar.chk, header=TRUE, row.names=1, sep='\t')
# Every column in toy data corresponds with a row in targets file. 
target.chk[1:5, ]
# Store toy data in "SummarizedExperiment".
se.chk <- SummarizedExperiment(assay=count.chk, colData=target.chk)
# The "rowData" slot can store a data frame of gene metadata, but not required. Only the 
# column named "metadata" will be recognized. 
# Pseudo row metadata.
metadata <- paste0('meta', seq_len(nrow(count.chk))); metadata[1:3]
rowData(se.chk) <- DataFrame(metadata=metadata)

# Subset the data by selected features (brain, heart, kidney) and factors (day10, day12).
data.sub <- sub_data(data=se.chk, feature='organism_part', features=c('brain', 'heart', 
'kidney'), factor='age', factors=c('day10', 'day12'), com.by='feature', target='brain')

## As conventions, raw sequencing count data should be normalized and filtered to
## reduce noise. Since normalization will be performed in spatial enrichment, only filtering
## is required.  

# Filter out genes with low counts and low variance. Genes with counts over 5 in
# at least 10% samples (pOA), and coefficient of variance (CV) between 3.5 and 100 are 
# retained.
data.sub.fil <- filter_data(data=data.sub, sam.factor='organism_part', con.factor='age',
pOA=c(0.1, 5), CV=c(0.7, 100), dir=NULL)
# Identify brain-specifically expressed genes relative to heart and kidney, where day10 and
# day12 are treated as replicates. 
deg.lis <- spatial_enrich(data.sub.fil)
# All up- and down-regulated genes in brain across methods. On the right is the data after
# replicates aggregated, and will be used in the spatial heatmaps.
deg.lis$deg.table[1:3, ]
# Up-regulated genes detected by edgeR.
deg.lis$lis.up.down$up.lis$edgeR.up[1:5]
# The aSVG path.
svg.chk <- system.file("extdata/shinyApp/example", "gallus_gallus.svg", 
package="spatialHeatmap")
# Plot one brain-specific gene in spatial heatmap.
spatial_hm(svg.path=svg.chk, data=deg.lis$deg.table, ID=deg.lis$deg.table$gene[1], legend.r=1.9, legend.nrow=2, sub.title.size=7, ncol=2, bar.width=0.11)
# Overlap of up-regulated brain-specific genes across methods.
deg_ovl(deg.lis$lis.up.down, type='up', plot='upset')
deg_ovl(deg.lis$lis.up.down, type='up', plot='matrix')
# Overlap of down-regulated brain-specific genes across methods.
deg_ovl(deg.lis$lis.up.down, type='down', plot='upset')
deg_ovl(deg.lis$lis.up.down, type='down', plot='matrix')
# Line graph of gene expression profile.
profile_gene(deg.lis$deg.table[1, ])

[Package spatialHeatmap version 1.99.1 Index]