Contents

1 Getting started

The SEtools package is a set of convenience functions for the Bioconductor class SummarizedExperiment. It facilitates merging, melting, and plotting SummarizedExperiment objects.

1.1 Package installation

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("SEtools")

Or, until the new bioconductor release:

devtools::install_github("plger/SEtools")

1.2 Example data

To showcase the main functions, we will use an example object which contains (a subset of) whole-hippocampus RNAseq of mice after different stressors:

suppressPackageStartupMessages({
  library(SummarizedExperiment)
  library(SEtools)
})
data("SE", package="SEtools")
SE
## class: SummarizedExperiment 
## dim: 100 20 
## metadata(0):
## assays(2): counts logcpm
## rownames(100): Egr1 Nr4a1 ... CH36-200G6.4 Bhlhe22
## rowData names(2): meanCPM meanTPM
## colnames(20): HC.Homecage.1 HC.Homecage.2 ... HC.Swim.4 HC.Swim.5
## colData names(2): Region Condition

This is taken from Floriou-Servou et al., Biol Psychiatry 2018.

1.3 Heatmaps

1.3.1 sehm

The sehm function simplifies the generation of heatmaps from SummarizedExperiment. It uses pheatmap, so any argument supported by it can in principle be passed:

g <- c("Egr1", "Nr4a1", "Fos", "Egr2", "Sgk1", "Arc", "Dusp1", "Fosb", "Sik1")
sehm(SE, genes=g)
## Using assay logcpm

sehm(SE, genes=g, scale="row")
## Using assay logcpm

Annotation from the object’s rowData and colData can be plotted simply by specifying the column name (some will be shown by default if found):

sehm(SE, genes=g, scale="row", anno_rows="meanTPM")
## Using assay logcpm

These can also be used to create gaps:

sehm(SE, genes=g, scale="row", anno_rows="meanTPM", gaps_at="Condition")
## Using assay logcpm

The specific assay to use for plotting can be specified with the assayName argument.

1.3.1.1 Row/column ordering

By default, rows are sorted not with hierarchical clustering, but from the angle on a MDS plot, which tends to give nicer results than bottom-up hierarchical clustering. This can be disabled using sortRowsOn=NULL or cluster_rows=TRUE (to avoid any row reordering and use the order given, use sortRowsOn=NULL, cluster_rows=FALSE). Column clustering is disabled by default, but this can be changed with cluster_cols=TRUE.

It is common to cluster features into groups, and such a clustering can be used simultaneously with row sorting using the toporder argument. For instance:

lfcs <- assays(SE)$logcpm-rowMeans(assays(SE)$logcpm[,which(SE$Condition=="Homecage")])
rowData(SE)$cluster <- as.character(kmeans(lfcs,4)$cluster)
sehm(SE, scale="row", anno_rows="cluster", toporder="cluster", gaps_at="Condition")
## Using assay logcpm

1.3.2 crossHm

Heatmaps from multiple SE can be created either by merging the objects (see below), or using the crossHm function, which uses the ComplexHeatmap pacakge:

crossHm( list(se1=SE, se2=SE), g, 
         anno_colors = list( Condition=c( Homecage="green",
                                          Handling="orange",
                                          Restraint="red",
                                          Swim="blue")
                            )
        )
## Using assay logcpm
## Using assay logcpm

1.3.3 Default arguments

For some arguments (for instance colors), if they are not specified in the function call, SEtools will try to see whether the corresponding global options have been set, before using default colors. This means that if, in the context of a given project, the same colors are repeatedly being used, they can be specified a single time, and all subsequent plots will be affected:

options("SEtools_def_hmcols"=c("white","grey","black"))
ancols <- list( Condition=c( Homecage="green",
                             Handling="orange",
                             Restraint="red",
                             Swim="blue" ) )
options("SEtools_def_anno_colors"=ancols)
sehm(SE, g, do.scale = TRUE)
## Using assay logcpm

At the moment, the following arguments can be set as global options: assayName, hmcols, anno_columns, anno_rows, anno_colors, gaps_at, breaks. Options must be set with the prefix SEtools_def_, followed by the name of the argument. You may use resetAllSEtoolsOptions() to remove all global options relative to the package.


1.4 Merging SEs

se1 <- SE[,1:10]
se2 <- SE[,11:20]
se3 <- mergeSEs( list(se1=se1, se2=se2) )
se3
## class: SummarizedExperiment 
## dim: 100 20 
## metadata(0):
## assays(2): counts logcpm
## rownames(100): AC139063.2 Actr6 ... Zfp667 Zfp930
## rowData names(3): meanCPM meanTPM cluster
## colnames(20): se1.HC.Homecage.1 se1.HC.Homecage.2 ... se2.HC.Swim.4
##   se2.HC.Swim.5
## colData names(3): Dataset Region Condition

All assays were merged, along with rowData and colData slots.

By default, row z-scores are calculated for each object when merging. This can be prevented with:

se3 <- mergeSEs( list(se1=se1, se2=se2), do.scale=FALSE)

If more than one assay is present, one can specify a different scaling behavior for each assay:

se3 <- mergeSEs( list(se1=se1, se2=se2), use.assays=c("counts", "logcpm"), do.scale=c(FALSE, TRUE))

1.5 Melting SE

To facilitate plotting features with ggplot2, the meltSE function combines assay values along with row/column data:

d <- meltSE(SE, genes=g[1:4])
head(d)
##   feature        sample Region Condition counts    logcpm
## 1    Egr1 HC.Homecage.1     HC  Homecage 1581.0 4.4284969
## 2   Nr4a1 HC.Homecage.1     HC  Homecage  750.0 3.6958917
## 3     Fos HC.Homecage.1     HC  Homecage   91.4 1.7556317
## 4    Egr2 HC.Homecage.1     HC  Homecage   15.1 0.5826999
## 5    Egr1 HC.Homecage.2     HC  Homecage 1423.0 4.4415828
## 6   Nr4a1 HC.Homecage.2     HC  Homecage  841.0 3.9237691
suppressPackageStartupMessages(library(ggplot2))
ggplot(d, aes(Condition, counts)) + geom_violin() + facet_wrap(~feature, scale="free")
An example ggplot created from a melted SE.

Figure 1: An example ggplot created from a melted SE

Session info

## R version 3.6.1 (2019-07-05)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.3 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.10-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.10-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] parallel  stats4    stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] ggplot2_3.2.1               SEtools_1.0.0              
##  [3] SummarizedExperiment_1.16.0 DelayedArray_0.12.0        
##  [5] BiocParallel_1.20.0         matrixStats_0.55.0         
##  [7] Biobase_2.46.0              GenomicRanges_1.38.0       
##  [9] GenomeInfoDb_1.22.0         IRanges_2.20.0             
## [11] S4Vectors_0.24.0            BiocGenerics_0.32.0        
## [13] BiocStyle_2.14.0           
## 
## loaded via a namespace (and not attached):
##  [1] viridis_0.5.1          viridisLite_0.3.0      foreach_1.4.7         
##  [4] gtools_3.8.1           assertthat_0.2.1       highr_0.8             
##  [7] BiocManager_1.30.9     GenomeInfoDbData_1.2.2 yaml_2.2.0            
## [10] pillar_1.4.2           lattice_0.20-38        glue_1.3.1            
## [13] digest_0.6.22          RColorBrewer_1.1-2     XVector_0.26.0        
## [16] colorspace_1.4-1       htmltools_0.4.0        Matrix_1.2-17         
## [19] pkgconfig_2.0.3        pheatmap_1.0.12        GetoptLong_0.1.7      
## [22] bookdown_0.14          zlibbioc_1.32.0        purrr_0.3.3           
## [25] scales_1.0.0           gdata_2.18.0           tibble_2.1.3          
## [28] withr_2.1.2            lazyeval_0.2.2         magrittr_1.5          
## [31] crayon_1.3.4           evaluate_0.14          MASS_7.3-51.4         
## [34] gplots_3.0.1.1         tools_3.6.1            registry_0.5-1        
## [37] data.table_1.12.6      GlobalOptions_0.1.1    ComplexHeatmap_2.2.0  
## [40] stringr_1.4.0          munsell_0.5.0          cluster_2.1.0         
## [43] compiler_3.6.1         caTools_1.17.1.2       rlang_0.4.1           
## [46] grid_3.6.1             RCurl_1.95-4.12        iterators_1.0.12      
## [49] rjson_0.2.20           circlize_0.4.8         labeling_0.3          
## [52] bitops_1.0-6           rmarkdown_1.16         gtable_0.3.0          
## [55] codetools_0.2-16       TSP_1.1-7              R6_2.4.0              
## [58] gridExtra_2.3          seriation_1.2-8        knitr_1.25            
## [61] dplyr_0.8.3            clue_0.3-57            KernSmooth_2.23-16    
## [64] dendextend_1.12.0      shape_1.4.4            stringi_1.4.3         
## [67] Rcpp_1.0.2             png_0.1-7              gclus_1.3.2           
## [70] tidyselect_0.2.5       xfun_0.10