1 Getting started

The SEtools package is a set of convenience functions for the Bioconductor class SummarizedExperiment. It facilitates merging, melting, and plotting SummarizedExperiment objects.

1.1 Package installation

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("SEtools")

NOTE that the heatmap-related functions have been moved to a standalone package, sechm.

Or, to install the latest development version:

BiocManager::install("plger/SEtools")

1.2 Example data

To showcase the main functions, we will use an example object which contains (a subset of) whole-hippocampus RNAseq of mice after different stressors:

suppressPackageStartupMessages({
  library(SummarizedExperiment)
  library(SEtools)
})
data("SE", package="SEtools")
SE

## class: SummarizedExperiment 
## dim: 100 20 
## metadata(0):
## assays(2): counts logcpm
## rownames(100): Egr1 Nr4a1 ... CH36-200G6.4 Bhlhe22
## rowData names(2): meanCPM meanTPM
## colnames(20): HC.Homecage.1 HC.Homecage.2 ... HC.Swim.4 HC.Swim.5
## colData names(2): Region Condition

This is taken from Floriou-Servou et al., Biol Psychiatry 2018.

1.3 Merging and aggregating SEs

se1 <- SE[,1:10]
se2 <- SE[,11:20]
se3 <- mergeSEs( list(se1=se1, se2=se2) )
se3

## class: SummarizedExperiment 
## dim: 100 20 
## metadata(3): se1 se2 anno_colors
## assays(2): counts logcpm
## rownames(100): AC139063.2 Actr6 ... Zfp667 Zfp930
## rowData names(2): meanCPM meanTPM
## colnames(20): se1.HC.Homecage.1 se1.HC.Homecage.2 ... se2.HC.Swim.4
##   se2.HC.Swim.5
## colData names(3): Dataset Region Condition

All assays were merged, along with rowData and colData slots.

By default, row z-scores are calculated for each object when merging. This can be prevented with:

se3 <- mergeSEs( list(se1=se1, se2=se2), do.scale=FALSE)

If more than one assay is present, one can specify a different scaling behavior for each assay:

se3 <- mergeSEs( list(se1=se1, se2=se2), use.assays=c("counts", "logcpm"), do.scale=c(FALSE, TRUE))

1.3.1 Merging by rowData columns

It is also possible to merge by rowData columns, which are specified through the mergeBy argument. In this case, one can have one-to-many and many-to-many mappings, in which case two behaviors are possible:

By default, all combinations will be reported, which means that the same feature of one object might appear multiple times in the output because it matches multiple features of another object.
If a function is passed through aggFun, the features of each object will by aggregated by mergeBy using this function before merging.

rowData(se1)$metafeature <- sample(LETTERS,nrow(se1),replace = TRUE)
rowData(se2)$metafeature <- sample(LETTERS,nrow(se2),replace = TRUE)
se3 <- mergeSEs( list(se1=se1, se2=se2), do.scale=FALSE, mergeBy="metafeature", aggFun=median)

## Aggregating the objects by metafeature

## Merging...

sechm::sechm(se3, features=row.names(se3))

1.3.2 Aggregating a SE

A single SE can also be aggregated by using the aggSE function:

se1b <- aggSE(se1, by = "metafeature")

## Aggregation methods for each assay:
## counts: sum; logcpm: expsum

se1b

## class: SummarizedExperiment 
## dim: 24 10 
## metadata(0):
## assays(2): counts logcpm
## rownames(24): A B ... Y Z
## rowData names(0):
## colnames(10): HC.Homecage.1 HC.Homecage.2 ... HC.Handling.4
##   HC.Handling.5
## colData names(2): Region Condition

If the aggregation function(s) are not specified, aggSE will try to guess decent aggregation functions from the assay names.

1.4 Other convenience functions

Calculate an assay of log-foldchanges to the controls:

SE <- log2FC(SE, fromAssay="logcpm", controls=SE$Condition=="Homecage")

Session info

## R version 4.2.1 (2022-06-23)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.5 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.16-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.16-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] SEtools_1.12.0              sechm_1.6.0                
##  [3] SummarizedExperiment_1.28.0 Biobase_2.58.0             
##  [5] GenomicRanges_1.50.0        GenomeInfoDb_1.34.0        
##  [7] IRanges_2.32.0              S4Vectors_0.36.0           
##  [9] BiocGenerics_0.44.0         MatrixGenerics_1.10.0      
## [11] matrixStats_0.62.0          BiocStyle_2.26.0           
## 
## loaded via a namespace (and not attached):
##   [1] Rtsne_0.16             colorspace_2.0-3       rjson_0.2.21          
##   [4] circlize_0.4.15        XVector_0.38.0         GlobalOptions_0.1.2   
##   [7] clue_0.3-62            bit64_4.0.5            AnnotationDbi_1.60.0  
##  [10] fansi_1.0.3            codetools_0.2-18       splines_4.2.1         
##  [13] doParallel_1.0.17      cachem_1.0.6           geneplotter_1.76.0    
##  [16] knitr_1.40             jsonlite_1.8.3         Cairo_1.6-0           
##  [19] annotate_1.76.0        cluster_2.1.4          png_0.1-7             
##  [22] pheatmap_1.0.12        BiocManager_1.30.19    compiler_4.2.1        
##  [25] httr_1.4.4             assertthat_0.2.1       Matrix_1.5-1          
##  [28] fastmap_1.1.0          limma_3.54.0           cli_3.4.1             
##  [31] htmltools_0.5.3        tools_4.2.1            gtable_0.3.1          
##  [34] glue_1.6.2             GenomeInfoDbData_1.2.9 dplyr_1.0.10          
##  [37] V8_4.2.1               Rcpp_1.0.9             jquerylib_0.1.4       
##  [40] vctrs_0.5.0            Biostrings_2.66.0      nlme_3.1-160          
##  [43] iterators_1.0.14       xfun_0.34              stringr_1.4.1         
##  [46] openxlsx_4.2.5.1       lifecycle_1.0.3        XML_3.99-0.12         
##  [49] ca_0.71.1              edgeR_3.40.0           zlibbioc_1.44.0       
##  [52] scales_1.2.1           TSP_1.2-1              parallel_4.2.1        
##  [55] RColorBrewer_1.1-3     ComplexHeatmap_2.14.0  yaml_2.3.6            
##  [58] curl_4.3.3             memoise_2.0.1          ggplot2_3.3.6         
##  [61] sass_0.4.2             stringi_1.7.8          RSQLite_2.2.18        
##  [64] highr_0.9              randomcoloR_1.1.0.1    genefilter_1.80.0     
##  [67] foreach_1.5.2          seriation_1.4.0        zip_2.2.2             
##  [70] BiocParallel_1.32.0    shape_1.4.6            rlang_1.0.6           
##  [73] pkgconfig_2.0.3        bitops_1.0-7           evaluate_0.17         
##  [76] lattice_0.20-45        bit_4.0.4              tidyselect_1.2.0      
##  [79] magrittr_2.0.3         bookdown_0.29          DESeq2_1.38.0         
##  [82] R6_2.5.1               magick_2.7.3           generics_0.1.3        
##  [85] DelayedArray_0.24.0    DBI_1.1.3              mgcv_1.8-41           
##  [88] pillar_1.8.1           survival_3.4-0         KEGGREST_1.38.0       
##  [91] RCurl_1.98-1.9         tibble_3.1.8           crayon_1.5.2          
##  [94] utf8_1.2.2             rmarkdown_2.17         GetoptLong_1.0.5      
##  [97] locfit_1.5-9.6         grid_4.2.1             sva_3.46.0            
## [100] data.table_1.14.4      blob_1.2.3             digest_0.6.30         
## [103] xtable_1.8-4           munsell_0.5.0          registry_0.5-1        
## [106] bslib_0.4.0

SEtools

1 November 2022

Abstract

Package

Contents