Showcases the use of SEtools to merge objects of the SummarizedExperiment class.
SEtools 1.12.0
The SEtools package is a set of convenience functions for the Bioconductor class SummarizedExperiment. It facilitates merging, melting, and plotting SummarizedExperiment
objects.
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("SEtools")
NOTE that the heatmap-related functions have been moved to a standalone package, sechm.
Or, to install the latest development version:
BiocManager::install("plger/SEtools")
To showcase the main functions, we will use an example object which contains (a subset of) whole-hippocampus RNAseq of mice after different stressors:
suppressPackageStartupMessages({
library(SummarizedExperiment)
library(SEtools)
})
data("SE", package="SEtools")
SE
## class: SummarizedExperiment
## dim: 100 20
## metadata(0):
## assays(2): counts logcpm
## rownames(100): Egr1 Nr4a1 ... CH36-200G6.4 Bhlhe22
## rowData names(2): meanCPM meanTPM
## colnames(20): HC.Homecage.1 HC.Homecage.2 ... HC.Swim.4 HC.Swim.5
## colData names(2): Region Condition
This is taken from Floriou-Servou et al., Biol Psychiatry 2018.
se1 <- SE[,1:10]
se2 <- SE[,11:20]
se3 <- mergeSEs( list(se1=se1, se2=se2) )
se3
## class: SummarizedExperiment
## dim: 100 20
## metadata(3): se1 se2 anno_colors
## assays(2): counts logcpm
## rownames(100): AC139063.2 Actr6 ... Zfp667 Zfp930
## rowData names(2): meanCPM meanTPM
## colnames(20): se1.HC.Homecage.1 se1.HC.Homecage.2 ... se2.HC.Swim.4
## se2.HC.Swim.5
## colData names(3): Dataset Region Condition
All assays were merged, along with rowData and colData slots.
By default, row z-scores are calculated for each object when merging. This can be prevented with:
se3 <- mergeSEs( list(se1=se1, se2=se2), do.scale=FALSE)
If more than one assay is present, one can specify a different scaling behavior for each assay:
se3 <- mergeSEs( list(se1=se1, se2=se2), use.assays=c("counts", "logcpm"), do.scale=c(FALSE, TRUE))
It is also possible to merge by rowData columns, which are specified through the mergeBy
argument.
In this case, one can have one-to-many and many-to-many mappings, in which case two behaviors are possible:
aggFun
, the features of each object will by aggregated by mergeBy
using this function before merging.rowData(se1)$metafeature <- sample(LETTERS,nrow(se1),replace = TRUE)
rowData(se2)$metafeature <- sample(LETTERS,nrow(se2),replace = TRUE)
se3 <- mergeSEs( list(se1=se1, se2=se2), do.scale=FALSE, mergeBy="metafeature", aggFun=median)
## Aggregating the objects by metafeature
## Merging...
sechm::sechm(se3, features=row.names(se3))
A single SE can also be aggregated by using the aggSE
function:
se1b <- aggSE(se1, by = "metafeature")
## Aggregation methods for each assay:
## counts: sum; logcpm: expsum
se1b
## class: SummarizedExperiment
## dim: 25 10
## metadata(0):
## assays(2): counts logcpm
## rownames(25): A B ... Y Z
## rowData names(0):
## colnames(10): HC.Homecage.1 HC.Homecage.2 ... HC.Handling.4
## HC.Handling.5
## colData names(2): Region Condition
If the aggregation function(s) are not specified, aggSE
will try to guess decent aggregation functions from the assay names.
Calculate an assay of log-foldchanges to the controls:
SE <- log2FC(SE, fromAssay="logcpm", controls=SE$Condition=="Homecage")
## R version 4.2.1 (2022-06-23)
## Platform: aarch64-apple-darwin20 (64-bit)
## Running under: macOS Ventura 13.0
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib
##
## locale:
## [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] SEtools_1.12.0 sechm_1.6.0
## [3] SummarizedExperiment_1.28.0 Biobase_2.58.0
## [5] GenomicRanges_1.50.1 GenomeInfoDb_1.34.2
## [7] IRanges_2.32.0 S4Vectors_0.36.0
## [9] BiocGenerics_0.44.0 MatrixGenerics_1.10.0
## [11] matrixStats_0.62.0 BiocStyle_2.26.0
##
## loaded via a namespace (and not attached):
## [1] Rtsne_0.16 colorspace_2.0-3 rjson_0.2.21
## [4] ellipsis_0.3.2 circlize_0.4.15 XVector_0.38.0
## [7] GlobalOptions_0.1.2 clue_0.3-61 bit64_4.0.5
## [10] AnnotationDbi_1.60.0 fansi_1.0.3 codetools_0.2-18
## [13] splines_4.2.1 doParallel_1.0.17 cachem_1.0.6
## [16] geneplotter_1.76.0 knitr_1.39 jsonlite_1.8.0
## [19] Cairo_1.6-0 annotate_1.76.0 cluster_2.1.3
## [22] png_0.1-7 pheatmap_1.0.12 BiocManager_1.30.18
## [25] compiler_4.2.1 httr_1.4.3 assertthat_0.2.1
## [28] Matrix_1.4-1 fastmap_1.1.0 limma_3.54.0
## [31] cli_3.3.0 htmltools_0.5.2 tools_4.2.1
## [34] gtable_0.3.0 glue_1.6.2 GenomeInfoDbData_1.2.8
## [37] dplyr_1.0.9 V8_4.2.0 Rcpp_1.0.9
## [40] jquerylib_0.1.4 vctrs_0.4.1 Biostrings_2.66.0
## [43] nlme_3.1-158 iterators_1.0.14 xfun_0.31
## [46] stringr_1.4.0 openxlsx_4.2.5 lifecycle_1.0.1
## [49] XML_3.99-0.10 edgeR_3.40.0 zlibbioc_1.44.0
## [52] scales_1.2.0 TSP_1.2-1 parallel_4.2.1
## [55] RColorBrewer_1.1-3 ComplexHeatmap_2.14.0 yaml_2.3.5
## [58] curl_4.3.2 memoise_2.0.1 ggplot2_3.3.6
## [61] sass_0.4.1 stringi_1.7.8 RSQLite_2.2.14
## [64] highr_0.9 randomcoloR_1.1.0.1 genefilter_1.80.0
## [67] foreach_1.5.2 seriation_1.3.5 zip_2.2.0
## [70] BiocParallel_1.32.1 shape_1.4.6 rlang_1.0.4
## [73] pkgconfig_2.0.3 bitops_1.0-7 evaluate_0.15
## [76] lattice_0.20-45 purrr_0.3.4 bit_4.0.4
## [79] tidyselect_1.1.2 magrittr_2.0.3 bookdown_0.27
## [82] DESeq2_1.38.0 R6_2.5.1 magick_2.7.3
## [85] generics_0.1.3 DelayedArray_0.24.0 DBI_1.1.3
## [88] mgcv_1.8-40 pillar_1.7.0 survival_3.3-1
## [91] KEGGREST_1.38.0 RCurl_1.98-1.7 tibble_3.1.7
## [94] crayon_1.5.1 utf8_1.2.2 rmarkdown_2.14
## [97] GetoptLong_1.0.5 locfit_1.5-9.6 grid_4.2.1
## [100] sva_3.46.0 data.table_1.14.2 blob_1.2.3
## [103] digest_0.6.29 xtable_1.8-4 munsell_0.5.0
## [106] registry_0.5-1 bslib_0.3.1