TreeSummarizedExperiment 2.10.0
TreeSummarizedExperiment
objectsMultiple TreeSummarizedExperiemnt
objects (TSE) can be combined by using
rbind
or cbind
. Here, we create a toy TreeSummarizedExperiment
object
using makeTSE()
(see ?makeTSE()
). As the tree in the row/column tree slot is
generated randomly using ape::rtree()
, set.seed()
is used to create
reproducible results.
library(TreeSummarizedExperiment)
set.seed(1)
# TSE: without the column tree
(tse_a <- makeTSE(include.colTree = FALSE))
## class: TreeSummarizedExperiment
## dim: 10 4
## metadata(0):
## assays(1): ''
## rownames(10): entity1 entity2 ... entity9 entity10
## rowData names(2): var1 var2
## colnames(4): sample1 sample2 sample3 sample4
## colData names(2): ID group
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## rowLinks: a LinkDataFrame (10 rows)
## rowTree: 1 phylo tree(s) (10 leaves)
## colLinks: NULL
## colTree: NULL
# combine two TSEs by row
(tse_aa <- rbind(tse_a, tse_a))
## class: TreeSummarizedExperiment
## dim: 20 4
## metadata(0):
## assays(1): ''
## rownames(20): entity1 entity2 ... entity9 entity10
## rowData names(2): var1 var2
## colnames(4): sample1 sample2 sample3 sample4
## colData names(2): ID group
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## rowLinks: a LinkDataFrame (20 rows)
## rowTree: 1 phylo tree(s) (10 leaves)
## colLinks: NULL
## colTree: NULL
The generated tse_aa
has 20 rows, which is two times of that in tse_a
. The row tree in tse_aa
is the same as that in tse_a
.
identical(rowTree(tse_aa), rowTree(tse_a))
## [1] TRUE
If we rbind
two TSEs (e.g., tse_a
and tse_b
) that have different row trees, the obtained TSE (e.g., tse_ab
) will have two row trees.
set.seed(2)
tse_b <- makeTSE(include.colTree = FALSE)
# different row trees
identical(rowTree(tse_a), rowTree(tse_b))
## [1] FALSE
# 2 phylo tree(s) in rowTree
(tse_ab <- rbind(tse_a, tse_b))
## class: TreeSummarizedExperiment
## dim: 20 4
## metadata(0):
## assays(1): ''
## rownames(20): entity1 entity2 ... entity9 entity10
## rowData names(2): var1 var2
## colnames(4): sample1 sample2 sample3 sample4
## colData names(2): ID group
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## rowLinks: a LinkDataFrame (20 rows)
## rowTree: 2 phylo tree(s) (20 leaves)
## colLinks: NULL
## colTree: NULL
In the row link data, the whichTree
column gives information about which tree the row is mapped to.
For tse_aa
, there is only one tree named as phylo
. However, for tse_ab
, there are two trees (phylo
and phylo.1
).
rowLinks(tse_aa)
## LinkDataFrame with 20 rows and 5 columns
## nodeLab nodeLab_alias nodeNum isLeaf whichTree
## <character> <character> <integer> <logical> <character>
## entity1 entity1 alias_1 1 TRUE phylo
## entity2 entity2 alias_2 2 TRUE phylo
## entity3 entity3 alias_3 3 TRUE phylo
## entity4 entity4 alias_4 4 TRUE phylo
## entity5 entity5 alias_5 5 TRUE phylo
## ... ... ... ... ... ...
## entity6 entity6 alias_6 6 TRUE phylo
## entity7 entity7 alias_7 7 TRUE phylo
## entity8 entity8 alias_8 8 TRUE phylo
## entity9 entity9 alias_9 9 TRUE phylo
## entity10 entity10 alias_10 10 TRUE phylo
rowLinks(tse_ab)
## LinkDataFrame with 20 rows and 5 columns
## nodeLab nodeLab_alias nodeNum isLeaf whichTree
## <character> <character> <integer> <logical> <character>
## entity1 entity1 alias_1 1 TRUE phylo
## entity2 entity2 alias_2 2 TRUE phylo
## entity3 entity3 alias_3 3 TRUE phylo
## entity4 entity4 alias_4 4 TRUE phylo
## entity5 entity5 alias_5 5 TRUE phylo
## ... ... ... ... ... ...
## entity6 entity6 alias_6 6 TRUE phylo.1
## entity7 entity7 alias_7 7 TRUE phylo.1
## entity8 entity8 alias_8 8 TRUE phylo.1
## entity9 entity9 alias_9 9 TRUE phylo.1
## entity10 entity10 alias_10 10 TRUE phylo.1
The name of trees can be accessed using rowTreeNames
. If the input TSEs use the same name for trees, rbind
will automatically create valid and unique names for trees by using make.names
. tse_a
and tse_b
both use phylo
as the name of their row trees. In tse_ab
, the row tree that originates from tse_b
is named as phylo.1
instead.
rowTreeNames(tse_aa)
## [1] "phylo"
rowTreeNames(tse_ab)
## [1] "phylo" "phylo.1"
# The original tree names in the input TSEs
rowTreeNames(tse_a)
## [1] "phylo"
rowTreeNames(tse_b)
## [1] "phylo"
Once the name of trees is changed, the column whichTree
in the rowLinks()
is updated accordingly.
rowTreeNames(tse_ab) <- paste0("tree", 1:2)
rowLinks(tse_ab)
## LinkDataFrame with 20 rows and 5 columns
## nodeLab nodeLab_alias nodeNum isLeaf whichTree
## <character> <character> <integer> <logical> <character>
## entity1 entity1 alias_1 1 TRUE tree1
## entity2 entity2 alias_2 2 TRUE tree1
## entity3 entity3 alias_3 3 TRUE tree1
## entity4 entity4 alias_4 4 TRUE tree1
## entity5 entity5 alias_5 5 TRUE tree1
## ... ... ... ... ... ...
## entity6 entity6 alias_6 6 TRUE tree2
## entity7 entity7 alias_7 7 TRUE tree2
## entity8 entity8 alias_8 8 TRUE tree2
## entity9 entity9 alias_9 9 TRUE tree2
## entity10 entity10 alias_10 10 TRUE tree2
To run cbind
, TSEs should agree in the row dimension. If TSEs only differ in the row tree, the row tree and the row link data are dropped.
cbind(tse_a, tse_a)
## class: TreeSummarizedExperiment
## dim: 10 8
## metadata(0):
## assays(1): ''
## rownames(10): entity1 entity2 ... entity9 entity10
## rowData names(2): var1 var2
## colnames(8): sample1 sample2 ... sample3 sample4
## colData names(2): ID group
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## rowLinks: a LinkDataFrame (10 rows)
## rowTree: 1 phylo tree(s) (10 leaves)
## colLinks: NULL
## colTree: NULL
cbind(tse_a, tse_b)
## Warning in cbind(...): rowTree & rowLinks differ in the provided TSEs.
## rowTree & rowLinks are dropped after 'cbind'
## class: TreeSummarizedExperiment
## dim: 10 8
## metadata(0):
## assays(1): ''
## rownames(10): entity1 entity2 ... entity9 entity10
## rowData names(2): var1 var2
## colnames(8): sample1 sample2 ... sample3 sample4
## colData names(2): ID group
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## rowLinks: a LinkDataFrame (10 rows)
## rowTree: 1 phylo tree(s) (10 leaves)
## colLinks: NULL
## colTree: NULL
We obtain a subset of tse_ab
by extracting the data on rows 11:15
. These rows are mapped to the same tree named as phylo.1
. So, the rowTree
slot of sse
has only one tree.
(sse <- tse_ab[11:15, ])
## class: TreeSummarizedExperiment
## dim: 5 4
## metadata(0):
## assays(1): ''
## rownames(5): entity1 entity2 entity3 entity4 entity5
## rowData names(2): var1 var2
## colnames(4): sample1 sample2 sample3 sample4
## colData names(2): ID group
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## rowLinks: a LinkDataFrame (5 rows)
## rowTree: 1 phylo tree(s) (10 leaves)
## colLinks: NULL
## colTree: NULL
rowLinks(sse)
## LinkDataFrame with 5 rows and 5 columns
## nodeLab nodeLab_alias nodeNum isLeaf whichTree
## <character> <character> <integer> <logical> <character>
## entity1 entity1 alias_1 1 TRUE tree2
## entity2 entity2 alias_2 2 TRUE tree2
## entity3 entity3 alias_3 3 TRUE tree2
## entity4 entity4 alias_4 4 TRUE tree2
## entity5 entity5 alias_5 5 TRUE tree2
[
works not only as a getter but also a setter to replace a subset of sse
.
set.seed(3)
tse_c <- makeTSE(include.colTree = FALSE)
rowTreeNames(tse_c) <- "new_tree"
# the first two rows are from tse_c, and are mapped to 'new_tree'
sse[1:2, ] <- tse_c[5:6, ]
rowLinks(sse)
## LinkDataFrame with 5 rows and 5 columns
## nodeLab nodeLab_alias nodeNum isLeaf whichTree
## <character> <character> <integer> <logical> <character>
## entity5 entity5 alias_5 5 TRUE new_tree
## entity6 entity6 alias_6 6 TRUE new_tree
## entity3 entity3 alias_3 3 TRUE tree2
## entity4 entity4 alias_4 4 TRUE tree2
## entity5 entity5 alias_5 5 TRUE tree2
The TSE object can be subset also by nodes or/and trees using subsetByNodes
# by tree
sse_a <- subsetByNode(x = sse, whichRowTree = "new_tree")
rowLinks(sse_a)
## LinkDataFrame with 2 rows and 5 columns
## nodeLab nodeLab_alias nodeNum isLeaf whichTree
## <character> <character> <integer> <logical> <character>
## entity5 entity5 alias_5 5 TRUE new_tree
## entity6 entity6 alias_6 6 TRUE new_tree
# by node
sse_b <- subsetByNode(x = sse, rowNode = 5)
rowLinks(sse_b)
## LinkDataFrame with 2 rows and 5 columns
## nodeLab nodeLab_alias nodeNum isLeaf whichTree
## <character> <character> <integer> <logical> <character>
## entity5 entity5 alias_5 5 TRUE new_tree
## entity5 entity5 alias_5 5 TRUE tree2
# by tree and node
sse_c <- subsetByNode(x = sse, rowNode = 5, whichRowTree = "tree2")
rowLinks(sse_c)
## LinkDataFrame with 1 row and 5 columns
## nodeLab nodeLab_alias nodeNum isLeaf whichTree
## <character> <character> <integer> <logical> <character>
## entity5 entity5 alias_5 5 TRUE tree2
By using colTree
, we can add a column tree to sse
that has no column tree before.
colTree(sse)
## NULL
library(ape)
set.seed(1)
col_tree <- rtree(ncol(sse))
# To use 'colTree` as a setter, the input tree should have node labels matching
# with column names of the TSE.
col_tree$tip.label <- colnames(sse)
colTree(sse) <- col_tree
colTree(sse)
##
## Phylogenetic tree with 4 tips and 3 internal nodes.
##
## Tip labels:
## sample1, sample2, sample3, sample4
##
## Rooted; includes branch lengths.
sse
has two row trees. We can replace one of them with a new tree by
specifying whichTree
of the rowTree
.
# the original row links
rowLinks(sse)
## LinkDataFrame with 5 rows and 5 columns
## nodeLab nodeLab_alias nodeNum isLeaf whichTree
## <character> <character> <integer> <logical> <character>
## entity5 entity5 alias_5 5 TRUE new_tree
## entity6 entity6 alias_6 6 TRUE new_tree
## entity3 entity3 alias_3 3 TRUE tree2
## entity4 entity4 alias_4 4 TRUE tree2
## entity5 entity5 alias_5 5 TRUE tree2
# the new row tree
set.seed(1)
row_tree <- rtree(4)
row_tree$tip.label <- paste0("entity", 5:7)
# replace the tree named as the 'new_tree'
nse <- sse
rowTree(nse, whichTree = "new_tree") <- row_tree
rowLinks(nse)
## LinkDataFrame with 5 rows and 5 columns
## nodeLab nodeLab_alias nodeNum isLeaf whichTree
## <character> <character> <integer> <logical> <character>
## entity5 entity5 alias_1 1 TRUE new_tree
## entity6 entity6 alias_2 2 TRUE new_tree
## entity3 entity3 alias_3 3 TRUE tree2
## entity4 entity4 alias_4 4 TRUE tree2
## entity5 entity5 alias_5 5 TRUE tree2
In the row links, the first two rows now have new values in nodeNum
and
nodeLab_alias
. The name in whichTree
is not changed but the tree is actually
updated.
# FALSE is expected
identical(rowTree(sse, whichTree = "new_tree"),
rowTree(nse, whichTree = "new_tree"))
## [1] FALSE
# TRUE is expected
identical(rowTree(nse, whichTree = "new_tree"),
row_tree)
## [1] TRUE
If nodes of the input tree and rows of the TSE are named differently, users
can match rows with nodes via changeTree
with rowNodeLab
provided.
sessionInfo()
## R version 4.3.1 (2023-06-16)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.3 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.18-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] ggplot2_3.4.4 ggtree_3.10.0
## [3] ape_5.7-1 TreeSummarizedExperiment_2.10.0
## [5] Biostrings_2.70.0 XVector_0.42.0
## [7] SingleCellExperiment_1.24.0 SummarizedExperiment_1.32.0
## [9] Biobase_2.62.0 GenomicRanges_1.54.0
## [11] GenomeInfoDb_1.38.0 IRanges_2.36.0
## [13] S4Vectors_0.40.0 BiocGenerics_0.48.0
## [15] MatrixGenerics_1.14.0 matrixStats_1.0.0
## [17] BiocStyle_2.30.0
##
## loaded via a namespace (and not attached):
## [1] tidyselect_1.2.0 dplyr_1.1.3 farver_2.1.1
## [4] bitops_1.0-7 fastmap_1.1.1 lazyeval_0.2.2
## [7] RCurl_1.98-1.12 digest_0.6.33 lifecycle_1.0.3
## [10] tidytree_0.4.5 magrittr_2.0.3 compiler_4.3.1
## [13] rlang_1.1.1 sass_0.4.7 tools_4.3.1
## [16] utf8_1.2.4 yaml_2.3.7 knitr_1.44
## [19] S4Arrays_1.2.0 labeling_0.4.3 DelayedArray_0.28.0
## [22] aplot_0.2.2 abind_1.4-5 BiocParallel_1.36.0
## [25] withr_2.5.1 purrr_1.0.2 grid_4.3.1
## [28] fansi_1.0.5 colorspace_2.1-0 scales_1.2.1
## [31] cli_3.6.1 rmarkdown_2.25 crayon_1.5.2
## [34] treeio_1.26.0 generics_0.1.3 cachem_1.0.8
## [37] zlibbioc_1.48.0 parallel_4.3.1 ggplotify_0.1.2
## [40] BiocManager_1.30.22 vctrs_0.6.4 yulab.utils_0.1.0
## [43] Matrix_1.6-1.1 jsonlite_1.8.7 bookdown_0.36
## [46] gridGraphics_0.5-1 patchwork_1.1.3 magick_2.8.1
## [49] jquerylib_0.1.4 tidyr_1.3.0 glue_1.6.2
## [52] codetools_0.2-19 gtable_0.3.4 munsell_0.5.0
## [55] tibble_3.2.1 pillar_1.9.0 htmltools_0.5.6.1
## [58] GenomeInfoDbData_1.2.11 R6_2.5.1 evaluate_0.22
## [61] lattice_0.22-5 memoise_2.0.1 ggfun_0.1.3
## [64] bslib_0.5.1 Rcpp_1.0.11 SparseArray_1.2.0
## [67] nlme_3.1-163 xfun_0.40 fs_1.6.3
## [70] pkgconfig_2.0.3