Brings SummarizedExperiment to the tidyverse!
website: stemangiola.github.io/tidySummarizedExperiment/
Please also have a look at
tidySummarizedExperiment provides a bridge between Bioconductor SummarizedExperiment [@morgan2020summarized] and the tidyverse [@wickham2019welcome]. It creates an invisible layer that enables viewing the Bioconductor SummarizedExperiment object as a tidyverse tibble, and provides SummarizedExperiment-compatible dplyr, tidyr, ggplot and plotly functions. This allows users to get the best of both Bioconductor and tidyverse worlds.
SummarizedExperiment-compatible Functions | Description |
---|---|
all |
After all tidySummarizedExperiment is a SummarizedExperiment object, just better |
tidyverse Packages | Description |
---|---|
dplyr |
Almost all dplyr APIs like for any tibble |
tidyr |
Almost all tidyr APIs like for any tibble |
ggplot2 |
ggplot like for any tibble |
plotly |
plot_ly like for any tibble |
Utilities | Description |
---|---|
tidy |
Add tidySummarizedExperiment invisible layer over a SummarizedExperiment object |
as_tibble |
Convert cell-wise information to a tbl_df |
if (!requireNamespace("BiocManager", quietly=TRUE)) {
install.packages("BiocManager")
}
BiocManager::install("tidySummarizedExperiment")
From Github (development)
devtools::install_github("stemangiola/tidySummarizedExperiment")
Load libraries used in the examples.
library(ggplot2)
library(tidySummarizedExperiment)
tidySummarizedExperiment
, the best of both worlds!This is a SummarizedExperiment object but it is evaluated as a tibble. So it is fully compatible both with SummarizedExperiment and tidyverse APIs.
pasilla_tidy <- tidySummarizedExperiment::pasilla %>%
tidy()
It looks like a tibble
pasilla_tidy
## # A tibble abstraction: 102,193 x 5
## sample condition type transcript counts
## <chr> <chr> <chr> <chr> <int>
## 1 untrt1 untreated single_end FBgn0000003 0
## 2 untrt1 untreated single_end FBgn0000008 92
## 3 untrt1 untreated single_end FBgn0000014 5
## 4 untrt1 untreated single_end FBgn0000015 0
## 5 untrt1 untreated single_end FBgn0000017 4664
## 6 untrt1 untreated single_end FBgn0000018 583
## 7 untrt1 untreated single_end FBgn0000022 0
## 8 untrt1 untreated single_end FBgn0000024 10
## 9 untrt1 untreated single_end FBgn0000028 0
## 10 untrt1 untreated single_end FBgn0000032 1446
## # … with 40 more rows
But it is a SummarizedExperiment object after all
Assays(pasilla_tidy)
## An object of class "SimpleAssays"
## Slot "data":
## List of length 1
We can use tidyverse commands to explore the tidy SummarizedExperiment object.
We can use slice
to choose rows by position, for example to choose the first row.
pasilla_tidy %>%
slice(1)
## # A tibble abstraction: 1 x 5
## sample condition type transcript counts
## <chr> <chr> <chr> <chr> <int>
## 1 untrt1 untreated single_end FBgn0000003 0
We can use filter
to choose rows by criteria.
pasilla_tidy %>%
filter(condition == "untreated")
## # A tibble abstraction: 58,396 x 5
## sample condition type transcript counts
## <chr> <chr> <chr> <chr> <int>
## 1 untrt1 untreated single_end FBgn0000003 0
## 2 untrt1 untreated single_end FBgn0000008 92
## 3 untrt1 untreated single_end FBgn0000014 5
## 4 untrt1 untreated single_end FBgn0000015 0
## 5 untrt1 untreated single_end FBgn0000017 4664
## 6 untrt1 untreated single_end FBgn0000018 583
## 7 untrt1 untreated single_end FBgn0000022 0
## 8 untrt1 untreated single_end FBgn0000024 10
## 9 untrt1 untreated single_end FBgn0000028 0
## 10 untrt1 untreated single_end FBgn0000032 1446
## # … with 40 more rows
We can use select
to choose columns.
pasilla_tidy %>%
select(sample)
## # A tibble: 102,193 x 1
## sample
## <chr>
## 1 untrt1
## 2 untrt1
## 3 untrt1
## 4 untrt1
## 5 untrt1
## 6 untrt1
## 7 untrt1
## 8 untrt1
## 9 untrt1
## 10 untrt1
## # … with 102,183 more rows
We can use count
to count how many rows we have for each sample.
pasilla_tidy %>%
count(sample)
## # A tibble: 7 x 2
## sample n
## <chr> <int>
## 1 trt1 14599
## 2 trt2 14599
## 3 trt3 14599
## 4 untrt1 14599
## 5 untrt2 14599
## 6 untrt3 14599
## 7 untrt4 14599
We can use distinct
to see what distinct sample information we have.
pasilla_tidy %>%
distinct(sample, condition, type)
## # A tibble: 7 x 3
## sample condition type
## <chr> <chr> <chr>
## 1 untrt1 untreated single_end
## 2 untrt2 untreated single_end
## 3 untrt3 untreated paired_end
## 4 untrt4 untreated paired_end
## 5 trt1 treated single_end
## 6 trt2 treated paired_end
## 7 trt3 treated paired_end
We could use rename
to rename a column. For example, to modify the type column name.
pasilla_tidy %>%
rename(sequencing=type)
## # A tibble abstraction: 102,193 x 5
## sample condition sequencing transcript counts
## <chr> <chr> <chr> <chr> <int>
## 1 untrt1 untreated single_end FBgn0000003 0
## 2 untrt1 untreated single_end FBgn0000008 92
## 3 untrt1 untreated single_end FBgn0000014 5
## 4 untrt1 untreated single_end FBgn0000015 0
## 5 untrt1 untreated single_end FBgn0000017 4664
## 6 untrt1 untreated single_end FBgn0000018 583
## 7 untrt1 untreated single_end FBgn0000022 0
## 8 untrt1 untreated single_end FBgn0000024 10
## 9 untrt1 untreated single_end FBgn0000028 0
## 10 untrt1 untreated single_end FBgn0000032 1446
## # … with 40 more rows
We could use mutate
to create a column. For example, we could create a new type column that contains single
and paired instead of single_end and paired_end.
pasilla_tidy %>%
mutate(type=gsub("_end", "", type))
## # A tibble abstraction: 102,193 x 5
## sample condition type transcript counts
## <chr> <chr> <chr> <chr> <int>
## 1 untrt1 untreated single FBgn0000003 0
## 2 untrt1 untreated single FBgn0000008 92
## 3 untrt1 untreated single FBgn0000014 5
## 4 untrt1 untreated single FBgn0000015 0
## 5 untrt1 untreated single FBgn0000017 4664
## 6 untrt1 untreated single FBgn0000018 583
## 7 untrt1 untreated single FBgn0000022 0
## 8 untrt1 untreated single FBgn0000024 10
## 9 untrt1 untreated single FBgn0000028 0
## 10 untrt1 untreated single FBgn0000032 1446
## # … with 40 more rows
We could use unite
to combine multiple columns into a single column.
pasilla_tidy %>%
unite("group", c(condition, type))
## # A tibble abstraction: 102,193 x 4
## sample group transcript counts
## <chr> <chr> <chr> <int>
## 1 untrt1 untreated_single_end FBgn0000003 0
## 2 untrt1 untreated_single_end FBgn0000008 92
## 3 untrt1 untreated_single_end FBgn0000014 5
## 4 untrt1 untreated_single_end FBgn0000015 0
## 5 untrt1 untreated_single_end FBgn0000017 4664
## 6 untrt1 untreated_single_end FBgn0000018 583
## 7 untrt1 untreated_single_end FBgn0000022 0
## 8 untrt1 untreated_single_end FBgn0000024 10
## 9 untrt1 untreated_single_end FBgn0000028 0
## 10 untrt1 untreated_single_end FBgn0000032 1446
## # … with 40 more rows
We can also combine commands with the tidyverse pipe %>%
.
For example, we could combine group_by
and summarise
to get the total counts for each sample.
pasilla_tidy %>%
group_by(sample) %>%
summarise(total_counts=sum(counts))
## # A tibble: 7 x 2
## sample total_counts
## <chr> <int>
## 1 trt1 18670279
## 2 trt2 9571826
## 3 trt3 10343856
## 4 untrt1 13972512
## 5 untrt2 21911438
## 6 untrt3 8358426
## 7 untrt4 9841335
We could combine group_by
, mutate
and filter
to get the transcripts with mean count > 0.
pasilla_tidy %>%
group_by(transcript) %>%
mutate(mean_count=mean(counts)) %>%
filter(mean_count > 0)
## # A tibble: 86,513 x 6
## # Groups: transcript [12,359]
## sample condition type transcript counts mean_count
## <chr> <chr> <chr> <chr> <int> <dbl>
## 1 untrt1 untreated single_end FBgn0000003 0 0.143
## 2 untrt1 untreated single_end FBgn0000008 92 99.6
## 3 untrt1 untreated single_end FBgn0000014 5 1.43
## 4 untrt1 untreated single_end FBgn0000015 0 0.857
## 5 untrt1 untreated single_end FBgn0000017 4664 4672.
## 6 untrt1 untreated single_end FBgn0000018 583 461.
## 7 untrt1 untreated single_end FBgn0000022 0 0.143
## 8 untrt1 untreated single_end FBgn0000024 10 7
## 9 untrt1 untreated single_end FBgn0000028 0 0.429
## 10 untrt1 untreated single_end FBgn0000032 1446 1085.
## # … with 86,503 more rows
my_theme <-
list(
scale_fill_brewer(palette="Set1"),
scale_color_brewer(palette="Set1"),
theme_bw() +
theme(
panel.border=element_blank(),
axis.line=element_line(),
panel.grid.major=element_line(size=0.2),
panel.grid.minor=element_line(size=0.1),
text=element_text(size=12),
legend.position="bottom",
aspect.ratio=1,
strip.background=element_blank(),
axis.title.x=element_text(margin=margin(t=10, r=10, b=10, l=10)),
axis.title.y=element_text(margin=margin(t=10, r=10, b=10, l=10))
)
)
We can treat pasilla_tidy
as a normal tibble for plotting.
Here we plot the distribution of counts per sample.
pasilla_tidy %>%
tidySummarizedExperiment::ggplot(aes(counts + 1, group=sample, color=`type`)) +
geom_density() +
scale_x_log10() +
my_theme
sessionInfo()
## R version 4.0.3 (2020-10-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.5 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.12-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.12-bioc/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] parallel stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] tidySummarizedExperiment_1.0.0 SummarizedExperiment_1.20.0
## [3] Biobase_2.50.0 GenomicRanges_1.42.0
## [5] GenomeInfoDb_1.26.0 IRanges_2.24.0
## [7] S4Vectors_0.28.0 BiocGenerics_0.36.0
## [9] MatrixGenerics_1.2.0 matrixStats_0.57.0
## [11] ggplot2_3.3.2 knitr_1.30
##
## loaded via a namespace (and not attached):
## [1] tidyselect_1.1.0 xfun_0.18 purrr_0.3.4
## [4] lattice_0.20-41 colorspace_1.4-1 vctrs_0.3.4
## [7] generics_0.0.2 htmltools_0.5.0 viridisLite_0.3.0
## [10] utf8_1.1.4 plotly_4.9.2.1 rlang_0.4.8
## [13] pillar_1.4.6 glue_1.4.2 withr_2.3.0
## [16] RColorBrewer_1.1-2 GenomeInfoDbData_1.2.4 lifecycle_0.2.0
## [19] stringr_1.4.0 zlibbioc_1.36.0 munsell_0.5.0
## [22] gtable_0.3.0 htmlwidgets_1.5.2 evaluate_0.14
## [25] labeling_0.4.2 fansi_0.4.1 highr_0.8
## [28] scales_1.1.1 DelayedArray_0.16.0 jsonlite_1.7.1
## [31] XVector_0.30.0 farver_2.0.3 digest_0.6.27
## [34] stringi_1.5.3 dplyr_1.0.2 grid_4.0.3
## [37] cli_2.1.0 tools_4.0.3 bitops_1.0-6
## [40] magrittr_1.5 RCurl_1.98-1.2 lazyeval_0.2.2
## [43] tibble_3.0.4 crayon_1.3.4 tidyr_1.1.2
## [46] pkgconfig_2.0.3 ellipsis_0.3.1 Matrix_1.2-18
## [49] data.table_1.13.2 assertthat_0.2.1 httr_1.4.2
## [52] R6_2.4.1 compiler_4.0.3