MultiAssayExperiment
A DataFrame
describing the characteristics of the biological units. In The Cancer Genome Atlas data, for example, the biological units are patients.
Key points:
pheno <- DataFrame(id = 1:4, type = c("a", "a", "b", "b"),
sex = c("M", "F", "M", "F"),
row.names = c("Bob", "Sandy", "Jake", "Lauren"))
A base list
or ExperimentList
object containing the experimental datasets for the set of samples collected. This gets converted into a class ExperimentList
during construction.
Key points:
[
, dimnames
, dim
dataset1 <- matrix(rnorm(20, 5, 1), ncol = 5,
dimnames = list(paste0("GENE", 4:1),
paste0("sample", LETTERS[1:5])))
dataset2 <- matrix(rnorm(12, 3, 2), ncol = 3,
dimnames = list(paste0("ENST0000", 1:4),
paste0("samp", letters[1:3])))
expList <- list(exp1 = dataset1, exp2 = dataset2)
expList
## $exp1
## sampleA sampleB sampleC sampleD sampleE
## GENE4 3.720379 4.265752 4.162123 4.733160 4.088411
## GENE3 4.611418 5.330292 3.975553 3.765205 3.353615
## GENE2 5.756995 4.737932 5.380220 5.267235 4.974819
## GENE1 4.578920 4.355404 2.792270 3.555878 4.809017
##
## $exp2
## sampa sampb sampc
## ENST00001 6.374843 3.665216509 5.03255396
## ENST00002 2.078270 7.911929690 0.01464498
## ENST00003 3.923976 5.670601129 -2.35608685
## ENST00004 3.141602 -0.009088183 2.64769758
A DataFrame
graph representation of the relationship between the experiments (assay
column name), biological units (primary
), and samples (colname
). Helper functions are available for creating a map from a list. See ?listToMap
Key points: * relates experimental observations (colnames
) to colData
* permits experiment-specific sample naming, missing, and replicate observations
map1 <- DataFrame(primary = c("Bob", "Jake", "Sandy", "Sandy", "Lauren"),
colname = paste0("sample", LETTERS[1:5]))
map2 <- DataFrame(primary = c("Jake", "Sandy", "Lauren"),
colname = paste0("samp", letters[1:3]))
sampMap <- listToMap(list(exp1 = map1, exp2 = map2))
sampMap
## DataFrame with 8 rows and 3 columns
## assay primary colname
## <factor> <character> <character>
## 1 exp1 Bob sampleA
## 2 exp1 Jake sampleB
## 3 exp1 Sandy sampleC
## 4 exp1 Sandy sampleD
## 5 exp1 Lauren sampleE
## 6 exp2 Jake sampa
## 7 exp2 Sandy sampb
## 8 exp2 Lauren sampc
The MultiAssayExperiment
constructor function can take three arguments:
experiments
- An ExperimentList
or list
of datacolData
- A DataFrame
describing the biological unitssampleMap
- A DataFrame
of assay
, primary
, and colname
identifiers(mae <- MultiAssayExperiment(expList, pheno, sampMap))
## A MultiAssayExperiment object of 2 listed
## experiments with user-defined names and respective classes.
## Containing an ExperimentList class object of length 2:
## [1] exp1: matrix with 4 rows and 5 columns
## [2] exp2: matrix with 4 rows and 3 columns
## Features:
## experiments() - obtain the ExperimentList instance
## colData() - the primary/phenotype DataFrame
## sampleMap() - the sample availability DataFrame
## `$`, `[`, `[[` - extract colData columns, subset, or experiment
## *Format() - convert ExperimentList into a long or wide DataFrame
## assays() - convert ExperimentList to a SimpleList of matrices
[
In pseudo code below, the subsetting operations work on the rows of the following indices: 1. i experimental data rows 2. j the primary names or the column names (entered as a list
or List
) 3. k assay
multiassayexperiment[i = rownames, j = primary or colnames, k = assay]
Examples:
mae[c("GENE4", "ENST00002"), , ]
## A MultiAssayExperiment object of 2 listed
## experiments with user-defined names and respective classes.
## Containing an ExperimentList class object of length 2:
## [1] exp1: matrix with 1 rows and 5 columns
## [2] exp2: matrix with 1 rows and 3 columns
## Features:
## experiments() - obtain the ExperimentList instance
## colData() - the primary/phenotype DataFrame
## sampleMap() - the sample availability DataFrame
## `$`, `[`, `[[` - extract colData columns, subset, or experiment
## *Format() - convert ExperimentList into a long or wide DataFrame
## assays() - convert ExperimentList to a SimpleList of matrices
mae[, c("Bob", "Jake", "Sandy"), ]
## harmonizing input:
## removing 2 sampleMap rows with 'colname' not in colnames of experiments
## removing 1 colData rownames not in sampleMap 'primary'
## A MultiAssayExperiment object of 2 listed
## experiments with user-defined names and respective classes.
## Containing an ExperimentList class object of length 2:
## [1] exp1: matrix with 4 rows and 4 columns
## [2] exp2: matrix with 4 rows and 2 columns
## Features:
## experiments() - obtain the ExperimentList instance
## colData() - the primary/phenotype DataFrame
## sampleMap() - the sample availability DataFrame
## `$`, `[`, `[[` - extract colData columns, subset, or experiment
## *Format() - convert ExperimentList into a long or wide DataFrame
## assays() - convert ExperimentList to a SimpleList of matrices
mae[, , "exp1"]
## A MultiAssayExperiment object of 1 listed
## experiment with a user-defined name and respective class.
## Containing an ExperimentList class object of length 1:
## [1] exp1: matrix with 4 rows and 5 columns
## Features:
## experiments() - obtain the ExperimentList instance
## colData() - the primary/phenotype DataFrame
## sampleMap() - the sample availability DataFrame
## `$`, `[`, `[[` - extract colData columns, subset, or experiment
## *Format() - convert ExperimentList into a long or wide DataFrame
## assays() - convert ExperimentList to a SimpleList of matrices
[[
The “double bracket” method ([[
) is a convenience function for extracting a single element of the MultiAssayExperiment
ExperimentList
. It avoids the use of experiments(mae)[[1L]]
. For example:
mae[[1L]]
## sampleA sampleB sampleC sampleD sampleE
## GENE4 3.720379 4.265752 4.162123 4.733160 4.088411
## GENE3 4.611418 5.330292 3.975553 3.765205 3.353615
## GENE2 5.756995 4.737932 5.380220 5.267235 4.974819
## GENE1 4.578920 4.355404 2.792270 3.555878 4.809017
will extract the first experiment in the ExperimentList
in the class that it was stored in.
The assay
and assays
methods follow SummarizedExperiment
convention. The assay
(singular) method will extract the first element of the ExperimentList
and will return a matrix
.
assay(mae)
## sampleA sampleB sampleC sampleD sampleE
## GENE4 3.720379 4.265752 4.162123 4.733160 4.088411
## GENE3 4.611418 5.330292 3.975553 3.765205 3.353615
## GENE2 5.756995 4.737932 5.380220 5.267235 4.974819
## GENE1 4.578920 4.355404 2.792270 3.555878 4.809017
The assays
(plurar) method will return a SimpleList
of the data with each element being a matrix
.
assays(mae)
## List of length 2
## names(2): exp1 exp2
Each slot
in the MultiAssayExperiment
has its convenient accessor function. See the table below.
Slot | Accessor |
---|---|
ExperimentList |
experiments |
colData |
colData / $ * |
sampleMap |
sampleMap |
metadata |
metadata |
__*__ The $
operator on a MultiAssayExperiment
will return a single column of colData
. For example:
mae$sex
## [1] "M" "F" "M" "F"
longFormat
& wideFormat
The longFormat
or wideFormat
functions will “reshape” and combine your data into one DataFrame
. This is accomplished using either the long or wide format function.
longFormat(mae)
## DataFrame with 32 rows and 5 columns
## assay rowname colname value primary
## <Rle> <character> <Rle> <numeric> <Rle>
## 1 exp1 GENE4 sampleA 3.720379 Bob
## 2 exp1 GENE3 sampleA 4.611418 Bob
## 3 exp1 GENE2 sampleA 5.756995 Bob
## 4 exp1 GENE1 sampleA 4.578920 Bob
## 5 exp1 GENE4 sampleB 4.265752 Jake
## ... ... ... ... ... ...
## 28 exp2 ENST00004 sampb -0.009088183 Sandy
## 29 exp2 ENST00001 sampc 5.032553957 Lauren
## 30 exp2 ENST00002 sampc 0.014644979 Lauren
## 31 exp2 ENST00003 sampc -2.356086852 Lauren
## 32 exp2 ENST00004 sampc 2.647697584 Lauren
For a wide dataset, use the wideFormat
function.
wideFormat(mae)[, 1:4]
## DataFrame with 4 rows and 4 columns
## primary exp1_GENE1_sampleA exp1_GENE1_sampleB exp1_GENE1_sampleC
## <factor> <numeric> <numeric> <numeric>
## 1 Bob 4.57892 NA NA
## 2 Jake NA 4.355404 NA
## 3 Lauren NA NA NA
## 4 Sandy NA NA 2.79227
c
- combineThe c
function allows the user to insert an additional experiment into an already created MultiAssayExperiment
.
A sampleMap
can be provided using in order to map colData
rows to experiment column names. In the following example, the “exp3” experiment contains repeated measurements for Bob.
(maec1 <- c(x = mae,
exp3 = matrix(rnorm(10), ncol = 5,
dimnames = list(paste0("GENE", c("A", "B")),
paste0("sample", LETTERS[1:5]))),
sampleMap = DataFrame(assay = "exp3",
primary = c("Bob", "Bob", "Sandy", "Jake", "Lauren"),
colname = paste0("sample", LETTERS[1:5])
)
))
## A MultiAssayExperiment object of 3 listed
## experiments with user-defined names and respective classes.
## Containing an ExperimentList class object of length 3:
## [1] exp1: matrix with 4 rows and 5 columns
## [2] exp2: matrix with 4 rows and 3 columns
## [3] exp3: matrix with 2 rows and 5 columns
## Features:
## experiments() - obtain the ExperimentList instance
## colData() - the primary/phenotype DataFrame
## sampleMap() - the sample availability DataFrame
## `$`, `[`, `[[` - extract colData columns, subset, or experiment
## *Format() - convert ExperimentList into a long or wide DataFrame
## assays() - convert ExperimentList to a SimpleList of matrices
sampleMap(maec1)
## DataFrame with 13 rows and 3 columns
## assay primary colname
## <factor> <character> <character>
## 1 exp1 Bob sampleA
## 2 exp1 Jake sampleB
## 3 exp1 Sandy sampleC
## 4 exp1 Sandy sampleD
## 5 exp1 Lauren sampleE
## ... ... ... ...
## 9 exp3 Bob sampleA
## 10 exp3 Bob sampleB
## 11 exp3 Sandy sampleC
## 12 exp3 Jake sampleD
## 13 exp3 Lauren sampleE
For convenience, the mapFrom argument allows the user to map from a particular experiment provided that the order of the colnames is in the same. A warning
will be issued to make the user aware of this assumption.
(maec2 <- c(x = mae,
exp3 = matrix(rnorm(10), ncol = 5,
dimnames = list(paste0("GENE", c("A", "B")),
paste0("sample", LETTERS[1:5]))),
mapFrom = 1L))
## Warning in .local(x, ...): Assuming column order in the data provided
## matches the order in 'mapFrom' experiment(s) colnames
## A MultiAssayExperiment object of 3 listed
## experiments with user-defined names and respective classes.
## Containing an ExperimentList class object of length 3:
## [1] exp1: matrix with 4 rows and 5 columns
## [2] exp2: matrix with 4 rows and 3 columns
## [3] exp3: matrix with 2 rows and 5 columns
## Features:
## experiments() - obtain the ExperimentList instance
## colData() - the primary/phenotype DataFrame
## sampleMap() - the sample availability DataFrame
## `$`, `[`, `[[` - extract colData columns, subset, or experiment
## *Format() - convert ExperimentList into a long or wide DataFrame
## assays() - convert ExperimentList to a SimpleList of matrices
prepMultiAssay
- Constructor function helperThe prepMultiAssay
function allows the user to diagnose typical problems when creating a MultiAssayExperiment
object. See ?prepMultiAssay
for more details.
sessionInfo()
## R version 3.4.0 (2017-04-21)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.2 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.5-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.5-bioc/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] parallel stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] SummarizedExperiment_1.6.1 DelayedArray_0.2.3
## [3] matrixStats_0.52.2 Biobase_2.36.2
## [5] GenomicRanges_1.28.2 GenomeInfoDb_1.12.0
## [7] IRanges_2.10.1 S4Vectors_0.14.1
## [9] BiocGenerics_0.22.0 MultiAssayExperiment_1.2.1
## [11] BiocStyle_2.4.0
##
## loaded via a namespace (and not attached):
## [1] Rcpp_0.12.10 plyr_1.8.4
## [3] compiler_3.4.0 XVector_0.16.0
## [5] bitops_1.0-6 tools_3.4.0
## [7] zlibbioc_1.22.0 digest_0.6.12
## [9] tibble_1.3.1 gtable_0.2.0
## [11] evaluate_0.10 lattice_0.20-35
## [13] rlang_0.1 Matrix_1.2-10
## [15] shiny_1.0.3 yaml_2.1.14
## [17] gridExtra_2.2.1 GenomeInfoDbData_0.99.0
## [19] UpSetR_1.3.3 stringr_1.2.0
## [21] knitr_1.15.1 rprojroot_1.2
## [23] grid_3.4.0 shinydashboard_0.5.3
## [25] R6_2.2.1 rmarkdown_1.5
## [27] bookdown_0.3 tidyr_0.6.3
## [29] reshape2_1.4.2 ggplot2_2.2.1
## [31] magrittr_1.5 scales_0.4.1
## [33] backports_1.0.5 htmltools_0.3.6
## [35] colorspace_1.3-2 mime_0.5
## [37] xtable_1.8-2 httpuv_1.3.3
## [39] stringi_1.1.5 lazyeval_0.2.0
## [41] munsell_0.4.3 RCurl_1.95-4.8