The MetaGxPancreas package is a compendium of Pancreatic Cancer datasets. The package is publicly available and can be installed from Bioconductor into R version 3.6.0 or higher. Currently, the phenoData for the datasets is overall survival status and overall survival time. This survival information is available for 11 of the 15 datasets.
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("MetaGxPancreas")
First we load the MetaGxPancreas package into the workspace.
library(MetaGxPancreas)
## Loading required package: SummarizedExperiment
## Loading required package: MatrixGenerics
## Loading required package: matrixStats
##
## Attaching package: 'MatrixGenerics'
## The following objects are masked from 'package:matrixStats':
##
## colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
## colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
## colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
## colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
## colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
## colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
## colWeightedMeans, colWeightedMedians, colWeightedSds,
## colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
## rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
## rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
## rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
## rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
## rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
## rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
## rowWeightedSds, rowWeightedVars
## Loading required package: GenomicRanges
## Loading required package: stats4
## Loading required package: BiocGenerics
## Loading required package: parallel
##
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:parallel':
##
## clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
## clusterExport, clusterMap, parApply, parCapply, parLapply,
## parLapplyLB, parRapply, parSapply, parSapplyLB
## The following objects are masked from 'package:stats':
##
## IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
##
## Filter, Find, Map, Position, Reduce, anyDuplicated, append,
## as.data.frame, basename, cbind, colnames, dirname, do.call,
## duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
## lapply, mapply, match, mget, order, paste, pmax, pmax.int, pmin,
## pmin.int, rank, rbind, rownames, sapply, setdiff, sort, table,
## tapply, union, unique, unsplit, which.max, which.min
## Loading required package: S4Vectors
##
## Attaching package: 'S4Vectors'
## The following objects are masked from 'package:base':
##
## I, expand.grid, unname
## Loading required package: IRanges
## Loading required package: GenomeInfoDb
## Loading required package: Biobase
## Welcome to Bioconductor
##
## Vignettes contain introductory material; view with
## 'browseVignettes()'. To cite Bioconductor, see
## 'citation("Biobase")', and for packages 'citation("pkgname")'.
##
## Attaching package: 'Biobase'
## The following object is masked from 'package:MatrixGenerics':
##
## rowMedians
## The following objects are masked from 'package:matrixStats':
##
## anyMissing, rowMedians
## Loading required package: ExperimentHub
## Loading required package: AnnotationHub
## Loading required package: BiocFileCache
## Loading required package: dbplyr
##
## Attaching package: 'AnnotationHub'
## The following object is masked from 'package:Biobase':
##
## cache
pancreasData <- loadPancreasDatasets()
## snapshotDate(): 2021-05-18
## Filtered out duplicated samples: ICGC_0400, ICGC_0402, GSM388116, GSM388118, GSM388120, GSM388145, GSM299238, GSM299239, GSM299240
duplicates <- pancreasData$duplicates
SEs <- pancreasData$SEs
This will load 15 expression datasets. Users can modify the parameters of the function to restrict datasets that do not meet certain criteria for loading. Some example parameters are shown below:
To obtain the number of samples per dataset, run the following:
numSamples <- vapply(SEs, function(SE) length(colnames(SE)), FUN.VALUE=numeric(1))
sampleNumberByDataset <- data.frame(numSamples=numSamples,
row.names=names(SEs))
totalNumSamples <- sum(sampleNumberByDataset$numSamples)
sampleNumberByDataset <- rbind(sampleNumberByDataset, totalNumSamples)
rownames(sampleNumberByDataset)[nrow(sampleNumberByDataset)] <- 'Total'
knitr::kable(sampleNumberByDataset)
X0 | |
---|---|
Total | 0 |
sessionInfo()
## R version 4.1.0 (2021-05-18)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.2 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.13-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.13-bioc/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] parallel stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] MetaGxPancreas_1.12.1 ExperimentHub_2.0.0
## [3] AnnotationHub_3.0.1 BiocFileCache_2.0.0
## [5] dbplyr_2.1.1 SummarizedExperiment_1.22.0
## [7] Biobase_2.52.0 GenomicRanges_1.44.0
## [9] GenomeInfoDb_1.28.0 IRanges_2.26.0
## [11] S4Vectors_0.30.0 BiocGenerics_0.38.0
## [13] MatrixGenerics_1.4.0 matrixStats_0.59.0
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.6 lattice_0.20-44
## [3] Biostrings_2.60.1 png_0.1-7
## [5] assertthat_0.2.1 digest_0.6.27
## [7] utf8_1.2.1 mime_0.10
## [9] R6_2.5.0 RSQLite_2.2.7
## [11] evaluate_0.14 highr_0.9
## [13] httr_1.4.2 pillar_1.6.1
## [15] zlibbioc_1.38.0 rlang_0.4.11
## [17] curl_4.3.1 blob_1.2.1
## [19] Matrix_1.3-4 stringr_1.4.0
## [21] RCurl_1.98-1.3 bit_4.0.4
## [23] shiny_1.6.0 DelayedArray_0.18.0
## [25] compiler_4.1.0 httpuv_1.6.1
## [27] xfun_0.24 pkgconfig_2.0.3
## [29] htmltools_0.5.1.1 KEGGREST_1.32.0
## [31] tidyselect_1.1.1 tibble_3.1.2
## [33] GenomeInfoDbData_1.2.6 interactiveDisplayBase_1.30.0
## [35] fansi_0.5.0 withr_2.4.2
## [37] crayon_1.4.1 dplyr_1.0.7
## [39] later_1.2.0 bitops_1.0-7
## [41] rappdirs_0.3.3 grid_4.1.0
## [43] xtable_1.8-4 lifecycle_1.0.0
## [45] DBI_1.1.1 magrittr_2.0.1
## [47] impute_1.66.0 stringi_1.6.2
## [49] cachem_1.0.5 XVector_0.32.0
## [51] promises_1.2.0.1 ellipsis_0.3.2
## [53] filelock_1.0.2 generics_0.1.0
## [55] vctrs_0.3.8 tools_4.1.0
## [57] bit64_4.0.5 glue_1.4.2
## [59] purrr_0.3.4 BiocVersion_3.13.1
## [61] fastmap_1.1.0 yaml_2.2.1
## [63] AnnotationDbi_1.54.1 BiocManager_1.30.16
## [65] memoise_2.0.0 knitr_1.33