library(cBioPortalData)
library(AnVIL)
This document serves as a reporting tool for errors that occur when running our utility functions on the cBioPortal datasets.
cBioPortalData()
)Typically, the number of errors encountered via the API are low. There are only a handful of packages that error when we apply the utility functions to provide a MultiAssayExperiment data representation.
First, we load the error Rda
dataset.
api_errs <- system.file(
"extdata", "api", "err_api_info.rda",
package = "cBioPortalData", mustWork = TRUE
)
load(api_errs)
We can now inspect the contents of the data:
class(err_api_info)
## [1] "list"
length(err_api_info)
## [1] 6
lengths(err_api_info)
## Barcodes must start with 'TCGA'
## 2
## group length is 0 but data length > 0
## 1
## Frequency of NA values higher than the cutoff tolerance
## 2
## Inconsistent build numbers found
## 33
## `n` must be a single number, not an integer `NA`.
## 1
## Argument 1 must be a data frame or a named atomic vector.
## 1
There were about 6 unique errors during the last build run.
names(err_api_info)
## [1] "Barcodes must start with 'TCGA'"
## [2] "group length is 0 but data length > 0"
## [3] "Frequency of NA values higher than the cutoff tolerance"
## [4] "Inconsistent build numbers found"
## [5] "`n` must be a single number, not an integer `NA`."
## [6] "Argument 1 must be a data frame or a named atomic vector."
The most common error was Inconsistent build numbers found
. This is
due to annotations from different build numbers that were not able to
be resolved.
To see what datasets (cancer_study_id
s) have that error we can use:
err_api_info[['Inconsistent build numbers found']]
## [1] "msk_ch_2020" "msk_access_2021"
## [3] "mixed_msk_tcga_2021" "mixed_impact_subset_2022"
## [5] "pan_origimed_2020" "prad_msk_stopsack_2021"
## [7] "pancan_pcawg_2020" "prad_pik3r1_msk_2021"
## [9] "skcm_tcga" "stad_tcga"
## [11] "stad_tcga_pub" "skcm_tcga_pan_can_atlas_2018"
## [13] "stad_tcga_pan_can_atlas_2018" "stes_tcga_pub"
## [15] "summit_2018" "cfdna_msk_2019"
## [17] "blca_bcan_hcrn_2022" "nsclc_ctdx_msk_2022"
## [19] "thyroid_mskcc_2016" "skcm_mskcc_2014"
## [21] "tmb_mskcc_2018" "rectal_msk_2019"
## [23] "skcm_tcga_pub_2015" "msk_spectrum_tme_2022"
## [25] "ucec_ccr_cfdna_msk_2022" "paired_bladder_2022"
## [27] "mtnn_msk_2022" "pog570_bcgsc_2020"
## [29] "sarcoma_msk_2023" "bowel_colitis_msk_2022"
## [31] "luad_mskcc_2023_met_organotropism" "coad_silu_2022"
## [33] "paac_msk_jco_2023"
We can also have a look at the entirety of the dataset.
err_api_info
## $`Barcodes must start with 'TCGA'`
## [1] "blca_msk_tcga_2020" "nsclc_tcga_broad_2016"
##
## $`group length is 0 but data length > 0`
## [1] "glioma_msk_2018"
##
## $`Frequency of NA values higher than the cutoff tolerance`
## [1] "mixed_selpercatinib_2020" "ucec_ccr_msk_2022"
##
## $`Inconsistent build numbers found`
## [1] "msk_ch_2020" "msk_access_2021"
## [3] "mixed_msk_tcga_2021" "mixed_impact_subset_2022"
## [5] "pan_origimed_2020" "prad_msk_stopsack_2021"
## [7] "pancan_pcawg_2020" "prad_pik3r1_msk_2021"
## [9] "skcm_tcga" "stad_tcga"
## [11] "stad_tcga_pub" "skcm_tcga_pan_can_atlas_2018"
## [13] "stad_tcga_pan_can_atlas_2018" "stes_tcga_pub"
## [15] "summit_2018" "cfdna_msk_2019"
## [17] "blca_bcan_hcrn_2022" "nsclc_ctdx_msk_2022"
## [19] "thyroid_mskcc_2016" "skcm_mskcc_2014"
## [21] "tmb_mskcc_2018" "rectal_msk_2019"
## [23] "skcm_tcga_pub_2015" "msk_spectrum_tme_2022"
## [25] "ucec_ccr_cfdna_msk_2022" "paired_bladder_2022"
## [27] "mtnn_msk_2022" "pog570_bcgsc_2020"
## [29] "sarcoma_msk_2023" "bowel_colitis_msk_2022"
## [31] "luad_mskcc_2023_met_organotropism" "coad_silu_2022"
## [33] "paac_msk_jco_2023"
##
## $``n` must be a single number, not an integer `NA`.`
## [1] "msk_met_2021"
##
## $`Argument 1 must be a data frame or a named atomic vector.`
## [1] "makeanimpact_ccr_2023"
cBioDataPack()
Now let’s look at the errors in the packaged datasets that are used for
cBioDataPack
:
pack_errs <- system.file(
"extdata", "pack", "err_pack_info.rda",
package = "cBioPortalData", mustWork = TRUE
)
load(pack_errs)
We can do the same for this data:
length(err_pack_info)
## [1] 5
lengths(err_pack_info)
## more columns than column names
## 9
## Frequency of NA values higher than the cutoff tolerance
## 5
## non-character argument
## 2
## invalid class "ExperimentList" object: \n Non-unique names provided
## 2
## 'wget' call had nonzero exit status
## 11
We can get a list of all the errors present:
names(err_pack_info)
## [1] "more columns than column names"
## [2] "Frequency of NA values higher than the cutoff tolerance"
## [3] "non-character argument"
## [4] "invalid class \"ExperimentList\" object: \n Non-unique names provided"
## [5] "'wget' call had nonzero exit status"
And finally the full list of errors:
err_pack_info
## $`more columns than column names`
## [1] "ccrcc_utokyo_2013" "coadread_tcga_pan_can_atlas_2018"
## [3] "gbm_cptac_2021" "ov_tcga_pan_can_atlas_2018"
## [5] "pan_origimed_2020" "sarc_tcga_pan_can_atlas_2018"
## [7] "luad_mskimpact_2021" "mbl_dkfz_2017"
## [9] "brca_tcga_pan_can_atlas_2018"
##
## $`Frequency of NA values higher than the cutoff tolerance`
## [1] "ihch_mskcc_2020" "ihch_msk_2021"
## [3] "mixed_selpercatinib_2020" "mixed_msk_tcga_2021"
## [5] "ucec_ccr_msk_2022"
##
## $`non-character argument`
## [1] "mbn_mdacc_2013" "pcpg_tcga_pub"
##
## $`invalid class "ExperimentList" object: \n Non-unique names provided`
## [1] "stad_tcga_pub" "mpnst_mskcc"
##
## $`'wget' call had nonzero exit status`
## [1] "makeanimpact_ccr_2023" "prad_organoids_msk_2022"
## [3] "mtnn_msk_2022" "sarcoma_msk_2023"
## [5] "bowel_colitis_msk_2022" "bladder_mskcc_2022"
## [7] "paac_msk_jco_2023" "nbl_msk_2023"
## [9] "rms_msk_2023" "gist_msk_2023"
## [11] "egc_trap_ccr_msk_2023"
sessionInfo()
## R version 4.4.0 alpha (2024-03-27 r86216)
## Platform: aarch64-apple-darwin20
## Running under: macOS Ventura 13.6.5
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: America/New_York
## tzcode source: internal
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] survminer_0.4.9 ggpubr_0.6.0
## [3] ggplot2_3.5.0 survival_3.5-8
## [5] cBioPortalData_2.16.0 MultiAssayExperiment_1.30.0
## [7] SummarizedExperiment_1.34.0 Biobase_2.64.0
## [9] GenomicRanges_1.56.0 GenomeInfoDb_1.40.0
## [11] IRanges_2.38.0 S4Vectors_0.42.0
## [13] BiocGenerics_0.50.0 MatrixGenerics_1.16.0
## [15] matrixStats_1.2.0 AnVIL_1.16.0
## [17] dplyr_1.1.4 BiocStyle_2.32.0
##
## loaded via a namespace (and not attached):
## [1] jsonlite_1.8.8 magrittr_2.0.3
## [3] magick_2.8.3 GenomicFeatures_1.56.0
## [5] farver_2.1.1 rmarkdown_2.26
## [7] BiocIO_1.14.0 zlibbioc_1.50.0
## [9] vctrs_0.6.5 memoise_2.0.1
## [11] Rsamtools_2.20.0 RCurl_1.98-1.14
## [13] rstatix_0.7.2 htmltools_0.5.8
## [15] S4Arrays_1.4.0 BiocBaseUtils_1.6.0
## [17] lambda.r_1.2.4 curl_5.2.1
## [19] broom_1.0.5 SparseArray_1.4.0
## [21] sass_0.4.9 bslib_0.6.2
## [23] htmlwidgets_1.6.4 zoo_1.8-12
## [25] futile.options_1.0.1 cachem_1.0.8
## [27] commonmark_1.9.1 GenomicAlignments_1.40.0
## [29] mime_0.12 lifecycle_1.0.4
## [31] pkgconfig_2.0.3 Matrix_1.7-0
## [33] R6_2.5.1 fastmap_1.1.1
## [35] GenomeInfoDbData_1.2.12 shiny_1.8.1
## [37] digest_0.6.35 colorspace_2.1-0
## [39] RaggedExperiment_1.28.0 AnnotationDbi_1.66.0
## [41] ps_1.7.6 RSQLite_2.3.5
## [43] labeling_0.4.3 filelock_1.0.3
## [45] RTCGAToolbox_2.34.0 km.ci_0.5-6
## [47] fansi_1.0.6 RJSONIO_1.3-1.9
## [49] httr_1.4.7 abind_1.4-5
## [51] compiler_4.4.0 bit64_4.0.5
## [53] withr_3.0.0 backports_1.4.1
## [55] BiocParallel_1.38.0 carData_3.0-5
## [57] DBI_1.2.2 highr_0.10
## [59] ggsignif_0.6.4 rappdirs_0.3.3
## [61] DelayedArray_0.30.0 rjson_0.2.21
## [63] tools_4.4.0 chromote_0.2.0
## [65] httpuv_1.6.15 glue_1.7.0
## [67] restfulr_0.0.15 promises_1.2.1
## [69] gridtext_0.1.5 grid_4.4.0
## [71] generics_0.1.3 gtable_0.3.4
## [73] KMsurv_0.1-5 tzdb_0.4.0
## [75] tidyr_1.3.1 websocket_1.4.1
## [77] data.table_1.15.4 hms_1.1.3
## [79] car_3.1-2 xml2_1.3.6
## [81] utf8_1.2.4 XVector_0.44.0
## [83] markdown_1.12 pillar_1.9.0
## [85] stringr_1.5.1 later_1.3.2
## [87] splines_4.4.0 ggtext_0.1.2
## [89] BiocFileCache_2.12.0 lattice_0.22-6
## [91] rtracklayer_1.64.0 bit_4.0.5
## [93] tidyselect_1.2.1 Biostrings_2.72.0
## [95] miniUI_0.1.1.1 knitr_1.45
## [97] gridExtra_2.3 bookdown_0.38
## [99] futile.logger_1.4.3 xfun_0.43
## [101] DT_0.32 stringi_1.8.3
## [103] UCSC.utils_1.0.0 yaml_2.3.8
## [105] evaluate_0.23 codetools_0.2-19
## [107] tibble_3.2.1 BiocManager_1.30.22
## [109] cli_3.6.2 xtable_1.8-4
## [111] munsell_0.5.0 processx_3.8.4
## [113] jquerylib_0.1.4 survMisc_0.5.6
## [115] Rcpp_1.0.12 GenomicDataCommons_1.28.0
## [117] dbplyr_2.5.0 png_0.1-8
## [119] XML_3.99-0.16.1 rapiclient_0.1.3
## [121] parallel_4.4.0 TCGAutils_1.24.0
## [123] readr_2.1.5 blob_1.2.4
## [125] bitops_1.0-7 scales_1.3.0
## [127] purrr_1.0.2 crayon_1.5.2
## [129] rlang_1.1.3 KEGGREST_1.44.0
## [131] rvest_1.0.4 formatR_1.14