This vignette aims to help developers migrate from the now defunct cgdsr
CRAN package. Note that the cgdsr
package code is shown for comparison but it
is not guaranteed to work. If you have questions regarding the contents,
please create an issue at the GitHub repository:
https://github.com/waldronlab/cBioPortalData/issues
library(cBioPortalData)
library(cgdsr)
cgds <- CGDS("http://www.cbioportal.org/")
getCancerStudies.CGDS(cgds)
Here we show the default inputs to the cBioPortal function for clarity.
cbio <- cBioPortal(
hostname = "www.cbioportal.org",
protocol = "https",
api. = "/api/api-docs"
)
getStudies(cbio)
## # A tibble: 342 × 13
## name description publicStudy pmid citation groups status importDate
## <chr> <chr> <lgl> <chr> <chr> <chr> <int> <chr>
## 1 Adenoid Cyst… "Whole exo… TRUE 2609… Martelo… "ACYC… 0 2022-03-0…
## 2 Adenoid Cyst… "Multi-Ins… TRUE 3148… Allen e… "ACYC… 0 2022-03-0…
## 3 Adrenocortic… "TCGA Adre… TRUE <NA> <NA> "PUBL… 0 2022-03-0…
## 4 Bladder Canc… "Whole exo… TRUE 2690… Al-Ahma… "" 0 2022-03-0…
## 5 Basal Cell C… "Whole-exo… TRUE 2695… Bonilla… "PUBL… 0 2022-03-0…
## 6 Acute Lympho… "Comprehen… TRUE 2573… Anderss… "PUBL… 0 2022-03-0…
## 7 Ampullary Ca… "Exome seq… TRUE 2680… Gingras… "PUBL… 0 2022-03-0…
## 8 Bladder Urot… "Whole exo… TRUE 2509… Van All… "PUBL… 0 2022-03-0…
## 9 Bladder Canc… "Comprehen… TRUE 2389… Iyer et… "PUBL… 0 2022-03-0…
## 10 Bladder Urot… "Whole-exo… TRUE 2412… Guo et … "PUBL… 0 2022-03-0…
## # … with 332 more rows, and 5 more variables: allSampleCount <int>,
## # readPermission <lgl>, studyId <chr>, cancerTypeId <chr>,
## # referenceGenome <chr>
Note that the studyId
column is important for further queries.
head(getStudies(cbio)[["studyId"]])
## [1] "acbc_mskcc_2015" "acc_2019"
## [3] "acc_tcga" "blca_plasmacytoid_mskcc_2016"
## [5] "bcc_unige_2016" "all_stjude_2015"
The case_list_id
in cgds
and obtain the clinical data with the first
case list identifier (gbm_tcga_pub_all
in this example).
clist1 <-
getCaseLists.CGDS(cgds, cancerStudy = "gbm_tcga_pub")[1, "case_list_id"]
getClinicalData.CGDS(cgds, clist1)
For the case list identifiers, you can use sampleLists
and inspect the
sampleListId
column. Here we take the first value as in the example above.
clist1 <- sampleLists(cbio, "gbm_tcga_pub")[1, "sampleListId", drop = TRUE]
clist1
## [1] "gbm_tcga_pub_all"
Note that a sample list ID is not required when using the
fetchAllClinicalDataInStudyUsingPOST
internal endpoint. Data for all
patients can be obtained using the clinicalData
function.
clinicalData(cbio, "gbm_tcga_pub")
## # A tibble: 206 × 24
## patientId DFS_MONTHS DFS_STATUS KARNOFSKY_PERFORMANC… OS_MONTHS OS_STATUS
## <chr> <chr> <chr> <chr> <chr> <chr>
## 1 TCGA-02-0001 4.504109589 1:Recurred 80.0 11.60547… 1:DECEAS…
## 2 TCGA-02-0003 1.315068493 1:Recurred 100.0 4.734246… 1:DECEAS…
## 3 TCGA-02-0004 10.32328767 1:Recurred 80.0 11.34246… 1:DECEAS…
## 4 TCGA-02-0006 9.928767123 1:Recurred 80.0 18.34520… 1:DECEAS…
## 5 TCGA-02-0007 17.03013699 1:Recurred 80.0 23.17808… 1:DECEAS…
## 6 TCGA-02-0009 8.679452055 1:Recurred 80.0 10.58630… 1:DECEAS…
## 7 TCGA-02-0010 11.53972603 1:Recurred 80.0 35.40821… 1:DECEAS…
## 8 TCGA-02-0011 4.734246575 1:Recurred 80.0 20.71232… 1:DECEAS…
## 9 TCGA-02-0014 <NA> <NA> 100.0 82.55342… 1:DECEAS…
## 10 TCGA-02-0015 14.99178082 1:Recurred 80.0 20.61369… 1:DECEAS…
## # … with 196 more rows, and 18 more variables: PRETREATMENT_HISTORY <chr>,
## # PRIOR_GLIOMA <chr>, SAMPLE_COUNT <chr>, SEX <chr>, sampleId <chr>,
## # ACGH_DATA <chr>, CANCER_TYPE <chr>, CANCER_TYPE_DETAILED <chr>,
## # COMPLETE_DATA <chr>, FRACTION_GENOME_ALTERED <chr>, MRNA_DATA <chr>,
## # MUTATION_COUNT <chr>, ONCOTREE_CODE <chr>, SAMPLE_TYPE <chr>,
## # SEQUENCED <chr>, SOMATIC_STATUS <chr>, TMB_NONSYNONYMOUS <chr>,
## # TREATMENT_STATUS <chr>
But you can still use a different endpoint to obtain data for a single sample:
samplist <- samplesInSampleLists(cbio, clist1)
onesample <- samplist[["gbm_tcga_pub_all"]][1]
onesample
## [1] "TCGA-02-0001-01"
cbio$getAllClinicalDataOfSampleInStudyUsingGET(
sampleId = onesample, studyId = "gbm_tcga_pub"
)
## Response [https://www.cbioportal.org/api/studies/gbm_tcga_pub/samples/TCGA-02-0001-01/clinical-data]
## Date: 2022-04-26 20:13
## Status: 200
## Content-Type: application/json
## Size: 3.33 kB
There may be other endpoints that you could look into:
searchOps(cbio, "clinical")
## [1] "getAllClinicalAttributesUsingGET"
## [2] "fetchClinicalAttributesUsingPOST"
## [3] "fetchClinicalDataUsingPOST"
## [4] "getAllClinicalAttributesInStudyUsingGET"
## [5] "getClinicalAttributeInStudyUsingGET"
## [6] "getAllClinicalDataInStudyUsingGET"
## [7] "fetchAllClinicalDataInStudyUsingPOST"
## [8] "getAllClinicalDataOfPatientInStudyUsingGET"
## [9] "getAllClinicalDataOfSampleInStudyUsingGET"
getGeneticProfiles.CGDS(cgds, cancerStudy = "gbm_tcga_pub")
molecularProfiles(cbio, "gbm_tcga_pub")
## # A tibble: 10 × 8
## molecularAlteration… datatype name description showProfileInAn… patientLevel
## <chr> <chr> <chr> <chr> <lgl> <lgl>
## 1 COPY_NUMBER_ALTERAT… DISCRETE Puta… Putative c… TRUE FALSE
## 2 COPY_NUMBER_ALTERAT… DISCRETE Puta… Putative c… TRUE FALSE
## 3 MUTATION_EXTENDED MAF Muta… Mutation d… TRUE FALSE
## 4 METHYLATION CONTINU… Meth… Methylatio… FALSE FALSE
## 5 MRNA_EXPRESSION CONTINU… mRNA… mRNA expre… FALSE FALSE
## 6 MRNA_EXPRESSION Z-SCORE mRNA… 18,698 gen… TRUE FALSE
## 7 MRNA_EXPRESSION Z-SCORE mRNA… Log-transf… TRUE FALSE
## 8 MRNA_EXPRESSION CONTINU… micr… expression… FALSE FALSE
## 9 MRNA_EXPRESSION Z-SCORE micr… microRNA e… FALSE FALSE
## 10 MRNA_EXPRESSION Z-SCORE mRNA… mRNA and m… TRUE FALSE
## # … with 2 more variables: molecularProfileId <chr>, studyId <chr>
Note that we want to pull the molecularProfileId
column to use in other
queries.
getProfileData.CGDS(x = cgds,
genes = c("NF1", "TP53", "ABL1"),
geneticProfiles = "gbm_tcga_pub_mrna",
caseList = "gbm_tcga_pub_all"
)
Currently, some conversion is needed to directly use the molecularData
function, if you only have Hugo symbols. First, convert to Entrez gene IDs
and then obtain all the samples in the sample list of interest.
genetab <- queryGeneTable(cbio,
by = "hugoGeneSymbol",
genes = c("NF1", "TP53", "ABL1")
)
genetab
## # A tibble: 3 × 3
## entrezGeneId hugoGeneSymbol type
## <int> <chr> <chr>
## 1 4763 NF1 protein-coding
## 2 25 ABL1 protein-coding
## 3 7157 TP53 protein-coding
entrez <- genetab[["entrezGeneId"]]
allsamps <- samplesInSampleLists(cbio, "gbm_tcga_pub_all")
molecularData(cbio, "gbm_tcga_pub_mrna",
entrezGeneIds = entrez, sampleIds = unlist(allsamps))
## $gbm_tcga_pub_mrna
## # A tibble: 618 × 8
## uniqueSampleKey uniquePatientKey entrezGeneId molecularProfil… sampleId
## <chr> <chr> <int> <chr> <chr>
## 1 VENHQS0wMi0wMDAxLTAx… VENHQS0wMi0wMDA… 25 gbm_tcga_pub_mr… TCGA-02…
## 2 VENHQS0wMi0wMDAxLTAx… VENHQS0wMi0wMDA… 4763 gbm_tcga_pub_mr… TCGA-02…
## 3 VENHQS0wMi0wMDAxLTAx… VENHQS0wMi0wMDA… 7157 gbm_tcga_pub_mr… TCGA-02…
## 4 VENHQS0wMi0wMDAzLTAx… VENHQS0wMi0wMDA… 25 gbm_tcga_pub_mr… TCGA-02…
## 5 VENHQS0wMi0wMDAzLTAx… VENHQS0wMi0wMDA… 4763 gbm_tcga_pub_mr… TCGA-02…
## 6 VENHQS0wMi0wMDAzLTAx… VENHQS0wMi0wMDA… 7157 gbm_tcga_pub_mr… TCGA-02…
## 7 VENHQS0wMi0wMDA0LTAx… VENHQS0wMi0wMDA… 25 gbm_tcga_pub_mr… TCGA-02…
## 8 VENHQS0wMi0wMDA0LTAx… VENHQS0wMi0wMDA… 4763 gbm_tcga_pub_mr… TCGA-02…
## 9 VENHQS0wMi0wMDA0LTAx… VENHQS0wMi0wMDA… 7157 gbm_tcga_pub_mr… TCGA-02…
## 10 VENHQS0wMi0wMDA2LTAx… VENHQS0wMi0wMDA… 25 gbm_tcga_pub_mr… TCGA-02…
## # … with 608 more rows, and 3 more variables: patientId <chr>, studyId <chr>,
## # value <dbl>
Note that this can also be done by using the getDataByGenes
function, which
automatically figures out all the sample identifiers in the study:
getDataByGenes(
api = cbio,
studyId = "gbm_tcga_pub",
genes = c("NF1", "TP53", "ABL1"),
by = "hugoGeneSymbol",
molecularProfileIds = "gbm_tcga_pub_mrna"
)
## $gbm_tcga_pub_mrna
## # A tibble: 618 × 10
## uniqueSampleKey uniquePatientKey entrezGeneId molecularProfil… sampleId
## <chr> <chr> <int> <chr> <chr>
## 1 VENHQS0wMi0wMDAxLTAx… VENHQS0wMi0wMDA… 25 gbm_tcga_pub_mr… TCGA-02…
## 2 VENHQS0wMi0wMDAxLTAx… VENHQS0wMi0wMDA… 4763 gbm_tcga_pub_mr… TCGA-02…
## 3 VENHQS0wMi0wMDAxLTAx… VENHQS0wMi0wMDA… 7157 gbm_tcga_pub_mr… TCGA-02…
## 4 VENHQS0wMi0wMDAzLTAx… VENHQS0wMi0wMDA… 25 gbm_tcga_pub_mr… TCGA-02…
## 5 VENHQS0wMi0wMDAzLTAx… VENHQS0wMi0wMDA… 4763 gbm_tcga_pub_mr… TCGA-02…
## 6 VENHQS0wMi0wMDAzLTAx… VENHQS0wMi0wMDA… 7157 gbm_tcga_pub_mr… TCGA-02…
## 7 VENHQS0wMi0wMDA0LTAx… VENHQS0wMi0wMDA… 25 gbm_tcga_pub_mr… TCGA-02…
## 8 VENHQS0wMi0wMDA0LTAx… VENHQS0wMi0wMDA… 4763 gbm_tcga_pub_mr… TCGA-02…
## 9 VENHQS0wMi0wMDA0LTAx… VENHQS0wMi0wMDA… 7157 gbm_tcga_pub_mr… TCGA-02…
## 10 VENHQS0wMi0wMDA2LTAx… VENHQS0wMi0wMDA… 25 gbm_tcga_pub_mr… TCGA-02…
## # … with 608 more rows, and 5 more variables: patientId <chr>, studyId <chr>,
## # value <dbl>, hugoGeneSymbol <chr>, type <chr>
getMutationData.CGDS(
x = cgds,
caseList = "getMutationData",
geneticProfile = "gbm_tcga_pub_mutations",
genes = c("NF1", "TP53", "ABL1")
)
Similar to molecularData
, mutation data can be obtained with the
mutationData
function or the getDataByGenes
function.
mutationData(
api = cbio,
molecularProfileIds = "gbm_tcga_pub_mutations",
entrezGeneIds = entrez,
sampleIds = unlist(allsamps)
)
## $gbm_tcga_pub_mutations
## # A tibble: 57 × 28
## uniqueSampleKey uniquePatientKey molecularProfil… sampleId patientId
## <chr> <chr> <chr> <chr> <chr>
## 1 VENHQS0wMi0wMDAxLTAxOmd… VENHQS0wMi0wMDA… gbm_tcga_pub_mu… TCGA-02… TCGA-02-…
## 2 VENHQS0wMi0wMDAxLTAxOmd… VENHQS0wMi0wMDA… gbm_tcga_pub_mu… TCGA-02… TCGA-02-…
## 3 VENHQS0wMi0wMDAzLTAxOmd… VENHQS0wMi0wMDA… gbm_tcga_pub_mu… TCGA-02… TCGA-02-…
## 4 VENHQS0wMi0wMDAzLTAxOmd… VENHQS0wMi0wMDA… gbm_tcga_pub_mu… TCGA-02… TCGA-02-…
## 5 VENHQS0wMi0wMDEwLTAxOmd… VENHQS0wMi0wMDE… gbm_tcga_pub_mu… TCGA-02… TCGA-02-…
## 6 VENHQS0wMi0wMDEwLTAxOmd… VENHQS0wMi0wMDE… gbm_tcga_pub_mu… TCGA-02… TCGA-02-…
## 7 VENHQS0wMi0wMDEwLTAxOmd… VENHQS0wMi0wMDE… gbm_tcga_pub_mu… TCGA-02… TCGA-02-…
## 8 VENHQS0wMi0wMDExLTAxOmd… VENHQS0wMi0wMDE… gbm_tcga_pub_mu… TCGA-02… TCGA-02-…
## 9 VENHQS0wMi0wMDE0LTAxOmd… VENHQS0wMi0wMDE… gbm_tcga_pub_mu… TCGA-02… TCGA-02-…
## 10 VENHQS0wMi0wMDI0LTAxOmd… VENHQS0wMi0wMDI… gbm_tcga_pub_mu… TCGA-02… TCGA-02-…
## # … with 47 more rows, and 23 more variables: entrezGeneId <int>,
## # studyId <chr>, center <chr>, mutationStatus <chr>, validationStatus <chr>,
## # startPosition <int>, endPosition <int>, referenceAllele <chr>,
## # proteinChange <chr>, mutationType <chr>, functionalImpactScore <chr>,
## # fisValue <dbl>, linkXvar <chr>, linkPdb <chr>, linkMsa <chr>,
## # ncbiBuild <chr>, variantType <chr>, keyword <chr>, chr <chr>,
## # variantAllele <chr>, refseqMrnaId <chr>, proteinPosStart <int>, …
getDataByGenes(
api = cbio,
studyId = "gbm_tcga_pub",
genes = c("NF1", "TP53", "ABL1"),
by = "hugoGeneSymbol",
molecularProfileIds = "gbm_tcga_pub_mutations"
)
## $gbm_tcga_pub_mutations
## # A tibble: 57 × 30
## uniqueSampleKey uniquePatientKey molecularProfil… sampleId patientId
## <chr> <chr> <chr> <chr> <chr>
## 1 VENHQS0wMi0wMDAxLTAxOmd… VENHQS0wMi0wMDA… gbm_tcga_pub_mu… TCGA-02… TCGA-02-…
## 2 VENHQS0wMi0wMDAxLTAxOmd… VENHQS0wMi0wMDA… gbm_tcga_pub_mu… TCGA-02… TCGA-02-…
## 3 VENHQS0wMi0wMDAzLTAxOmd… VENHQS0wMi0wMDA… gbm_tcga_pub_mu… TCGA-02… TCGA-02-…
## 4 VENHQS0wMi0wMDAzLTAxOmd… VENHQS0wMi0wMDA… gbm_tcga_pub_mu… TCGA-02… TCGA-02-…
## 5 VENHQS0wMi0wMDEwLTAxOmd… VENHQS0wMi0wMDE… gbm_tcga_pub_mu… TCGA-02… TCGA-02-…
## 6 VENHQS0wMi0wMDEwLTAxOmd… VENHQS0wMi0wMDE… gbm_tcga_pub_mu… TCGA-02… TCGA-02-…
## 7 VENHQS0wMi0wMDEwLTAxOmd… VENHQS0wMi0wMDE… gbm_tcga_pub_mu… TCGA-02… TCGA-02-…
## 8 VENHQS0wMi0wMDExLTAxOmd… VENHQS0wMi0wMDE… gbm_tcga_pub_mu… TCGA-02… TCGA-02-…
## 9 VENHQS0wMi0wMDE0LTAxOmd… VENHQS0wMi0wMDE… gbm_tcga_pub_mu… TCGA-02… TCGA-02-…
## 10 VENHQS0wMi0wMDI0LTAxOmd… VENHQS0wMi0wMDI… gbm_tcga_pub_mu… TCGA-02… TCGA-02-…
## # … with 47 more rows, and 25 more variables: entrezGeneId <int>,
## # studyId <chr>, center <chr>, mutationStatus <chr>, validationStatus <chr>,
## # startPosition <int>, endPosition <int>, referenceAllele <chr>,
## # proteinChange <chr>, mutationType <chr>, functionalImpactScore <chr>,
## # fisValue <dbl>, linkXvar <chr>, linkPdb <chr>, linkMsa <chr>,
## # ncbiBuild <chr>, variantType <chr>, keyword <chr>, chr <chr>,
## # variantAllele <chr>, refseqMrnaId <chr>, proteinPosStart <int>, …
It is important to note that end users who wish to obtain the data as
easily as possible should use the main cBioPortalData
function:
gbm_pub <- cBioPortalData(
api = cbio,
studyId = "gbm_tcga_pub",
genes = c("NF1", "TP53", "ABL1"), by = "hugoGeneSymbol",
molecularProfileIds = "gbm_tcga_pub_mrna"
)
assay(gbm_pub[["gbm_tcga_pub_mrna"]])[, 1:4]
## TCGA-02-0001-01 TCGA-02-0003-01 TCGA-02-0004-01 TCGA-02-0006-01
## ABL1 -0.1744878 -0.177096729 -0.08782114 -0.1733767
## NF1 -0.2966920 -0.001066810 -0.23626512 -0.1691507
## TP53 0.6213171 0.006435625 -0.30507285 0.3967758