Contents

Introduction

This vignette aims to help developers migrate from the now defunct cgdsr CRAN package. Note that the cgdsr package code is shown for comparison but it is not guaranteed to work. If you have questions regarding the contents, please create an issue at the GitHub repository: https://github.com/waldronlab/cBioPortalData/issues

Loading the package

library(cBioPortalData)

Discovering studies

cBioPortalData setup

Here we show the default inputs to the cBioPortal function for clarity.

cbio <- cBioPortal(
    hostname = "www.cbioportal.org",
    protocol = "https",
    api. = "/api/api-docs"
)
getStudies(cbio)
FALSE # A tibble: 365 × 13
FALSE    name    descr…¹ publi…² groups status impor…³ allSa…⁴ readP…⁵ studyId cance…⁶
FALSE    <chr>   <chr>   <lgl>   <chr>   <int> <chr>     <int> <lgl>   <chr>   <chr>  
FALSE  1 Adreno… "TCGA … TRUE    "PUBL…      0 2022-1…      92 TRUE    acc_tc… acc    
FALSE  2 Acute … "Compr… TRUE    "PUBL…      0 2022-1…      93 TRUE    all_st… bll    
FALSE  3 Hypodi… "Whole… TRUE    ""          0 2022-1…      44 TRUE    all_st… myeloid
FALSE  4 Adenoi… "Whole… TRUE    "ACYC…      0 2022-1…      12 TRUE    acbc_m… acbc   
FALSE  5 Adenoi… "Targe… TRUE    "ACYC…      0 2022-1…      28 TRUE    acyc_f… acyc   
FALSE  6 Adenoi… "Whole… TRUE    "ACYC…      0 2022-1…      25 TRUE    acyc_j… acyc   
FALSE  7 Adenoi… "WGS o… TRUE    "ACYC…      0 2022-1…     102 TRUE    acyc_m… acyc   
FALSE  8 Adenoi… "Whole… TRUE    "ACYC"      0 2022-1…      10 TRUE    acyc_m… acyc   
FALSE  9 Adenoi… "Whole… TRUE    "ACYC…      0 2022-1…      24 TRUE    acyc_s… acyc   
FALSE 10 Acute … "Whole… TRUE    "PUBL…      0 2022-1…      73 TRUE    all_st… bll    
FALSE # … with 355 more rows, 3 more variables: referenceGenome <chr>, pmid <chr>,
FALSE #   citation <chr>, and abbreviated variable names ¹​description, ²​publicStudy,
FALSE #   ³​importDate, ⁴​allSampleCount, ⁵​readPermission, ⁶​cancerTypeId

Note that the studyId column is important for further queries.

head(getStudies(cbio)[["studyId"]])
## [1] "acc_tcga"        "all_stjude_2015" "all_stjude_2013" "acbc_mskcc_2015"
## [5] "acyc_fmi_2014"   "acyc_jhu_2016"

cgdsr setup

library(cgdsr)
cgds <- CGDS("http://www.cbioportal.org/")
getCancerStudies.CGDS(cgds)

Obtaining Cases

cBioPortalData (Cases)

Notes

  • Each patient is identified by a patientId.
  • sampleListId identifies groups of patientId based on profile type
  • The sampleLists function uses studyId input to return sampleListId

sampleLists

For the sample list identifiers, you can use sampleLists and inspect the sampleListId column.

samps <- sampleLists(cbio, "gbm_tcga_pub")
samps[, c("category", "name", "sampleListId")]
## # A tibble: 15 × 3
##    category                             name                             sampl…¹
##    <chr>                                <chr>                            <chr>  
##  1 all_cases_in_study                   All samples                      gbm_tc…
##  2 other                                Expression Cluster Classical     gbm_tc…
##  3 all_cases_with_cna_data              Samples with CNA data            gbm_tc…
##  4 all_cases_with_mutation_and_cna_data Samples with mutation and CNA d… gbm_tc…
##  5 all_cases_with_mrna_array_data       Samples with mRNA data (Agilent… gbm_tc…
##  6 other                                Expression Cluster Mesenchymal   gbm_tc…
##  7 all_cases_with_methylation_data      Samples with methylation data    gbm_tc…
##  8 all_cases_with_methylation_data      Samples with methylation data (… gbm_tc…
##  9 all_cases_with_microrna_data         Samples with microRNA data (mic… gbm_tc…
## 10 other                                Expression Cluster Neural        gbm_tc…
## 11 other                                Expression Cluster Proneural     gbm_tc…
## 12 other                                Sequenced, No Hypermutators      gbm_tc…
## 13 other                                Sequenced, Not Treated           gbm_tc…
## 14 other                                Sequenced, Treated               gbm_tc…
## 15 all_cases_with_mutation_data         Samples with mutation data       gbm_tc…
## # … with abbreviated variable name ¹​sampleListId

samples from sampleLists

It is possible to get case_ids directly when using the samplesInSampleLists function. The function handles multiple sampleList identifiers.

samplesInSampleLists(
    api = cbio,
    sampleListIds = c(
        "gbm_tcga_pub_expr_classical", "gbm_tcga_pub_expr_mesenchymal"
    )
)
## CharacterList of length 2
## [["gbm_tcga_pub_expr_classical"]] TCGA-02-0001-01 ... TCGA-12-0615-01
## [["gbm_tcga_pub_expr_mesenchymal"]] TCGA-02-0004-01 ... TCGA-12-0620-01

getSampleInfo

To get more information about patients, we can query with getSampleInfo function.

getSampleInfo(api = cbio,  studyId = "gbm_tcga_pub", projection = "SUMMARY")
## # A tibble: 206 × 6
##    uniqueSampleKey                       uniqu…¹ sampl…² sampl…³ patie…⁴ studyId
##    <chr>                                 <chr>   <chr>   <chr>   <chr>   <chr>  
##  1 VENHQS0wMi0wMDAxLTAxOmdibV90Y2dhX3B1… VENHQS… Primar… TCGA-0… TCGA-0… gbm_tc…
##  2 VENHQS0wMi0wMDAzLTAxOmdibV90Y2dhX3B1… VENHQS… Primar… TCGA-0… TCGA-0… gbm_tc…
##  3 VENHQS0wMi0wMDA0LTAxOmdibV90Y2dhX3B1… VENHQS… Primar… TCGA-0… TCGA-0… gbm_tc…
##  4 VENHQS0wMi0wMDA2LTAxOmdibV90Y2dhX3B1… VENHQS… Primar… TCGA-0… TCGA-0… gbm_tc…
##  5 VENHQS0wMi0wMDA3LTAxOmdibV90Y2dhX3B1… VENHQS… Primar… TCGA-0… TCGA-0… gbm_tc…
##  6 VENHQS0wMi0wMDA5LTAxOmdibV90Y2dhX3B1… VENHQS… Primar… TCGA-0… TCGA-0… gbm_tc…
##  7 VENHQS0wMi0wMDEwLTAxOmdibV90Y2dhX3B1… VENHQS… Primar… TCGA-0… TCGA-0… gbm_tc…
##  8 VENHQS0wMi0wMDExLTAxOmdibV90Y2dhX3B1… VENHQS… Primar… TCGA-0… TCGA-0… gbm_tc…
##  9 VENHQS0wMi0wMDE0LTAxOmdibV90Y2dhX3B1… VENHQS… Primar… TCGA-0… TCGA-0… gbm_tc…
## 10 VENHQS0wMi0wMDE1LTAxOmdibV90Y2dhX3B1… VENHQS… Primar… TCGA-0… TCGA-0… gbm_tc…
## # … with 196 more rows, and abbreviated variable names ¹​uniquePatientKey,
## #   ²​sampleType, ³​sampleId, ⁴​patientId

cgdsr (Cases)

Notes

  • ‘Cases’ and ‘Patients’ are synonymous.
  • Each patient is identified by a case_id.
  • Queries only handle a single cancerStudy identifier
  • The case_list_description describes the assays

getCaseLists and getClinicalData

We obtain the first case_list_id in the cgds object from above and the corresponding clinical data for that case list (gbm_tcga_pub_all as the case list in this example).

clist1 <-
    getCaseLists.CGDS(cgds, cancerStudy = "gbm_tcga_pub")[1, "case_list_id"]

getClinicalData.CGDS(cgds, clist1)

Obtaining Clinical Data

cBioPortalData (Clinical)

All clinical data

Note that a sampleListId is not required when using the fetchAllClinicalDataInStudyUsingPOST internal endpoint. Data for all patients can be obtained using the clinicalData function.

clinicalData(cbio, "gbm_tcga_pub")
## # A tibble: 206 × 24
##    patie…¹ DFS_M…² DFS_S…³ KARNO…⁴ OS_MO…⁵ OS_ST…⁶ PRETR…⁷ PRIOR…⁸ SAMPL…⁹ SEX  
##    <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>
##  1 TCGA-0… 4.5041… 1:Recu… 80.0    11.605… 1:DECE… YES     NO      1       Fema…
##  2 TCGA-0… 1.3150… 1:Recu… 100.0   4.7342… 1:DECE… NO      NO      1       Male 
##  3 TCGA-0… 10.323… 1:Recu… 80.0    11.342… 1:DECE… NO      NO      1       Male 
##  4 TCGA-0… 9.9287… 1:Recu… 80.0    18.345… 1:DECE… NO      NO      1       Fema…
##  5 TCGA-0… 17.030… 1:Recu… 80.0    23.178… 1:DECE… YES     NO      1       Fema…
##  6 TCGA-0… 8.6794… 1:Recu… 80.0    10.586… 1:DECE… NO      NO      1       Fema…
##  7 TCGA-0… 11.539… 1:Recu… 80.0    35.408… 1:DECE… YES     NO      1       Fema…
##  8 TCGA-0… 4.7342… 1:Recu… 80.0    20.712… 1:DECE… NO      NO      1       Fema…
##  9 TCGA-0… <NA>    <NA>    100.0   82.553… 1:DECE… NO      NO      1       Male 
## 10 TCGA-0… 14.991… 1:Recu… 80.0    20.613… 1:DECE… NO      NO      1       Male 
## # … with 196 more rows, 14 more variables: sampleId <chr>, ACGH_DATA <chr>,
## #   CANCER_TYPE <chr>, CANCER_TYPE_DETAILED <chr>, COMPLETE_DATA <chr>,
## #   FRACTION_GENOME_ALTERED <chr>, MRNA_DATA <chr>, MUTATION_COUNT <chr>,
## #   ONCOTREE_CODE <chr>, SAMPLE_TYPE <chr>, SEQUENCED <chr>,
## #   SOMATIC_STATUS <chr>, TMB_NONSYNONYMOUS <chr>, TREATMENT_STATUS <chr>, and
## #   abbreviated variable names ¹​patientId, ²​DFS_MONTHS, ³​DFS_STATUS,
## #   ⁴​KARNOFSKY_PERFORMANCE_SCORE, ⁵​OS_MONTHS, ⁶​OS_STATUS, …

By sample data

You can use a different endpoint to obtain data for a single sample. First, obtain a single sampleId with the samplesInSampleLists function.

clist1 <- "gbm_tcga_pub_all"
samplist <- samplesInSampleLists(cbio, clist1)
onesample <- samplist[["gbm_tcga_pub_all"]][1]
onesample
## [1] "TCGA-02-0001-01"

Then we use the API endpoint to retrieve the data. Note that you would run httr::content on the output to extract the data.

cbio$getAllClinicalDataOfSampleInStudyUsingGET(
    sampleId = onesample, studyId = "gbm_tcga_pub"
)
## Response [https://www.cbioportal.org/api/studies/gbm_tcga_pub/samples/TCGA-02-0001-01/clinical-data]
##   Date: 2023-01-04 21:24
##   Status: 200
##   Content-Type: application/json
##   Size: 3.31 kB

cgdsr (Clinical)

Notes

  • getClinicalData uses case_list_id as input without specifying the study_id as case list identifiers are unique to each study.

getClinicalData

We query clinical data for the gbm_tcga_pub_expr_classical case list identifier which is part of the gbm_tcga_pub study.

getClinicalData.CGDS(x = cgds,
    caseList = "gbm_tcga_pub_expr_classical"
)

Clinical Data Summary

cgdsr allows you to obtain clinical data for a case list subset (54 cases with gbm_tcga_pub_expr_classical) and cBioPortalData provides clinical data for all 206 samples in gbm_tcga_pub using the clinicalData function.

You may be interested in other clinical data endpoints. For a list, use the searchOps function.

searchOps(cbio, "clinical")
## [1] "getAllClinicalAttributesUsingGET"          
## [2] "fetchClinicalAttributesUsingPOST"          
## [3] "fetchClinicalDataUsingPOST"                
## [4] "getAllClinicalAttributesInStudyUsingGET"   
## [5] "getClinicalAttributeInStudyUsingGET"       
## [6] "getAllClinicalDataInStudyUsingGET"         
## [7] "fetchAllClinicalDataInStudyUsingPOST"      
## [8] "getAllClinicalDataOfPatientInStudyUsingGET"
## [9] "getAllClinicalDataOfSampleInStudyUsingGET"

Molecular or Genetic Profiles

cBioPortalData (molecularProfiles)

molecularProfiles(api = cbio, studyId = "gbm_tcga_pub")
## # A tibble: 10 × 8
##    molecularAlterationType datat…¹ name  descr…² showP…³ patie…⁴ molec…⁵ studyId
##    <chr>                   <chr>   <chr> <chr>   <lgl>   <lgl>   <chr>   <chr>  
##  1 COPY_NUMBER_ALTERATION  DISCRE… Puta… Putati… TRUE    FALSE   gbm_tc… gbm_tc…
##  2 COPY_NUMBER_ALTERATION  DISCRE… Puta… Putati… TRUE    FALSE   gbm_tc… gbm_tc…
##  3 MUTATION_EXTENDED       MAF     Muta… Mutati… TRUE    FALSE   gbm_tc… gbm_tc…
##  4 METHYLATION             CONTIN… Meth… Methyl… FALSE   FALSE   gbm_tc… gbm_tc…
##  5 MRNA_EXPRESSION         CONTIN… mRNA… mRNA e… FALSE   FALSE   gbm_tc… gbm_tc…
##  6 MRNA_EXPRESSION         Z-SCORE mRNA… 18,698… TRUE    FALSE   gbm_tc… gbm_tc…
##  7 MRNA_EXPRESSION         Z-SCORE mRNA… Log-tr… TRUE    FALSE   gbm_tc… gbm_tc…
##  8 MRNA_EXPRESSION         CONTIN… micr… expres… FALSE   FALSE   gbm_tc… gbm_tc…
##  9 MRNA_EXPRESSION         Z-SCORE micr… microR… FALSE   FALSE   gbm_tc… gbm_tc…
## 10 MRNA_EXPRESSION         Z-SCORE mRNA… mRNA a… TRUE    FALSE   gbm_tc… gbm_tc…
## # … with abbreviated variable names ¹​datatype, ²​description,
## #   ³​showProfileInAnalysisTab, ⁴​patientLevel, ⁵​molecularProfileId

Note that we want to pull the molecularProfileId column to use in other queries.

cgdsr (getGeneticProfiles)

getGeneticProfiles.CGDS(cgds, cancerStudy = "gbm_tcga_pub")

Genomic Profile Data for a set of genes

cBioPortalData (Indentify samples and genes)

Currently, some conversion is needed to directly use the molecularData function, if you only have Hugo symbols. First, convert to Entrez gene IDs and then obtain all the samples in the sample list of interest.

Convert hugoGeneSymbol to entrezGeneId

genetab <- queryGeneTable(cbio,
    by = "hugoGeneSymbol",
    genes = c("NF1", "TP53", "ABL1")
)
genetab
## # A tibble: 3 × 3
##   entrezGeneId hugoGeneSymbol type          
##          <int> <chr>          <chr>         
## 1         4763 NF1            protein-coding
## 2           25 ABL1           protein-coding
## 3         7157 TP53           protein-coding
entrez <- genetab[["entrezGeneId"]]

Obtain all samples in study

allsamps <- samplesInSampleLists(cbio, "gbm_tcga_pub_all")

In the next section, we will show how to use the genes and sample identifiers to obtain the molecular profile data.

cgdsr (Profile Data)

The getProfileData function allows for straightforward retrieval of the molecular profile data with only a case list and genetic profile identifiers.

getProfileData.CGDS(x = cgds,
    genes = c("NF1", "TP53", "ABL1"),
    geneticProfiles = "gbm_tcga_pub_mrna",
    caseList = "gbm_tcga_pub_all"
)

Molecular Data with cBioPortalData

cBioPortalData provides a number of options for retrieving molecular profile data depending on the use case. Note that molecularData is mostly used internally and that the cBioPortalData function is the user-friendly method for downloading such data.

molecularData

We use the translated entrez identifiers from above.

molecularData(cbio, "gbm_tcga_pub_mrna",
    entrezGeneIds = entrez, sampleIds = unlist(allsamps))
## $gbm_tcga_pub_mrna
## # A tibble: 618 × 8
##    uniqueSampleKey      uniqu…¹ entre…² molec…³ sampl…⁴ patie…⁵ studyId    value
##    <chr>                <chr>     <int> <chr>   <chr>   <chr>   <chr>      <dbl>
##  1 VENHQS0wMi0wMDAxLTA… VENHQS…      25 gbm_tc… TCGA-0… TCGA-0… gbm_tc… -0.174  
##  2 VENHQS0wMi0wMDAxLTA… VENHQS…    4763 gbm_tc… TCGA-0… TCGA-0… gbm_tc… -0.297  
##  3 VENHQS0wMi0wMDAxLTA… VENHQS…    7157 gbm_tc… TCGA-0… TCGA-0… gbm_tc…  0.621  
##  4 VENHQS0wMi0wMDAzLTA… VENHQS…      25 gbm_tc… TCGA-0… TCGA-0… gbm_tc… -0.177  
##  5 VENHQS0wMi0wMDAzLTA… VENHQS…    4763 gbm_tc… TCGA-0… TCGA-0… gbm_tc… -0.00107
##  6 VENHQS0wMi0wMDAzLTA… VENHQS…    7157 gbm_tc… TCGA-0… TCGA-0… gbm_tc…  0.00644
##  7 VENHQS0wMi0wMDA0LTA… VENHQS…      25 gbm_tc… TCGA-0… TCGA-0… gbm_tc… -0.0878 
##  8 VENHQS0wMi0wMDA0LTA… VENHQS…    4763 gbm_tc… TCGA-0… TCGA-0… gbm_tc… -0.236  
##  9 VENHQS0wMi0wMDA0LTA… VENHQS…    7157 gbm_tc… TCGA-0… TCGA-0… gbm_tc… -0.305  
## 10 VENHQS0wMi0wMDA2LTA… VENHQS…      25 gbm_tc… TCGA-0… TCGA-0… gbm_tc… -0.173  
## # … with 608 more rows, and abbreviated variable names ¹​uniquePatientKey,
## #   ²​entrezGeneId, ³​molecularProfileId, ⁴​sampleId, ⁵​patientId

getDataByGenes

The getDataByGenes function automatically figures out all the sample identifiers in the study and it allows Hugo and Entrez identifiers, as well as genePanelId inputs.

getDataByGenes(
    api =  cbio,
    studyId = "gbm_tcga_pub",
    genes = c("NF1", "TP53", "ABL1"),
    by = "hugoGeneSymbol",
    molecularProfileIds = "gbm_tcga_pub_mrna"
)
## $gbm_tcga_pub_mrna
## # A tibble: 618 × 10
##    uniqueSamp…¹ uniqu…² entre…³ molec…⁴ sampl…⁵ patie…⁶ studyId    value hugoG…⁷
##    <chr>        <chr>     <int> <chr>   <chr>   <chr>   <chr>      <dbl> <chr>  
##  1 VENHQS0wMi0… VENHQS…      25 gbm_tc… TCGA-0… TCGA-0… gbm_tc… -0.174   ABL1   
##  2 VENHQS0wMi0… VENHQS…    4763 gbm_tc… TCGA-0… TCGA-0… gbm_tc… -0.297   NF1    
##  3 VENHQS0wMi0… VENHQS…    7157 gbm_tc… TCGA-0… TCGA-0… gbm_tc…  0.621   TP53   
##  4 VENHQS0wMi0… VENHQS…      25 gbm_tc… TCGA-0… TCGA-0… gbm_tc… -0.177   ABL1   
##  5 VENHQS0wMi0… VENHQS…    4763 gbm_tc… TCGA-0… TCGA-0… gbm_tc… -0.00107 NF1    
##  6 VENHQS0wMi0… VENHQS…    7157 gbm_tc… TCGA-0… TCGA-0… gbm_tc…  0.00644 TP53   
##  7 VENHQS0wMi0… VENHQS…      25 gbm_tc… TCGA-0… TCGA-0… gbm_tc… -0.0878  ABL1   
##  8 VENHQS0wMi0… VENHQS…    4763 gbm_tc… TCGA-0… TCGA-0… gbm_tc… -0.236   NF1    
##  9 VENHQS0wMi0… VENHQS…    7157 gbm_tc… TCGA-0… TCGA-0… gbm_tc… -0.305   TP53   
## 10 VENHQS0wMi0… VENHQS…      25 gbm_tc… TCGA-0… TCGA-0… gbm_tc… -0.173   ABL1   
## # … with 608 more rows, 1 more variable: type <chr>, and abbreviated variable
## #   names ¹​uniqueSampleKey, ²​uniquePatientKey, ³​entrezGeneId,
## #   ⁴​molecularProfileId, ⁵​sampleId, ⁶​patientId, ⁷​hugoGeneSymbol

cBioPortalData: the main end-user function

It is important to note that end users who wish to obtain the data as easily as possible should use the main cBioPortalData function:

gbm_pub <- cBioPortalData(
    api = cbio,
    studyId = "gbm_tcga_pub",
    genes = c("NF1", "TP53", "ABL1"), by = "hugoGeneSymbol",
    molecularProfileIds = "gbm_tcga_pub_mrna"
)

assay(gbm_pub[["gbm_tcga_pub_mrna"]])[, 1:4]
##      TCGA-02-0001-01 TCGA-02-0003-01 TCGA-02-0004-01 TCGA-02-0006-01
## ABL1      -0.1744878    -0.177096729     -0.08782114      -0.1733767
## NF1       -0.2966920    -0.001066810     -0.23626512      -0.1691507
## TP53       0.6213171     0.006435625     -0.30507285       0.3967758

Mutation Data

cBioPortalData (mutationData)

Similar to molecularData, mutation data can be obtained with the mutationData function or the getDataByGenes function.

mutationData(
    api = cbio,
    molecularProfileIds = "gbm_tcga_pub_mutations",
    entrezGeneIds = entrez,
    sampleIds = unlist(allsamps)
)
## $gbm_tcga_pub_mutations
## # A tibble: 57 × 28
##    uniqueSample…¹ uniqu…² molec…³ sampl…⁴ patie…⁵ entre…⁶ studyId center mutat…⁷
##    <chr>          <chr>   <chr>   <chr>   <chr>     <int> <chr>   <chr>  <chr>  
##  1 VENHQS0wMi0wM… VENHQS… gbm_tc… TCGA-0… TCGA-0…    7157 gbm_tc… genom… Somatic
##  2 VENHQS0wMi0wM… VENHQS… gbm_tc… TCGA-0… TCGA-0…    4763 gbm_tc… genom… Somatic
##  3 VENHQS0wMi0wM… VENHQS… gbm_tc… TCGA-0… TCGA-0…    7157 gbm_tc… genom… Somatic
##  4 VENHQS0wMi0wM… VENHQS… gbm_tc… TCGA-0… TCGA-0…    7157 gbm_tc… genom… Somatic
##  5 VENHQS0wMi0wM… VENHQS… gbm_tc… TCGA-0… TCGA-0…    7157 gbm_tc… genom… Somatic
##  6 VENHQS0wMi0wM… VENHQS… gbm_tc… TCGA-0… TCGA-0…    7157 gbm_tc… genom… Somatic
##  7 VENHQS0wMi0wM… VENHQS… gbm_tc… TCGA-0… TCGA-0…    4763 gbm_tc… genom… Somatic
##  8 VENHQS0wMi0wM… VENHQS… gbm_tc… TCGA-0… TCGA-0…    7157 gbm_tc… genom… Somatic
##  9 VENHQS0wMi0wM… VENHQS… gbm_tc… TCGA-0… TCGA-0…    7157 gbm_tc… genom… Somatic
## 10 VENHQS0wMi0wM… VENHQS… gbm_tc… TCGA-0… TCGA-0…    7157 gbm_tc… genom… Somatic
## # … with 47 more rows, 19 more variables: validationStatus <chr>,
## #   startPosition <int>, endPosition <int>, referenceAllele <chr>,
## #   proteinChange <chr>, mutationType <chr>, functionalImpactScore <chr>,
## #   fisValue <dbl>, linkXvar <chr>, linkPdb <chr>, linkMsa <chr>,
## #   ncbiBuild <chr>, variantType <chr>, keyword <chr>, chr <chr>,
## #   variantAllele <chr>, refseqMrnaId <chr>, proteinPosStart <int>,
## #   proteinPosEnd <int>, and abbreviated variable names ¹​uniqueSampleKey, …
getDataByGenes(
    api = cbio,
    studyId = "gbm_tcga_pub",
    genes = c("NF1", "TP53", "ABL1"),
    by = "hugoGeneSymbol",
    molecularProfileIds = "gbm_tcga_pub_mutations"
)
## $gbm_tcga_pub_mutations
## # A tibble: 57 × 30
##    uniqueSample…¹ uniqu…² molec…³ sampl…⁴ patie…⁵ entre…⁶ studyId center mutat…⁷
##    <chr>          <chr>   <chr>   <chr>   <chr>     <int> <chr>   <chr>  <chr>  
##  1 VENHQS0wMi0wM… VENHQS… gbm_tc… TCGA-0… TCGA-0…    7157 gbm_tc… genom… Somatic
##  2 VENHQS0wMi0wM… VENHQS… gbm_tc… TCGA-0… TCGA-0…    4763 gbm_tc… genom… Somatic
##  3 VENHQS0wMi0wM… VENHQS… gbm_tc… TCGA-0… TCGA-0…    7157 gbm_tc… genom… Somatic
##  4 VENHQS0wMi0wM… VENHQS… gbm_tc… TCGA-0… TCGA-0…    7157 gbm_tc… genom… Somatic
##  5 VENHQS0wMi0wM… VENHQS… gbm_tc… TCGA-0… TCGA-0…    7157 gbm_tc… genom… Somatic
##  6 VENHQS0wMi0wM… VENHQS… gbm_tc… TCGA-0… TCGA-0…    7157 gbm_tc… genom… Somatic
##  7 VENHQS0wMi0wM… VENHQS… gbm_tc… TCGA-0… TCGA-0…    4763 gbm_tc… genom… Somatic
##  8 VENHQS0wMi0wM… VENHQS… gbm_tc… TCGA-0… TCGA-0…    7157 gbm_tc… genom… Somatic
##  9 VENHQS0wMi0wM… VENHQS… gbm_tc… TCGA-0… TCGA-0…    7157 gbm_tc… genom… Somatic
## 10 VENHQS0wMi0wM… VENHQS… gbm_tc… TCGA-0… TCGA-0…    7157 gbm_tc… genom… Somatic
## # … with 47 more rows, 21 more variables: validationStatus <chr>,
## #   startPosition <int>, endPosition <int>, referenceAllele <chr>,
## #   proteinChange <chr>, mutationType <chr>, functionalImpactScore <chr>,
## #   fisValue <dbl>, linkXvar <chr>, linkPdb <chr>, linkMsa <chr>,
## #   ncbiBuild <chr>, variantType <chr>, keyword <chr>, chr <chr>,
## #   variantAllele <chr>, refseqMrnaId <chr>, proteinPosStart <int>,
## #   proteinPosEnd <int>, hugoGeneSymbol <chr>, type <chr>, and abbreviated …

cgdsr (getMutationData)

getMutationData.CGDS(
    x = cgds,
    caseList = "getMutationData",
    geneticProfile = "gbm_tcga_pub_mutations",
    genes = c("NF1", "TP53", "ABL1")
)

Copy Number Alteration (CNA)

cBioPortalData (CNA)

Copy Number Alteration data can be obtained with the getDataByGenes function or by the main cBioPortal function.

getDataByGenes(
    api = cbio,
    studyId = "gbm_tcga_pub",
    genes = c("NF1", "TP53", "ABL1"),
    by = "hugoGeneSymbol",
    molecularProfileIds = "gbm_tcga_pub_cna_rae"
)
## $gbm_tcga_pub_cna_rae
## # A tibble: 609 × 10
##    uniqueS…¹ uniqu…² entre…³ molec…⁴ sampl…⁵ patie…⁶ studyId value hugoG…⁷ type 
##    <chr>     <chr>     <int> <chr>   <chr>   <chr>   <chr>   <int> <chr>   <chr>
##  1 VENHQS0w… VENHQS…      25 gbm_tc… TCGA-0… TCGA-0… gbm_tc…     1 ABL1    prot…
##  2 VENHQS0w… VENHQS…    4763 gbm_tc… TCGA-0… TCGA-0… gbm_tc…     0 NF1     prot…
##  3 VENHQS0w… VENHQS…    7157 gbm_tc… TCGA-0… TCGA-0… gbm_tc…     0 TP53    prot…
##  4 VENHQS0w… VENHQS…      25 gbm_tc… TCGA-0… TCGA-0… gbm_tc…     0 ABL1    prot…
##  5 VENHQS0w… VENHQS…    4763 gbm_tc… TCGA-0… TCGA-0… gbm_tc…     0 NF1     prot…
##  6 VENHQS0w… VENHQS…    7157 gbm_tc… TCGA-0… TCGA-0… gbm_tc…     0 TP53    prot…
##  7 VENHQS0w… VENHQS…      25 gbm_tc… TCGA-0… TCGA-0… gbm_tc…     0 ABL1    prot…
##  8 VENHQS0w… VENHQS…    4763 gbm_tc… TCGA-0… TCGA-0… gbm_tc…     0 NF1     prot…
##  9 VENHQS0w… VENHQS…    7157 gbm_tc… TCGA-0… TCGA-0… gbm_tc…     0 TP53    prot…
## 10 VENHQS0w… VENHQS…      25 gbm_tc… TCGA-0… TCGA-0… gbm_tc…     0 ABL1    prot…
## # … with 599 more rows, and abbreviated variable names ¹​uniqueSampleKey,
## #   ²​uniquePatientKey, ³​entrezGeneId, ⁴​molecularProfileId, ⁵​sampleId,
## #   ⁶​patientId, ⁷​hugoGeneSymbol
cBioPortalData(
    api = cbio,
    studyId = "gbm_tcga_pub",
    genes = c("NF1", "TP53", "ABL1"),
    by = "hugoGeneSymbol",
    molecularProfileIds = "gbm_tcga_pub_cna_rae"
)
## harmonizing input:
##   removing 3 colData rownames not in sampleMap 'primary'
## A MultiAssayExperiment object of 1 listed
##  experiment with a user-defined name and respective class.
##  Containing an ExperimentList class object of length 1:
##  [1] gbm_tcga_pub_cna_rae: SummarizedExperiment with 3 rows and 203 columns
## Functionality:
##  experiments() - obtain the ExperimentList instance
##  colData() - the primary/phenotype DataFrame
##  sampleMap() - the sample coordination DataFrame
##  `$`, `[`, `[[` - extract colData columns, subset, or experiment
##  *Format() - convert into a long or wide DataFrame
##  assays() - convert ExperimentList to a SimpleList of matrices
##  exportClass() - save data to flat files

cgdsr (CNA)

getProfileData.CGDS(
    x = cgds,
    genes = c("NF1", "TP53", "ABL1"),
    geneticProfiles = "gbm_tcga_pub_cna_rae",
    caseList = "gbm_tcga_pub_cna"
)

Methylation Data

cBioPortalData (Methylation)

Similar to Copy Number Alteration, Methylation can be obtained by getDataByGenes function or by ‘cBioPortalData’ function.

getDataByGenes(
    api = cbio,
    studyId = "gbm_tcga_pub",
    genes = c("NF1", "TP53", "ABL1"),
    by = "hugoGeneSymbol",
    molecularProfileIds = "gbm_tcga_pub_methylation_hm27"
)
## $gbm_tcga_pub_methylation_hm27
## # A tibble: 174 × 10
##    unique…¹ uniqu…² entre…³ molec…⁴ sampl…⁵ patie…⁶ studyId  value hugoG…⁷ type 
##    <chr>    <chr>     <int> <chr>   <chr>   <chr>   <chr>    <dbl> <chr>   <chr>
##  1 VENHQS0… VENHQS…      25 gbm_tc… TCGA-0… TCGA-0… gbm_tc… 0.103  ABL1    prot…
##  2 VENHQS0… VENHQS…    4763 gbm_tc… TCGA-0… TCGA-0… gbm_tc… 0.112  NF1     prot…
##  3 VENHQS0… VENHQS…    7157 gbm_tc… TCGA-0… TCGA-0… gbm_tc… 0.0735 TP53    prot…
##  4 VENHQS0… VENHQS…      25 gbm_tc… TCGA-0… TCGA-0… gbm_tc… 0.202  ABL1    prot…
##  5 VENHQS0… VENHQS…    4763 gbm_tc… TCGA-0… TCGA-0… gbm_tc… 0.161  NF1     prot…
##  6 VENHQS0… VENHQS…    7157 gbm_tc… TCGA-0… TCGA-0… gbm_tc… 0.152  TP53    prot…
##  7 VENHQS0… VENHQS…      25 gbm_tc… TCGA-0… TCGA-0… gbm_tc… 0.179  ABL1    prot…
##  8 VENHQS0… VENHQS…    4763 gbm_tc… TCGA-0… TCGA-0… gbm_tc… 0.161  NF1     prot…
##  9 VENHQS0… VENHQS…    7157 gbm_tc… TCGA-0… TCGA-0… gbm_tc… 0.170  TP53    prot…
## 10 VENHQS0… VENHQS…      25 gbm_tc… TCGA-0… TCGA-0… gbm_tc… 0.176  ABL1    prot…
## # … with 164 more rows, and abbreviated variable names ¹​uniqueSampleKey,
## #   ²​uniquePatientKey, ³​entrezGeneId, ⁴​molecularProfileId, ⁵​sampleId,
## #   ⁶​patientId, ⁷​hugoGeneSymbol
cBioPortalData(
    api = cbio,
    studyId = "gbm_tcga_pub",
    genes = c("NF1", "TP53", "ABL1"),
    by = "hugoGeneSymbol",
    molecularProfileIds = "gbm_tcga_pub_methylation_hm27"
)
## harmonizing input:
##   removing 148 colData rownames not in sampleMap 'primary'
## A MultiAssayExperiment object of 1 listed
##  experiment with a user-defined name and respective class.
##  Containing an ExperimentList class object of length 1:
##  [1] gbm_tcga_pub_methylation_hm27: SummarizedExperiment with 3 rows and 58 columns
## Functionality:
##  experiments() - obtain the ExperimentList instance
##  colData() - the primary/phenotype DataFrame
##  sampleMap() - the sample coordination DataFrame
##  `$`, `[`, `[[` - extract colData columns, subset, or experiment
##  *Format() - convert into a long or wide DataFrame
##  assays() - convert ExperimentList to a SimpleList of matrices
##  exportClass() - save data to flat files

cgdsr (Methylation)

getProfileData.CGDS(
    x = cgds,
    genes = c("NF1", "TP53", "ABL1"),
    geneticProfiles = "gbm_tcga_pub_methylation_hm27",
    caseList = "gbm_tcga_pub_methylation_hm27"
)

sessionInfo

sessionInfo()
## R version 4.2.2 (2022-10-31)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.5 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.16-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.16-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] survminer_0.4.9             ggpubr_0.5.0               
##  [3] ggplot2_3.4.0               survival_3.4-0             
##  [5] cBioPortalData_2.10.3       MultiAssayExperiment_1.24.0
##  [7] SummarizedExperiment_1.28.0 Biobase_2.58.0             
##  [9] GenomicRanges_1.50.2        GenomeInfoDb_1.34.6        
## [11] IRanges_2.32.0              S4Vectors_0.36.1           
## [13] BiocGenerics_0.44.0         MatrixGenerics_1.10.0      
## [15] matrixStats_0.63.0          AnVIL_1.10.1               
## [17] dplyr_1.0.10                BiocStyle_2.26.0           
## 
## loaded via a namespace (and not attached):
##   [1] backports_1.4.1           BiocBaseUtils_1.0.0      
##   [3] BiocFileCache_2.6.0       RCircos_1.2.2            
##   [5] splines_4.2.2             BiocParallel_1.32.5      
##   [7] TCGAutils_1.18.0          digest_0.6.31            
##   [9] htmltools_0.5.4           magick_2.7.3             
##  [11] fansi_1.0.3               magrittr_2.0.3           
##  [13] memoise_2.0.1             tzdb_0.3.0               
##  [15] limma_3.54.0              Biostrings_2.66.0        
##  [17] readr_2.1.3               vroom_1.6.0              
##  [19] prettyunits_1.1.1         colorspace_2.0-3         
##  [21] blob_1.2.3                rvest_1.0.3              
##  [23] rappdirs_0.3.3            xfun_0.36                
##  [25] crayon_1.5.2              RCurl_1.98-1.9           
##  [27] jsonlite_1.8.4            RaggedExperiment_1.22.0  
##  [29] zoo_1.8-11                glue_1.6.2               
##  [31] GenomicDataCommons_1.22.0 gtable_0.3.1             
##  [33] zlibbioc_1.44.0           XVector_0.38.0           
##  [35] DelayedArray_0.24.0       car_3.1-1                
##  [37] abind_1.4-5               scales_1.2.1             
##  [39] futile.options_1.0.1      DBI_1.1.3                
##  [41] rstatix_0.7.1             miniUI_0.1.1.1           
##  [43] Rcpp_1.0.9                gridtext_0.1.5           
##  [45] xtable_1.8-4              progress_1.2.2           
##  [47] archive_1.1.5             bit_4.0.5                
##  [49] km.ci_0.5-6               DT_0.26                  
##  [51] htmlwidgets_1.6.0         httr_1.4.4               
##  [53] ellipsis_0.3.2            farver_2.1.1             
##  [55] pkgconfig_2.0.3           XML_3.99-0.13            
##  [57] rapiclient_0.1.3          sass_0.4.4               
##  [59] dbplyr_2.2.1              utf8_1.2.2               
##  [61] RJSONIO_1.3-1.6           labeling_0.4.2           
##  [63] tidyselect_1.2.0          rlang_1.0.6              
##  [65] later_1.3.0               AnnotationDbi_1.60.0     
##  [67] munsell_0.5.0             tools_4.2.2              
##  [69] cachem_1.0.6              cli_3.5.0                
##  [71] generics_0.1.3            RSQLite_2.2.20           
##  [73] broom_1.0.2               evaluate_0.19            
##  [75] stringr_1.5.0             fastmap_1.1.0            
##  [77] yaml_2.3.6                knitr_1.41               
##  [79] bit64_4.0.5               survMisc_0.5.6           
##  [81] purrr_1.0.0               KEGGREST_1.38.0          
##  [83] mime_0.12                 formatR_1.13             
##  [85] xml2_1.3.3                biomaRt_2.54.0           
##  [87] compiler_4.2.2            filelock_1.0.2           
##  [89] curl_4.3.3                png_0.1-8                
##  [91] ggsignif_0.6.4            tibble_3.1.8             
##  [93] bslib_0.4.2               stringi_1.7.8            
##  [95] highr_0.10                futile.logger_1.4.3      
##  [97] GenomicFeatures_1.50.3    lattice_0.20-45          
##  [99] Matrix_1.5-3              commonmark_1.8.1         
## [101] markdown_1.4              KMsurv_0.1-5             
## [103] RTCGAToolbox_2.28.0       vctrs_0.5.1              
## [105] pillar_1.8.1              lifecycle_1.0.3          
## [107] BiocManager_1.30.19       jquerylib_0.1.4          
## [109] data.table_1.14.6         bitops_1.0-7             
## [111] httpuv_1.6.7              rtracklayer_1.58.0       
## [113] R6_2.5.1                  BiocIO_1.8.0             
## [115] bookdown_0.31             promises_1.2.0.1         
## [117] gridExtra_2.3             codetools_0.2-18         
## [119] lambda.r_1.2.4            assertthat_0.2.1         
## [121] rjson_0.2.21              withr_2.5.0              
## [123] GenomicAlignments_1.34.0  Rsamtools_2.14.0         
## [125] GenomeInfoDbData_1.2.9    ggtext_0.1.2             
## [127] parallel_4.2.2            hms_1.1.2                
## [129] grid_4.2.2                tidyr_1.2.1              
## [131] rmarkdown_2.19            carData_3.0-5            
## [133] shiny_1.7.4               restfulr_0.0.15