1 Installation

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("curatedTCGAData")

Load packages:

library(curatedTCGAData)
library(MultiAssayExperiment)
library(TCGAutils)

2 Downloading datasets

Checking available cancer codes and assays in TCGA data:

curatedTCGAData(diseaseCode = "*", assays = "*", dry.run = TRUE)
## Please see the list below for available cohorts and assays
## Available Cancer codes:
##  ACC BLCA BRCA CESC CHOL COAD DLBC ESCA GBM HNSC KICH
##  KIRC KIRP LAML LGG LIHC LUAD LUSC MESO OV PAAD PCPG
##  PRAD READ SARC SKCM STAD TGCT THCA THYM UCEC UCS UVM 
## Available Data Types:
##  CNACGH CNACGH_CGH_hg_244a
##  CNACGH_CGH_hg_415k_g4124a CNASNP CNASeq
##  CNVSNP GISTIC_AllByGene GISTIC_Peaks
##  GISTIC_ThresholdedByGene Methylation
##  Methylation_methyl27 Methylation_methyl450
##  Mutation RNASeq2GeneNorm RNASeqGene RPPAArray
##  mRNAArray mRNAArray_TX_g4502a
##  mRNAArray_TX_g4502a_1
##  mRNAArray_TX_ht_hg_u133a mRNAArray_huex
##  miRNAArray miRNASeqGene

Check potential files to be downloaded:

curatedTCGAData(diseaseCode = "COAD", assays = "RPPA*", dry.run = TRUE)
##                      Title DispatchClass
## 96 COAD_RPPAArray-20160128           Rda

2.1 ACC dataset example

(accmae <- curatedTCGAData("ACC", c("CN*", "Mutation"), FALSE))
## A MultiAssayExperiment object of 3 listed
##  experiments with user-defined names and respective classes. 
##  Containing an ExperimentList class object of length 3: 
##  [1] ACC_CNASNP-20160128: RaggedExperiment with 79861 rows and 180 columns 
##  [2] ACC_CNVSNP-20160128: RaggedExperiment with 21052 rows and 180 columns 
##  [3] ACC_Mutation-20160128: RaggedExperiment with 20166 rows and 90 columns 
## Features: 
##  experiments() - obtain the ExperimentList instance 
##  colData() - the primary/phenotype DataFrame 
##  sampleMap() - the sample availability DataFrame 
##  `$`, `[`, `[[` - extract colData columns, subset, or experiment 
##  *Format() - convert into a long or wide DataFrame 
##  assays() - convert ExperimentList to a SimpleList of matrices

Note. For more on how to use a MultiAssayExperiment please see the MultiAssayExperiment vignette.

2.1.1 Subtype information

Some cancer datasets contain associated subtype information within the clinical datasets provided. This subtype information is included in the metadata of colData of the MultiAssayExperiment object. To obtain these variable names, use the getSubtypeMap function from TCGA utils:

head(getSubtypeMap(accmae))
##         ACC_annotations   ACC_subtype
## 1            Patient_ID     patientID
## 2 histological_subtypes     Histology
## 3         mrna_subtypes       C1A/C1B
## 4         mrna_subtypes       mRNA_K4
## 5                  cimp    MethyLevel
## 6     microrna_subtypes miRNA cluster

2.1.2 Typical clinical variables

Another helper function provided by TCGAutils allows users to obtain a set of consistent clinical variable names across several cancer types. Use the getClinicalNames function to obtain a character vector of common clinical variables such as vital status, years to birth, days to death, etc.

head(getClinicalNames("ACC"))
## [1] "years_to_birth"        "vital_status"          "days_to_death"        
## [4] "days_to_last_followup" "tumor_tissue_site"     "pathologic_stage"
colData(accmae)[, getClinicalNames("ACC")][1:5, 1:5]
## DataFrame with 5 rows and 5 columns
##              years_to_birth vital_status days_to_death days_to_last_followup
##                   <integer>    <integer>     <integer>             <integer>
## TCGA-OR-A5J1             58            1          1355                    NA
## TCGA-OR-A5J2             44            1          1677                    NA
## TCGA-OR-A5J3             23            0            NA                  2091
## TCGA-OR-A5J4             23            1           423                    NA
## TCGA-OR-A5J5             30            1           365                    NA
##              tumor_tissue_site
##                    <character>
## TCGA-OR-A5J1           adrenal
## TCGA-OR-A5J2           adrenal
## TCGA-OR-A5J3           adrenal
## TCGA-OR-A5J4           adrenal
## TCGA-OR-A5J5           adrenal

2.1.3 Samples in Assays

The sampleTables function gives an overview of sample types / codes present in the data:

sampleTables(accmae)
## $`ACC_CNASNP-20160128`
## 
## 01 10 11 
## 90 85  5 
## 
## $`ACC_CNVSNP-20160128`
## 
## 01 10 11 
## 90 85  5 
## 
## $`ACC_Mutation-20160128`
## 
## 01 
## 90

Often, an analysis is performed comparing two groups of samples to each other. To facilitate the separation of samples, the splitAssays TCGAutils function identifies all sample types in the assays and moves each into its own assay. By default, all discoverable sample types are separated into a separate experiment. In this case we requested only solid tumors and blood derived normal samples as seen in the sampleTypes reference dataset:

sampleTypes[sampleTypes[["Code"]] %in% c("01", "10"), ]
##    Code           Definition Short.Letter.Code
## 1    01  Primary Solid Tumor                TP
## 10   10 Blood Derived Normal                NB
splitAssays(accmae, c("01", "10"))
## Warning: Some 'sampleCodes' not found in assays
## A MultiAssayExperiment object of 5 listed
##  experiments with user-defined names and respective classes. 
##  Containing an ExperimentList class object of length 5: 
##  [1] 01_ACC_CNASNP-20160128: RaggedExperiment with 79861 rows and 90 columns 
##  [2] 10_ACC_CNASNP-20160128: RaggedExperiment with 79861 rows and 85 columns 
##  [3] 01_ACC_CNVSNP-20160128: RaggedExperiment with 21052 rows and 90 columns 
##  [4] 10_ACC_CNVSNP-20160128: RaggedExperiment with 21052 rows and 85 columns 
##  [5] 01_ACC_Mutation-20160128: RaggedExperiment with 20166 rows and 90 columns 
## Features: 
##  experiments() - obtain the ExperimentList instance 
##  colData() - the primary/phenotype DataFrame 
##  sampleMap() - the sample availability DataFrame 
##  `$`, `[`, `[[` - extract colData columns, subset, or experiment 
##  *Format() - convert into a long or wide DataFrame 
##  assays() - convert ExperimentList to a SimpleList of matrices