RTCGA package

The Cancer Genome Atlas (TCGA) Data Portal provides a platform for researchers to search, download, and analyze data sets generated by TCGA. It contains clinical information, genomic characterization data, and high level sequence analysis of the tumor genomes. The key is to understand genomics to improve cancer care.

RTCGA package offers download and integration of the variety and volume of TCGA data using patient barcode key, what enables easier data possession. This may have an benefcial infuence on impact on development of science and improvement of patients’ treatment. RTCGA is an open-source R package, available to download from Bioconductor

source("http://bioconductor.org/biocLite.R")
biocLite("RTCGA")

or from github

if (!require(devtools)) {
    install.packages("devtools")
    require(devtools)
}
biocLite("MarcinKosinski/RTCGA")

Furthermore, RTCGA package transforms TCGA data to form which is convenient to use in R statistical package. Those data transformations can be a part of statistical analysis pipeline which can be more reproducible with RTCGA.

Use cases and examples are shown in RTCGA packages vignettes:

browseVignettes("RTCGA")

How to download clinical data to gain the same datasets as in RTCGA.mutations package?

There are many available date times of TCGA data releases. To see them all just type:

library(RTCGA)
checkTCGA('Dates')

Version 1.0 of RTCGA.mutations package contains mutations datasets from 2015-08-21. There were downloaded as follows (which is mainly copied from http://marcinkosinski.github.io/RTCGA/:

Available cohorts

All cohort names can be checked using:

(cohorts <- infoTCGA() %>% 
   rownames() %>% 
   sub("-counts", "", x=.))

For all cohorts the following code downloads the mutations data.

Downloading tarred files

# dir.create( "data2" )
releaseDate <- "2015-08-21"
sapply( cohorts, function(element){
tryCatch({
downloadTCGA( cancerTypes = element, 
              dataSet = "Mutation_Packager_Calls.Level",
              destDir = "data2", 
              date = releaseDate )},
error = function(cond){
   cat("Error: Maybe there weren't mutations data for ", element, " cancer.\n")
}
)
})

Saving mutations data to `RTCGA.mutations` package

list.files( "data2" ) %>%
   grep( x=., pattern ="Mutation", value = TRUE ) %>%
   file.path( "data2", .) %>%
   sapply( function(element){
      readTCGA(element,"mutations") -> mutations_file
     
     ## remove non-ASCII strings:
     for( i in 1:ncol(mutations_file)){
       mutations_file[, i] <- iconv(mutations_file[, i],
                                    "UTF-8", "ASCII", sub="")
     }
     
     which( sapply(cohorts, grep, x = element) == 1 ) %>%
       names -> cohort_name
     
     assign( x = paste0(cohort_name, ".mutations"),
               value = mutations_file,
              envir = .GlobalEnv )

#      save( list = paste0(cohort_name, ".mutations"),
#             file = paste0("data/",
#                      cohort_name,
#                      ".mutations.rda"))
#      rm( list = paste0(cohort_name, ".mutations"),
#          envir = .GlobalEnv )
   })


# or save with good compression
devtools::use_data(ACC.mutations,BLCA.mutations,BRCA.mutations,CESC.mutations,CHOL.mutations,COAD.mutations,GBM.mutations,HNSC.mutations,KICH.mutations,KIPAN.mutations,KIRC.mutations,KIRP.mutations,LAML.mutations,LGG.mutations,LIHC.mutations,LUAD.mutations,LUSC.mutations,OV.mutations,PAAD.mutations,PCPG.mutations,PRAD.mutations,READ.mutations,SARC.mutations,SKCM.mutations,STAD.mutations,STES.mutations,TGCT.mutations,THCA.mutations,UCEC.mutations,UCS.mutations,UVM.mutations, overwrite = TRUE, compress = "xz")

Using `RTCGA` package to download mutations data that are included in `RTCGA.mutations` package

Date of datasets release: 2015-08-21

Marcin Kosiński

2015-10-15

RTCGA package

How to download clinical data to gain the same datasets as in RTCGA.mutations package?

Available cohorts

Downloading tarred files

Saving mutations data to `RTCGA.mutations` package

Using RTCGA package to download mutations data that are included in RTCGA.mutations package

Date of datasets release: 2015-08-21

Marcin Kosiński

2015-10-15

RTCGA package

How to download clinical data to gain the same datasets as in RTCGA.mutations package?

Available cohorts

Downloading tarred files

Saving mutations data to RTCGA.mutations package

Using `RTCGA` package to download mutations data that are included in `RTCGA.mutations` package

Saving mutations data to `RTCGA.mutations` package