The NCI-60 cancer cell line panel has been used over the course of several decades as an anti-cancer drug screen. This panel was developed as part of the Developmental Therapeutics Program (DTP, http://dtp.nci.nih.gov/) of the U.S. National Cancer Institute (NCI). Thousands of compounds have been tested on the NCI-60, which have been extensively characterized by many platforms for gene and protein expression, copy number, mutation, and others (Reinhold, et al., 2012). The purpose of the CellMiner project (http://discover.nci.nih.gov/cellminer) has been to integrate data from multiple platforms used to analyze the NCI-60 and to provide a powerful suite of tools for exploration of NCI-60 data.
All data in the rcellminerData package has been retrieved directly from the CellMiner project (http://discover.nci.nih.gov/cellminer) website. Both the data downloaded and the scripts used to generate this data package are contained within the inst/data-raw folder of the package.
source("http://bioconductor.org/biocLite.R")
biocLite("rcellminer")
biocLite("rcellminerData")
Load rcellminer and rcellminerData packages:
library(rcellminer)
library(rcellminerData)
A list of all accessible vignettes and methods is available with the following command.
help.search("rcellminerData")
Specific information about the drug data or the molecular profiling data can also be retrieved
help("drugData")
help("molData")
Data rcellminerData exists as two S4 class objects: molData and drugData. molData contains results for molecular assays (e.g. genomics, proteomics, etc) that have been performed on the NCI-60 and drugData contains results for drug response assays (Reinhold, et al., 2012).
molData is an instance of the MolData S4 class composed of 2 slots: eSetList and sampleData. eSetList is a list of eSet objects that can be of different dimensions; NOTE: in concept this is similar to eSet objects, but differs in that the eSet assayData slot requires that matrices have equal dimensions. The second slot, sampleData, is a MIAxE class instance, but its accessor, getSampleData(), returns a data.frame containing information for each sample. Below are examples of possible operations that can be performed on the MolData object.
# Get the types of feature data in a MolData object.
names(getAllFeatureData(molData))
## [1] "cop" "exp" "xai" "exo" "mut" "mir" "pro" "mda"
An eSetList list member within a MolData object can be referenced directly using the double square bracket operator, as with a normal list and the operation returns an eSet object. In the case of rcellminerData, an ExpressionSet is returned which is derived from eSet. Any eSet derived class can potentially be added to the eSetList; adding objects to the eSetList will be described in a later section.
class(molData[["exp"]])
## [1] "ExpressionSet"
## attr(,"package")
## [1] "Biobase"
geneExpMat <- exprs(molData[["exp"]])
Sample information about a MolData object can be accessed using getSampleData(), which returns a data.frame. For the NCI-60, we provide information the tissue of origin for each cell line.
getSampleData(molData)[1:10, "TissueType"]
## [1] "breast" "breast"
## [3] "breast" "breast"
## [5] "breast" "central nervous system"
## [7] "central nervous system" "central nervous system"
## [9] "central nervous system" "central nervous system"
It is possible to add additional datasets into MolData objects, as shown below, where the protein data provided in rcellminerData is copied as “test”. This provides users flexibility for wider usage of the MolData class.
# Add data
molData[["test"]] <- molData[["pro"]]
names(getAllFeatureData(molData))
## [1] "cop" "exp" "xai" "exo" "mut" "mir" "pro" "mda" "test"
Drug activity (response) data is provided in the rcellminerData package for the NCI-60. drugData is an instance of the DrugData S4 class that is composed of 3 slots: act, repeatAct, and sampleData. Both act (summarized data across multiple repeats) and repeatAct (row repeat data) are activity data slots are provided as ExpressionSet objects. In the example below, the drugActMat has fewer rows than drugRepeatActMat since the data across multiple repeats has been summarized, but the same number of columns (samples).
drugActMat <- exprs(getAct(drugData))
dim(drugActMat)
## [1] 20861 60
drugRepeatActMat <- exprs(getRepeatAct(drugData))
dim(drugRepeatActMat)
## [1] 77421 60
rcellminerData provides a large amount of information on drugs tested on the NCI-60, including structure information, clinical testing status, etc. This data can be extracted using into a data.frame as shown below:
drugAnnotDf <- as(featureData(getAct(drugData)), "data.frame")
colnames(drugAnnotDf)
## [1] "NSC" "NAME" "FDA_STATUS"
## [4] "MOA" "PUBCHEM_ID" "TOTAL_EXPS"
## [7] "TOTAL_EXPS_AFTER_QC" "SMILES"
DrugData objects can contain sample data in the same manner as with MolData objects. In the case of rcellminerData, the sample data provided for the the drugData object will be identical to that provided for the molData object.
identical(getSampleData(molData), getSampleData(drugData))
## [1] TRUE
sessionInfo()
## R version 3.2.0 (2015-04-16)
## Platform: x86_64-unknown-linux-gnu (64-bit)
## Running under: Ubuntu 14.04.2 LTS
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=en_US.UTF-8
## [9] LC_ADDRESS=en_US.UTF-8 LC_TELEPHONE=en_US.UTF-8
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=en_US.UTF-8
##
## attached base packages:
## [1] parallel stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] rcellminerData_1.0.0 rcellminer_1.0.0 rcdk_3.3.2
## [4] fingerprint_3.5.2 Biobase_2.28.0 BiocGenerics_0.14.0
## [7] knitr_1.9
##
## loaded via a namespace (and not attached):
## [1] Rcpp_0.11.5 xtable_1.7-4 R6_2.0.1
## [4] stringr_0.6.2 caTools_1.17.1 rcdklibs_1.5.8.4
## [7] tools_3.2.0 png_0.1-7 KernSmooth_2.23-14
## [10] htmltools_0.2.6 iterators_1.0.7 gtools_3.4.2
## [13] yaml_2.1.13 digest_0.6.8 RJSONIO_1.3-0
## [16] rJava_0.9-6 shiny_0.11.1 formatR_1.1
## [19] bitops_1.0-6 mime_0.3 evaluate_0.6
## [22] rmarkdown_0.5.1 gdata_2.13.3 gplots_2.16.0
## [25] BiocStyle_1.6.0 httpuv_1.3.2