Contents

1 Introduction

The gDRtestData package includes a curated subset of DepMap Public 24Q4 data, serving multiple purposes:

This vignette describes the included DepMap datasets, their contents, and how to use them.

1.1 Data Source & Citation

Orginal source: DepMap Portal

Release: DepMap Public 24Q4 (Loaded May 26, 2026)

Citation: DepMap, Broad (2024). DepMap 24Q4 Public. Figshare+. Dataset. https://doi.org/10.25452/figshare.plus.27993248.v1

2 Data Overview

The DepMap 24Q4 release contains new cell models and data from:

The following datasets are included in the gDRtestData pacakge:

Dataset Type Dimensions Description Dataset url
Models Metadata ~1,000 cell lines Cell line information and annotations url
CRISPRGeneEffect Functional ~1,000 × ~18,000 CRISPR knockout gene effect scores (integrated via Chronos) url
Expression Omics ~1,000 × ~19,000 Gene expression (log2 TPM, protein-coding genes) url
Mutations (Hotspot) Somatic ~1,000 × ~3,000 Binary matrix of hotspot mutations url
Mutations (Damaging) Somatic ~1,000 × ~3,000 Binary matrix of damaging mutations url
OmicsCNGene CNV ~1,000 × ~20,000 Gene-level copy number estimates url

2.1 Data Dictionary

2.1.1 Models (Cell Line Metadata)

file name Model.csv

Aspect Details
Rows Individual cell lines (~1,000 models)
Columns Metadata columns (see below)
Values Cell line annotations and patient information
Data Type Mixed (character, numeric)
Interpretation Comprehensive metadata for each cell line model
NA Handling Missing values indicate information not available for that model

Column Details:

Column Summary:

  • ModelID: Unique cell line identifier
  • CCLEName: Cell line name from CCLE database
  • CellLineName: Common cell line name
  • TissueOrigin: Tissue type (Human, Mouse, Other)
  • DepmapModelType, OncotreeLineage, OncotreePrimaryDisease, OncotreeSubtype: Cancer classification (from Oncotree)
  • OncotreeCode: Oncotree classification code
  • PrimaryOrMetastasis: Tumor site (Primary/Metastatic/Recurrence)
  • Age: Age at sampling
  • AgeCategory: Age category at time of sampling (Adult/Pediatric/Fetus/Unknown)
  • Sex: Sex at sampling (Female/Male/Unknown)
  • PatientRace: Patient-reported race

Notes:

  • Classification: Oncotree taxonomy for cancer models
  • Quality: Authenticated, high-quality cell lines only
  • Completeness: Some fields may be NA; indicates information not available
  • Use: Primary reference for cell line metadata; join with other datasets via ModelID

2.1.2 Somatic Mutations (Hotspot)

file name: OmicsSomaticMutationsMatrixHotspot.csv

Aspect Details
Rows Cell line identifiers
Columns NCBI gene IDs
Values Binary (0/1); 1 = hotspot mutation present
Definition Mutations in known cancer hotspots (COSMIC, OncoKB)
Sequencing Whole exome sequencing (WES)

Note: Recurrent mutations at known oncogenic positions.

2.1.3 Somatic Mutations (Damaging)

file name: OmicsSomaticMutationsMatrixDamaging.csv

Aspect Details
Rows Cell line identifiers
Columns NCBI gene IDs
Values Binary (0/1); 1 = damaging mutation present
Definition Frame-shift, stop-gain, or splice-site mutations
Quality High confidence damaging variants

Note: Loss-of-function mutations (frameshifts, nonsense, etc.)

2.1.4 CRISPR Gene Effect

file name: CRISPRGeneEffect.csv

Aspect Details
Rows Cell line identifiers
Columns NCBI gene IDs (Entrez format) as column names
Values CRISPR knockout effect scores
Scale -1 to +1 (typically); negative = essential gene in that cell line
Interpretation Lower values indicate genes more essential for cell viability
NA Handling Missing values indicate insufficient screen coverage

Note:

  • Method: Genome-wide CRISPR/Cas9 knockout screens
  • Scale: Dependency scores (probability of essentiality)
  • Processing: Already normalized and quality-filtered by DepMap

2.1.5 Gene Expression

file name: OmicsExpressionProteinCodingGenesTPMLogp1.csv

Aspect Details
Rows Cell line identifiers
Columns NCBI gene IDs (protein-coding genes only)
Values Expression levels (numeric)
Scale Log2(TPM + 1); already log-transformed
Range Typically 0-20 (log2 scale)
Quality RNA-seq from standardized Broad CCL protocols

Note:

  • Only protein-coding genes included
  • Already log-transformed (TPM + 1 pseudocount)
  • Row-wise and gene-wise normalization already applied by DepMap

2.1.6 Copy Number Variation (CNV)

file name: OmicsCNGene.csv

Aspect Details
Rows Cell line identifiers
Columns NCBI gene IDs
Values Numeric (continuous); gene-level CN estimates
Scale Log2 ratio relative to diploid reference (typically -2 to +3)
Method SNP microarray or WES-derived CN calling
Interpretation 0 = diploid (2 copies); <0 = deletion; >0 = amplification

3 Important Limitations & Disclaimers

  1. Data Subset: This package includes a curated subset for testing/examples. For comprehensive analyses, download the full DepMap Portal data.

  2. Licensing & Usage: DepMap data is publicly available but has specific usage terms. Verify compliance with your intended use: https://depmap.org/portal/documentation/

  3. Citation: Always cite both DepMap (original source) and gDRtestData package.

SessionInfo

sessionInfo()
#> R version 4.6.0 RC (2026-04-17 r89917)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.4 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.24-bioc/R/lib/libRblas.so 
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB              LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: America/New_York
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] gDRtestData_1.11.3 BiocStyle_2.41.0  
#> 
#> loaded via a namespace (and not attached):
#>  [1] digest_0.6.39       R6_2.6.1            bookdown_0.46      
#>  [4] fastmap_1.2.0       xfun_0.58           cachem_1.1.0       
#>  [7] knitr_1.51          htmltools_0.5.9     rmarkdown_2.31     
#> [10] lifecycle_1.0.5     cli_3.6.6           sass_0.4.10        
#> [13] data.table_1.18.4   jquerylib_0.1.4     compiler_4.6.0     
#> [16] tools_4.6.0         evaluate_1.0.5      bslib_0.11.0       
#> [19] yaml_2.3.12         otel_0.2.0          BiocManager_1.30.27
#> [22] jsonlite_2.0.0      rlang_1.2.0