1 Introduction

The depmap package aims to provide a reproducible research framework to cancer dependency data described by Tsherniak, Aviad, et al. “Defining a cancer dependency map.” Cell 170.3 (2017): 564-576.. The data found in the depmap package has been formatted to facilitate the use of common R packages such as dplyr and ggplot2. We hope that this package will allow researchers to more easily mine, explore and visually illustrate dependency data taken from the Depmap cancer genomic dependency study.

2 Installation instructions

To install depmap, the BiocManager Bioconductor Project Package Manager is required. If BiocManager is not already installed, it will need to be done so beforehand. Type (within R) install.packages(“BiocManager”) (This needs to be done just once.)

install.packages("BiocManager")
BiocManager::install("depmap")

The depmap package fully depends on the ExperimentHub Bioconductor package, which allows the data accessed in this package to be stored and retrieved from the cloud.

library("depmap")
library("ExperimentHub")

3 Available data

The depmap package currently contains eight datasets available through ExperimentHub.

The data found in this R package has been converted from a “wide” format .csv file to “long” format .rda file. None of the values taken from the original datasets have been changed, although the columns have been re-arranged. Descriptions of the changes made are described under the Details section after querying the relevant dataset.

## create ExperimentHub query object
eh <- ExperimentHub()
## snapshotDate(): 2022-10-24
query(eh, "depmap")
## ExperimentHub with 82 records
## # snapshotDate(): 2022-10-24
## # $dataprovider: Broad Institute
## # $species: Homo sapiens
## # $rdataclass: tibble
## # additional mcols(): taxonomyid, genome, description,
## #   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
## #   rdatapath, sourceurl, sourcetype 
## # retrieve records with, e.g., 'object[["EH2260"]]' 
## 
##            title             
##   EH2260 | rnai_19Q1         
##   EH2261 | crispr_19Q1       
##   EH2262 | copyNumber_19Q1   
##   EH2263 | RPPA_19Q1         
##   EH2264 | TPM_19Q1          
##   ...      ...               
##   EH7555 | copyNumber_22Q2   
##   EH7556 | TPM_22Q2          
##   EH7557 | mutationCalls_22Q2
##   EH7558 | metadata_22Q2     
##   EH7559 | achilles_22Q2

Each dataset has a ExperimentHub accession number, (e.g. EH2260 refers to the rnai dataset from the 19Q1 release).

3.1 RNA inference knockout data

## see ?depmap and browseVignettes('depmap') for documentation
## loading from cache

The rnai dataset contains the combined genetic dependency data for RNAi - induced gene knockdown for select genes and cancer cell lines. This data corresponds to the D2_combined_genetic_dependency_scores.csv file found in the 22Q2 depmap release and includes 17309 genes, 712 cell lines, 30 primary diseases and 31 lineages.

Specific rnai datasets can be accessed, such as rnai_19Q1 by EH number.

rnai <- eh[["EH2260"]]
rnai
## # A tibble: 12,324,008 × 6
##    depmap_id  cell_line          gene                gene_name entrez_id depen…¹
##    <chr>      <chr>              <chr>               <chr>     <chr>       <dbl>
##  1 ACH-001270 127399_SOFT_TISSUE A1BG (1)            A1BG      1          NA    
##  2 ACH-001270 127399_SOFT_TISSUE NAT2 (10)           NAT2      10         NA    
##  3 ACH-001270 127399_SOFT_TISSUE ADA (100)           ADA       100        NA    
##  4 ACH-001270 127399_SOFT_TISSUE CDH2 (1000)         CDH2      1000       -0.195
##  5 ACH-001270 127399_SOFT_TISSUE AKT3 (10000)        AKT3      10000      -0.256
##  6 ACH-001270 127399_SOFT_TISSUE MED6 (10001)        MED6      10001      -0.174
##  7 ACH-001270 127399_SOFT_TISSUE NR2E3 (10002)       NR2E3     10002      -0.140
##  8 ACH-001270 127399_SOFT_TISSUE NAALAD2 (10003)     NAALAD2   10003      NA    
##  9 ACH-001270 127399_SOFT_TISSUE DUXB (100033411)    DUXB      100033411  NA    
## 10 ACH-001270 127399_SOFT_TISSUE PDZK1P1 (100034743) PDZK1P1   100034743  NA    
## # … with 12,323,998 more rows, and abbreviated variable name ¹​dependency

The most recent rnai dataset can be automatically loaded into R by using the depmap_rnai function.

depmap::depmap_rnai()
## snapshotDate(): 2022-10-24
## see ?depmap and browseVignettes('depmap') for documentation
## loading from cache
## # A tibble: 12,324,008 × 6
##    gene                cell_line          dependency entrez_id gene_name depma…¹
##    <chr>               <chr>                   <dbl>     <int> <chr>     <chr>  
##  1 A1BG (1)            127399_SOFT_TISSUE     NA             1 A1BG      ACH-00…
##  2 NAT2 (10)           127399_SOFT_TISSUE     NA            10 NAT2      ACH-00…
##  3 ADA (100)           127399_SOFT_TISSUE     NA           100 ADA       ACH-00…
##  4 CDH2 (1000)         127399_SOFT_TISSUE     -0.195      1000 CDH2      ACH-00…
##  5 AKT3 (10000)        127399_SOFT_TISSUE     -0.256     10000 AKT3      ACH-00…
##  6 MED6 (10001)        127399_SOFT_TISSUE     -0.174     10001 MED6      ACH-00…
##  7 NR2E3 (10002)       127399_SOFT_TISSUE     -0.140     10002 NR2E3     ACH-00…
##  8 NAALAD2 (10003)     127399_SOFT_TISSUE     NA         10003 NAALAD2   ACH-00…
##  9 DUXB (100033411)    127399_SOFT_TISSUE     NA     100033411 DUXB      ACH-00…
## 10 PDZK1P1 (100034743) 127399_SOFT_TISSUE     NA     100034743 PDZK1P1   ACH-00…
## # … with 12,323,998 more rows, and abbreviated variable name ¹​depmap_id

3.2 CRISPR-Cas9 knockout data

## see ?depmap and browseVignettes('depmap') for documentation
## loading from cache

The crispr dataset contains the (batch corrected CERES inferred gene effect) CRISPR-Cas9 knockout data of select genes and cancer cell lines. This data corresponds to the gene_effect_corrected.csv file from the 22Q2 depmap release. Data from this dataset includes 17634 genes, 558 cell lines, 26 primary diseases, 28 lineages.

Specific crispr datasets can be accessed, such as crispr_19Q1 by EH number.

crispr <- eh[["EH2261"]]
crispr
## # A tibble: 9,839,772 × 6
##    depmap_id  cell_line                            gene  gene_…¹ entre…² depen…³
##    <chr>      <chr>                                <chr> <chr>   <chr>     <dbl>
##  1 ACH-000004 HEL_HAEMATOPOIETIC_AND_LYMPHOID_TIS… A1BG… A1BG    1        0.135 
##  2 ACH-000005 HEL9217_HAEMATOPOIETIC_AND_LYMPHOID… A1BG… A1BG    1       -0.212 
##  3 ACH-000007 LS513_LARGE_INTESTINE                A1BG… A1BG    1        0.0433
##  4 ACH-000009 C2BBE1_LARGE_INTESTINE               A1BG… A1BG    1        0.0705
##  5 ACH-000011 253J_URINARY_TRACT                   A1BG… A1BG    1        0.191 
##  6 ACH-000012 HCC827_LUNG                          A1BG… A1BG    1       -0.0104
##  7 ACH-000013 ONCODG1_OVARY                        A1BG… A1BG    1        0.0210
##  8 ACH-000014 HS294T_SKIN                          A1BG… A1BG    1        0.113 
##  9 ACH-000015 NCIH1581_LUNG                        A1BG… A1BG    1       -0.0742
## 10 ACH-000017 SKBR3_BREAST                         A1BG… A1BG    1        0.133 
## # … with 9,839,762 more rows, and abbreviated variable names ¹​gene_name,
## #   ²​entrez_id, ³​dependency

The most recent crispr dataset can be automatically loaded into R by using the depmap_crispr function.

depmap::depmap_crispr()
## snapshotDate(): 2022-10-24
## see ?depmap and browseVignettes('depmap') for documentation
## loading from cache
## # A tibble: 18,881,196 × 6
##    depmap_id  gene     dependency entrez_id gene_name cell_line                 
##    <chr>      <chr>         <dbl>     <int> <chr>     <chr>                     
##  1 ACH-000001 A1BG (1)    -0.135          1 A1BG      NIHOVCAR3_OVARY           
##  2 ACH-000004 A1BG (1)     0.0819         1 A1BG      HEL_HAEMATOPOIETIC_AND_LY…
##  3 ACH-000005 A1BG (1)    -0.0942         1 A1BG      HEL9217_HAEMATOPOIETIC_AN…
##  4 ACH-000007 A1BG (1)    -0.0115         1 A1BG      LS513_LARGE_INTESTINE     
##  5 ACH-000009 A1BG (1)    -0.0508         1 A1BG      C2BBE1_LARGE_INTESTINE    
##  6 ACH-000011 A1BG (1)     0.0918         1 A1BG      253J_URINARY_TRACT        
##  7 ACH-000012 A1BG (1)    -0.147          1 A1BG      HCC827_LUNG               
##  8 ACH-000013 A1BG (1)    -0.0592         1 A1BG      ONCODG1_OVARY             
##  9 ACH-000014 A1BG (1)    -0.0348         1 A1BG      HS294T_SKIN               
## 10 ACH-000015 A1BG (1)    -0.204          1 A1BG      NCIH1581_LUNG             
## # … with 18,881,186 more rows

3.3 WES copy number data

## see ?depmap and browseVignettes('depmap') for documentation
## loading from cache

The copyNumber dataset contains the WES copy number data, relating to the numerical log-fold copy number change measured against the baseline copy number of select genes and cell lines. This dataset corresponds to the public_19Q1_gene_cn.csv from the 22Q2 depmap release. This dataset includes 23299 genes, 1604 cell lines, 38 primary diseases and 33 lineages.

Specific copyNumber datasets can be accessed, such as copyNumber_19Q1 by EH number.

copyNumber <- eh[["EH2262"]]
copyNumber
## # A tibble: 37,371,596 × 6
##    depmap_id  cell_line                           gene  gene_…¹ entre…² log_co…³
##    <chr>      <chr>                               <chr> <chr>   <chr>      <dbl>
##  1 ACH-000011 253J_URINARY_TRACT                  A1BG… A1BG    1        0.131  
##  2 ACH-000026 253JBV_URINARY_TRACT                A1BG… A1BG    1       -0.237  
##  3 ACH-000086 ACCMESO1_PLEURA                     A1BG… A1BG    1        0.134  
##  4 ACH-000557 AML193_HAEMATOPOIETIC_AND_LYMPHOID… A1BG… A1BG    1       -0.0208 
##  5 ACH-000838 AMO1_HAEMATOPOIETIC_AND_LYMPHOID_T… A1BG… A1BG    1        0.170  
##  6 ACH-000080 BDCM_HAEMATOPOIETIC_AND_LYMPHOID_T… A1BG… A1BG    1        0.00703
##  7 ACH-000992 BICR18_UPPER_AERODIGESTIVE_TRACT    A1BG… A1BG    1       -0.376  
##  8 ACH-000228 BICR31_UPPER_AERODIGESTIVE_TRACT    A1BG… A1BG    1        1.16   
##  9 ACH-000771 BICR56_UPPER_AERODIGESTIVE_TRACT    A1BG… A1BG    1        0.0197 
## 10 ACH-000415 BICR6_UPPER_AERODIGESTIVE_TRACT     A1BG… A1BG    1        0.280  
## # … with 37,371,586 more rows, and abbreviated variable names ¹​gene_name,
## #   ²​entrez_id, ³​log_copy_number

The most recent copyNumber dataset can be automatically loaded into R by using the depmap_copyNumber function.

depmap::depmap_copyNumber()
## snapshotDate(): 2022-10-24
## see ?depmap and browseVignettes('depmap') for documentation
## loading from cache
## # A tibble: 44,799,888 × 6
##    depmap_id  gene            log_copy_number entrez_id gene_name cell_line     
##    <chr>      <chr>                     <dbl>     <int> <chr>     <chr>         
##  1 ACH-000267 DDX11L1 (84771)           1.15      84771 DDX11L1   HDLM2_HAEMATO…
##  2 ACH-001408 DDX11L1 (84771)           1.04      84771 DDX11L1   UMUC14_URINAR…
##  3 ACH-000617 DDX11L1 (84771)           0.762     84771 DDX11L1   OVCAR4_OVARY  
##  4 ACH-002123 DDX11L1 (84771)           1.14      84771 DDX11L1   H2369_PLEURA  
##  5 ACH-000519 DDX11L1 (84771)           1.01      84771 DDX11L1   PEER_HAEMATOP…
##  6 ACH-000750 DDX11L1 (84771)           0.711     84771 DDX11L1   LOXIMVI_SKIN  
##  7 ACH-000544 DDX11L1 (84771)           0.981     84771 DDX11L1   OE21_OESOPHAG…
##  8 ACH-001214 DDX11L1 (84771)           1.05      84771 DDX11L1   U138MG_CENTRA…
##  9 ACH-002223 DDX11L1 (84771)           0.630     84771 DDX11L1   D245MG_CENTRA…
## 10 ACH-000713 DDX11L1 (84771)           0.823     84771 DDX11L1   CAOV3_OVARY   
## # … with 44,799,878 more rows

3.4 CCLE Reverse Phase Protein Array data

## see ?depmap and browseVignettes('depmap') for documentation
## loading from cache

The RPPA dataset contains the CCLE Reverse Phase Protein Array (RPPA) data which corresponds to the CCLE_RPPA_20180123.csv file from the 22Q2 depmap release. This dataset includes 214 genes, 899 cell lines, 28 primary diseases, 28 lineages.

Specific RPPA datasets can be accessed, such as RPPA_19Q1 by EH number.

RPPA <- eh[["EH2263"]]
RPPA
## # A tibble: 192,386 × 4
##    depmap_id  cell_line                                 antibody    expression
##    <chr>      <chr>                                     <chr>            <dbl>
##  1 ACH-000698 DMS53_LUNG                                14-3-3_beta    -0.105 
##  2 ACH-000489 SW1116_LARGE_INTESTINE                    14-3-3_beta     0.359 
##  3 ACH-000431 NCIH1694_LUNG                             14-3-3_beta     0.0287
##  4 ACH-000707 P3HR1_HAEMATOPOIETIC_AND_LYMPHOID_TISSUE  14-3-3_beta     0.120 
##  5 ACH-000509 HUT78_HAEMATOPOIETIC_AND_LYMPHOID_TISSUE  14-3-3_beta    -0.269 
##  6 ACH-000522 UMUC3_URINARY_TRACT                       14-3-3_beta    -0.171 
##  7 ACH-000613 HOS_BONE                                  14-3-3_beta    -0.0253
##  8 ACH-000829 HUNS1_HAEMATOPOIETIC_AND_LYMPHOID_TISSUE  14-3-3_beta    -0.170 
##  9 ACH-000557 AML193_HAEMATOPOIETIC_AND_LYMPHOID_TISSUE 14-3-3_beta     0.0819
## 10 ACH-000614 RVH421_SKIN                               14-3-3_beta     0.222 
## # … with 192,376 more rows

The most recent RPPA dataset can be automatically loaded into R by using the depmap_RPPA function.

depmap::depmap_RPPA()
## snapshotDate(): 2022-10-24
## see ?depmap and browseVignettes('depmap') for documentation
## loading from cache
## # A tibble: 192,386 × 4
##    cell_line                                 antibody    expression depmap_id 
##    <chr>                                     <chr>            <dbl> <chr>     
##  1 DMS53_LUNG                                14-3-3_beta    -0.105  ACH-000698
##  2 SW1116_LARGE_INTESTINE                    14-3-3_beta     0.359  ACH-000489
##  3 NCIH1694_LUNG                             14-3-3_beta     0.0287 ACH-000431
##  4 P3HR1_HAEMATOPOIETIC_AND_LYMPHOID_TISSUE  14-3-3_beta     0.120  ACH-000707
##  5 HUT78_HAEMATOPOIETIC_AND_LYMPHOID_TISSUE  14-3-3_beta    -0.269  ACH-000509
##  6 UMUC3_URINARY_TRACT                       14-3-3_beta    -0.171  ACH-000522
##  7 HOS_BONE                                  14-3-3_beta    -0.0253 ACH-000613
##  8 HUNS1_HAEMATOPOIETIC_AND_LYMPHOID_TISSUE  14-3-3_beta    -0.170  ACH-000829
##  9 AML193_HAEMATOPOIETIC_AND_LYMPHOID_TISSUE 14-3-3_beta     0.0819 ACH-000557
## 10 RVH421_SKIN                               14-3-3_beta     0.222  ACH-000614
## # … with 192,376 more rows

3.5 CCLE RNAseq gene expression data

## see ?depmap and browseVignettes('depmap') for documentation
## loading from cache

The TPM dataset contains the CCLE RNAseq gene expression data. This shows expression data only for protein coding genes (using scale log2(TPM+1)). This data corresponds to the CCLE_depMap_19Q1_TPM.csv file from the 22Q2 depmap release. This dataset includes 55825 genes, 1165 cell lines, 33 primary Diseases, 32 lineages.

Specific TPM datasets can be accessed, such as TPM_19Q1 by EH number.

TPM <- eh[["EH2264"]]
TPM
## # A tibble: 67,360,300 × 6
##    depmap_id  cell_line                            gene  gene_…¹ ensem…² expre…³
##    <chr>      <chr>                                <chr> <chr>   <chr>     <dbl>
##  1 ACH-000956 22RV1_PROSTATE                       TSPA… TSPAN6  ENSG00…   2.65 
##  2 ACH-000948 2313287_STOMACH                      TSPA… TSPAN6  ENSG00…   3.00 
##  3 ACH-000026 253JBV_URINARY_TRACT                 TSPA… TSPAN6  ENSG00…   4.57 
##  4 ACH-000011 253J_URINARY_TRACT                   TSPA… TSPAN6  ENSG00…   4.58 
##  5 ACH-000323 42MGBA_CENTRAL_NERVOUS_SYSTEM        TSPA… TSPAN6  ENSG00…   4.59 
##  6 ACH-000905 5637_URINARY_TRACT                   TSPA… TSPAN6  ENSG00…   5.88 
##  7 ACH-000520 59M_OVARY                            TSPA… TSPAN6  ENSG00…   4.11 
##  8 ACH-000973 639V_URINARY_TRACT                   TSPA… TSPAN6  ENSG00…   5.05 
##  9 ACH-000896 647V_URINARY_TRACT                   TSPA… TSPAN6  ENSG00…   5.94 
## 10 ACH-000070 697_HAEMATOPOIETIC_AND_LYMPHOID_TIS… TSPA… TSPAN6  ENSG00…   0.151
## # … with 67,360,290 more rows, and abbreviated variable names ¹​gene_name,
## #   ²​ensembl_id, ³​expression

The TPM dataset can also be accessed by using the depmap_TPM function.

depmap::depmap_TPM()
## snapshotDate(): 2022-10-24
## see ?depmap and browseVignettes('depmap') for documentation
## loading from cache
## # A tibble: 27,024,726 × 6
##    depmap_id  gene          rna_expression entrez_id gene_name cell_line        
##    <chr>      <chr>                  <dbl>     <int> <chr>     <chr>            
##  1 ACH-001113 TSPAN6 (7105)         4.33        7105 TSPAN6    LC1SQSF_LUNG     
##  2 ACH-001289 TSPAN6 (7105)         4.57        7105 TSPAN6    COGAR359_SOFT_TI…
##  3 ACH-001339 TSPAN6 (7105)         3.15        7105 TSPAN6    COLO794_SKIN     
##  4 ACH-001538 TSPAN6 (7105)         5.09        7105 TSPAN6    KKU213_BILIARY_T…
##  5 ACH-000242 TSPAN6 (7105)         6.73        7105 TSPAN6    RT4_URINARY_TRACT
##  6 ACH-000708 TSPAN6 (7105)         4.27        7105 TSPAN6    SNU283_LARGE_INT…
##  7 ACH-000327 TSPAN6 (7105)         3.34        7105 TSPAN6    NCIH1395_LUNG    
##  8 ACH-000233 TSPAN6 (7105)         0.0566      7105 TSPAN6    DEL_HAEMATOPOIET…
##  9 ACH-000461 TSPAN6 (7105)         4.02        7105 TSPAN6    SNU1196_BILIARY_…
## 10 ACH-000705 TSPAN6 (7105)         4.41        7105 TSPAN6    LC1F_LUNG        
## # … with 27,024,716 more rows

3.6 Cancer cell lines

## see ?depmap and browseVignettes('depmap') for documentation
## loading from cache

The metadata dataset contains the metadata about all of the cancer cell lines. It corresponds to the depmap_19Q1_cell_lines.csv file found in the 22Q2 depmap release. This dataset includes 0 genes, 1676 cell lines, 38 primary diseases and 33 lineages.

Specific metadata datasets can be accessed, such as metadata_19Q1 by EH number.

metadata <- eh[["EH2266"]]
metadata
## # A tibble: 1,676 × 9
##    depmap_id  cell_line    aliases cosmi…¹ sange…² prima…³ subty…⁴ gender source
##    <chr>      <chr>        <chr>     <dbl>   <dbl> <chr>   <chr>   <chr>  <chr> 
##  1 ACH-000001 NIHOVCAR3_O… NIH:OV…  905933    2201 Ovaria… Adenoc… Female ATCC  
##  2 ACH-000002 HL60_HAEMAT… HL-60    905938      55 Leukem… Acute … Female ATCC  
##  3 ACH-000003 CACO2_LARGE… CACO2;…      NA      NA Colon/… Colon … -1     <NA>  
##  4 ACH-000004 HEL_HAEMATO… HEL      907053     783 Leukem… Acute … Male   DSMZ  
##  5 ACH-000005 HEL9217_HAE… HEL 92…      NA      NA Leukem… Acute … Male   ATCC  
##  6 ACH-000006 MONOMAC6_HA… MONO-M…  908148    2167 Leukem… Acute … Male   DSMZ  
##  7 ACH-000007 LS513_LARGE… LS513    907795     569 Colon/… Colon … Male   ATCC  
##  8 ACH-000009 C2BBE1_LARG… C2BBe1   910700    2104 Colon/… Colon … Male   ATCC  
##  9 ACH-000010 NCIH2077_LU… NCI-H2…      NA      NA Lung C… Non-Sm… <NA>   <NA>  
## 10 ACH-000011 253J_URINAR… 253J         NA      NA Bladde… Carcin… <NA>   KCLB  
## # … with 1,666 more rows, and abbreviated variable names ¹​cosmic_id,
## #   ²​sanger_id, ³​primary_disease, ⁴​subtype_disease

The most recent metadata dataset can be automatically loaded into R by using the depmap_metadata function.

depmap::depmap_metadata()
## snapshotDate(): 2022-10-24
## see ?depmap and browseVignettes('depmap') for documentation
## loading from cache
## # A tibble: 1,840 × 29
##    depmap_id  cell_…¹ strip…² cell_…³ aliases cosmi…⁴ sex   source RRID  WTSI_…⁵
##    <chr>      <chr>   <chr>   <chr>   <chr>     <dbl> <chr> <chr>  <chr>   <dbl>
##  1 ACH-000016 SLR 21  SLR21   SLR21_… <NA>         NA <NA>  Acade… CVCL…      NA
##  2 ACH-000032 MHH-CA… MHHCAL… MHHCAL… <NA>         NA Fema… DSMZ   CVCL…      NA
##  3 ACH-000033 NCI-H1… NCIH18… NCIH18… <NA>         NA Fema… Acade… CVCL…      NA
##  4 ACH-000043 Hs 895… HS895T  HS895T… <NA>         NA Fema… ATCC   CVCL…      NA
##  5 ACH-000049 HEK TE  HEKTE   HEKTE_… <NA>         NA <NA>  Acade… CVCL…      NA
##  6 ACH-000051 TE 617… TE617T  TE617T… <NA>         NA Fema… ATCC   CVCL…      NA
##  7 ACH-000064 SALE    SALE    SALE_L… <NA>         NA Male  Acade… CVCL…      NA
##  8 ACH-000068 REC-1   REC1    REC1_H… <NA>         NA Male  DSMZ   CVCL…      NA
##  9 ACH-000071 <NA>    HS706T  HS706T… <NA>         NA Fema… ATCC   CVCL…      NA
## 10 ACH-000076 NCO2    NCO2    NCO2_H… <NA>         NA Fema… HSRRB  CVCL…      NA
## # … with 1,830 more rows, 19 more variables: sample_collection_site <chr>,
## #   primary_or_metastasis <chr>, primary_disease <chr>, subtype_disease <chr>,
## #   age <chr>, sanger_id <chr>, additional_info <chr>, lineage <chr>,
## #   lineage_subtype <chr>, lineage_sub_subtype <chr>,
## #   lineage_molecular_subtype <chr>, default_growth_pattern <chr>,
## #   model_manipulation <chr>, model_manipulation_details <chr>,
## #   patient_id <chr>, parent_depmap_id <chr>, Cellosaurus_NCIt_disease <chr>, …

3.7 Mutation calls

## see ?depmap and browseVignettes('depmap') for documentation
## loading from cache

The mutationCalls dataset contains all merged mutation calls (coding region, germline filtered) found in the depmap dependency study. This dataset corresponds with the depmap_19Q1_mutation_calls.csv file found in the 22Q2 depmap release and includes 19350 genes, 1601 cell lines, 37 primary diseases and 33 lineages.

Specific mutationCalls datasets can be accessed, such as mutationCalls_19Q1 by EH number.

mutationCalls <- eh[["EH2265"]]
mutationCalls
## # A tibble: 1,243,145 × 35
##    depmap_id  gene_name entrez_id ncbi_…¹ chrom…² start…³ end_pos strand var_c…⁴
##    <chr>      <chr>         <dbl>   <dbl> <chr>     <dbl>   <dbl> <chr>  <chr>  
##  1 ACH-000001 VPS13D        55187      37 1        1.24e7  1.24e7 +      Nonsen…
##  2 ACH-000001 AADACL4      343066      37 1        1.27e7  1.27e7 +      In_Fra…
##  3 ACH-000001 IFNLR1       163702      37 1        2.45e7  2.45e7 +      Silent 
##  4 ACH-000001 TMEM57        55219      37 1        2.58e7  2.58e7 +      Frame_…
##  5 ACH-000001 ZSCAN20        7579      37 1        3.40e7  3.40e7 +      Missen…
##  6 ACH-000001 POU3F1         5453      37 1        3.85e7  3.85e7 +      Missen…
##  7 ACH-000001 MAST2         23139      37 1        4.65e7  4.65e7 +      Silent 
##  8 ACH-000001 GBP4         115361      37 1        8.97e7  8.97e7 +      Silent 
##  9 ACH-000001 VAV3          10451      37 1        1.08e8  1.08e8 +      Splice…
## 10 ACH-000001 NBPF20    100288142      37 1        1.48e8  1.48e8 +      Missen…
## # … with 1,243,135 more rows, 26 more variables: var_type <chr>,
## #   ref_allele <chr>, tumor_seq_allele1 <chr>, dbSNP_RS <chr>,
## #   dbSNP_val_status <chr>, genome_change <chr>, annotation_transcript <chr>,
## #   tumor_sample_barcode <chr>, cDNA_change <chr>, codon_change <chr>,
## #   protein_change <chr>, is_deleterious <lgl>, is_tcga_hotspot <lgl>,
## #   tcga_hsCnt <dbl>, is_cosmic_hotspot <lgl>, cosmic_hsCnt <dbl>,
## #   ExAC_AF <dbl>, VA_WES_AC <chr>, CGA_WES_AC <chr>, sanger_WES_AC <chr>, …

The most recent mutationCalls dataset can be automatically loaded into R by using the depmap_mutationCalls function.

depmap::depmap_mutationCalls()
## snapshotDate(): 2022-10-24
## see ?depmap and browseVignettes('depmap') for documentation
## loading from cache
## # A tibble: 1,235,466 × 32
##    depmap_id  gene_name entrez_id ncbi_…¹ chrom…² start…³ end_pos strand var_c…⁴
##    <chr>      <chr>         <dbl>   <dbl> <chr>     <dbl>   <dbl> <chr>  <chr>  
##  1 ACH-000001 VPS13D        55187      37 1        1.24e7  1.24e7 +      Nonsen…
##  2 ACH-000001 AADACL4      343066      37 1        1.27e7  1.27e7 +      In_Fra…
##  3 ACH-000001 IFNLR1       163702      37 1        2.45e7  2.45e7 +      Silent 
##  4 ACH-000001 TMEM57        55219      37 1        2.58e7  2.58e7 +      Frame_…
##  5 ACH-000001 ZSCAN20        7579      37 1        3.40e7  3.40e7 +      Missen…
##  6 ACH-000001 POU3F1         5453      37 1        3.85e7  3.85e7 +      Missen…
##  7 ACH-000001 MAST2         23139      37 1        4.65e7  4.65e7 +      Silent 
##  8 ACH-000001 GBP4         115361      37 1        8.97e7  8.97e7 +      Silent 
##  9 ACH-000001 VAV3          10451      37 1        1.08e8  1.08e8 +      Splice…
## 10 ACH-000001 NBPF20    100288142      37 1        1.48e8  1.48e8 +      Missen…
## # … with 1,235,456 more rows, 23 more variables: var_type <chr>,
## #   ref_allele <chr>, alt_allele <chr>, dbSNP_RS <chr>, dbSNP_val_status <chr>,
## #   genome_change <chr>, annotation_trans <chr>, cDNA_change <chr>,
## #   codon_change <chr>, protein_change <chr>, is_deleterious <lgl>,
## #   is_tcga_hotspot <lgl>, tcga_hsCnt <dbl>, is_cosmic_hotspot <lgl>,
## #   cosmic_hsCnt <dbl>, ExAC_AF <dbl>, var_annotation <chr>, CGA_WES_AC <chr>,
## #   HC_AC <chr>, RD_AC <chr>, RNAseq_AC <chr>, sanger_WES_AC <chr>, …

3.8 Drug Sensitivity

## see ?depmap and browseVignettes('depmap') for documentation
## loading from cache

The drug_sensitivity dataset contains dependency data for cancer cell lines treated with 4686 compounds. This dataset corresponds with the primary_replicate_collapsed_logfold_change.csv file found in the 22Q2 depmap release and includes 578 cell lines, 23 primary diseases and 25 lineages.

Specific drug_sensitivity datasets can be accessed, such as drug_sensitivity_19Q3 by EH number.

drug_sensitivity <- eh[["EH3087"]]
drug_sensitivity
## # A tibble: 2,708,508 × 4
##    depmap_id  cell_line             compound                         dependency
##    <chr>      <chr>                 <chr>                                 <dbl>
##  1 ACH-000001 NIHOVCAR3_OVARY       BRD-A00077618-236-07-6::2.5::HTS    -0.0156
##  2 ACH-000007 LS513_LARGE_INTESTINE BRD-A00077618-236-07-6::2.5::HTS    -0.0957
##  3 ACH-000008 A101D_SKIN            BRD-A00077618-236-07-6::2.5::HTS     0.379 
##  4 ACH-000010 NCIH2077_LUNG         BRD-A00077618-236-07-6::2.5::HTS     0.119 
##  5 ACH-000011 253J_URINARY_TRACT    BRD-A00077618-236-07-6::2.5::HTS     0.145 
##  6 ACH-000012 HCC827_LUNG           BRD-A00077618-236-07-6::2.5::HTS     0.103 
##  7 ACH-000013 ONCODG1_OVARY         BRD-A00077618-236-07-6::2.5::HTS     0.353 
##  8 ACH-000014 HS294T_SKIN           BRD-A00077618-236-07-6::2.5::HTS     0.128 
##  9 ACH-000015 NCIH1581_LUNG         BRD-A00077618-236-07-6::2.5::HTS     0.167 
## 10 ACH-000018 T24_URINARY_TRACT     BRD-A00077618-236-07-6::2.5::HTS     0.832 
## # … with 2,708,498 more rows

The most recent drug_sensitivity dataset can be automatically loaded into R by using the depmap_drug_sensitivity function.

depmap::depmap_drug_sensitivity()
## snapshotDate(): 2022-10-24
## see ?depmap and browseVignettes('depmap') for documentation
## loading from cache
## # A tibble: 2,708,508 × 14
##    depmap_id  cell_line compo…¹ depen…² broad…³ name   dose scree…⁴ moa   target
##    <chr>      <chr>     <chr>     <dbl> <chr>   <chr> <dbl> <chr>   <chr> <chr> 
##  1 ACH-000001 NIHOVCAR… BRD-A0… -0.0156 BRD-A0… 8-br…   2.5 HTS     PKA … PRKG1 
##  2 ACH-000007 LS513_LA… BRD-A0… -0.0957 BRD-A0… 8-br…   2.5 HTS     PKA … PRKG1 
##  3 ACH-000008 A101D_SK… BRD-A0…  0.379  BRD-A0… 8-br…   2.5 HTS     PKA … PRKG1 
##  4 ACH-000010 NCIH2077… BRD-A0…  0.119  BRD-A0… 8-br…   2.5 HTS     PKA … PRKG1 
##  5 ACH-000011 253J_URI… BRD-A0…  0.145  BRD-A0… 8-br…   2.5 HTS     PKA … PRKG1 
##  6 ACH-000012 HCC827_L… BRD-A0…  0.103  BRD-A0… 8-br…   2.5 HTS     PKA … PRKG1 
##  7 ACH-000013 ONCODG1_… BRD-A0…  0.353  BRD-A0… 8-br…   2.5 HTS     PKA … PRKG1 
##  8 ACH-000014 HS294T_S… BRD-A0…  0.128  BRD-A0… 8-br…   2.5 HTS     PKA … PRKG1 
##  9 ACH-000015 NCIH1581… BRD-A0…  0.167  BRD-A0… 8-br…   2.5 HTS     PKA … PRKG1 
## 10 ACH-000018 T24_URIN… BRD-A0…  0.832  BRD-A0… 8-br…   2.5 HTS     PKA … PRKG1 
## # … with 2,708,498 more rows, 4 more variables: disease_area <chr>,
## #   indication <chr>, smiles <chr>, phase <chr>, and abbreviated variable names
## #   ¹​compound, ²​dependency, ³​broad_id, ⁴​screen_id

3.9 Proteomic

## see ?depmap and browseVignettes('depmap') for documentation
## loading from cache

The proteomic dataset contains normalized quantitative profiling of proteins of cancer cell lines by mass spectrometry. This dataset corresponds with the https://gygi.med.harvard.edu/sites/gygi.med.harvard.edu/files/documents/protein_quant_current_normalized.csv.gz file found in the 22Q2 depmap release and includes 375 cell lines, 24 primary diseases and 27 lineages.

Specific proteomic datasets can be accessed, such as proteomic_20Q2 by EH number.

proteomic <- eh[["EH3459"]]
proteomic
## # A tibble: 4,821,390 × 12
##    depmap_id  gene_name entrez_id protein protei…¹ prote…² desc  group…³ uniprot
##    <chr>      <chr>         <dbl> <chr>      <dbl> <chr>   <chr>   <dbl> <chr>  
##  1 ACH-000849 SLC12A2        6558 MDAMB4…  2.11    sp|P55… S12A…       0 S12A2_…
##  2 ACH-000441 SLC12A2        6558 SH4_SK…  0.0705  sp|P55… S12A…       0 S12A2_…
##  3 ACH-000248 SLC12A2        6558 AU565_… -0.464   sp|P55… S12A…       0 S12A2_…
##  4 ACH-000684 SLC12A2        6558 KMRC1_… -0.884   sp|P55… S12A…       0 S12A2_…
##  5 ACH-000856 SLC12A2        6558 CAL51_…  0.789   sp|P55… S12A…       0 S12A2_…
##  6 ACH-000348 SLC12A2        6558 RPMI79… -0.912   sp|P55… S12A…       0 S12A2_…
##  7 ACH-000062 SLC12A2        6558 RERFLC…  0.729   sp|P55… S12A…       0 S12A2_…
##  8 ACH-000650 SLC12A2        6558 IGR37_… -0.658   sp|P55… S12A…       0 S12A2_…
##  9 ACH-000484 SLC12A2        6558 VMRCRC… -1.15    sp|P55… S12A…       0 S12A2_…
## 10 ACH-000625 SLC12A2        6558 HEP3B2…  0.00882 sp|P55… S12A…       0 S12A2_…
## # … with 4,821,380 more rows, 3 more variables: uniprot_acc <chr>, TenPx <chr>,
## #   cell_line <chr>, and abbreviated variable names ¹​protein_expression,
## #   ²​protein_id, ³​group_id

The most recent proteomic dataset can be automatically loaded into R by using the depmap_proteomic function.

depmap::depmap_proteomic()
## snapshotDate(): 2022-10-24
## see ?depmap and browseVignettes('depmap') for documentation
## loading from cache
## # A tibble: 4,821,390 × 12
##    depmap_id  gene_name entrez_id protein protei…¹ prote…² desc  group…³ uniprot
##    <chr>      <chr>         <dbl> <chr>      <dbl> <chr>   <chr>   <dbl> <chr>  
##  1 ACH-000849 SLC12A2        6558 MDAMB4…  2.11    sp|P55… S12A…       0 S12A2_…
##  2 ACH-000441 SLC12A2        6558 SH4_SK…  0.0705  sp|P55… S12A…       0 S12A2_…
##  3 ACH-000248 SLC12A2        6558 AU565_… -0.464   sp|P55… S12A…       0 S12A2_…
##  4 ACH-000684 SLC12A2        6558 KMRC1_… -0.884   sp|P55… S12A…       0 S12A2_…
##  5 ACH-000856 SLC12A2        6558 CAL51_…  0.789   sp|P55… S12A…       0 S12A2_…
##  6 ACH-000348 SLC12A2        6558 RPMI79… -0.912   sp|P55… S12A…       0 S12A2_…
##  7 ACH-000062 SLC12A2        6558 RERFLC…  0.729   sp|P55… S12A…       0 S12A2_…
##  8 ACH-000650 SLC12A2        6558 IGR37_… -0.658   sp|P55… S12A…       0 S12A2_…
##  9 ACH-000484 SLC12A2        6558 VMRCRC… -1.15    sp|P55… S12A…       0 S12A2_…
## 10 ACH-000625 SLC12A2        6558 HEP3B2…  0.00882 sp|P55… S12A…       0 S12A2_…
## # … with 4,821,380 more rows, 3 more variables: uniprot_acc <chr>, TenPx <chr>,
## #   cell_line <chr>, and abbreviated variable names ¹​protein_expression,
## #   ²​protein_id, ³​group_id

4 The Broad Institute data

If desired, the original data from which the depmap package were derived from can be downloaded from the Broad Institute website. The instructions on how to download these files and how the data was transformed and loaded into the depmap package can be found in the make_data.R file found in ./inst/scripts. (It should be noted that the original uncompressed .csv files are >1.5GB in total and take a moderate amount of time to download remotely.)

5 Session information

## R version 4.2.1 (2022-06-23)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.5 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.16-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.16-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] ExperimentHub_2.6.0 AnnotationHub_3.6.0 BiocFileCache_2.6.0
## [4] dbplyr_2.2.1        BiocGenerics_0.44.0 depmap_1.12.0      
## [7] dplyr_1.0.10        BiocStyle_2.26.0   
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.9                    png_0.1-7                    
##  [3] Biostrings_2.66.0             assertthat_0.2.1             
##  [5] digest_0.6.30                 utf8_1.2.2                   
##  [7] mime_0.12                     GenomeInfoDb_1.34.0          
##  [9] R6_2.5.1                      stats4_4.2.1                 
## [11] RSQLite_2.2.18                evaluate_0.17                
## [13] httr_1.4.4                    pillar_1.8.1                 
## [15] zlibbioc_1.44.0               rlang_1.0.6                  
## [17] curl_4.3.3                    jquerylib_0.1.4              
## [19] blob_1.2.3                    S4Vectors_0.36.0             
## [21] rmarkdown_2.17                stringr_1.4.1                
## [23] RCurl_1.98-1.9                bit_4.0.4                    
## [25] shiny_1.7.3                   compiler_4.2.1               
## [27] httpuv_1.6.6                  xfun_0.34                    
## [29] pkgconfig_2.0.3               htmltools_0.5.3              
## [31] tidyselect_1.2.0              KEGGREST_1.38.0              
## [33] GenomeInfoDbData_1.2.9        tibble_3.1.8                 
## [35] interactiveDisplayBase_1.36.0 bookdown_0.29                
## [37] IRanges_2.32.0                fansi_1.0.3                  
## [39] withr_2.5.0                   crayon_1.5.2                 
## [41] later_1.3.0                   bitops_1.0-7                 
## [43] rappdirs_0.3.3                jsonlite_1.8.3               
## [45] xtable_1.8-4                  lifecycle_1.0.3              
## [47] DBI_1.1.3                     magrittr_2.0.3               
## [49] cli_3.4.1                     stringi_1.7.8                
## [51] cachem_1.0.6                  XVector_0.38.0               
## [53] promises_1.2.0.1              bslib_0.4.0                  
## [55] ellipsis_0.3.2                filelock_1.0.2               
## [57] generics_0.1.3                vctrs_0.5.0                  
## [59] tools_4.2.1                   bit64_4.0.5                  
## [61] Biobase_2.58.0                glue_1.6.2                   
## [63] purrr_0.3.5                   BiocVersion_3.16.0           
## [65] fastmap_1.1.0                 yaml_2.3.6                   
## [67] AnnotationDbi_1.60.0          BiocManager_1.30.19          
## [69] memoise_2.0.1                 knitr_1.40                   
## [71] sass_0.4.2