CTdata 1.2.0
CTdata
is the companion Package for CTexploreR
and provides omics
data to select and characterise cancer testis genes. Data come from
public databases and include expression and methylation values of
genes in normal and tumor samples as well as in tumor cell lines, and
expression in cells treated with a demethylating agent is also
available.
The data are served through the ExperimentHub
infrastructure, which
allows download them only once and cache them for further
use. Currently available data are summarised in the table below and
details in the next section.
library("CTdata")
DT::datatable(CTdata())
To install the package:
if (!require("BiocManager"))
install.packages("CTdata")
BiocManager::install("CTdata")
To install the package from GitHub:
if (!require("BiocManager"))
install.packages("BiocManager")
BiocManager::install("UCLouvain-CBIO/CTdata")
For details about each data, see their respective manual pages.
A SummarizedExperiment
object with gene expression data in normal
tissues from GTEx database:
library("SummarizedExperiment")
GTEX_data()
## see ?CTdata and browseVignettes('CTdata') for documentation
## loading from cache
## class: SummarizedExperiment
## dim: 24359 32
## metadata(0):
## assays(1): TPM
## rownames(24359): ENSG00000243485 ENSG00000237613 ... ENSG00000198695
## ENSG00000198727
## rowData names(3): external_gene_name GTEX_category max_TPM_somatic
## colnames(32): Testis Ovary ... Uterus Vagina
## colData names(0):
A SummarizedExperiment
object with gene expression data in cancer
cell lines from CCLE:
CCLE_data()
## see ?CTdata and browseVignettes('CTdata') for documentation
## loading from cache
## class: SummarizedExperiment
## dim: 24327 1229
## metadata(0):
## assays(1): TPM
## rownames(24327): ENSG00000000003 ENSG00000000005 ... ENSG00000284543
## ENSG00000284546
## rowData names(5): external_gene_name
## percent_of_positive_CCLE_cell_lines
## percent_of_negative_CCLE_cell_lines max_TPM_in_CCLE CCLE_category
## colnames(1229): LC1SQSF COLO794 ... ECC2 A673
## colData names(30): DepMap_ID cell_line_name ... Cellosaurus_issues type
A SummarizedExperiment
object with gene expression values in normal
tissues with or without allowing multimapping:
normal_tissues_multimapping_data()
## see ?CTdata and browseVignettes('CTdata') for documentation
## loading from cache
## class: SummarizedExperiment
## dim: 24359 18
## metadata(0):
## assays(2): TPM_no_multimapping TPM_with_multimapping
## rownames(24359): ENSG00000000003 ENSG00000000005 ... ENSG00000284543
## ENSG00000284546
## rowData names(3): external_gene_name lowly_expressed_in_GTEX
## multimapping_analysis
## colnames(18): adrenal_gland breast_epithelium ... transverse_colon
## upper_lobe_of_left_lung
## colData names(0):
A SummarizedExperiment
object containing genes differential
expression analysis (with RNAseq expression values) in cell lines
treated or not with a demethylating agent (5-Aza-2’-Deoxycytidine).
DAC_treated_cells()
## see ?CTdata and browseVignettes('CTdata') for documentation
## loading from cache
## class: SummarizedExperiment
## dim: 24359 32
## metadata(0):
## assays(1): log1p
## rownames(24359): ENSG00000243485 ENSG00000237613 ... ENSG00000198695
## ENSG00000198727
## rowData names(18): external_gene_name logFC_B2-1 ... padj_TS603 induced
## colnames(32): B2-1_CTL_rep1 B2-1_CTL_rep2 ... TS603_DAC_rep1
## TS603_DAC_rep2
## colData names(9): ref cell ... library lab
As above, with multimapping:
DAC_treated_cells_multimapping()
## see ?CTdata and browseVignettes('CTdata') for documentation
## loading from cache
## class: SummarizedExperiment
## dim: 24359 32
## metadata(0):
## assays(1): log1p
## rownames(24359): ENSG00000243485 ENSG00000237613 ... ENSG00000198695
## ENSG00000198727
## rowData names(18): external_gene_name logFC_B2-1 ... padj_TS603 induced
## colnames(32): B2-1_CTL_rep1 B2-1_CTL_rep2 ... TS603_DAC_rep1
## TS603_DAC_rep2
## colData names(9): ref cell ... library lab
A SummarizedExperiment
with gene expression data in TCGA samples
(tumor and peritumoral samples : SKCM, LUAD, LUSC, COAD, ESCA, BRCA
and HNSC):
TCGA_TPM()
## see ?CTdata and browseVignettes('CTdata') for documentation
## loading from cache
## class: SummarizedExperiment
## dim: 24350 4141
## metadata(0):
## assays(1): TPM
## rownames(24350): ENSG00000000003 ENSG00000000005 ... ENSG00000284543
## ENSG00000284546
## rowData names(19): external_gene_name percent_pos_SKCM ...
## max_TPM_in_TCGA TCGA_category
## colnames(4141): TCGA-EB-A5SF-01A-11R-A311-07
## TCGA-EE-A3J8-06A-11R-A20F-07 ... TCGA-CV-6935-11A-01R-1915-07
## TCGA-CV-7183-01A-11R-2016-07
## colData names(65): patient sample ... CD8_T_cells proliferation_score
A SingleCellExperiment
object containing gene expression from testis
single cell RNAseq experiment (The adult human testis transcriptional
cell atlas (Guo et al. 2018)):
library("SingleCellExperiment")
testis_sce()
## see ?CTdata and browseVignettes('CTdata') for documentation
## loading from cache
## class: SingleCellExperiment
## dim: 19777 6490
## metadata(0):
## assays(2): counts logcounts
## rownames(19777): FAM87B LINC00115 ... NCF4-AS1 LINC01689
## rowData names(4): external_gene_name percent_pos_testis_germcells
## percent_pos_testis_somatic testis_cell_type
## colnames(6490): Donor2-AAACCTGGTGCCTTGG-1 Donor2-AAACCTGTCAACGGGA-1 ...
## Donor1-TTTGTCAGTGTGCGTC-2 Donor1-TTTGTCATCCAAACTG-2
## colData names(6): nGene nUMI ... Donor sizeFactor
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
A SingleCellExperiment
object containing gene expression in different human
cell types based on scRNAseq data obtained from the Human Protein Atlas
(https://www.proteinatlas.org)/
scRNAseq_HPA()
## see ?CTdata and browseVignettes('CTdata') for documentation
## loading from cache
## class: SingleCellExperiment
## dim: 20082 66
## metadata(0):
## assays(1): TPM
## rownames(20082): ENSG00000000003 ENSG00000000005 ... ENSG00000288684
## ENSG00000288695
## rowData names(4): external_gene_name max_TPM_in_a_somatic_cell_type
## max_in_germcells_group Higher_in_somatic_cell_type
## colnames(66): Adipocytes T-cells ... Syncytiotrophoblasts Extravillous
## trophoblasts
## colData names(2): Cell_type group
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## see ?CTdata and browseVignettes('CTdata') for documentation
## loading from cache
With the datasets above, we generated a list of 298 CT genes (see figure below for details).
We used multimapping because many CT genes belong to gene families from which members have identical or nearly identical sequences. This is likely the reason why these genes are not detected in GTEx database, as GTEx processing pipeline specifies that overlapping intervals between genes are excluded from all genes for counting. Some CT genes can thus only be detected in RNAseq data in which multimapping reads are not discarded.
A RangedSummarizedExperiment
containing methylation of CpGs located
within CT promoters in normal tissues:
CT_methylation_in_tissues()
## see ?CTdata and browseVignettes('CTdata') for documentation
## loading from cache
## class: RangedSummarizedExperiment
## dim: 51725 14
## metadata(0):
## assays(1): ''
## rownames: NULL
## rowData names(0):
## colnames(14): adipose colon ... thyroid sperm
## colData names(0):
A SummarizedExperiment
with Cancer-Testis genes’ promoters mean
methylation in normal tissues:
CT_mean_methylation_in_tissues()
## see ?CTdata and browseVignettes('CTdata') for documentation
## loading from cache
## class: SummarizedExperiment
## dim: 298 14
## metadata(0):
## assays(1): ''
## rownames(298): TTLL10 TAS1R1 ... RBMY1F RBMY1J
## rowData names(7): ensembl_gene_id CpG_density ... somatic_methylation
## germline_methylation
## colnames(14): adipose colon ... thyroid sperm
## colData names(0):
A SummarizedExperiment
with gene expression data in TCGA samples
(tumor and peritumoral samples : SKCM, LUAD, LUSC, COAD, ESCA, BRCA
and HNSC):
TCGA_CT_methylation()
## see ?CTdata and browseVignettes('CTdata') for documentation
## loading from cache
## class: RangedSummarizedExperiment
## dim: 666 3423
## metadata(0):
## assays(1): methylation
## rownames(666): cg26578072 cg27631599 ... cg16626452 cg22051787
## rowData names(52): address_A address_B ... MASK_extBase MASK_general
## colnames(3423): TCGA-ER-A42L-06A-11D-A24V-05
## TCGA-WE-A8K1-06A-21D-A373-05 ... TCGA-BB-A6UO-01A-12D-A34K-05
## TCGA-IQ-A61O-01A-11D-A30F-05
## colData names(3): samples sample project_id
A matrix
with gene expression correlations in CCLE cancer cell lines:
dim(CCLE_correlation_matrix())
## see ?CTdata and browseVignettes('CTdata') for documentation
## loading from cache
## [1] 298 24327
CCLE_correlation_matrix()[1:10, 1:5]
## see ?CTdata and browseVignettes('CTdata') for documentation
## loading from cache
## ENSG00000000003 ENSG00000000005 ENSG00000000419 ENSG00000000457
## ENSG00000162571 0.103434061 0.001049710 -0.042549895 0.14462034
## ENSG00000173662 -0.071928957 -0.023465128 -0.072986999 0.08297871
## ENSG00000157330 -0.009096577 -0.012775872 -0.013617443 0.01686471
## ENSG00000234593 -0.042559061 -0.022602098 -0.001087715 -0.03065048
## ENSG00000117148 -0.027897133 0.031152498 0.012210680 0.08032635
## ENSG00000131914 0.098018050 0.062352652 0.046643780 0.05024423
## ENSG00000142698 -0.009432254 -0.014082458 0.007391337 0.02454294
## ENSG00000143006 0.018450551 -0.005250837 -0.052121771 0.01223201
## ENSG00000237853 -0.138887798 -0.012717974 -0.037329853 0.05698454
## ENSG00000226088 0.116214009 0.016494255 0.050339175 -0.02698655
## ENSG00000000460
## ENSG00000162571 0.051584992
## ENSG00000173662 0.054997238
## ENSG00000157330 -0.041359090
## ENSG00000234593 -0.014316867
## ENSG00000117148 0.081165649
## ENSG00000131914 0.032435630
## ENSG00000142698 0.008474658
## ENSG00000143006 -0.006112485
## ENSG00000237853 0.005744308
## ENSG00000226088 0.029441251
A tibble
with Cancer-Testis (CT) genes and their characteristics:
CT_genes()
## see ?CTdata and browseVignettes('CTdata') for documentation
## loading from cache
## # A tibble: 298 × 36
## ensembl_gene_id external_gene_name family chr strand transcription_start_…¹
## <chr> <chr> <chr> <chr> <int> <int>
## 1 ENSG00000162571 TTLL10 <NA> 1 1 1173880
## 2 ENSG00000173662 TAS1R1 <NA> 1 1 6555307
## 3 ENSG00000157330 CFAP107 <NA> 1 1 12746200
## 4 ENSG00000234593 KAZN-AS1 <NA> 1 -1 14419973
## 5 ENSG00000117148 ACTL8 <NA> 1 1 17755333
## 6 ENSG00000131914 LIN28A <NA> 1 1 26410817
## 7 ENSG00000142698 C1orf94 <NA> 1 1 34176907
## 8 ENSG00000143006 DMRTB1 <NA> 1 1 53459399
## 9 ENSG00000237853 NFIA-AS1 <NA> 1 -1 61253510
## 10 ENSG00000226088 HHLA3-AS1 <NA> 1 -1 70360437
## # ℹ 288 more rows
## # ℹ abbreviated name: ¹transcription_start_site
## # ℹ 30 more variables: X_linked <lgl>, TPM_testis <dbl>, max_TPM_somatic <dbl>,
## # GTEX_category <chr>, lowly_expressed_in_GTEX <lgl>,
## # multimapping_analysis <chr>, testis_specificity <chr>,
## # testis_cell_type <chr>, Higher_in_somatic_cell_type <lgl>,
## # percent_of_positive_CCLE_cell_lines <dbl>, …
## R version 4.3.1 (2023-06-16)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.3 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.18-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] SingleCellExperiment_1.24.0 SummarizedExperiment_1.32.0
## [3] Biobase_2.62.0 GenomicRanges_1.54.0
## [5] GenomeInfoDb_1.38.0 IRanges_2.36.0
## [7] S4Vectors_0.40.0 BiocGenerics_0.48.0
## [9] MatrixGenerics_1.14.0 matrixStats_1.0.0
## [11] CTdata_1.2.0 BiocStyle_2.30.0
##
## loaded via a namespace (and not attached):
## [1] tidyselect_1.2.0 dplyr_1.1.3
## [3] blob_1.2.4 filelock_1.0.2
## [5] Biostrings_2.70.0 bitops_1.0-7
## [7] fastmap_1.1.1 RCurl_1.98-1.12
## [9] BiocFileCache_2.10.0 promises_1.2.1
## [11] digest_0.6.33 mime_0.12
## [13] lifecycle_1.0.3 ellipsis_0.3.2
## [15] KEGGREST_1.42.0 interactiveDisplayBase_1.40.0
## [17] RSQLite_2.3.1 magrittr_2.0.3
## [19] compiler_4.3.1 rlang_1.1.1
## [21] sass_0.4.7 tools_4.3.1
## [23] utf8_1.2.4 yaml_2.3.7
## [25] knitr_1.44 S4Arrays_1.2.0
## [27] htmlwidgets_1.6.2 bit_4.0.5
## [29] curl_5.1.0 DelayedArray_0.28.0
## [31] abind_1.4-5 withr_2.5.1
## [33] purrr_1.0.2 grid_4.3.1
## [35] fansi_1.0.5 ExperimentHub_2.10.0
## [37] xtable_1.8-4 cli_3.6.1
## [39] rmarkdown_2.25 crayon_1.5.2
## [41] generics_0.1.3 httr_1.4.7
## [43] DBI_1.1.3 cachem_1.0.8
## [45] zlibbioc_1.48.0 AnnotationDbi_1.64.0
## [47] BiocManager_1.30.22 XVector_0.42.0
## [49] vctrs_0.6.4 Matrix_1.6-1.1
## [51] jsonlite_1.8.7 bookdown_0.36
## [53] bit64_4.0.5 crosstalk_1.2.0
## [55] jquerylib_0.1.4 glue_1.6.2
## [57] DT_0.30 BiocVersion_3.18.0
## [59] later_1.3.1 tibble_3.2.1
## [61] pillar_1.9.0 rappdirs_0.3.3
## [63] htmltools_0.5.6.1 GenomeInfoDbData_1.2.11
## [65] R6_2.5.1 dbplyr_2.3.4
## [67] lattice_0.22-5 evaluate_0.22
## [69] shiny_1.7.5.1 AnnotationHub_3.10.0
## [71] png_0.1-8 memoise_2.0.1
## [73] httpuv_1.6.12 bslib_0.5.1
## [75] Rcpp_1.0.11 SparseArray_1.2.0
## [77] xfun_0.40 pkgconfig_2.0.3