consensusOV
is a package for molecular subtyping for ovarian cancer. It is intended for whole-transcriptome gene expression datasets from patients with high-grade serous ovarian carcinoma. This package includes implementations of four previously published subtype classifiers [@konecny2014prognostic; @verhaak2013prognostically; @bentink2012angiogenic; @helland2011deregulation] and a consensus random forest classifier.
The get.subtypes()
function is a wrapper for the other package subtyping functions get.consensus.subtypes()
, get.konecny.subtypes()
, get.verhaak.subtypes()
,
get.bentink.subtypes()
, get.helland.subtypes()
. It can take as input either a matrix of gene expression values and a vector of Entrez IDs; or a BioBase::ExpressionSet following the format of MetaGxOvarian [@gendoo2016metagxdata]. If expression.dataset
is a matrix, it should be formatted with genes as rows and patients as columns; and entrez.ids
should be a vector with length the same as nrow(expression.dataset)
. The method
argument specifies which of the five subtype classifiers to use.
library(consensusOV)
library(Biobase)
library(genefu)
The package contains a subset of the ovarian cancer microarray dataset GSE14764
as example data.
data(GSE14764.eset)
dim(GSE14764.eset)
## Features Samples
## 1175 41
GSE14764.expression.matrix <- exprs(GSE14764.eset)
GSE14764.expression.matrix[1:5,1:5]
## GSE14764_GSM368661 GSE14764_GSM368662 GSE14764_GSM368664
## geneid.10397 10.856712 10.445412 11.976560
## geneid.65108 10.856441 10.312760 12.499419
## geneid.8655 11.518799 11.897707 11.782895
## geneid.22919 8.608944 8.756986 9.170513
## geneid.3925 7.658680 6.698586 7.159795
## GSE14764_GSM368665 GSE14764_GSM368668
## geneid.10397 11.651318 10.907453
## geneid.65108 11.377340 11.088542
## geneid.8655 11.799197 11.958500
## geneid.22919 8.627511 8.849757
## geneid.3925 7.466107 6.566558
GSE14764.entrez.ids <- fData(GSE14764.eset)$EntrezGene.ID
head(GSE14764.entrez.ids)
## [1] "10397" "65108" "8655" "22919" "3925" "1718"
bentink.subtypes <- get.subtypes(GSE14764.eset, method = "Bentink")
bentink.subtypes$Bentink.subtypes
## [1] Angiogenic nonAngiogenic nonAngiogenic Angiogenic Angiogenic
## [6] nonAngiogenic Angiogenic nonAngiogenic nonAngiogenic Angiogenic
## [11] nonAngiogenic nonAngiogenic Angiogenic nonAngiogenic nonAngiogenic
## [16] nonAngiogenic nonAngiogenic Angiogenic nonAngiogenic nonAngiogenic
## [21] Angiogenic Angiogenic Angiogenic nonAngiogenic nonAngiogenic
## [26] Angiogenic nonAngiogenic nonAngiogenic nonAngiogenic nonAngiogenic
## [31] nonAngiogenic Angiogenic nonAngiogenic nonAngiogenic nonAngiogenic
## [36] nonAngiogenic nonAngiogenic Angiogenic nonAngiogenic nonAngiogenic
## [41] nonAngiogenic
## Levels: Angiogenic nonAngiogenic
konecny.subtypes <- get.subtypes(GSE14764.eset, method = "Konecny")
konecny.subtypes$Konecny.subtypes
## [1] C3_profL C1_immL C2_diffL C4_mescL C1_immL C1_immL C4_mescL
## [8] C3_profL C3_profL C1_immL C2_diffL C2_diffL C4_mescL C2_diffL
## [15] C3_profL C2_diffL C1_immL C4_mescL C1_immL C2_diffL C4_mescL
## [22] C4_mescL C4_mescL C1_immL C3_profL C3_profL C2_diffL C2_diffL
## [29] C3_profL C2_diffL C3_profL C1_immL C1_immL C2_diffL C1_immL
## [36] C2_diffL C3_profL C3_profL C2_diffL C3_profL C2_diffL
## Levels: C1_immL C2_diffL C3_profL C4_mescL
helland.subtypes <- get.subtypes(GSE14764.eset, method = "Helland")
helland.subtypes$Helland.subtypes
## [1] C1 C2 C5 C1 C1 C2 C4 C1 C5 C2 C4 C4 C1 C5 C4 C4 C2 C1 C4 C5 C1 C1 C1
## [24] C2 C5 C1 C5 C2 C5 C4 C5 C2 C2 C4 C2 C4 C1 C1 C4 C1 C4
## Levels: C2 C4 C5 C1
verhaak.subtypes <- get.subtypes(GSE14764.eset, method = "Verhaak")
verhaak.subtypes$Verhaak.subtypes
## [1] MES IMR DIF DIF IMR IMR DIF MES PRO IMR DIF DIF MES DIF DIF DIF DIF
## [18] MES DIF PRO MES DIF DIF DIF MES MES DIF IMR MES DIF PRO IMR IMR DIF
## [35] IMR DIF PRO MES DIF MES DIF
## Levels: IMR DIF PRO MES
consensus.subtypes <- get.subtypes(GSE14764.eset, method = "consensusOV")
## [1] "Loading training data"
## [1] "Training Random Forest..."
consensus.subtypes$consensusOV.subtypes
## [1] MES_consensus IMR_consensus PRO_consensus DIF_consensus IMR_consensus
## [6] IMR_consensus DIF_consensus MES_consensus PRO_consensus IMR_consensus
## [11] DIF_consensus DIF_consensus MES_consensus PRO_consensus DIF_consensus
## [16] DIF_consensus IMR_consensus MES_consensus DIF_consensus PRO_consensus
## [21] MES_consensus DIF_consensus MES_consensus IMR_consensus PRO_consensus
## [26] PRO_consensus DIF_consensus IMR_consensus PRO_consensus PRO_consensus
## [31] PRO_consensus IMR_consensus IMR_consensus DIF_consensus IMR_consensus
## [36] IMR_consensus PRO_consensus PRO_consensus PRO_consensus MES_consensus
## [41] DIF_consensus
## Levels: IMR_consensus DIF_consensus PRO_consensus MES_consensus
## Alternatively, e.g.
bentink.subtypes <- get.subtypes(GSE14764.expression.matrix, GSE14764.entrez.ids, method = "Bentink")
Each subtyping function outputs a list with two values. The first value is a factor of subtype labels. The second is an classifier-specific values. For the Konecny, Helland, Verhaak, and Consensus classifiers, this object is a dataframe with subtype specific scores. For the Bentink classifier, the object is the output of the genefu
function call.
Subtype classifiers can alternatively be called using inner functions.
bentink.subtypes <- get.bentink.subtypes(GSE14764.expression.matrix, GSE14764.entrez.ids)
The Konecny, Helland, Verhaak, and Consensus classifiers produce real-valued subtype scores. We can use these in various ways - for example, here, we compute correlations between correspinding subtype scores.
We can compare the subtype scores between the Verhaak and Helland classifiers:
vest <- verhaak.subtypes$gsva.out
vest <- vest[,c("IMR", "DIF", "PRO", "MES")]
hest <- helland.subtypes$subtype.scores
hest <- hest[, c("C2", "C4", "C5", "C1")]
dat <- data.frame(
as.vector(vest),
rep(colnames(vest), each=nrow(vest)),
as.vector(hest),
rep(colnames(hest), each=nrow(hest)))
colnames(dat) <- c("Verhaak", "vsc", "Helland", "hsc")
## plot
library(ggplot2)
ggplot(dat, aes(Verhaak, Helland)) + geom_point() + facet_wrap(vsc~hsc, nrow = 2, ncol = 2)
Corresponding correlation values are 0.77, 0.79, 0.37, and 0.88.
Likewise, we can compare the subtype scores between the Konecny and Helland classifier:
kost <- konecny.subtypes$spearman.cc.vals
hest <- helland.subtypes$subtype.scores
hest <- hest[, c("C2", "C4", "C5", "C1")]
dat <- data.frame(
as.vector(kost),
rep(colnames(kost), each=nrow(kost)),
as.vector(hest),
rep(colnames(hest), each=nrow(hest)))
colnames(dat) <- c("Konecny", "ksc", "Helland", "hsc")
## plot
ggplot(dat, aes(Konecny, Helland)) + geom_point() + facet_wrap(ksc~hsc, nrow = 2, ncol = 2)
Corresponding correlation values are 0.95, 0.84, 0.7, and 0.95.