CrossICC

Yu Sun

2019-10-29

Citation

If you use CrossICC in your published research, please cite this paper:

Introduction

Unsupervised clustering of high-throughput molecular profiling data is widely adopted for discovering cancer subtypes. However, cancer subtypes derived from a single dataset are not usually applicable across multiple datasets from different platforms. We previously published an iterative clustering algorithm to address the issue (see this paper), but its use was hampered due to lack of implementation.

In this project, we present CrossICC to implement this method. Moreover, many new features were added to improve the performance of the algorithm. Briefly, CrossICC utilizes an iterative strategy to derive the optimal gene set and cluster number from consensus similarity matrix generated by consensus clustering. CrossICC is able to deal with multiple cross platform datasets so that requires no between-dataset normalizations. This package also provides abundant functions to help users visualize the identified subtypes and evaluate the subtyping performance. Specially, many cancer-related analysis methods are embedded to facilitate the clinical translation of the identified cancer subtypes.

Installation

  1. Download & install the package.

To install via Bioconductor:

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("CrossICC")

The development version is also available to download from Github.

BiocManager::install("bioinformatist/CrossICC")
  1. Load the package into R session.
library(CrossICC)

Getting data into R

Most of the tools for clustering require users to combine all of dataset, while CrossICC only needs a list object in R. We also provide a function CrossICCInput for importing multiple files as a list.

files <- list.files(path = "",pattern = ".csv")
CrossICC.input <- CrossICCInput(files)

NOTE: CrossICCInput() internally call data.table::fread(), so you never need specify a separator.

Run CrossICC

CrossICC is easy enough for using by just calling the function with default parameters. You can run function predictor() to calculate the correlation between the predictor centroid and the validation centroid and you can also get GSEA-like ranked matrix from CrossICC result by running function of ssGSEA(). We also provide a graphical interface which can help users to check the result of CrossICC in a very intuitive way.

An example

To run CrossICC:

library(CrossICC)
data(demo.platforms)
# Turn on use.shiny parameter if you want to call shiny once the CrossICC finished
CrossICC.object <- CrossICC(demo.platforms, skip.mfs = TRUE, use.shiny = FALSE, overwrite = TRUE, output.dir = tempdir())
## Merging, filtering and scaling are skipped.
## No study names provided or something goes wrong with your study names. Will use auto-generated study names instead.
## Tue Oct 29 23:50:27 2019 -- start iteration: 1
## 352 genes were engaged in this iteration.
## Tue Oct 29 23:50:39 2019 -- start iteration: 2
## 349 genes were engaged in this iteration.
## Tue Oct 29 23:50:51 2019 -- start iteration: 3
## 346 genes were engaged in this iteration.
## Tue Oct 29 23:51:02 2019 -- start iteration: 4
## 345 genes were engaged in this iteration.
## Tue Oct 29 23:51:12 2019 -- start iteration: 5
## 345 genes were engaged in this iteration.
## A CrossICC.object.rds file will be generated in home directory by default.
##       Note that the previous file will be overridden.
## Tue Oct 29 23:51:23 2019 -- Iteration finished! Iteration time for reaching convergence/limit: 4

CrossICC will generate an .rds formatted object in your home path (~/, a.k.a $HOME in Linux), which records key features (genes), iteration times and other information during analysis in a compressed file format. You can call shiny app with this file later.

To compare samples according to their pathway information, we provide a way to get GSEA-like ranked eigenvalue matrix:

Mcluster <- paste("K", CrossICC.object$clusters$clusters[[1]], sep = "")
CrossICC.ssGSEA <- ssGSEA(x = demo.platforms[[1]], gene.signature = CrossICC.object$gene.signature, geneset2gene = CrossICC.object$unioned.genesets, cluster = Mcluster)

And you can use CrossICC’s result as model for clustering new samples, simply by calculate the correlation between the predictor centroid and the validation centroid:

predicted <- predictor(demo.platforms[[1]], CrossICC.object)
## Tue Oct 29 23:51:23 2019 -- Merging multiple probes for one feature 
## Tue Oct 29 23:51:23 2019 -- Removing features with no variance 
## Tue Oct 29 23:51:23 2019 -- Scaling

new.exprs is your expression matrix, and CrossICC’s result CrossICC.obj can be used as model. The process will take a few minutes.

Need helps?

If you have issues/questions, please visit CrossICC homepage(https://github.com/bioinformatist/CrossICC) first. If you think you have found a bug, please provide a reproducible example to be posted on github issue tracker.

SessionInfo()

sessionInfo()
## R version 3.6.1 (2019-07-05)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.3 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.10-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.10-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] CrossICC_1.0.0 MASS_7.3-51.4 
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.2                  cluster_2.1.0              
##  [3] knitr_1.25                  magrittr_1.5               
##  [5] ConsensusClusterPlus_1.50.0 BiocGenerics_0.32.0        
##  [7] tidyselect_0.2.5            lattice_0.20-38            
##  [9] R6_2.4.0                    rlang_0.4.1                
## [11] stringr_1.4.0               dplyr_0.8.3                
## [13] tools_3.6.1                 parallel_3.6.1             
## [15] grid_3.6.1                  Biobase_2.46.0             
## [17] data.table_1.12.6           xfun_0.10                  
## [19] htmltools_0.4.0             yaml_2.2.0                 
## [21] digest_0.6.22               assertthat_0.2.1           
## [23] tibble_2.1.3                crayon_1.3.4               
## [25] Matrix_1.2-17               purrr_0.3.3                
## [27] MergeMaid_2.58.0            glue_1.3.1                 
## [29] evaluate_0.14               rmarkdown_1.16             
## [31] limma_3.42.0                stringi_1.4.3              
## [33] pillar_1.4.2                compiler_3.6.1             
## [35] pkgconfig_2.0.3