Contents

This document describes the usage of the functions integrated into the package and is meant to be a reference document for the end-user.

1 Introduction

Independent component analysis (ICA) of omics data can be used for deconvolution of biological signals and technical biases, correction of batch effects, feature engineering for classification and integration of the data [4]. The consICA package allows building robust consensus ICA-based deconvolution of the data as well as running post-hoc biological interpretation of the results. The implementation of parallel computing in the package ensures efficient analysis using modern multicore systems.

2 Installing and loading the package

Load the package with the following command:

if (!requireNamespace("BiocManager", quietly = TRUE)) { 
    install.packages("BiocManager")
}
BiocManager::install("consICA")
library('consICA')
#> 
#> 

3 Example dataset

The usage of the package functions for an independent component deconvolution is shown for an example set of 472 samples and 2454 most variable genes from the skin cutaneous melanoma (SKCM) TCGA cohort. It stored as a SummarizedExperiment object. In addition the data includes metadata such as age, gender, cancer subtype etc. and survival time-to-event data. Use data as

library(SummarizedExperiment, verbose = FALSE)
#> Loading required package: MatrixGenerics
#> Loading required package: matrixStats
#> 
#> Attaching package: 'MatrixGenerics'
#> The following objects are masked from 'package:matrixStats':
#> 
#>     colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
#>     colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
#>     colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
#>     colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
#>     colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
#>     colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
#>     colWeightedMeans, colWeightedMedians, colWeightedSds,
#>     colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
#>     rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
#>     rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
#>     rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
#>     rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
#>     rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
#>     rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
#>     rowWeightedSds, rowWeightedVars
#> Loading required package: GenomicRanges
#> Loading required package: stats4
#> Loading required package: BiocGenerics
#> 
#> Attaching package: 'BiocGenerics'
#> The following objects are masked from 'package:stats':
#> 
#>     IQR, mad, sd, var, xtabs
#> The following objects are masked from 'package:base':
#> 
#>     Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append,
#>     as.data.frame, basename, cbind, colnames, dirname, do.call,
#>     duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
#>     lapply, mapply, match, mget, order, paste, pmax, pmax.int, pmin,
#>     pmin.int, rank, rbind, rownames, sapply, setdiff, sort, table,
#>     tapply, union, unique, unsplit, which.max, which.min
#> Loading required package: S4Vectors
#> 
#> Attaching package: 'S4Vectors'
#> The following objects are masked from 'package:base':
#> 
#>     I, expand.grid, unname
#> Loading required package: IRanges
#> Loading required package: GenomeInfoDb
#> Loading required package: Biobase
#> Welcome to Bioconductor
#> 
#>     Vignettes contain introductory material; view with
#>     'browseVignettes()'. To cite Bioconductor, see
#>     'citation("Biobase")', and for packages 'citation("pkgname")'.
#> 
#> Attaching package: 'Biobase'
#> The following object is masked from 'package:MatrixGenerics':
#> 
#>     rowMedians
#> The following objects are masked from 'package:matrixStats':
#> 
#>     anyMissing, rowMedians
data("samples_data")

samples_data
#> class: SummarizedExperiment 
#> dim: 2454 472 
#> metadata(0):
#> assays(1): X
#> rownames(2454): A2ML1 AACSP1 ... ZNF883 ZP1
#> rowData names(0):
#> colnames(472): 3N.A9WB.Metastatic 3N.A9WC.Metastatic ...
#>   Z2.AA3S.Metastatic Z2.AA3V.Metastatic
#> colData names(103): age_at_initial_pathologic_diagnosis gender ... time
#>   event
# According to our terminology expression matrix X of 2454 genes and 472 samples
X <- data.frame(assay(samples_data))
dim(X)
#> [1] 2454  472
# variables described each sample
Var <- data.frame(colData(samples_data))
dim(Var)
#> [1] 472 103
colnames(Var)[1:20] # print first 20
#>  [1] "age_at_initial_pathologic_diagnosis" "gender"                             
#>  [3] "race"                                "Weight"                             
#>  [5] "Height"                              "BMI"                                
#>  [7] "Ethinicity"                          "Cancer.Type.Detailed"               
#>  [9] "Tumor.location.site"                 "ajcc_pathologic_tumor_stage"        
#> [11] "Cancer.stage.M"                      "Cancer.stage.N"                     
#> [13] "Cancer.stage.T"                      "initial_pathologic_dx_year"         
#> [15] "birth_days_to_initial_diagnosis"     "Year.of.Birth"                      
#> [17] "age.at.last.contact"                 "Year.of.last.contact"               
#> [19] "last_contact_days_to"                "age.at.death"
# survival time and event for each sample
Sur <- data.frame(colData(samples_data))[,c("time", "event")]
Var <- Var[,-which(colnames(Var) != "time" & colnames(Var) != "event")]

4 Consensus independent component analysis

Independent component analysis (ICA) is an unsupervised method of signal deconvolution [3]. ICA decomposes combined gene expression matrix from multiple samples into meaningful signals in the space of genes (metagenes, S) and weights in the space of samples (weight matrix, M).