Contents

library(TDbasedUFEadv)
library(Biobase)
#> Loading required package: BiocGenerics
#> 
#> Attaching package: 'BiocGenerics'
#> The following objects are masked from 'package:stats':
#> 
#>     IQR, mad, sd, var, xtabs
#> The following objects are masked from 'package:base':
#> 
#>     Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append,
#>     as.data.frame, basename, cbind, colnames, dirname, do.call,
#>     duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
#>     lapply, mapply, match, mget, order, paste, pmax, pmax.int, pmin,
#>     pmin.int, rank, rbind, rownames, sapply, setdiff, sort, table,
#>     tapply, union, unique, unsplit, which.max, which.min
#> Welcome to Bioconductor
#> 
#>     Vignettes contain introductory material; view with
#>     'browseVignettes()'. To cite Bioconductor, see
#>     'citation("Biobase")', and for packages 'citation("pkgname")'.
library(RTCGA.rnaseq)
library(TDbasedUFE)
library(MOFAdata)
library(TDbasedUFE)
library(RTCGA.clinical)

1 Installation

if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("TDbasedUFEadv")

2 Integrated analyses of two omics profiles

Here is a flowchart how we can make use of individual functions in TDbasedUFE and TDbasedUFEadv.

Relationship among functions in TDbasedUFE and TDbasedUFEadv

2.1 When features are shared.

In order to make use of TDbasedUFE for the drug repositioning, we previously proposed(Taguchi 2017a) the integrated analysis of two gene expression profiles, each of which is composed of gene expression of drug treated one and disease one. At first, we try to prepare two omics profiles, expDrug and expDisease, that represent gene expression profiles of cell lines treated by various drugs and a cell line of diseases by

Cancer_cell_lines <- list(ACC.rnaseq, BLCA.rnaseq, BRCA.rnaseq, CESC.rnaseq)
Drug_and_Disease <- prepareexpDrugandDisease(Cancer_cell_lines)
expDrug <- Drug_and_Disease$expDrug
expDisease <- Drug_and_Disease$expDisease
rm(Cancer_cell_lines)

expDrug is taken from RTCGA package and those associated with Drugs based upon (Ding, Zu, and Gu 2016). Those files are listed in drug_response.txt included in Clinical drug responses at http://lifeome.net/supp/drug_response/. expDisease is composed of files in BRCA.rnaseq, but not included in expDrug (For more details, see source code of prepareexpDrugandDisease). Then prepare a tensor as

Z <- prepareTensorfromMatrix(
  exprs(expDrug[seq_len(200), seq_len(100)]),
  exprs(expDisease[seq_len(200), seq_len(100)])
)
sample <- outer(
  colnames(expDrug)[seq_len(100)],
  colnames(expDisease)[seq_len(100)], function(x, y) {
    paste(x, y)
  }
)
Z <- PrepareSummarizedExperimentTensor(
  sample = sample, feature = rownames(expDrug)[seq_len(200)], value = Z
)

In the above, sample are pairs of file IDs taken from expDrug and expDisease. Since full data cannot be treated because of memory restriction, we restricted the first two hundred features and the first one hundred samples, respectively (In the below, we will introduce how to deal with the full data sets).

Then HOSVD is applied to a tensor as

HOSVD <- computeHosvd(Z)
#> 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=======================                                               |  33%
  |                                                                            
  |===============================================                       |  67%
  |                                                                            
  |======================================================================| 100%

Here we tries to find if Cisplatin causes distinct expression (0: cell lines treated with drugs other than Cisplatin, 1: cell lines treated with Cisplatin) and those between two classes (1 vs 2) of BRCA (in this case, there are no meaning of two classes) within top one hundred samples.

Cond <- prepareCondDrugandDisease(expDrug)
cond <- list(NULL, Cond[, colnames = "Cisplatin"][seq_len(100)], rep(1:2, each = 50))

Then try to select singular value vectors attributed to objects. When you try this vignettes, although you can do it in the interactive mode (see below), here we assume that you have already finished the selection.

input_all <- selectSingularValueVectorLarge(HOSVD,cond,input_all=c(2,9)) #Batch mode

In the case you prefer to select by yourself you can execute interactive mode.

input_all <- selectSingularValueVectorLarge(HOSVD,cond)

When you can see Next'',Prev’‘, and ``Select’’ radio buttons by which you can performs selection as well as histogram and standard deviation optimization by which you can verify the success of selection interactively.

Next we select which genes’ expression is altered by Cisplatin.

index <- selectFeature(HOSVD,input_all,de=0.05)

You might need to specify suitable value for de which is initial value of standard deviation.

Then we get the following plot.

Finally, list the genes selected as those associated with distinct expression.

head(tableFeatures(Z,index))
#>       Feature      p value adjusted p value
#> 4   ACADVL.37 2.233863e-24     4.467726e-22
#> 6     ACLY.47 1.448854e-19     1.448854e-17
#> 1       A2M.2 6.101507e-16     4.067671e-14
#> 3 ABHD2.11057 3.934360e-10     1.967180e-08
#> 2     AARS.16 1.449491e-06     5.797964e-05
#> 5 ACIN1.22985 6.510593e-06     2.170198e-04
rm(Z)
rm(HOSVD)
detach("package:RTCGA.rnaseq")
rm(SVD)
#> Warning in rm(SVD): object 'SVD' not found

The described methods were frequently used in the studies(Taguchi 2017b) (Taguchi 2018) (Taguchi and Turki 2020) by maintainers.

2.1.1 Reduction of required memory using partial summation.

In the case that there are large number of features, it is impossible to apply HOSVD to a full tensor (Then we have reduced the size of tensor). In this case, we apply SVD instead of HOSVD to matrix generated from a tensor as follows. In contrast to the above where only top two hundred features and top one hundred samples are included, the following one includes all features and all samples since it can save required memory because partial summation of features.

SVD <- computeSVD(exprs(expDrug), exprs(expDisease))
Z <- t(exprs(expDrug)) %*% exprs(expDisease)
sample <- outer(
  colnames(expDrug), colnames(expDisease),
  function(x, y) {
    paste(x, y)
  }
)
Z <- PrepareSummarizedExperimentTensor(
  sample = sample,
  feature = rownames(expDrug), value = Z
)

Nest select singular value vectors attributed to drugs and cell lines then identify features associated with altered expression by treatment of Cisplatin as well as distinction between two classes. Again, it included all samples for expDrug and expDisease.

cond <- list(NULL,Cond[,colnames="Cisplatin"],rep(1:2,each=dim(SVD$SVD$v)[1]/2))

Next we select singular value vectors and optimize standard deviation as batch mode

index_all <- selectFeatureRect(SVD,cond,de=c(0.01,0.01),
                               input_all=3) #batch mode