This package contains an updated parallel implementation of UniBic biclustering algorithm for gene expression data [Wang2016]. The algorithm locates trend-preserving biclusters within complex and noisy data and is considered one of the most accurate among biclustering methods.

Since their first application to gene expression data [Cheng2000] biclustering algorithms have gained much popularity [Eren2012]. The reason for this is the ability of the methods to simultaneously detect valid patterns within the data that include only subset of rows and columns.

In this package we provide an efficient parallel improved implementation of UniBic biclustering algorithm. This state-of-the-art algorithm is said to outperform multiple biclustering methods on both synthetic and genetic datasets. Our major contributions are reimplementing the method into more modern C++11 language as well as multiple improvements in code (memory management, code refactoring etc.)

This package provides the following main functions: * `BCUnibic`

/`runibic`

- parallel UniBic for continuous data * `BCUnibicD`

- parallel UniBic for discrete data

The package provides some additional functions: * `pairwiseLCS`

- calculates Longest Common Subsequence (LCS) between two vectors * `calculateLCS`

- calculates LCSes between all pairs of the input dataset (Fibonacci heap or standard sort) * `backtrackLCS`

- recovers LCS from the dynamic programming matrix * `cluster`

- main part of UniBic algorithm (biclusters seeding and expanding) * `unisort`

- returns matrix of indexes based on the increasing order in each row * `discretize`

- performs discretization using Fibonacci heap (sorting method used originally in UniBic)

The package may be installed as follows:

This example shows the basic usage of runibic on synthetic data. We start with loading the libraries.

First we prepare a random matrix.

Then, we run UniBic biclustering algorithm on the dataset. We use a `Biclust`

package wrapper.

We can inspect the results returned by the method. Let’s see how many biclusters were detected.

We could also inspect the rows of a specific bicluster, for example the third one.

Similarly we could check the indexes of columns of the third bicluster.

Visual analyses are very useful in biclustering. Let’s draw the heatmap with the obtained bicluster using `drawHeatmap`

function from `biclust`

package.

We could also present parallel coordinates plot of the bicluster with function `parallelCoordinates`

from `biclust`

package.

Similarly the package could be used to find trends in discrete data. We run `runiDiscretize`

function.

Finally, we run the UniBic algorithm dedicated to work with discrete datasets.

This example presents how to use runibic package on a sample dataset. After loading all required libraries

we apply UniBic biclustering method to the sample yeast dataset from `biclust`

package.

The method has found 92 biclusters. Now we are going to analyze the results. We will start with drawing a heatmap of the first found bicluster using `drawHeatmap`

from `biclust`

package.

Then we may check the result drawing the genes from the bicluster using parallel coordinates plot with `parallelCoordinates`

function from `biclust`

package.

The third application of runibic is using the algorithm on the dataset from an RNA-Seq experiment (`SummarizedExperiment`

). We start with loading necessary packages (`runibic`

, `SummarizedExperiment`

) and load the airway dataset.

We will take only a subset of the data to check if the method can return any patterns.

Let’s see if UniBic detects any pattern in the dataset.

The result could be visualized using `parallelCoordinates`

method from `biclust`

package.

We can also draw a heatmap with the second bicluster.

Another very useful application of the runibic package is analyzing real dataset obtained from Gene Expression Omnibus (GEO). For this task we will use three packages: `getGEO`

from `GEOquery`

to download the data, `GDS2eSet`

from `affy`

to convert the dataset to `ExpressionSet`

and finally `exprs`

to extract gene expression matrix.

We download dataset with Rat peripheral and brain regions from Gene Expression Omnibus.

Now we convert dataset to ExpressionSet and take subset of the dataset.

We perform analysis on first 100 of genes.

Finally, we draw the heatmap with the first bicluster using `drawHeatmap`

function from `biclust`

package.

We load package with QUBIC biclustering algorithm for comparison

Now, we perform biclustering using CC, Bimax, Qubic, Plaid and Unibic:

```
resCC <- biclust::biclust(BicatYeast, method = BCCC())
resBi <- biclust::biclust(BicatYeast, method = BCBimax())
resQub <- biclust::biclust(BicatYeast, method = BCQU())
resPlaid <- biclust::biclust(BicatYeast, method = BCPlaid())
resUni <- biclust::biclust(BicatYeast, method = BCUnibic())
```

The results of clustering could be compared using `showinfo`

from `QUBIC`

package.

The Longest Common Subsequence (LCS) between two vectors is the longest series of the data that is present in both analyzed vectors. Let’s prepare two vectors for the analysis.

You may notice that the values (1,2,4) are contained in both vectors: in vector A: (1,2,x,x,4,x) and in vector B: (x,1,x,2,x,4)

Let’s use the methods provided by the package to calculate the longest common subsequence. We first check which values are common for both vectors using `backtrackLCS`

.

Then we calculate using dynamic programming the matrix for Longest Common Subsequence with `pairwiselLCS`

In our next example we will find the length of the longest common subsequence within the matrix We start with preparing input matrix.

```
A <- matrix(c(11, 17, 12, 10, 8, 9, 19, 15, 18, 13, 14, 7, 4, 6, 16, 2, 3, 1, 5, 20,
17, 1, 8, 15, 5, 10, 2, 12, 9, 7, 3, 14, 11, 4, 6, 16, 20, 13, 19, 18,
15, 8, 17, 12, 18, 14, 19, 11, 16, 20, 10, 13, 6, 3, 7, 9, 1, 2, 5, 4,
15, 12, 16, 9, 19, 17, 10, 18, 11, 20, 8, 13, 2, 5, 7, 14, 1, 3, 4, 6,
15, 10, 9, 6, 13, 19, 7, 18, 16, 17, 14, 4, 3, 1, 2, 20, 12, 5, 11, 8,
1, 7, 4, 3, 2, 6, 8, 13, 5, 9, 12, 11, 16, 15, 17, 10, 19, 20, 14, 18,
10, 5, 3, 9, 2, 11, 6, 13, 8, 1, 7, 4, 16, 14, 15, 12, 18, 17, 20, 19,
10, 5, 1, 12, 8, 11, 7, 13, 6, 4, 3, 2, 18, 14, 15, 9, 17, 16, 20, 19,
9, 6, 3, 10, 1, 12, 7, 13, 8, 2, 5, 4, 16, 14, 15, 11, 19, 17, 20, 18,
12, 8, 1, 3, 2, 11, 4, 14, 9, 7, 10, 5, 16, 13, 15, 6, 18, 17, 20, 19), nrow = 10, byrow = TRUE)
```

We sort data in each row with the function `unisort`

provided by the `runibic`

package.

Now we calculate the length of LCS between each pair of rows using `calculateLCS`

either using Fibonacci Heap or standard sort.

You may notice that depending on the method chosen the results may differ.

We can check the length of the longest common subsequence (e.g. LCS between rows 6 and 7 is equal to 10).

First we create a random matrix

Discretization from UniBic (using original sorting method using Fibonacci Heap) could be applied using `runiDiscretize`

function.

In order to take full advantage of the modularity offered by runibic, we apply UniBic algorithm step by step using standard stable sort. The last example shows how to use cluster function from the package. After creating a matrix we sort values in each row and calculate LCS between all pairs of rows.

```
A <- matrix(c(4, 3, 1, 2, 5, 8, 6, 7, 9, 10, 11, 12), nrow = 4, byrow = TRUE)
iA <- unisort(A)
lcsResults <- calculateLCS(A, useFibHeap=FALSE)
```

Finally, we apply `cluster`

function to inspect the results.

For the original sequential biclustering algorithm please use the following citation:

**Zhenjia Wang, Guojun Li, Robert W. Robinson, Xiuzhen Huang *UniBic: Sequential row-based biclustering algorithm for analysis of gene expression data* Scientific Reports 6, 2016; 23466, doi: https://doi:10.1038/srep23466**

If you use in your work this package with parallel version of UniBic please use the following citation:

**Patryk Orzechowski, Artur Pańszczyk, Xiuzhen Huang, Jason H Moore; runibic: a Bioconductor package for parallel row-based biclustering of gene expression data Bioinformatics, , bty512, https://doi.org/10.1093/bioinformatics/bty512**

BibTex entry:

```
@article{orzechowski2018runibic,
author = {Orzechowski, Patryk and Pańszczyk, Artur and Huang, Xiuzhen and Moore, Jason H},
title = {runibic: a Bioconductor package for parallel row-based biclustering of gene expression data},
journal = {Bioinformatics},
volume = {},
number = {},
pages = {bty512},
year = {2018},
doi = {10.1093/bioinformatics/bty512},
URL = {http://dx.doi.org/10.1093/bioinformatics/bty512},
eprint = {/oup/backfile/content_public/journal/bioinformatics/pap/10.1093_bioinformatics_bty512/4/bty512.pdf}
}
```

- [Cheng2000] Cheng, Yizong, and George M. Church. “Biclustering of expression data.” Ismb. Vol. 8. No. 2000. 2000.
- [Eren2012] Eren, Kemal, et al. “A comparative analysis of biclustering algorithms for gene expression data.” Briefings in bioinformatics 14.3, 2012: 279-292.
- [Wang2016] Wang, Zhenjia, et al. “UniBic: Sequential row-based biclustering algorithm for analysis of gene expression data.” Scientific reports 6 (2016): 23466.