1 Introduction

Methods to find similarities have been developed for several purposes, being Jaccard and Dice similarities the most known. In bioinformatics much of the research on the topic is centered around Gene Ontologies because they provide controlled vocabularies, as part of their mission:

The mission of the GO Consortium is to develop an up-to-date, comprehensive, computational model of biological systems, from the molecular level to larger pathways, cellular and organism-level systems.

However, there is another resource of similarities between genes: metabolic pathways. Metabolic pathways describe the relationship between genes, proteins, lipids and other elements of the cells. A pathway describes, to some extent, the function in which it is involved in the cell. There exists several databases about which gene belong to which pathway. Together with pathways, gene sets related to a function or to a phenotype are a source of information of the genes function. With this package we provide the methods to calculate functional similarities based on this information.

Here we provides functions to calculate functional similarities for pathways, gene sets, genes and clusters of genes.

As it development started aiming to improve clustering of genes by functionality in co-expression networks using WGCNA it also has some functions to combine similarities.

2 Citation

The main article describing the software and its usefulness is currently under writing.

3 Installation

The BioCor package is available at Bioconductor and can be downloaded and installed via biocLite:

source("http://bioconductor.org/biocLite.R")
biocLite("BioCor")

You can install the latest version of BioCor from Github with:

library("devtools")
install_github("llrs/BioCor")

4 Using BioCor

4.1 Preparation

We can load the package and prepare the data for which we want to calculate the similarities:

library("BioCor")
## 
## If you use BioCor in published research, please cite:
## Load libraries with the data of the pathways
library("org.Hs.eg.db")
## Loading required package: AnnotationDbi
## Loading required package: stats4
## Loading required package: BiocGenerics
## Loading required package: parallel
## 
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:parallel':
## 
##     clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
##     clusterExport, clusterMap, parApply, parCapply, parLapply,
##     parLapplyLB, parRapply, parSapply, parSapplyLB
## The following objects are masked from 'package:stats':
## 
##     IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
## 
##     Filter, Find, Map, Position, Reduce, anyDuplicated, append,
##     as.data.frame, cbind, colMeans, colSums, colnames, do.call,
##     duplicated, eval, evalq, get, grep, grepl, intersect,
##     is.unsorted, lapply, lengths, mapply, match, mget, order, paste,
##     pmax, pmax.int, pmin, pmin.int, rank, rbind, rowMeans, rowSums,
##     rownames, sapply, setdiff, sort, table, tapply, union, unique,
##     unsplit, which, which.max, which.min
## Loading required package: Biobase
## Welcome to Bioconductor
## 
##     Vignettes contain introductory material; view with
##     'browseVignettes()'. To cite Bioconductor, see
##     'citation("Biobase")', and for packages 'citation("pkgname")'.
## Loading required package: IRanges
## Loading required package: S4Vectors
## 
## Attaching package: 'S4Vectors'
## The following object is masked from 'package:base':
## 
##     expand.grid
library("reactome.db")
genesKegg <- as.list(org.Hs.egPATH)
genesReact <- as.list(reactomeEXTID2PATHID)

To avoid having biased data it is important to have all the data about the pathways and genes associated to all pathways for organism under study. Here we assume that we are interested in human pathways. We use this two databases KEGG and Reactome as they are easy to obtain the data. However KEGG database is no longer free for large retrievals therefore it is not longer updated in the Bioconductor annotation packages.

However, one can use any list where the names of the list are the genes and the elements of the list the pathways or groups where the gene belong. One could also read from a GMT file or use GeneSetCollections in addition or instead of those associations from a pathway database and convert it to list using:

library("GSEABase")
paths2Genes <- geneIds(getGmt("/path/to/c3.all.v2.5.symbols.gmt",
                 collectionType=BroadCollection(category="c3"),
                 geneIdType=SymbolIdentifier()))

genes <- unlist(paths2Genes, use.names = FALSE)
pathways <- rep(names(paths2Genes), lengths(paths2Genes))
genes2paths <- split(pathways, genes)

gmt_sim <- mgeneSim(names(genes2paths), genes2paths)

4.2 Pathway similarities

We can compute similarities (Dice similarity, see question 1 of FAQ) between two pathways or between several pathways and combine them, or not:

(paths <- sample(unique(unlist(genesReact)), 2))
## [1] "R-HSA-1834949" "R-CEL-1834949"
pathSim(paths[1], paths[2], genesReact)
## [1] 0

(pathways <- sample(unique(unlist(genesReact)), 10))
##  [1] "R-SSC-2465910" "R-HSA-5658034" "R-SSC-349425"  "R-CEL-196299" 
##  [5] "R-HSA-69613"   "R-XTR-76005"   "R-RNO-425410"  "R-HSA-5689896"
##  [9] "R-DME-390471"  "R-XTR-2173789"
mpathSim(pathways, genesReact)
##               R-SSC-2465910 R-HSA-5658034 R-SSC-349425 R-CEL-196299
## R-SSC-2465910             1             0            0            0
## R-HSA-5658034             0             1            0            0
## R-SSC-349425              0             0            1            0
## R-CEL-196299              0             0            0            1
## R-HSA-69613               0             0            0            0
## R-XTR-76005               0             0            0            0
## R-RNO-425410              0             0            0            0
## R-HSA-5689896             0             0            0            0
## R-DME-390471              0             0            0            0
## R-XTR-2173789             0             0            0            0
##               R-HSA-69613 R-XTR-76005 R-RNO-425410 R-HSA-5689896
## R-SSC-2465910  0.00000000           0            0    0.00000000
## R-HSA-5658034  0.00000000           0            0    0.00000000
## R-SSC-349425   0.00000000           0            0    0.00000000
## R-CEL-196299   0.00000000           0            0    0.00000000
## R-HSA-69613    0.98245614           0            0    0.08510638
## R-XTR-76005    0.00000000           1            0    0.00000000
## R-RNO-425410   0.00000000           0            1    0.00000000
## R-HSA-5689896  0.08510638           0            0    1.00000000
## R-DME-390471   0.00000000           0            0    0.00000000
## R-XTR-2173789  0.00000000           0            0    0.00000000
##               R-DME-390471 R-XTR-2173789
## R-SSC-2465910            0             0
## R-HSA-5658034            0             0
## R-SSC-349425             0             0
## R-CEL-196299             0             0
## R-HSA-69613              0             0
## R-XTR-76005              0             0
## R-RNO-425410             0             0
## R-HSA-5689896            0             0
## R-DME-390471             1             0
## R-XTR-2173789            0             1

When the method to combine the similarities is set to NULL mpathSim returns a matrix of pathway similarities, otherwise it combines the values. In the next section we can see the methods to combine pathway similarities.

4.2.1 Combining values

To combine values we provide a function with several methods:

sim <- mpathSim(pathways, genesReact)
methodsCombineScores <- c("avg", "max", "rcmax", "rcmax.avg", "BMA")
sapply(methodsCombineScores, combineScores, scores = sim)
##       avg       max     rcmax rcmax.avg       BMA 
## 0.1015267 1.0000000 0.1085106 0.9982456 0.9982456

We can also specify the method to combine the similarities in mpathSim, geneSim, mgeneSim, clusterSim, mclusterSim, clusterGeneSim and mclusterGeneSim, argument method. By default the method is set to “max” to combine pathways (except in mpathSim where the default is to show all the pathway similarities) and “BMA” to combine similarities of genes or for cluster analysis. This function is adapted from GOSemSim package.

4.3 Gene similarities

To compare the function of two genes there is the geneSim function and mgeneSim function for several comparisons. In this example we compare the genes BRCA1 and BRCA2 and NAT2, which are the genes 672, 675 and 10 respectively in ENTREZID:

geneSim("672", "675", genesKegg)
## [1] 0.0824295
geneSim("672", "675", genesReact)
## [1] 1

mgeneSim(c("BRCA1" = "672", "BRCA2" = "675", "NAT2" = "10"), genesKegg)
##           BRCA1       BRCA2        NAT2
## BRCA1 1.0000000 0.082429501 0.000000000
## BRCA2 0.0824295 1.000000000 0.008241758
## NAT2  0.0000000 0.008241758 1.000000000
mgeneSim(c("BRCA1" = "672", "BRCA2" = "675", "NAT2" = "10"), genesReact)
##           BRCA1      BRCA2       NAT2
## BRCA1 1.0000000 1.00000000 0.14579350
## BRCA2 1.0000000 1.00000000 0.07011901
## NAT2  0.1457935 0.07011901 1.00000000

Note that for the same genes each database or list provided has different annotations, which result on different similarity scores. In this example BRCA1 has 3 and 13 pathways in KEGG and Reactome respectively and BRCA2 has 1 and 36 pathways in KEGG and Reactome respectively which results on different scores.

4.4 Gene cluster similarities

There are two methods:

Combining all the pathways for each cluster and compare between them.
Calculate the similarity between genes of a cluster and the other cluster.

4.4.1 By pathways similarities

As explained, in this method all the pathways of a cluster are compared with all the pathways of the other cluster. If a method to combine pathways similarities is not provided, all pathway similarities are returned:

clusterSim(c("672", "675"), c("100", "10", "1"), genesKegg)
## [1] 0.04210526
clusterSim(c("672", "675"), c("100", "10", "1"), genesKegg, NULL)
##            00230       01100       05340 00232 00983
## 04120 0.00000000 0.000000000 0.011764706     0     0
## 03440 0.04210526 0.006908463 0.000000000     0     0
## 05200 0.00000000 0.008241758 0.005540166     0     0
## 05212 0.00000000 0.001666667 0.019047619     0     0

clusters <- list(cluster1 = c("672", "675"),
                 cluster2 = c("100", "10", "1"),
                 cluster3 = c("18", "10", "83"))
mclusterSim(clusters, genesKegg, "rcmax.avg")
##             cluster1   cluster2    cluster3
## cluster1 1.000000000 0.01672822 0.002088221
## cluster2 0.016728221 1.00000000 0.497068355
## cluster3 0.002088221 0.49706835 1.000000000
mclusterSim(clusters, genesKegg, "max")
##             cluster1   cluster2    cluster3
## cluster1 1.000000000 0.04210526 0.008241758
## cluster2 0.042105263 1.00000000 1.000000000
## cluster3 0.008241758 1.00000000 1.000000000

4.4.2 By genes similarities

First the similarities between each gene is calculated, then the similarity between each group of genes is calculated.

This method of comparing clusters requires two methods to combine values, the first one to combine pathways similarities and the second one to combine genes similarities. If only one is provided it returns the matrix of similarities of the genes of each cluster:

clusterGeneSim(c("672", "675"), c("100", "10", "1"), genesKegg)
## [1] 0.02605425
clusterGeneSim(c("672", "675"), c("100", "10", "1"), genesKegg, "max")
##            100          10  1
## 672 0.01176471 0.000000000 NA
## 675 0.04210526 0.008241758 NA

mclusterGeneSim(clusters, genesKegg, c("max", "rcmax.avg"))
##             cluster1   cluster2    cluster3
## cluster1 1.000000000 0.02605425 0.006181319
## cluster2 0.026054248 1.00000000 1.000000000
## cluster3 0.006181319 1.00000000 1.000000000
mclusterGeneSim(clusters, genesKegg, c("max", "max"))
##             cluster1   cluster2    cluster3
## cluster1 1.000000000 0.04210526 0.008241758
## cluster2 0.042105263 1.00000000 1.000000000
## cluster3 0.008241758 1.00000000 1.000000000

Note the differences between mclusterGeneSim and mclusterSim in the similarity values of the clusters except if we set method = c("max", "max"), then the similarity between the clusters is the same as clusterSim.

4.5 Merging similarities

If one calculates similarities with KEGG data and Reactome or other input for the same genes or clusters we provide a couple of functions to merge them.

We can set a weight to each similarity input with weighted.sum, multiply them also using a weight for each similarity (with weighted.prod), doing the mean or just adding them up.

kegg <- mgeneSim(c("672", "675", "10"), genesKegg)
react <- mgeneSim(c("672", "675", "10"), genesReact)
## We can sum it adding a weight to each origin
weighted.sum(c(kegg["672", "675"], react["672","675"]), w = c(0.3, 0.7))
## [1] 0.7247289

## Or if we want to perform for all the matrix
## A list of matrices to merge
sim <- list("kegg" = kegg, "react" = react)
similarities(sim, weighted.sum, w = c(0.3, 0.7))
##           672        675         10
## 672 1.0000000 0.72472885 0.10205545
## 675 0.7247289 1.00000000 0.05155583
## 10  0.1020554 0.05155583 1.00000000
similarities(sim, weighted.prod, w = c(0.3, 0.7))
##           672          675           10
## 672 0.2100000 0.0173101952 0.0000000000
## 675 0.0173102 0.2100000000 0.0001213598
## 10  0.0000000 0.0001213598 0.2100000000
similarities(sim, prod)
##           672          675           10
## 672 1.0000000 0.0824295011 0.0000000000
## 675 0.0824295 1.0000000000 0.0005779039
## 10  0.0000000 0.0005779039 1.0000000000
similarities(sim, mean)
##            672        675         10
## 672 1.00000000 0.54121475 0.07289675
## 675 0.54121475 1.00000000 0.03918038
## 10  0.07289675 0.03918038 1.00000000

This functions are similar to weighted.mean, except that first the multiplication by the weights is done and then the NAs are removed.

4.6 Converting similarities

If needed, Jaccard similarity can be calculated from Dice similarity using D2J:

lapply(sim, D2J)
## $kegg
##            672         675          10
## 672 1.00000000 0.042986425 0.000000000
## 675 0.04298643 1.000000000 0.004137931
## 10  0.00000000 0.004137931 1.000000000
## 
## $react
##            672        675         10
## 672 1.00000000 1.00000000 0.07862851
## 675 1.00000000 1.00000000 0.03633333
## 10  0.07862851 0.03633333 1.00000000

Also if one has a Jaccard similarity and wants a Dice similarity, can use the J2D function.

5 High volumes of gene similarities

We can compute the whole similarity of genes in KEGG or Reactome by using :

## Omit those genes without a pathway
nas <- sapply(genesKegg, function(y){all(is.na(y)) | is.null(y)})
genesKegg2 <- genesKegg[!nas]
m <- mgeneSim(names(genesKegg2), genesKegg2, method  = NULL)

It takes around 5 hours in one core but it requires high memory available.

If one doesn’t have such a memory available can compute the similarities by pieces, and then fit it in another matrix with:

sim <- AintoB(m, B)

Usually B is a matrix of size length(genes), see ?AintoB.

6 An example of usage

In this example I show how to use BioCor to analyse a list of genes by functionality.

With a list of genes we are going to see how similar are those genes:

genes.id <- c("10", "15", "16", "18", "2", "9", "52", "3855", "3880", "644", 
              "81327", "9128", "2073", "2893", "5142", "210", "81", 
              "1352", "672", "675")
genes.id <- mapIds(org.Hs.eg.db, keys = genes.id, keytype = "ENTREZID", column = "SYMBOL")
## 'select()' returned 1:1 mapping between keys and columns
genes <- names(genes.id)
names(genes) <- genes.id
sum(lengths(genesReact[genes])) # Check the total number of pathways involved
## [1] 141
react <- mgeneSim(genes, genesReact)
## Warning in mgeneSim(genes, genesReact): Some genes are not in the list you
## provided.
## We remove genes which are not in list (hence the warning):
nan <- genes %in% names(genesReact)
react <- react[nan, nan]
hc <- hclust(as.dist(1 - react))
plot(hc)

Figure 1: Gene clustering by similarities

Now we can see the relationship between the genes. We can group them for a cluster analysis to visualize the relationship between the clusters:

mycl <- cutree(hc, h = 0.2)
clusters <- split(genes[nan], as.factor(mycl))
names(clusters) <- paste0("cluster", seq_len(max(mycl)))
## Remember we can use two methods to compare clusters
sim_clus1 <- mclusterSim(clusters, genesReact)
plot(hclust(as.dist(1 - sim_clus1)))

Figure 2: Clustering using clusterSim

sim_clus2 <- mclusterGeneSim(clusters, genesReact)
plot(hclust(as.dist(1 - sim_clus2)))

Figure 3: Clustering using clusterGeneSim

Each method results in a different dendrogram as we can see on Figure 2 compared to Figure 3.

7 Comparing with GOSemSim

In this section I will compare the functional similarity of BioCor with the closely related package GOSemSim. The genes to measure are extracted from the vignette of GOSemSim 2.0.4.

I will compare the functions geneSim and clusterSim with both datasets from Kegg and Reactome:

geneSim("241", "251", genesKegg, "BMA")
## [1] NA
geneSim("241", "251", genesReact, "BMA")
## [1] 0.04763403
mgeneSim(c("835", "5261","241", "994"), genesKegg, "BMA", round = TRUE)
##      835  5261 241   994
## 835   NA    NA  NA    NA
## 5261  NA 1.000  NA 0.153
## 241   NA    NA  NA    NA
## 994   NA 0.153  NA 1.000
mgeneSim(c("835", "5261","241", "994"), genesReact, "BMA", round = TRUE)
##        835  5261   241   994
## 835  0.923 0.063 0.047 0.135
## 5261 0.063 0.862 0.277 0.067
## 241  0.047 0.277 0.970 0.045
## 994  0.135 0.067 0.045 0.859

We can observe that for the first genes there isn’t information on the pathways of those genes. And in the list of genes we can observe that the values using Reactome information are closer to those from GOSemSim than using KEGG. However, the values of similarity are different from GOSemSim, because the gene 241 has higher similarity with the gene 5261 while in GOSemSim is higher with 845.

If named characters are passed they are used to name the resulting matrix:

genes <- c("CDC45", "MCM10", "CDC20", "NMU", "MMP1")
genese <- mapIds(org.Hs.eg.db, keys = genes, column = "ENTREZID", 
                 keytype = "SYMBOL")
## 'select()' returned 1:1 mapping between keys and columns
mgeneSim(genese, genesReact, "BMA")
##            CDC45     MCM10     CDC20        NMU       MMP1
## CDC45 0.91338505 0.8521854 0.4694304 0.09668248 0.08423069
## MCM10 0.85218535 0.9515952 0.4992280 0.10679222 0.08717160
## CDC20 0.46943036 0.4992280 0.9602670 0.18901308 0.19818130
## NMU   0.09668248 0.1067922 0.1890131 0.96409744 0.13268601
## MMP1  0.08423069 0.0871716 0.1981813 0.13268601 0.86255549
mgeneSim(genese, genesKegg, "BMA")
##            CDC45 MCM10      CDC20 NMU       MMP1
## CDC45 1.00000000    NA 0.62142530  NA 0.08897631
## MCM10         NA    NA         NA  NA         NA
## CDC20 0.62142530    NA 1.00000000  NA 0.08185331
## NMU           NA    NA         NA  NA         NA
## MMP1  0.08897631    NA 0.08185331  NA 1.00000000

In this example the CD45 is closer with MCM10 than any other gene as in GOSemSim, but CDC20 is closer tto MCM10 than to NMU as calculated with GOSemSim.

gs1 <- c("835", "5261","241", "994", "514", "533")
gs2 <- c("578","582", "400", "409", "411")
clusterSim(gs1, gs2, genesReact, "BMA")
## [1] 0.4532398
clusterSim(gs1, gs2, genesKegg, "BMA")
## [1] 0.351473

The similarity between those two clusters is lower than the one calculated with GOSemSim, maybe due to not having data for some genes in the list provided from Biocpkg("reactome.db"). This serves us a reminder of how important is the input data for the results.

8 WGCNA and BioCor

WGCNA uses the correlation of the expression data of several samples to cluster genes. Sometimes, from a biological point of view the interpretation of the resulting modules is difficult, even more when some groups of genes end up not having an enrichment in previously described functions. BioCor was originally thought to be used to overcome this problem: to help clustering genes, not only by correlation but also by functionality.

In order to have groups functionally related, functional similarities can enhance the clustering of genes when combined (See this section) with experimental correlations. The resulting groups will reflect, not only the correlation of the expression provided, but also the functionality known of those genes.

We propose the following steps:

Calculate the similarities for the expression data
Calculate the similarities of the genes in the expression
Combine the similarities
Calculate the adjacency
Identify modules with hierarchical clustering

Here we provide an example on how to use BioCor with WGCNA:

sim is a list where each element is a matrix of similarities between genes Our normalized expression is in the expr variable, a matrix where the samples are in the rows and genes in the columns.

expr.sim <- WGCNA::cor(expr) # or bicor

## Combine the similarities
similarity <- similarities(c(list(exp = expr.sim), sim), mean, na.rm = TRUE)

## Choose the softThreshold
pSFT <- pickSoftThreshold.fromSimilarity(similarity)

## Or any other function we want
adjacency <- adjacency.fromSimilarity(similarity, power = pSFT$powerEstimate)

## Once we have the similarities we can calculate the TOM with TOM
TOM <- TOMsimilarity(adjacency) ## Requires adjacencies despite its name 
dissTOM <- 1 - TOM
geneTree <- hclust(as.dist(dissTOM), method = "average")
## We can use a clustering tool to group the genes
dynamicMods <- cutreeHybrid(dendro = geneTree, distM = dissTOM,
                            deepSplit = 2, pamRespectsDendro = FALSE,
                            minClusterSize = 30)
moduleColors <- labels2colors(dynamicMods$labels)

Once the modules are identified using the functional similarities of this package and the gene correlations, one can continue with the workflow of WGCNA.

An important aspect in this process is deciding how to combine the similarities and the expression data: - If the functional similarities play a huge role, we will end up having only those genes closely related to the same functions. - If the functional similarities play a low role, it will be similarly to only use WGCNA, and the genes won’t be functionally related.

For these reasons it is better to use weights between 0.5 and 1 for expression if you use weighted.sum or similar functions.

There are several things to take into account when choosing a way to combine: - The size of the grey or 0 modules (those who don’t show a specific pattern) - The number and size of the modules created. - The way the similarities are combined

Violin plots may help to view the differences in size and distribution of the modules across different methods of combining the similarities.

9 FAQ

9.1 How is defined how similar are two pathways?

It uses the Sørensen–Dice index: It is the double of the genes shared by the pathways divided by the number of genes in each pathway.

We can calculate the similarity between two pathways (\(x\), \(w\)) with:

\[Dice(x, w) = \frac{2 |x \cap w|}{|x| + |w|}\]

This is implemented in the diceSim function, which results is similar to Jaccard index:

\[Jaccard(x, w) = \frac{|x \cap w|}{|x \cup w|}\]

Both Jaccard index and Dice index are between 0 and 1 (\([0, 1]\)). To calculate the Jaccard index from the diceSim use the D2J function.

9.2 Why do you use the dice coefficient and not the Jaccard ?

We consider Dice coefficient better than Jaccard because it has higher values for the same comparisons, which reflects that including a gene in a pathway is not easily done.

9.3 How do you combine similarities between several pathways of two genes?

Although the recommend method is the “max” method, (set as default), there are implemented other methods in combineScores of the GOSemSim package which I borrowed. See the Combining values section and the help page of combineScores.

9.5 How to detect which functional relationship is more important between two genes?

If two genes are involved in the same pathways usually they have (to some extent, maybe indirect) interactions. The fact that they share interactions with other genes does not imply that the relationship between these two genes is stronger. To detect which relationship is more important between two genes one could measure other similarities scores and check the stechiometric of the pathways and measure the expression changes and correlation between them.

This could be approached using dynamic simulations but this is out of the scope of the package.

9.7 Why isn’t available a method for calculating GO similarities?

This is covered by the GOSemSim package, you can use it to produce a similarity matrix (i.e. use mgeneSim). You can parallelize it with foreach package or BiocParallel if your list of genes is big.

9.8 I get an error! How do I solve its?

If the error is like this:

Error in FUN(X[[i]], ...) : 
  trying to get slot "geneAnno" from an object of a basic class ("list") with no slots

And you have loaded the GOSemSim library, R is calling the GOSemSim function of the same name. Use BioCor:: to call the function from BioCor (f.ex: BioCor::geneSim)

If the error is not previously described in the support forum, post a question there.

My apologies if you found a bug or an inconsistency between what BioCor should do and what it actually does. Once you checked that it is a bug, please let me know at the issues page of Github.

Session Info

sessionInfo()
## R version 3.4.0 (2017-04-21)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.2 LTS
## 
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.5-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.5-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] parallel  stats4    stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
## [1] reactome.db_1.59.1   org.Hs.eg.db_3.4.1   AnnotationDbi_1.38.0
## [4] IRanges_2.10.1       S4Vectors_0.14.1     Biobase_2.36.2      
## [7] BiocGenerics_0.22.0  BioCor_1.0.1         BiocStyle_2.4.0     
## 
## loaded via a namespace (and not attached):
##  [1] graph_1.54.0    Rcpp_0.12.10    knitr_1.15.1    magrittr_1.5   
##  [5] highr_0.6       stringr_1.2.0   tools_3.4.0     DBI_0.6-1      
##  [9] htmltools_0.3.6 yaml_2.1.14     rprojroot_1.2   digest_0.6.12  
## [13] bookdown_0.3    memoise_1.1.0   evaluate_0.10   RSQLite_1.1-2  
## [17] rmarkdown_1.5   stringi_1.1.5   compiler_3.4.0  backports_1.0.5

BioCor

17 May 2017

Abstract

Package

1 Introduction

2 Citation

3 Installation

4 Using BioCor

4.1 Preparation

4.2 Pathway similarities

4.2.1 Combining values

4.3 Gene similarities

4.4 Gene cluster similarities

4.4.1 By pathways similarities

4.4.2 By genes similarities

4.5 Merging similarities

4.6 Converting similarities

5 High volumes of gene similarities

6 An example of usage

7 Comparing with GOSemSim

8 WGCNA and BioCor

9 FAQ

9.1 How is defined how similar are two pathways?

9.2 Why do you use the dice coefficient and not the Jaccard ?

9.3 How do you combine similarities between several pathways of two genes?

9.5 How to detect which functional relationship is more important between two genes?

9.7 Why isn’t available a method for calculating GO similarities?

9.8 I get an error! How do I solve its?

Session Info

BioCor

17 May 2017

Abstract

Package

1 Introduction

2 Citation

3 Installation

4 Using BioCor

4.1 Preparation

4.2 Pathway similarities

4.2.1 Combining values

4.3 Gene similarities

4.4 Gene cluster similarities

4.4.1 By pathways similarities

4.4.2 By genes similarities

4.5 Merging similarities

4.6 Converting similarities

5 High volumes of gene similarities

6 An example of usage

7 Comparing with GOSemSim

8 WGCNA and BioCor

9 FAQ

9.1 How is defined how similar are two pathways?

9.2 Why do you use the dice coefficient and not the Jaccard ?

9.3 How do you combine similarities between several pathways of two genes?

9.4 Why do you recommend using the max method to combine similarities scores for pathways?

9.5 How to detect which functional relationship is more important between two genes?

9.6 How to detect with which genes is my gene of interest related?

9.7 Why isn’t available a method for calculating GO similarities?

9.8 I get an error! How do I solve its?

Session Info