Contents

1 fCI

fCI (f-divergence Cutoff Index), identifies DEGs by computing the difference between the distribution of fold-changes for the control-control and remaining (non-differential) case-control gene expression ratio data.:

fCI provides several advantages compared to existing methods. Firstly, it performed equally well or better in finding DEGs in diverse data types (both discrete and continuous data) from various omics technologies compared to methods that were specifically designed for the experiments. Secondly, it fulfills an urgent need in the omics research arena. The increasingly common proteogenomic approaches enabled by rapidly decreasing sequencing costs facilitates the collection of multi-dimensional (i.e. proteogenomics) experiments, for which no efficient tools have been developed to find co-regulation and dependences of DEGs between treatment conditions or developmental stages. Thirdly, fCI does not rely on statistical methods that require sufficiently large numbers of replicates to evaluate DEGs. Instead fCI can effectively identify changes in samples with very few or no replicates.

2 Using fCI

fCI can also be run interactively:

library(fCI)
## Loading required package: FNN
## Loading required package: psych
## Loading required package: gtools
## 
## Attaching package: 'gtools'
## 
## The following object is masked from 'package:psych':
## 
##     logit
## 
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## 
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## 
## Loading required package: rgl
## Loading required package: grid
## Loading required package: VennDiagram
## Loading required package: futile.logger
## 
## Attaching package: 'futile.logger'
## 
## The following object is masked from 'package:gtools':
## 
##     scat
## Warning: replacing previous import by 'gtools::logit' when loading 'fCI'
suppressWarnings(library(fCI))
suppressWarnings(library(psych))

3 Installing fCI

fCI should be installed as follows:

source("http://bioconductor.org/biocLite.R")
biocLite("fCI")
library(fCI)

3.1 Dependency Checks

4 Examples on how to use fCI

fCI can be used as the follows:

4.1 Environment Setup:

4.2 Load the data

fci.data=data.frame(matrix(sample(3:100, 1043*6, replace=TRUE), 1043,6))

4.3 Finding Differentially Expressed Genes:

 suppressWarnings(library(gtools))
 targets=fCI.call.by.index(c(1,2,3), c(4,5,6), fci.data)
## [1] "You didn't provide either the input file name or set the sample data \n\t\tfrom the object!"
## Control IDs : [ 1 2 ] & Case IDs : [ 1 4 ];  Fold_Cutoff= 10 ; #_DEGs= 72 ; Divergence= 0.00102112 
## Control IDs : [ 1 2 ] & Case IDs : [ 2 4 ];  Fold_Cutoff= 10 ; #_DEGs= 66 ; Divergence= 0.00308153 
## Control IDs : [ 1 2 ] & Case IDs : [ 3 4 ];  Fold_Cutoff= 10 ; #_DEGs= 79 ; Divergence= 0.00205424 
## Control IDs : [ 1 2 ] & Case IDs : [ 1 5 ];  Fold_Cutoff= 10 ; #_DEGs= 67 ; Divergence= 0.00156378 
## Control IDs : [ 1 2 ] & Case IDs : [ 2 5 ];  Fold_Cutoff= 10 ; #_DEGs= 54 ; Divergence= 0.00297927 
## Control IDs : [ 1 2 ] & Case IDs : [ 3 5 ];  Fold_Cutoff= 10 ; #_DEGs= 57 ; Divergence= 0.00230616 
## Control IDs : [ 1 2 ] & Case IDs : [ 1 6 ];  Fold_Cutoff= 10 ; #_DEGs= 63 ; Divergence= 0.00069198 
## Control IDs : [ 1 2 ] & Case IDs : [ 2 6 ];  Fold_Cutoff= 10 ; #_DEGs= 58 ; Divergence= 0.00142507 
## Control IDs : [ 1 2 ] & Case IDs : [ 3 6 ];  Fold_Cutoff= 10 ; #_DEGs= 62 ; Divergence= 0.00090973 
## Control IDs : [ 1 3 ] & Case IDs : [ 1 4 ];  Fold_Cutoff= 10 ; #_DEGs= 72 ; Divergence= 0.00203507 
## Control IDs : [ 1 3 ] & Case IDs : [ 2 4 ];  Fold_Cutoff= 10 ; #_DEGs= 66 ; Divergence= 0.00481389 
## Control IDs : [ 1 3 ] & Case IDs : [ 3 4 ];  Fold_Cutoff= 10 ; #_DEGs= 79 ; Divergence= 0.00352271 
## Control IDs : [ 1 3 ] & Case IDs : [ 1 5 ];  Fold_Cutoff= 10 ; #_DEGs= 67 ; Divergence= 0.00295556 
## Control IDs : [ 1 3 ] & Case IDs : [ 2 5 ];  Fold_Cutoff= 10 ; #_DEGs= 54 ; Divergence= 0.00489715 
## Control IDs : [ 1 3 ] & Case IDs : [ 3 5 ];  Fold_Cutoff= 10 ; #_DEGs= 57 ; Divergence= 0.00399087 
## Control IDs : [ 1 3 ] & Case IDs : [ 1 6 ];  Fold_Cutoff= 10 ; #_DEGs= 63 ; Divergence= 0.00132181 
## Control IDs : [ 1 3 ] & Case IDs : [ 2 6 ];  Fold_Cutoff= 10 ; #_DEGs= 58 ; Divergence= 0.00231105 
## Control IDs : [ 1 3 ] & Case IDs : [ 3 6 ];  Fold_Cutoff= 10 ; #_DEGs= 62 ; Divergence= 0.00156143 
## Control IDs : [ 2 3 ] & Case IDs : [ 1 4 ];  Fold_Cutoff= 10 ; #_DEGs= 72 ; Divergence= 0.00080067 
## Control IDs : [ 2 3 ] & Case IDs : [ 2 4 ];  Fold_Cutoff= 10 ; #_DEGs= 66 ; Divergence= 0.0017601 
## Control IDs : [ 2 3 ] & Case IDs : [ 3 4 ];  Fold_Cutoff= 10 ; #_DEGs= 79 ; Divergence= 0.00132457 
## Control IDs : [ 2 3 ] & Case IDs : [ 1 5 ];  Fold_Cutoff= 10 ; #_DEGs= 67 ; Divergence= 0.0013013 
## Control IDs : [ 2 3 ] & Case IDs : [ 2 5 ];  Fold_Cutoff= 10 ; #_DEGs= 54 ; Divergence= 0.00199341 
## Control IDs : [ 2 3 ] & Case IDs : [ 3 5 ];  Fold_Cutoff= 10 ; #_DEGs= 57 ; Divergence= 0.0016466 
## Control IDs : [ 2 3 ] & Case IDs : [ 1 6 ];  Fold_Cutoff= 10 ; #_DEGs= 63 ; Divergence= 0.00034194 
## Control IDs : [ 2 3 ] & Case IDs : [ 2 6 ];  Fold_Cutoff= 10 ; #_DEGs= 58 ; Divergence= 0.00062632 
## Control IDs : [ 2 3 ] & Case IDs : [ 3 6 ];  Fold_Cutoff= 10 ; #_DEGs= 62 ; Divergence= 0.0003692
 head(targets)
##   targets Freq     ratio
## 1       6   15 0.5555556
## 2       7    6 0.2222222
## 3      21    3 0.1111111
## 4      23    3 0.1111111
## 5      25    3 0.1111111
## 6      32    3 0.1111111

4.4 Result Interpretation

As fCI is coded using object oritented programming, all computations are based on object manipulation. You will be able to perform very versatile analysis by changing the software parameters and altering the options in computing the dysresgulated genes.

4.5 Illustrate the fCI analysis in details

  fci=new("NPCI")
## [1] "You didn't provide either the input file name or set the sample data \n\t\tfrom the object!"
fci@sample.data.normalized=fci.data**
**fci@sample.data.file="c://home//fci_data.txt"**

4.6 Assign data to fCI object

assign the built-in dataset to the object

if(dim(fci.data)[1]>0){
  fci@sample.data.normalized=fci.data
}

after you assign the data to the object, you should initialize the object
fci (which will remove genes with zero expression values and intialize the object parameters) and then assign the fci’s control replicates (fci@wt.index) with two sample column ids (i.e. 1 & 2) for constructing empirical null distribution, and assign fci’s control-case replicates (fci@df.index) with two sample column ids (i.e. 1 & 5) for control-case distribution.

if(dim(fci.data)[1]>0){
    fci=initialize(fci)
    fci@wt.index=c(1,2)
    fci@df.index=c(1,4)
}

next, you will perform the formal fCI analysis by calling the following functions to the object.

if(dim(fci.data)[1]>0){
fci =populate(fci )
fci =compute(fci )
fci =summarize(fci )
}

after the divergence was computed, fCI will generate the final results and saved to the ‘result’ field, which included the total number of differentially expressed genes, optimal fold cutoff, and divergence value. In addition,you will be able to see the differentially expressed gene ids (indicated by row numbers) and figures about the empirical-null vs control-case distribution.

since fCI uses object oriented programming, you can easily change the field s of the objects, such as fold cutoff list, the column ids for emprical null distributions (and/or control-case distribution) to evaluate differential expression on the samples of interest. After changes are made, rerun step 6 to compute new analysis results.

if(dim(fci.data)[1]>0){
fci@fold.cutoff.list=list(seq(from=1.2, to=5, by=0.2))
fci@wt.index=c(2,3)
fci@df.index=c(2,5)
}

4.7 multi-dimensional data

if the dataset contains multiple control replicates and case replicates, you don’t need to form the these combinations and perform fCI individually. Instead, you could invoke fci on a top-level function that will automatically perform the analysis on given control replicate column ids and case/experimental column ids. Given the dataset that contains 3 control replicates and 3 case replicates, fci will form possible 3 control-control (empirical null) distributions, namely 1-2, 1-3 and 2-3 to construct empirical null ratio distribution, and 9 control-case ratio distribution, namely 1-4, 1-5, 1-6, 2-4, 2-5, 2-6, 3-4, 3-5, and 3-6. Then fci will pick one distribution from the control-control, and one distribution fro control-case (i.e., 1-2 & 1-5) to form a valid fci analysis. There are a total of 3*9=27 unique fci analysis for this example.

fci=new("NPCI")
## [1] "You didn't provide either the input file name or set the sample data \n\t\tfrom the object!"
filename=""
if(file.exists("../inst/extdata/Supp_Dataset_part_2.txt")){
  filename="../inst/extdata/Supp_Dataset_part_2.txt"
}else if(file.exists("../../inst/extdata/Supp_Dataset_part_2.txt")){
  filename="../../inst/extdata/Supp_Dataset_part_2.txt"
}

if(nchar(filename)>3){
  fci=find.fci.targets(fci, c(1,2,3), c(4,5,6), 
                       filename,
                       use.normalization=FALSE)
  result=show.targets(fci)
  head(result, 20)
}
## Control IDs : [ 1 2 ] & Case IDs : [ 1 4 ];  Fold_Cutoff= 1.3 ; #_DEGs= 822 ; Divergence= 3.728e-05 
## Control IDs : [ 1 2 ] & Case IDs : [ 2 4 ];  Fold_Cutoff= 1.3 ; #_DEGs= 817 ; Divergence= 0.00017513 
## Control IDs : [ 1 2 ] & Case IDs : [ 3 4 ];  Fold_Cutoff= 1.3 ; #_DEGs= 820 ; Divergence= 9.713e-05 
## Control IDs : [ 1 2 ] & Case IDs : [ 1 5 ];  Fold_Cutoff= 1.3 ; #_DEGs= 827 ; Divergence= 1.39e-06 
## Control IDs : [ 1 2 ] & Case IDs : [ 2 5 ];  Fold_Cutoff= 1.3 ; #_DEGs= 825 ; Divergence= 4.03e-06 
## Control IDs : [ 1 2 ] & Case IDs : [ 3 5 ];  Fold_Cutoff= 1.3 ; #_DEGs= 823 ; Divergence= 4.641e-05 
## Control IDs : [ 1 2 ] & Case IDs : [ 1 6 ];  Fold_Cutoff= 1.3 ; #_DEGs= 826 ; Divergence= 5.36e-06 
## Control IDs : [ 1 2 ] & Case IDs : [ 2 6 ];  Fold_Cutoff= 1.3 ; #_DEGs= 828 ; Divergence= 2.307e-05 
## Control IDs : [ 1 2 ] & Case IDs : [ 3 6 ];  Fold_Cutoff= 1.3 ; #_DEGs= 823 ; Divergence= 3.249e-05 
## Control IDs : [ 1 3 ] & Case IDs : [ 1 4 ];  Fold_Cutoff= 1.3 ; #_DEGs= 822 ; Divergence= 6e-05 
## Control IDs : [ 1 3 ] & Case IDs : [ 2 4 ];  Fold_Cutoff= 1.3 ; #_DEGs= 817 ; Divergence= 0.00030072 
## Control IDs : [ 1 3 ] & Case IDs : [ 3 4 ];  Fold_Cutoff= 1.3 ; #_DEGs= 820 ; Divergence= 0.00024907 
## Control IDs : [ 1 3 ] & Case IDs : [ 1 5 ];  Fold_Cutoff= 1.3 ; #_DEGs= 827 ; Divergence= 1.459e-05 
## Control IDs : [ 1 3 ] & Case IDs : [ 2 5 ];  Fold_Cutoff= 1.3 ; #_DEGs= 825 ; Divergence= 1.401e-05 
## Control IDs : [ 1 3 ] & Case IDs : [ 3 5 ];  Fold_Cutoff= 1.3 ; #_DEGs= 823 ; Divergence= 1.9e-07 
## Control IDs : [ 1 3 ] & Case IDs : [ 1 6 ];  Fold_Cutoff= 1.3 ; #_DEGs= 826 ; Divergence= 4.838e-05 
## Control IDs : [ 1 3 ] & Case IDs : [ 2 6 ];  Fold_Cutoff= 1.3 ; #_DEGs= 828 ; Divergence= 2.395e-05 
## Control IDs : [ 1 3 ] & Case IDs : [ 3 6 ];  Fold_Cutoff= 1.3 ; #_DEGs= 823 ; Divergence= 0.00010071 
## Control IDs : [ 2 3 ] & Case IDs : [ 1 4 ];  Fold_Cutoff= 1.3 ; #_DEGs= 822 ; Divergence= 7.31e-06 
## Control IDs : [ 2 3 ] & Case IDs : [ 2 4 ];  Fold_Cutoff= 1.3 ; #_DEGs= 817 ; Divergence= 7.542e-05 
## Control IDs : [ 2 3 ] & Case IDs : [ 3 4 ];  Fold_Cutoff= 1.3 ; #_DEGs= 820 ; Divergence= 6.241e-05 
## Control IDs : [ 2 3 ] & Case IDs : [ 1 5 ];  Fold_Cutoff= 1.3 ; #_DEGs= 827 ; Divergence= 7.84e-06 
## Control IDs : [ 2 3 ] & Case IDs : [ 2 5 ];  Fold_Cutoff= 1.3 ; #_DEGs= 825 ; Divergence= 3.53e-06 
## Control IDs : [ 2 3 ] & Case IDs : [ 3 5 ];  Fold_Cutoff= 1.3 ; #_DEGs= 823 ; Divergence= 8.43e-06 
## Control IDs : [ 2 3 ] & Case IDs : [ 1 6 ];  Fold_Cutoff= 1.3 ; #_DEGs= 826 ; Divergence= 1.11e-05 
## Control IDs : [ 2 3 ] & Case IDs : [ 2 6 ];  Fold_Cutoff= 1.3 ; #_DEGs= 828 ; Divergence= 1.71e-06 
## Control IDs : [ 2 3 ] & Case IDs : [ 3 6 ];  Fold_Cutoff= 1.3 ; #_DEGs= 823 ; Divergence= 1.88e-05
##    targets Freq     ratio
## 1        1   27 1.0000000
## 2        2   27 1.0000000
## 3        3   27 1.0000000
## 4        5   27 1.0000000
## 5        6   27 1.0000000
## 6        7   27 1.0000000
## 7        8   27 1.0000000
## 8        9   27 1.0000000
## 9       10   27 1.0000000
## 10      11   27 1.0000000
## 11      12   27 1.0000000
## 12      13   27 1.0000000
## 13      14   27 1.0000000
## 14      15   27 1.0000000
## 15      17   27 1.0000000
## 16      18   27 1.0000000
## 17      19   27 1.0000000
## 18      20   27 1.0000000
## 19      21   21 0.7777778
## 20      22   24 0.8888889

results from analysis on step will be identical to results shown in step 1). However, you have more flexibility changing the parameters, and printing the figures for sample of interest.

4.8 Normalization

fCI enables users lot of flexibilities to perform differential expression. For example, the users could choose to normalize the each replicate’s gene expression based on total-library normalization, trimed-normalization, median normalization and so on. In addition, fCI enables two fundamental options for divergence estimation. The first method is the helliger distance estimation (the default option) which assumes the log-ratio expression values to follow the gaussian distribution. The second method is the cross-entropy which relax the condition.

library(fCI)
fci=normalization(fci)

4.9 identify a specific sample for differential expression

if(dim(fci@sample.data.normalized)[1]>100 & 
     dim(fci@sample.data.normalized)[2]>3){
fci@wt.index=c(1,2)
fci@df.index=c(1,4)
fci@method.option=1
fci =populate(fci )
fci =compute(fci )
fci =summarize(fci )
}

4.10 Time-course analysis

besides performing differential expression analysis using transcriptomic and/or proteomic data, fCI enables the users to perform jointly analysis using multi-dimensional data. By multi-dimensional, we refer to data that has been generated for multiple related samples, i.e. time course, different tissues, cell types or in cases where both transcriptomic and proteomic data are available.

if(nchar(filename)>3){
  fci=new("NPCI")
  fci=find.fci.targets(fci, c(1,2,3), c(4,5,6), 
             "../inst/extdata/Supp_Dataset_part_2.txt", 
             use.normalization=FALSE)
  result=show.targets(fci)
  head(result, 20)
}
## [1] "You didn't provide either the input file name or set the sample data \n\t\tfrom the object!"
## Control IDs : [ 1 2 ] & Case IDs : [ 1 4 ];  Fold_Cutoff= 1.3 ; #_DEGs= 822 ; Divergence= 3.728e-05 
## Control IDs : [ 1 2 ] & Case IDs : [ 2 4 ];  Fold_Cutoff= 1.3 ; #_DEGs= 817 ; Divergence= 0.00017513 
## Control IDs : [ 1 2 ] & Case IDs : [ 3 4 ];  Fold_Cutoff= 1.3 ; #_DEGs= 820 ; Divergence= 9.713e-05 
## Control IDs : [ 1 2 ] & Case IDs : [ 1 5 ];  Fold_Cutoff= 1.3 ; #_DEGs= 827 ; Divergence= 1.39e-06 
## Control IDs : [ 1 2 ] & Case IDs : [ 2 5 ];  Fold_Cutoff= 1.3 ; #_DEGs= 825 ; Divergence= 4.03e-06 
## Control IDs : [ 1 2 ] & Case IDs : [ 3 5 ];  Fold_Cutoff= 1.3 ; #_DEGs= 823 ; Divergence= 4.641e-05 
## Control IDs : [ 1 2 ] & Case IDs : [ 1 6 ];  Fold_Cutoff= 1.3 ; #_DEGs= 826 ; Divergence= 5.36e-06 
## Control IDs : [ 1 2 ] & Case IDs : [ 2 6 ];  Fold_Cutoff= 1.3 ; #_DEGs= 828 ; Divergence= 2.307e-05 
## Control IDs : [ 1 2 ] & Case IDs : [ 3 6 ];  Fold_Cutoff= 1.3 ; #_DEGs= 823 ; Divergence= 3.249e-05 
## Control IDs : [ 1 3 ] & Case IDs : [ 1 4 ];  Fold_Cutoff= 1.3 ; #_DEGs= 822 ; Divergence= 6e-05 
## Control IDs : [ 1 3 ] & Case IDs : [ 2 4 ];  Fold_Cutoff= 1.3 ; #_DEGs= 817 ; Divergence= 0.00030072 
## Control IDs : [ 1 3 ] & Case IDs : [ 3 4 ];  Fold_Cutoff= 1.3 ; #_DEGs= 820 ; Divergence= 0.00024907 
## Control IDs : [ 1 3 ] & Case IDs : [ 1 5 ];  Fold_Cutoff= 1.3 ; #_DEGs= 827 ; Divergence= 1.459e-05 
## Control IDs : [ 1 3 ] & Case IDs : [ 2 5 ];  Fold_Cutoff= 1.3 ; #_DEGs= 825 ; Divergence= 1.401e-05 
## Control IDs : [ 1 3 ] & Case IDs : [ 3 5 ];  Fold_Cutoff= 1.3 ; #_DEGs= 823 ; Divergence= 1.9e-07 
## Control IDs : [ 1 3 ] & Case IDs : [ 1 6 ];  Fold_Cutoff= 1.3 ; #_DEGs= 826 ; Divergence= 4.838e-05 
## Control IDs : [ 1 3 ] & Case IDs : [ 2 6 ];  Fold_Cutoff= 1.3 ; #_DEGs= 828 ; Divergence= 2.395e-05 
## Control IDs : [ 1 3 ] & Case IDs : [ 3 6 ];  Fold_Cutoff= 1.3 ; #_DEGs= 823 ; Divergence= 0.00010071 
## Control IDs : [ 2 3 ] & Case IDs : [ 1 4 ];  Fold_Cutoff= 1.3 ; #_DEGs= 822 ; Divergence= 7.31e-06 
## Control IDs : [ 2 3 ] & Case IDs : [ 2 4 ];  Fold_Cutoff= 1.3 ; #_DEGs= 817 ; Divergence= 7.542e-05 
## Control IDs : [ 2 3 ] & Case IDs : [ 3 4 ];  Fold_Cutoff= 1.3 ; #_DEGs= 820 ; Divergence= 6.241e-05 
## Control IDs : [ 2 3 ] & Case IDs : [ 1 5 ];  Fold_Cutoff= 1.3 ; #_DEGs= 827 ; Divergence= 7.84e-06 
## Control IDs : [ 2 3 ] & Case IDs : [ 2 5 ];  Fold_Cutoff= 1.3 ; #_DEGs= 825 ; Divergence= 3.53e-06 
## Control IDs : [ 2 3 ] & Case IDs : [ 3 5 ];  Fold_Cutoff= 1.3 ; #_DEGs= 823 ; Divergence= 8.43e-06 
## Control IDs : [ 2 3 ] & Case IDs : [ 1 6 ];  Fold_Cutoff= 1.3 ; #_DEGs= 826 ; Divergence= 1.11e-05 
## Control IDs : [ 2 3 ] & Case IDs : [ 2 6 ];  Fold_Cutoff= 1.3 ; #_DEGs= 828 ; Divergence= 1.71e-06 
## Control IDs : [ 2 3 ] & Case IDs : [ 3 6 ];  Fold_Cutoff= 1.3 ; #_DEGs= 823 ; Divergence= 1.88e-05
##    targets Freq     ratio
## 1        1   27 1.0000000
## 2        2   27 1.0000000
## 3        3   27 1.0000000
## 4        5   27 1.0000000
## 5        6   27 1.0000000
## 6        7   27 1.0000000
## 7        8   27 1.0000000
## 8        9   27 1.0000000
## 9       10   27 1.0000000
## 10      11   27 1.0000000
## 11      12   27 1.0000000
## 12      13   27 1.0000000
## 13      14   27 1.0000000
## 14      15   27 1.0000000
## 15      17   27 1.0000000
## 16      18   27 1.0000000
## 17      19   27 1.0000000
## 18      20   27 1.0000000
## 19      21   21 0.7777778
## 20      22   24 0.8888889