4 Examples on how to use `fCI`

fCI can be used as the follows:

4.1 Environment Setup:

Load the example sample data provided by fCI package under data/. This data contains gene/protein expression values with columns representing samples/lanes/replicates, and rows representing genes.
In this example, a total of 3 control replicates (column 1 through 3) and 3 case replicates (columns 4 through 6) is shown.

4.2 Load the data

fci.data=data.frame(matrix(sample(3:100, 1043*6, replace=TRUE), 1043,6))

4.3 Finding Differentially Expressed Genes:

To find differentially expressed genes** using column 1 to 3 as controls and 4 to 6 as case replicates, execute the following function call :

 suppressWarnings(library(gtools))
 targets=fCI.call.by.index(c(1,2,3), c(4,5,6), fci.data)

## [1] "You didn't provide either the input file name or set the sample data \n\t\tfrom the object!"
## Control IDs : [ 1 2 ] & Case IDs : [ 1 4 ];  Fold_Cutoff= 10 ; #_DEGs= 72 ; Divergence= 0.00102112 
## Control IDs : [ 1 2 ] & Case IDs : [ 2 4 ];  Fold_Cutoff= 10 ; #_DEGs= 66 ; Divergence= 0.00308153 
## Control IDs : [ 1 2 ] & Case IDs : [ 3 4 ];  Fold_Cutoff= 10 ; #_DEGs= 79 ; Divergence= 0.00205424 
## Control IDs : [ 1 2 ] & Case IDs : [ 1 5 ];  Fold_Cutoff= 10 ; #_DEGs= 67 ; Divergence= 0.00156378 
## Control IDs : [ 1 2 ] & Case IDs : [ 2 5 ];  Fold_Cutoff= 10 ; #_DEGs= 54 ; Divergence= 0.00297927 
## Control IDs : [ 1 2 ] & Case IDs : [ 3 5 ];  Fold_Cutoff= 10 ; #_DEGs= 57 ; Divergence= 0.00230616 
## Control IDs : [ 1 2 ] & Case IDs : [ 1 6 ];  Fold_Cutoff= 10 ; #_DEGs= 63 ; Divergence= 0.00069198 
## Control IDs : [ 1 2 ] & Case IDs : [ 2 6 ];  Fold_Cutoff= 10 ; #_DEGs= 58 ; Divergence= 0.00142507 
## Control IDs : [ 1 2 ] & Case IDs : [ 3 6 ];  Fold_Cutoff= 10 ; #_DEGs= 62 ; Divergence= 0.00090973 
## Control IDs : [ 1 3 ] & Case IDs : [ 1 4 ];  Fold_Cutoff= 10 ; #_DEGs= 72 ; Divergence= 0.00203507 
## Control IDs : [ 1 3 ] & Case IDs : [ 2 4 ];  Fold_Cutoff= 10 ; #_DEGs= 66 ; Divergence= 0.00481389 
## Control IDs : [ 1 3 ] & Case IDs : [ 3 4 ];  Fold_Cutoff= 10 ; #_DEGs= 79 ; Divergence= 0.00352271 
## Control IDs : [ 1 3 ] & Case IDs : [ 1 5 ];  Fold_Cutoff= 10 ; #_DEGs= 67 ; Divergence= 0.00295556 
## Control IDs : [ 1 3 ] & Case IDs : [ 2 5 ];  Fold_Cutoff= 10 ; #_DEGs= 54 ; Divergence= 0.00489715 
## Control IDs : [ 1 3 ] & Case IDs : [ 3 5 ];  Fold_Cutoff= 10 ; #_DEGs= 57 ; Divergence= 0.00399087 
## Control IDs : [ 1 3 ] & Case IDs : [ 1 6 ];  Fold_Cutoff= 10 ; #_DEGs= 63 ; Divergence= 0.00132181 
## Control IDs : [ 1 3 ] & Case IDs : [ 2 6 ];  Fold_Cutoff= 10 ; #_DEGs= 58 ; Divergence= 0.00231105 
## Control IDs : [ 1 3 ] & Case IDs : [ 3 6 ];  Fold_Cutoff= 10 ; #_DEGs= 62 ; Divergence= 0.00156143 
## Control IDs : [ 2 3 ] & Case IDs : [ 1 4 ];  Fold_Cutoff= 10 ; #_DEGs= 72 ; Divergence= 0.00080067 
## Control IDs : [ 2 3 ] & Case IDs : [ 2 4 ];  Fold_Cutoff= 10 ; #_DEGs= 66 ; Divergence= 0.0017601 
## Control IDs : [ 2 3 ] & Case IDs : [ 3 4 ];  Fold_Cutoff= 10 ; #_DEGs= 79 ; Divergence= 0.00132457 
## Control IDs : [ 2 3 ] & Case IDs : [ 1 5 ];  Fold_Cutoff= 10 ; #_DEGs= 67 ; Divergence= 0.0013013 
## Control IDs : [ 2 3 ] & Case IDs : [ 2 5 ];  Fold_Cutoff= 10 ; #_DEGs= 54 ; Divergence= 0.00199341 
## Control IDs : [ 2 3 ] & Case IDs : [ 3 5 ];  Fold_Cutoff= 10 ; #_DEGs= 57 ; Divergence= 0.0016466 
## Control IDs : [ 2 3 ] & Case IDs : [ 1 6 ];  Fold_Cutoff= 10 ; #_DEGs= 63 ; Divergence= 0.00034194 
## Control IDs : [ 2 3 ] & Case IDs : [ 2 6 ];  Fold_Cutoff= 10 ; #_DEGs= 58 ; Divergence= 0.00062632 
## Control IDs : [ 2 3 ] & Case IDs : [ 3 6 ];  Fold_Cutoff= 10 ; #_DEGs= 62 ; Divergence= 0.0003692

 head(targets)

##   targets Freq     ratio
## 1       6   15 0.5555556
## 2       7    6 0.2222222
## 3      21    3 0.1111111
## 4      23    3 0.1111111
## 5      25    3 0.1111111
## 6      32    3 0.1111111

4.4 Result Interpretation

The output will be the genes (ids) that are differentially expressed and its frequency (a ratio tells how often each gene is shown to be differentially expressed in all fCI combinations). *In general, the higher the frequency (the ratio) tells us how likely a gene is differentially expressed
For example, A ratio of 75% means the gene under study is shown to be a dysregulated target in 3 out of 4 fCI pairwise analysis.

As fCI is coded using object oritented programming, all computations are based on object manipulation. You will be able to perform very versatile analysis by changing the software parameters and altering the options in computing the dysresgulated genes.

4.5 Illustrate the fCI analysis in details

First, create an object – fci, which holds the gene expression levels, default model parameters.

  fci=new("NPCI")

## [1] "You didn't provide either the input file name or set the sample data \n\t\tfrom the object!"

next, you need to provide gene/protein expression values for subsequent analysis. method 1): assign an existing data frame (of gene expression values) in R console directly to the object, for example :

fci@sample.data.normalized=fci.data**

or simply provide the pathname for the file (tab-delimited file with rownames) that contains the expression values.

**fci@sample.data.file="c://home//fci_data.txt"**

4.6 Assign data to fCI object

assign the built-in dataset to the object

if(dim(fci.data)[1]>0){
  fci@sample.data.normalized=fci.data
}

after you assign the data to the object, you should initialize the object
fci (which will remove genes with zero expression values and intialize the object parameters) and then assign the fci’s control replicates (fci@wt.index) with two sample column ids (i.e. 1 & 2) for constructing empirical null distribution, and assign fci’s control-case replicates (fci@df.index) with two sample column ids (i.e. 1 & 5) for control-case distribution.

if(dim(fci.data)[1]>0){
    fci=initialize(fci)
    fci@wt.index=c(1,2)
    fci@df.index=c(1,4)
}

next, you will perform the formal fCI analysis by calling the following functions to the object.

if(dim(fci.data)[1]>0){
fci =populate(fci )
fci =compute(fci )
fci =summarize(fci )
}

after the divergence was computed, fCI will generate the final results and saved to the ‘result’ field, which included the total number of differentially expressed genes, optimal fold cutoff, and divergence value. In addition,you will be able to see the differentially expressed gene ids (indicated by row numbers) and figures about the empirical-null vs control-case distribution.

since fCI uses object oriented programming, you can easily change the field s of the objects, such as fold cutoff list, the column ids for emprical null distributions (and/or control-case distribution) to evaluate differential expression on the samples of interest. After changes are made, rerun step 6 to compute new analysis results.

if(dim(fci.data)[1]>0){
fci@fold.cutoff.list=list(seq(from=1.2, to=5, by=0.2))
fci@wt.index=c(2,3)
fci@df.index=c(2,5)
}

4.7 multi-dimensional data

if the dataset contains multiple control replicates and case replicates, you don’t need to form the these combinations and perform fCI individually. Instead, you could invoke fci on a top-level function that will automatically perform the analysis on given control replicate column ids and case/experimental column ids. Given the dataset that contains 3 control replicates and 3 case replicates, fci will form possible 3 control-control (empirical null) distributions, namely 1-2, 1-3 and 2-3 to construct empirical null ratio distribution, and 9 control-case ratio distribution, namely 1-4, 1-5, 1-6, 2-4, 2-5, 2-6, 3-4, 3-5, and 3-6. Then fci will pick one distribution from the control-control, and one distribution fro control-case (i.e., 1-2 & 1-5) to form a valid fci analysis. There are a total of 3*9=27 unique fci analysis for this example.

fci=new("NPCI")

## [1] "You didn't provide either the input file name or set the sample data \n\t\tfrom the object!"

filename=""
if(file.exists("../inst/extdata/Supp_Dataset_part_2.txt")){
  filename="../inst/extdata/Supp_Dataset_part_2.txt"
}else if(file.exists("../../inst/extdata/Supp_Dataset_part_2.txt")){
  filename="../../inst/extdata/Supp_Dataset_part_2.txt"
}

if(nchar(filename)>3){
  fci=find.fci.targets(fci, c(1,2,3), c(4,5,6), 
                       filename,
                       use.normalization=FALSE)
  result=show.targets(fci)
  head(result, 20)
}

## Control IDs : [ 1 2 ] & Case IDs : [ 1 4 ];  Fold_Cutoff= 1.3 ; #_DEGs= 822 ; Divergence= 3.728e-05 
## Control IDs : [ 1 2 ] & Case IDs : [ 2 4 ];  Fold_Cutoff= 1.3 ; #_DEGs= 817 ; Divergence= 0.00017513 
## Control IDs : [ 1 2 ] & Case IDs : [ 3 4 ];  Fold_Cutoff= 1.3 ; #_DEGs= 820 ; Divergence= 9.713e-05 
## Control IDs : [ 1 2 ] & Case IDs : [ 1 5 ];  Fold_Cutoff= 1.3 ; #_DEGs= 827 ; Divergence= 1.39e-06 
## Control IDs : [ 1 2 ] & Case IDs : [ 2 5 ];  Fold_Cutoff= 1.3 ; #_DEGs= 825 ; Divergence= 4.03e-06 
## Control IDs : [ 1 2 ] & Case IDs : [ 3 5 ];  Fold_Cutoff= 1.3 ; #_DEGs= 823 ; Divergence= 4.641e-05 
## Control IDs : [ 1 2 ] & Case IDs : [ 1 6 ];  Fold_Cutoff= 1.3 ; #_DEGs= 826 ; Divergence= 5.36e-06 
## Control IDs : [ 1 2 ] & Case IDs : [ 2 6 ];  Fold_Cutoff= 1.3 ; #_DEGs= 828 ; Divergence= 2.307e-05 
## Control IDs : [ 1 2 ] & Case IDs : [ 3 6 ];  Fold_Cutoff= 1.3 ; #_DEGs= 823 ; Divergence= 3.249e-05 
## Control IDs : [ 1 3 ] & Case IDs : [ 1 4 ];  Fold_Cutoff= 1.3 ; #_DEGs= 822 ; Divergence= 6e-05 
## Control IDs : [ 1 3 ] & Case IDs : [ 2 4 ];  Fold_Cutoff= 1.3 ; #_DEGs= 817 ; Divergence= 0.00030072 
## Control IDs : [ 1 3 ] & Case IDs : [ 3 4 ];  Fold_Cutoff= 1.3 ; #_DEGs= 820 ; Divergence= 0.00024907 
## Control IDs : [ 1 3 ] & Case IDs : [ 1 5 ];  Fold_Cutoff= 1.3 ; #_DEGs= 827 ; Divergence= 1.459e-05 
## Control IDs : [ 1 3 ] & Case IDs : [ 2 5 ];  Fold_Cutoff= 1.3 ; #_DEGs= 825 ; Divergence= 1.401e-05 
## Control IDs : [ 1 3 ] & Case IDs : [ 3 5 ];  Fold_Cutoff= 1.3 ; #_DEGs= 823 ; Divergence= 1.9e-07 
## Control IDs : [ 1 3 ] & Case IDs : [ 1 6 ];  Fold_Cutoff= 1.3 ; #_DEGs= 826 ; Divergence= 4.838e-05 
## Control IDs : [ 1 3 ] & Case IDs : [ 2 6 ];  Fold_Cutoff= 1.3 ; #_DEGs= 828 ; Divergence= 2.395e-05 
## Control IDs : [ 1 3 ] & Case IDs : [ 3 6 ];  Fold_Cutoff= 1.3 ; #_DEGs= 823 ; Divergence= 0.00010071 
## Control IDs : [ 2 3 ] & Case IDs : [ 1 4 ];  Fold_Cutoff= 1.3 ; #_DEGs= 822 ; Divergence= 7.31e-06 
## Control IDs : [ 2 3 ] & Case IDs : [ 2 4 ];  Fold_Cutoff= 1.3 ; #_DEGs= 817 ; Divergence= 7.542e-05 
## Control IDs : [ 2 3 ] & Case IDs : [ 3 4 ];  Fold_Cutoff= 1.3 ; #_DEGs= 820 ; Divergence= 6.241e-05 
## Control IDs : [ 2 3 ] & Case IDs : [ 1 5 ];  Fold_Cutoff= 1.3 ; #_DEGs= 827 ; Divergence= 7.84e-06 
## Control IDs : [ 2 3 ] & Case IDs : [ 2 5 ];  Fold_Cutoff= 1.3 ; #_DEGs= 825 ; Divergence= 3.53e-06 
## Control IDs : [ 2 3 ] & Case IDs : [ 3 5 ];  Fold_Cutoff= 1.3 ; #_DEGs= 823 ; Divergence= 8.43e-06 
## Control IDs : [ 2 3 ] & Case IDs : [ 1 6 ];  Fold_Cutoff= 1.3 ; #_DEGs= 826 ; Divergence= 1.11e-05 
## Control IDs : [ 2 3 ] & Case IDs : [ 2 6 ];  Fold_Cutoff= 1.3 ; #_DEGs= 828 ; Divergence= 1.71e-06 
## Control IDs : [ 2 3 ] & Case IDs : [ 3 6 ];  Fold_Cutoff= 1.3 ; #_DEGs= 823 ; Divergence= 1.88e-05

##    targets Freq     ratio
## 1        1   27 1.0000000
## 2        2   27 1.0000000
## 3        3   27 1.0000000
## 4        5   27 1.0000000
## 5        6   27 1.0000000
## 6        7   27 1.0000000
## 7        8   27 1.0000000
## 8        9   27 1.0000000
## 9       10   27 1.0000000
## 10      11   27 1.0000000
## 11      12   27 1.0000000
## 12      13   27 1.0000000
## 13      14   27 1.0000000
## 14      15   27 1.0000000
## 15      17   27 1.0000000
## 16      18   27 1.0000000
## 17      19   27 1.0000000
## 18      20   27 1.0000000
## 19      21   21 0.7777778
## 20      22   24 0.8888889

results from analysis on step will be identical to results shown in step 1). However, you have more flexibility changing the parameters, and printing the figures for sample of interest.

4.8 Normalization

fCI enables users lot of flexibilities to perform differential expression. For example, the users could choose to normalize the each replicate’s gene expression based on total-library normalization, trimed-normalization, median normalization and so on. In addition, fCI enables two fundamental options for divergence estimation. The first method is the helliger distance estimation (the default option) which assumes the log-ratio expression values to follow the gaussian distribution. The second method is the cross-entropy which relax the condition.

library(fCI)
fci=normalization(fci)

4.9 identify a specific sample for differential expression

if(dim(fci@sample.data.normalized)[1]>100 & 
     dim(fci@sample.data.normalized)[2]>3){
fci@wt.index=c(1,2)
fci@df.index=c(1,4)
fci@method.option=1
fci =populate(fci )
fci =compute(fci )
fci =summarize(fci )
}

4.10 Time-course analysis

besides performing differential expression analysis using transcriptomic and/or proteomic data, fCI enables the users to perform jointly analysis using multi-dimensional data. By multi-dimensional, we refer to data that has been generated for multiple related samples, i.e. time course, different tissues, cell types or in cases where both transcriptomic and proteomic data are available.

if(nchar(filename)>3){
  fci=new("NPCI")
  fci=find.fci.targets(fci, c(1,2,3), c(4,5,6), 
             "../inst/extdata/Supp_Dataset_part_2.txt", 
             use.normalization=FALSE)
  result=show.targets(fci)
  head(result, 20)
}

## [1] "You didn't provide either the input file name or set the sample data \n\t\tfrom the object!"
## Control IDs : [ 1 2 ] & Case IDs : [ 1 4 ];  Fold_Cutoff= 1.3 ; #_DEGs= 822 ; Divergence= 3.728e-05 
## Control IDs : [ 1 2 ] & Case IDs : [ 2 4 ];  Fold_Cutoff= 1.3 ; #_DEGs= 817 ; Divergence= 0.00017513 
## Control IDs : [ 1 2 ] & Case IDs : [ 3 4 ];  Fold_Cutoff= 1.3 ; #_DEGs= 820 ; Divergence= 9.713e-05 
## Control IDs : [ 1 2 ] & Case IDs : [ 1 5 ];  Fold_Cutoff= 1.3 ; #_DEGs= 827 ; Divergence= 1.39e-06 
## Control IDs : [ 1 2 ] & Case IDs : [ 2 5 ];  Fold_Cutoff= 1.3 ; #_DEGs= 825 ; Divergence= 4.03e-06 
## Control IDs : [ 1 2 ] & Case IDs : [ 3 5 ];  Fold_Cutoff= 1.3 ; #_DEGs= 823 ; Divergence= 4.641e-05 
## Control IDs : [ 1 2 ] & Case IDs : [ 1 6 ];  Fold_Cutoff= 1.3 ; #_DEGs= 826 ; Divergence= 5.36e-06 
## Control IDs : [ 1 2 ] & Case IDs : [ 2 6 ];  Fold_Cutoff= 1.3 ; #_DEGs= 828 ; Divergence= 2.307e-05 
## Control IDs : [ 1 2 ] & Case IDs : [ 3 6 ];  Fold_Cutoff= 1.3 ; #_DEGs= 823 ; Divergence= 3.249e-05 
## Control IDs : [ 1 3 ] & Case IDs : [ 1 4 ];  Fold_Cutoff= 1.3 ; #_DEGs= 822 ; Divergence= 6e-05 
## Control IDs : [ 1 3 ] & Case IDs : [ 2 4 ];  Fold_Cutoff= 1.3 ; #_DEGs= 817 ; Divergence= 0.00030072 
## Control IDs : [ 1 3 ] & Case IDs : [ 3 4 ];  Fold_Cutoff= 1.3 ; #_DEGs= 820 ; Divergence= 0.00024907 
## Control IDs : [ 1 3 ] & Case IDs : [ 1 5 ];  Fold_Cutoff= 1.3 ; #_DEGs= 827 ; Divergence= 1.459e-05 
## Control IDs : [ 1 3 ] & Case IDs : [ 2 5 ];  Fold_Cutoff= 1.3 ; #_DEGs= 825 ; Divergence= 1.401e-05 
## Control IDs : [ 1 3 ] & Case IDs : [ 3 5 ];  Fold_Cutoff= 1.3 ; #_DEGs= 823 ; Divergence= 1.9e-07 
## Control IDs : [ 1 3 ] & Case IDs : [ 1 6 ];  Fold_Cutoff= 1.3 ; #_DEGs= 826 ; Divergence= 4.838e-05 
## Control IDs : [ 1 3 ] & Case IDs : [ 2 6 ];  Fold_Cutoff= 1.3 ; #_DEGs= 828 ; Divergence= 2.395e-05 
## Control IDs : [ 1 3 ] & Case IDs : [ 3 6 ];  Fold_Cutoff= 1.3 ; #_DEGs= 823 ; Divergence= 0.00010071 
## Control IDs : [ 2 3 ] & Case IDs : [ 1 4 ];  Fold_Cutoff= 1.3 ; #_DEGs= 822 ; Divergence= 7.31e-06 
## Control IDs : [ 2 3 ] & Case IDs : [ 2 4 ];  Fold_Cutoff= 1.3 ; #_DEGs= 817 ; Divergence= 7.542e-05 
## Control IDs : [ 2 3 ] & Case IDs : [ 3 4 ];  Fold_Cutoff= 1.3 ; #_DEGs= 820 ; Divergence= 6.241e-05 
## Control IDs : [ 2 3 ] & Case IDs : [ 1 5 ];  Fold_Cutoff= 1.3 ; #_DEGs= 827 ; Divergence= 7.84e-06 
## Control IDs : [ 2 3 ] & Case IDs : [ 2 5 ];  Fold_Cutoff= 1.3 ; #_DEGs= 825 ; Divergence= 3.53e-06 
## Control IDs : [ 2 3 ] & Case IDs : [ 3 5 ];  Fold_Cutoff= 1.3 ; #_DEGs= 823 ; Divergence= 8.43e-06 
## Control IDs : [ 2 3 ] & Case IDs : [ 1 6 ];  Fold_Cutoff= 1.3 ; #_DEGs= 826 ; Divergence= 1.11e-05 
## Control IDs : [ 2 3 ] & Case IDs : [ 2 6 ];  Fold_Cutoff= 1.3 ; #_DEGs= 828 ; Divergence= 1.71e-06 
## Control IDs : [ 2 3 ] & Case IDs : [ 3 6 ];  Fold_Cutoff= 1.3 ; #_DEGs= 823 ; Divergence= 1.88e-05

##    targets Freq     ratio
## 1        1   27 1.0000000
## 2        2   27 1.0000000
## 3        3   27 1.0000000
## 4        5   27 1.0000000
## 5        6   27 1.0000000
## 6        7   27 1.0000000
## 7        8   27 1.0000000
## 8        9   27 1.0000000
## 9       10   27 1.0000000
## 10      11   27 1.0000000
## 11      12   27 1.0000000
## 12      13   27 1.0000000
## 13      14   27 1.0000000
## 14      15   27 1.0000000
## 15      17   27 1.0000000
## 16      18   27 1.0000000
## 17      19   27 1.0000000
## 18      20   27 1.0000000
## 19      21   21 0.7777778
## 20      22   24 0.8888889

fCI

Shaojun Tang

Contents

1 `fCI`

2 Using `fCI`

3 Installing `fCI`

3.1 Dependency Checks

4 Examples on how to use `fCI`

4.1 Environment Setup:

4.2 Load the data

4.3 Finding Differentially Expressed Genes:

4.4 Result Interpretation

4.5 Illustrate the fCI analysis in details

4.6 Assign data to fCI object

4.7 multi-dimensional data

4.8 Normalization

4.9 identify a specific sample for differential expression

4.10 Time-course analysis

fCI

Shaojun Tang

Contents

1 fCI

2 Using fCI

3 Installing fCI

3.1 Dependency Checks

4 Examples on how to use fCI

4.1 Environment Setup:

4.2 Load the data

4.3 Finding Differentially Expressed Genes:

4.4 Result Interpretation

4.5 Illustrate the fCI analysis in details

4.6 Assign data to fCI object

4.7 multi-dimensional data

4.8 Normalization

4.9 identify a specific sample for differential expression

4.10 Time-course analysis

1 `fCI`

2 Using `fCI`

3 Installing `fCI`

4 Examples on how to use `fCI`