DEGseq {DEGseq}R Documentation

DEGseq: Identify Differentially Expressed Genes from RNA-seq data

Description

This function is used to identify differentially expressed genes from RNA-seq data. It takes uniquely mapped reads from RNA-seq data for the two samples with a gene annotation as input. So users should map the reads (obtained from sequencing libraries of the samples) to the corresponding genome in advance.

Usage

DEGseq(mapResultBatch1, mapResultBatch2, fileFormat="bed", readLength=32,
       strandInfo=FALSE, refFlat, groupLabel1="group1", groupLabel2="group2",
       method=c("LRT", "CTR", "FET", "MARS", "MATR", "FC"), 
       pValue=1e-3, zScore=4, qValue=1e-3, foldChange=4, thresholdKind=1,
       outputDir="none", normalMethod=c("none", "loess", "median"),
       depthKind=1, replicate1="none", replicate2="none",
       replicateLabel1="replicate1", replicateLabel2="replicate2")

Arguments

mapResultBatch1

vector containing uniquely mapping result files for technical replicates of sample1 (or replicate1 when method="CTR").

mapResultBatch2

vector containing uniquely mapping result files for technical replicates of sample2 (or replicate2 when method="CTR").

fileFormat

file format: "bed" or "eland".
example of "bed" format: chr12 7 38 readID 2 +
example of "eland" format: readID chr12.fa 7 U2 F
Note: The field separator character is TAB. And the files must follow the format as one of the examples.

readLength

the length of the reads (only used if fileFormat="eland").

strandInfo

whether the strand information was retained during the cloning of the cDNAs.

  • "TRUE" : retained,

  • "FALSE": not retained.

refFlat

gene annotation file in UCSC refFlat format.
See http://genome.ucsc.edu/goldenPath/gbdDescriptionsOld.html#RefFlat.

groupLabel1

label of group1 on the plots.

groupLabel2

label of group2 on the plots.

method

method to identify differentially expressed genes. Possible methods are:

  • "LRT": Likelihood Ratio Test (Marioni et al. 2008),

  • "CTR": Check whether the variation between two Technical Replicates can be explained by the random sampling model (Wang et al. 2009),

  • "FET": Fisher's Exact Test (Joshua et al. 2009),

  • "MARS": MA-plot-based method with Random Sampling model (Wang et al. 2009),

  • "MATR": MA-plot-based method with Technical Replicates (Wang et al. 2009),

  • "FC" : Fold-Change threshold on MA-plot.

pValue

pValue threshold (for the methods: LRT, FET, MARS, MATR).
only used when thresholdKind=1.

zScore

zScore threshold (for the methods: MARS, MATR).
only used when thresholdKind=2.

qValue

qValue threshold (for the methods: LRT, FET, MARS, MATR).
only used when thresholdKind=3 or thresholdKind=4.

thresholdKind

the kind of threshold. Possible kinds are:

  • 1: pValue threshold,

  • 2: zScore threshold,

  • 3: qValue threshold (Benjamini et al. 1995),

  • 4: qValue threshold (Storey et al. 2003),

  • 5: qValue threshold (Storey et al. 2003) and Fold-Change threshold on MA-plot are both required (can be used only when method="MARS").

foldChange

fold change threshold on MA-plot (for the method: FC).

outputDir

the output directory.

normalMethod

the normalization method: "none", "loess", "median" (Yang,Y.H. et al. 2002).
recommend: "none".

depthKind

1: take the total number of reads uniquely mapped to genome as the depth for each replicate,
0: take the total number of reads uniquely mapped to all annotated genes as the depth for each replicate.
We recommend taking depthKind=1, especially when the genes in annotation file are part of all genes.

replicate1

files containing uniquely mapped reads obtained from replicate batch1 (only used when method="MATR").

replicate2

files containing uniquely mapped reads obtained from replicate batch2 (only used when method="MATR").

replicateLabel1

label of replicate batch1 on the plots (only used when method="MATR").

replicateLabel2

label of replicate batch2 on the plots (only used when method="MATR").

References

Benjamini,Y. and Hochberg,Y. (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B 57, 289-300.

Jiang,H. and Wong,W.H. (2009) Statistical inferences for isoform expression in RNA-seq. Bioinformatics, 25, 1026-1032.

Bloom,J.S. et al. (2009) Measuring differential gene expression by short read sequencing: quantitative comparison to 2-channel gene expression microarrays. BMC Genomics, 10, 221.

Marioni,J.C. et al. (2008) RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res., 18, 1509-1517.

Storey,J.D. and Tibshirani,R. (2003) Statistical significance for genomewide studies. Proc. Natl. Acad. Sci. 100, 9440-9445.

Wang,L.K. and et al. (2010) DEGseq: an R package for identifying differentially expressed genes from RNA-seq data, Bioinformatics 26, 136 - 138.

Yang,Y.H. et al. (2002) Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Research, 30, e15.

See Also

DEGexp, getGeneExp, readGeneExp, kidneyChr21.bed, liverChr21.bed, refFlatChr21.

Examples

  kidneyR1L1 <- system.file("extdata", "kidneyChr21.bed.txt", package="DEGseq")
  liverR1L2  <- system.file("extdata", "liverChr21.bed.txt", package="DEGseq")
  refFlat    <- system.file("extdata", "refFlatChr21.txt", package="DEGseq")
  mapResultBatch1 <- c(kidneyR1L1)  ## only use the data from kidneyR1L1 and liverR1L2
  mapResultBatch2 <- c(liverR1L2)
  outputDir <- file.path(tempdir(), "DEGseqExample")
  DEGseq(mapResultBatch1, mapResultBatch2, fileFormat="bed", refFlat=refFlat,
         outputDir=outputDir, method="LRT")
  cat("outputDir:", outputDir, "\n")

[Package DEGseq version 1.47.0 Index]