RNAsense is a tool to facilitate interpretation of time-resolved RNA-seq data. Typically it compares gene expression time curves for two different experimental conditions, e.g. wild-type and mutant. The aim is to provide basic functions to easily create plots of stage-specific gene sets like in Figure 1C of https://www.ncbi.nlm.nih.gov/pubmed/20212526.
Following the method of the paper, genes are sorted into different groups in two ways. First, wild-type and mutant condition are compared at each time point to get groups of differentially expressed transcripts that are up- or downregulated in the mutant. This is achieved by the function getFC
whose usage is described below. Second, the expression profiles of one experimental condition (typically wild-type) are tested for significant growth or decay. Similar to the idea in https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1920252/, a one-step and a zero-step (in fact the mean) function are fitted to the time-resolved data and compared by means of likelihood-ratio test. Thus, genes are sorted into non-overlapping groups by the time point of switch up or down. This step is achieved by the function getStep
.
Finally, the function plotSSGS
analyzes correlations between the outputs of getFC
and getStep
by means of Fisher’s exact test and plots the result in form of a heat map, with time profiles and differential expression groups at y- and x-axis, respectively.
First of all, data has to be provided in the right format. We made use of the S4-class SummarizedExperiment
that gives the possibility to provide additional information for example on covariates for the measurements that themselves are given as a numeric matrix. Here is an example how your data should look like:
## class: SummarizedExperiment
## dim: 15775 40
## metadata(0):
## assays(1): mydata
## rownames: NULL
## rowData names(2): name genename
## colnames(40): WT_2.5_B1 WT_3_B1 ... MZsox_5.5_B1 MZsox_6_B1
## colData names(3): condition time replicate
SummarizedExperiment
provides a constructor to easily bring your data into this format. When constructing your object, make sure to provide genenames in the rowData
argument and information on condition, time point and, if available, replicate identifier, in the colData
argument.
Next, the conditions that should be analyzed are specified and a threshold is provided which is used to exclude genes with expression levels below this threshold for all conditions. This can be useful, if expression levels are in the range of your detection limit. Since RNAseq
makes use of the parallel
package, you may specify a number of cores in order to speed up your computation time.
analyzeConditions <- c("WT", "MZsox")
thCount <- 100
nrcores <- 1
library(SummarizedExperiment)
#if(Sys.info()[[1]]=="Windows"){nrcores <- 1} # use parallelization only on Linux and Mac
mydata <- mydata[seq(1,nrow(mydata), by=4),]
vec2Keep <- which(vapply(1:dim(mydata)[1],function(i)
!Reduce("&",assays(mydata)[[1]][i,]<thCount), c(TRUE)))
mydata <- mydata[vec2Keep,] # threshold is applied
times <- unique(sort(as.numeric(colData(mydata)$time))) # get measurement times from input data
After data preparation, fold change detection can be performed. The function getFC
internally calles functions from the NBPSeq
package to perform fold change analysis for each gene and at each time point. The result is saved in a data.frame
with corresponding p-values.
resultFC <- getFC(dataset = mydata,
myanalyzeConditions = analyzeConditions,
cores = nrcores,
mytimes = times)
head(resultFC)
## name logFoldChange pValue time FCdetect
## 1 enc3 1.0151565 8.171045e-03 2.5 WT>MZsox
## 2 crtap -2.5849625 1.273691e-01 2.5 WT=MZsox
## 3 rpl3 -0.1478379 4.987150e-01 2.5 WT=MZsox
## 4 zic2b -0.5252563 3.605132e-01 2.5 WT=MZsox
## 5 ccn1 -2.5091311 1.099316e-08 2.5 WT<MZsox
## 6 vipas39 0.9668331 1.486301e-02 2.5 WT=MZsox
Note that each gene appears in the data frame as often as the number of time points. The result of the fold change analysis can be visualized as a vulcano plot:
library(ggplot2)
ggplot(subset(resultFC, FCdetect!="none"),
aes(x=logFoldChange, y=-log10(pValue), color=FCdetect)) +
xlab("log2(Fold Change)") + geom_point(shape=20)
Next, the gene expression profiles are analyzed for switches. A switch appears if the profile shows a statistically significant up- or downregulation at a specific time point. If up- or downregulation is detected at multiple time points, the time point with the best likelihood value (for the one-step model) is chosen. The result is saved in a data.frame
with information on whether a switch has been detected, at which time point and with which p-value based on the likelihood ratio test.
resultSwitch <- getSwitch(dataset = mydata,
experimentStepDetection = "WT",
cores = nrcores,
mytimes = times)
head(resultSwitch)
## name genename timepoint switch pvalueSwitch experiment
## 7 NM_001001402 enc3 NA none 0.40773375 WT
## 71 NM_001001406 crtap NA none 0.11729573 WT
## 6 NM_001001590 rpl3 NA none 0.10784956 WT
## 5 NM_001001820 zic2b NA none 0.07115325 WT
## 2 NM_001001826 ccn1 NA none 0.16854030 WT
## 4 NM_001001836 vipas39 4 down 0.04349482 WT
After fold change and switch analysis have been performed, results shall be collected in one and the same data.frame using the combineResults
function. This function basically prepares the results to be handed over to plotSSGS
.
## name genename timepoint switch pvalueSwitch experiment
## 7 NM_001001402 enc3 NA none 0.40773375 WT
## 71 NM_001001406 crtap NA none 0.11729573 WT
## 6 NM_001001590 rpl3 NA none 0.10784956 WT
## 5 NM_001001820 zic2b NA none 0.07115325 WT
## 2 NM_001001826 ccn1 NA none 0.16854030 WT
## 4 NM_001001836 vipas39 4 down 0.04349482 WT
## FCdown FCup
## 7 2.5hpf 3.0hpf 4.0hpf 6.0hpf
## 71 5.0hpf 5.5hpf 6.0hpf
## 6
## 5 3.0hpf 3.5hpf 4.0hpf
## 2 2.5hpf 3.0hpf 3.5hpf 5.0hpf 5.5hpf 6.0hpf
## 4 5.0hpf 5.5hpf 6.0hpf
Finally, the plotSSGS
function performs Fisher’s exact test for each combination of fold change time point and switch time point. Results are plotted as a heat map highlighting combinations with a high significance.
In order to document the result of the analysis, the function outputGeneTables
provides a possibility to automatically output switch and fold change information into table files (.txt).
## [1] "Results written to /tmp/RtmpqJYLbt/Rbuild66cf7a7a6283/RNAsense/vignettes"
The function outputGeneTables
generates five .txt files. Two of them (geneNamelist) contain gene lists with gene name for genes that switch up and down respectively. The other two (genelist) contain exactly the same output but with gene identifiers instead of gene names depending on what you prefer for further analysis. Each column corresponds to a combination of switch time point, fold change direction and time point of fold change. All genes for which fold change was detected at the indicated time point and switch was detected at the indicated time point are listed in the corresponding column. Note that a single gene may appear multiple times. The fifth .txt file (switchList) contains information on detected switches in a different format. The output consists of table with six columns with each row corresponding to one gene. Detected switches are indicated by 1, -1 and 0 for switch up, switch down and no switch, respectively. If a switch was detected, the column timepoint indicated the corresponding time point of switch detection.
## R version 3.6.1 (2019-07-05)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.3 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.10-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.10-bioc/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] parallel stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] ggplot2_3.2.1 SummarizedExperiment_1.16.0
## [3] DelayedArray_0.12.0 BiocParallel_1.20.0
## [5] matrixStats_0.55.0 Biobase_2.46.0
## [7] GenomicRanges_1.38.0 GenomeInfoDb_1.22.0
## [9] IRanges_2.20.0 S4Vectors_0.24.0
## [11] BiocGenerics_0.32.0 RNAsense_1.0.0
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.2 plyr_1.8.4 compiler_3.6.1
## [4] pillar_1.4.2 XVector_0.26.0 bitops_1.0-6
## [7] tools_3.6.1 zlibbioc_1.32.0 digest_0.6.22
## [10] evaluate_0.14 tibble_2.1.3 gtable_0.3.0
## [13] lattice_0.20-38 pkgconfig_2.0.3 rlang_0.4.1
## [16] Matrix_1.2-17 yaml_2.2.0 xfun_0.10
## [19] NBPSeq_0.3.0 GenomeInfoDbData_1.2.2 withr_2.1.2
## [22] dplyr_0.8.3 stringr_1.4.0 knitr_1.25
## [25] tidyselect_0.2.5 grid_3.6.1 qvalue_2.18.0
## [28] glue_1.3.1 R6_2.4.0 rmarkdown_1.16
## [31] reshape2_1.4.3 purrr_0.3.3 magrittr_1.5
## [34] splines_3.6.1 scales_1.0.0 htmltools_0.4.0
## [37] assertthat_0.2.1 colorspace_1.4-1 labeling_0.3
## [40] stringi_1.4.3 RCurl_1.95-4.12 lazyeval_0.2.2
## [43] munsell_0.5.0 crayon_1.3.4