0. Installation

Please note that ‘rifi’ is only available for unix based systems. To install this package, start R (>= version “4.2”) and enter:

if (!requireNamespace("BiocManager", quietly = TRUE))
  install.packages("BiocManager")

BiocManager::install("rifi")

I. Introduction

The stability or halflife of bacterial transcripts is often estimated using Rifampicin timeseries data. Rifampicin has the special feature that it prevents the initiation of transcprition, but RNA polymerases which are already elongating are unaffected (Campbell et al. 2001). This has the implication that the RNA concentrations of positions downstream of the transcriptional start site appear unchanged until the last polymerase has passed this point. The result is a delayed exponential decay (Chen et al. 2015), which can be fitted by the following model:


\(c(t,n) = \begin{cases} \frac{\alpha}{\lambda} & \quad \text{if } t < \frac{n}{v}\\ \frac{\alpha}{\lambda} \times e^{-\lambda t} & \quad \text{if } t \geq \frac{n}{v} \end{cases}\)

The model (Chen et al. 2015) consists of two phases; the firts phase describes the delay where the transcript concentration is in its steady state defined by the ratio of the synthesis rate \(\alpha\) and the decay constant \(\lambda\) (\(steadystate = \frac{\alpha}{\lambda}\)). The delay is dependent on the distance from the transcriptional start site \(n\) and the transcription velocity \(v\). If the time after the Rifampicin additon is greater than the delay (\(delay = \frac{n}{v}\)) the exponential decay phase starts.

In addition to the standard model, we are using a second model which describes the behaviour at positions were the concentration increases after Rifampicin addition (Figure 1, right panel). This phenomenon can be explained by Rifampicin relievable transcription termination, e.g. through the transcriptional interference (TI) collision mechanism (Shearwin, Callen, and Egan 2005) or termination by short-lived factors such as sRNAs (Wang et al. 2015). In the following we will call this model the ‘TI model’ which consists of three phases:


\(c(t,n) = \begin{cases} \frac{\alpha - \alpha \times \beta}{\lambda} & \quad \text{if } t < \frac{n - n_{term}}{v}\\ \frac{\alpha}{\lambda} - \frac{\alpha \times \beta}{\lambda} \times e ^{-\lambda (t -\frac{n - n_{term}}{v})} & \quad \text{if } \frac{n - n_{term}}{v} < t < \frac{n}{v}\\ (\frac{\alpha}{\lambda} - \frac{\alpha \times \beta}{\lambda} \times e ^{-\lambda (t -\frac{n_{term}}{v})}) \times e^{-\lambda (t-\frac{n}{v})} & \quad \text{if } \frac{n}{v} \leq t \end{cases}\)

The first phase describes again the steady state concentration at a given transcript position, but here the synthesis rate \(\alpha\) is reduced by the TI-termination-factor \(\beta\). We assume a short lived factor responsible for the termination whose synthesis is stopped after rifampicin addition. Thus after the relieve of termination all polymerases that start at the transcriptional start site can reach positions downstream of the former termination site (\(n_{term}\)), the time polymerases need from the position of termination to the position \(n\) is delay for the increase (\(delay_{increase}= \frac{n - n_{term}}{v}\)). After the last polymerase has passed the respective position, the exponential decay phase starts.

‘rifi’ is a tool to do a stability analysis on high-throughput rifampicin data. RNA sequencing and microarray data derived from rifampicin treated bacteria with sufficiently high time resolution can reveal many insights into the mechanics of transcription, RNAP velocity and RNA stability. ‘rifi’ is a tool for the holistic identification of these transcription processes. The core part of the data analysis by rifi is the utilization of one of the two non linear regression models applied on the time series data of each probe (or bin), giving the probe/bin specific delay, decay constant \(\lambda\) and half-life (\(t_\frac{1}{2} = \frac{\ln(2)}{\lambda}\)) (Figure 1, left panel).


**Fit models**. Fits from both models. Left: the two-phase standard fit. Right the TI model fits the increase in intensity. Black dotes represent the average intensity for each timepoint, colored circles indicate the respective replicate.

Figure 1: Fit models
Fits from both models. Left: the two-phase standard fit. Right the TI model fits the increase in intensity. Black dotes represent the average intensity for each timepoint, colored circles indicate the respective replicate.


After the fit of the individual probes/bins, common worklfows usually combine the individual half-life values based on the given genome annotation to get an average for the gene based stability. This procedure can not deal with differences within a given gene, e.g. due to processing sites. ‘rifi’ uses an annotation agnostic approach to get an unbiased estimate of individual transcripts as they actually appear in vivo. probes/bins with equal properties in the extracted values delay, half-life, TI_termination_factor and the given intensity values are combined into segments by dynamic programming (called fragmentation in ‘rifi’), independent of an existing genome annotation (Figure 2). The fragmentation is performed hierarchically.
Initially segments of bins are grouped by regions without significant sequencing depth into position_segments. Those are grouped into delay_fragments by common velocity. Subsequently, each delay-fragment is grouped by similar half-life into half_life_fragments, on which the bins finally are grouped into intensity_fragments by similar intensity. From the fragmentation, many events can be extracted; iTSS (internal transcription start sites), transcription pausing_sites, velocity_changes,processing_sites, partial terminations, as well as instances of Rifampicin relievable transcription termination, e.g. by TI (transcription interference). All data are integrated to give an estimate of continous transcriptional units, i.e. operons. Comprehensive output tables and visualizations of the full genome result and the individual fits for all probes/bins are produced.



**Fragmentation hirarchy**. The hierarchy of the workflow is separated into five steps. At first segments are formed based on the distance to the next data point. The threshold is 200 nucleotides. Next, segments are parted into delay fragments. The delay fragments are parted by the half-life and likewise the half-life fragments are parted by intensity. Groups of TUs are formed based on the distance between the starts and ends of the delay fragments. Finally, within TUs, TI fragments are formed.

Figure 2: Fragmentation hirarchy
The hierarchy of the workflow is separated into five steps. At first segments are formed based on the distance to the next data point. The threshold is 200 nucleotides. Next, segments are parted into delay fragments. The delay fragments are parted by the half-life and likewise the half-life fragments are parted by intensity. Groups of TUs are formed based on the distance between the starts and ends of the delay fragments. Finally, within TUs, TI fragments are formed.


1. Quickstart

If you have your data prepared as described in The Input Data Frame you can use the rifi_wrapper to run ’rifi" with default options. rifi_wrapper conveniently wraps all functions included in rifi. That allows the user to run the whole workflow with one function. If the data contain a background component, e.g. in case of microarray data, take to define a meaningful background intensity.

The functions used are:

check_input
rifi_preprocess
rifi_fit
rifi_penalties
rifi_fragmentation
rifi_stats
rifi_summary
rifi_visualization

For rifi_wrapper you only need to provide the path to a .gff file of the respective genome and the input SummarizedExperiment object. The genome annotation is needed for the visualization and to map fragmented segments to annotated genes for an easier interpretation.

Path = gzfile(system.file("extdata", "gff_e_coli.gff3.gz", package = "rifi"))
wrapper_minimal <-
  rifi_wrapper(
    inp = example_input_e_coli,
    cores = 2,
    gff = Path,
    bg = 0,
    restr = 0.01
  )
}


The wrapper saves the output of each sub-function in a list. Thus each intermediate result can be re-run with custom settings. the object ‘wrapper_minimal’ contains only minimal artificial data to reduce the runtime of the test run.

data(wrapper_minimal)

# list of 6 SummarizedExperiment objects
length(wrapper_minimal)
## [1] 6
#the first intermediate result
wrapper_minimal[[1]]
## class: RangedSummarizedExperiment 
## dim: 4 33 
## metadata(2): replicates timepoints
## assays(1): ''
## rownames: NULL
## rowData names(7): position ID ... flag position_segment
## colnames: NULL
## colData names(2): timepoint replicate


2. The output

A small example output can be loaded with data(summary_minimal). The final output is a SummarizedExperiment object.

All main results are stored as metadata in the SummarizedExperiment object. The first five entries are not of immediate importance.

data(summary_minimal)

# the main results
names(metadata(summary_minimal))   
##  [1] "timepoints"                        "replicates"                       
##  [3] "logbook"                           "logbook_details"                  
##  [5] "annot"                             "dataframe_summary_1"              
##  [7] "dataframe_summary_2"               "dataframe_summary_events"         
##  [9] "dataframe_summary_events_HL_int"   "dataframe_summary_events_ps_itss" 
## [11] "dataframe_summary_events_velocity" "dataframe_summary_TI"

1. bin/probe based results

Entry 6, dataframe_summary_1, contains all fit results for each individual bin/probe and a mapping to the genome annotation. They can be exported to an .csv file using write.csv().

# bin/probe probe based results
head(metadata(summary_minimal)$dataframe_summary_1)
##   ID feature_type gene       locus_tag position strand segment   TU
## 1  1         <NA> <NA>            <NA>       50      +     S_1 TU_1
## 2  2         <NA> <NA>            <NA>      100      +     S_1 TU_1
## 3  3         <NA> <NA>            <NA>      150      +     S_1 TU_1
## 4  4          CDS thrL BW25113_RS00005      200      +     S_1 TU_1
## 5  5          CDS thrL BW25113_RS00005      250      +     S_1 TU_1
## 6  6         <NA> <NA>            <NA>      300      +     S_1 TU_1
##   delay_fragment delay HL_fragment half_life intensity_fragment intensity  flag
## 1            D_1  0.08        Dc_1      1.57                I_1    138.55 _ABG_
## 2            D_1  0.17        Dc_1      1.54                I_1    146.89 _ABG_
## 3            D_1  0.50        Dc_1      1.52                I_1    152.35 _ABG_
## 4            D_1  0.76        Dc_1      1.54                I_1    163.39 _ABG_
## 5            D_1  1.00        Dc_1      1.50                I_1    149.01 _ABG_
## 6            D_1  1.21        Dc_1      1.58                I_1    160.37 _ABG_
##   TI_termination_factor
## 1                    NA
## 2                    NA
## 3                    NA
## 4                    NA
## 5                    NA
## 6                    NA
#export to csv
write.csv(metadata(summary_minimal)$dataframe_summary_1, file="filename.csv")

2. fragment based results

Entry 7, dataframe_summary_2, contains all fit results for each intensity_fragment. Due to the hierachic fragmentation strategy [Figure 2] several intensity_fragment (I_x) can belong to one decay_fragment (Dc_x), one delay_fragment (D_x) and one transcriptional unit (TU_x). The fragment IDs can be used to locate the fragment in the output PDF file. The fragments are mapped to the genome annotation and the mean halflife given in minutes and the mean intensity is given with their standard deviations. Also the estimated velocity in nt/min is given. The table can be exported as an .csv file using write.csv().

# all estimated fragments
metadata(summary_minimal)$dataframe_summary_2
##   feature_type    gene          locus_tag first_position_frg last_position_frg
## 1       NA|CDS NA|thrL NA|BW25113_RS00005                 50               300
## 2          CDS    thrA    BW25113_RS00010                350               600
## 3          CDS    thrA    BW25113_RS00010                650               900
## 4           NA      NA                 NA               1151              1051
## 5           NA      NA                 NA               1001               901
##   strand   TU segment delay_fragment HL_fragment half_life HL_SD HL_SE
## 1      + TU_1     S_1            D_1        Dc_1      1.54  0.03  0.01
## 2      + TU_1     S_1            D_1        Dc_2      3.04  0.04  0.02
## 3      + TU_1     S_1            D_2        Dc_3      3.00  0.05  0.02
## 4      - TU_2     S_2            D_3        Dc_4      2.00  0.02  0.01
## 5      - TU_2     S_2            D_3        Dc_4      2.00  0.02  0.01
##   intensity_fragment intensity intensity_SD intensity_SE velocity
## 1                I_1    151.53         9.12         3.72   143.86
## 2                I_2    300.24         5.75         2.35   143.86
## 3                I_3    446.67         7.36         3.00   417.25
## 4                I_4    205.03         1.78         1.03   201.10
## 5                I_5    102.47         3.88         2.24   201.10
#export to csv
write.csv(metadata(summary_minimal)$dataframe_summary_2, file="filename.csv")

3. Transcription events

Entry 8, dataframe_summary_events, contains all estimated transcriptional events. Events always appear between two transcript fragments.

# all estimated events
metadata(summary_minimal)$dataframe_summary_events
##          event  p_value p_adjusted FC_HL FC_intensity FC_HL_adapted
## 1       iTSS_I       NA         NA    NA           NA            NA
## 2  Termination       NA         NA    NA           NA          0.99
## 3      iTSS_II       NA         NA    NA           NA          1.97
## 4      iTSS_II       NA         NA    NA           NA          0.99
## 5    Int_event       NA         NA    NA         0.98          1.97
## 6    Int_event       NA         NA    NA         0.57          0.99
## 7    Int_event       NA         NA    NA        -1.00          0.99
## 8     HL_event 2.68e-10   4.02e-10  0.98           NA            NA
## 9     HL_event 1.07e-11   3.21e-11 -0.02           NA            NA
## 10    velocity 1.91e-04   1.91e-04    NA           NA            NA
##    FC_HL_FC_intensity event_position velocity_ratio feature_type gene
## 1                  NA            625             NA          CDS thrA
## 2                0.50           1026             NA                  
## 3                1.00            325             NA                  
## 4                1.51            625             NA          CDS thrA
## 5                  NA            325             NA                  
## 6                  NA            625             NA          CDS thrA
## 7                  NA           1026             NA                  
## 8                  NA            325             NA                  
## 9                  NA            625             NA          CDS thrA
## 10                 NA            625            2.9          CDS thrA
##          locus_tag strand   TU             segment_1             segment_2
## 1  BW25113_RS00010      + TU_1          S_1|TU_1|D_1          S_1|TU_1|D_2
## 2                       - TU_2 S_2|TU_2|D_3|Dc_4|I_4 S_2|TU_2|D_3|Dc_4|I_5
## 3                       + TU_1 S_1|TU_1|D_1|Dc_1|I_1 S_1|TU_1|D_1|Dc_2|I_2
## 4  BW25113_RS00010      + TU_1 S_1|TU_1|D_1|Dc_2|I_2 S_1|TU_1|D_1|Dc_3|I_3
## 5                       + TU_1 S_1|TU_1|D_1|Dc_1|I_1 S_1|TU_1|D_1|Dc_1|I_2
## 6  BW25113_RS00010      + TU_1 S_1|TU_1|D_1|Dc_2|I_2 S_1|TU_1|D_1|Dc_2|I_3
## 7                       - TU_2 S_2|TU_2|D_3|Dc_4|I_4 S_2|TU_2|D_3|Dc_4|I_5
## 8                       + TU_1     S_1|TU_1|D_1|Dc_1     S_1|TU_1|D_1|Dc_2
## 9  BW25113_RS00010      + TU_1     S_1|TU_1|D_1|Dc_2     S_1|TU_1|D_1|Dc_3
## 10 BW25113_RS00010      + TU_1          S_1|TU_1|D_1          S_1|TU_1|D_2
##    event_duration gap_fragments features
## 1           -1.27            50        2
## 2              NA            50        3
## 3              NA            50        4
## 4              NA            50        4
## 5              NA            50        2
## 6              NA            50        2
## 7              NA            50        2
## 8              NA            50        2
## 9              NA            50        2
## 10             NA            50        2
#export to csv
write.csv(metadata(summary_minimal)$dataframe_summary_events, file="filename.csv")


Changes in the linear increase of the delay indicate a potential pausing site if there is a sudden delay increase (Figure 3A), a potential internal starting site iTSS_I if there is a sudden decrease in the delay (Figure 3B) and a velocity change of the RNA polymerase if there is a slope change (Figure 3C). The events are statistically evaluated by Ancova (apply_ancova) or a t-test (apply_Ttest_delay).
A fragment border between halflife segments (HL_event) which belong to the same transcriptional unit might indicate a processing site (Figure 3D) and a fragment border between intensity fragments (Int_event) (Figure 3E) can indicate a processing site (Figure 3F), a new transcriptional start site (iTSS_II) (Figure 3G) or an partial termination (Figure 3H), depending on the respective intensity foldchanges and the halflife foldchanges.
Each event is described by is type, its p-Value and adjusted p-Value, the foldchange or event duration, the estimated event position, a mapping to existing annotations and the IDs of the two bordering fragments.