TSRchitect Introduction

R. Taylor Raborn

2020-10-28

R. Taylor Raborn and Volker P. Brendel

Department of Biology, Indiana University

July 10, 2018; updated April 7, 2019; Sept. 17, 2019.

TSRchitect is an R package for analyzing diverse types of high-throughput transcription start site (TSS) profiling datasets. TSRchitect can handle TSS profiling experiments that contain either single-end or paired-end sequence reads, such as CAGE, RAMPAGE, PEAT, STRIPE-seq and others. TSRchitect is an open-source bioinformatics package that is intended to support large-scale, reproducible analysis of TSS profiling data in a broad array of eukaryotic model systems.

Before we can begin, we must first load TSRchitect in our working environment.

The TSRchitect User’s Guide is available; it goes through several well-documented examples of TSRchitect using different datasets.

To open the TSRchitect User’s Guide on your machine, enter the following:

Getting started

Now that you have loaded TSRchitect, we can proceed with a small example.

First, we will create a tssObject using loadTSSobj and import our sample .bam files (which are found in extdata/). In doing this, we provide sample names and identify our replicate names using the argument sampleNames. We do this in the following manner:

extdata.dir <- system.file("extdata/bamFiles", package="TSRchitect") 

tssObjectExample <- loadTSSobj(experimentTitle="Vignette Example",
inputDir=extdata.dir, n.cores=1, isPairedBAM=TRUE,
sampleNames=c("sample1-rep1", "sample1-rep2","sample2-rep1",
"sample2-rep2"), replicateIDs=c(1,1,2,2)) #datasets 1-2 and 3-4 are replicates
## ... loadTSSobj ...
## 
## Importing paired-end reads (first reads) ...
## 
## Beginning import of 4 bam files ...
## 
## Importing paired-end reads (last reads) ...
## Done. Alignment data from 4 bam files have been attached to the tssObject.
## -----------------------------------------------------
## 
## Names and replicate IDs were successfully added to the tssObject.
## -----------------------------------------------------
##  Done.

Please note that TSRchitect allows input also from bed-formatted files. You could replace the above lines with following equivalents:

extdata.dir <- system.file("extdata/bedFiles", package="TSRchitect") 

tssObjectExample <- loadTSSobj(experimentTitle="Vignette Example",
inputDir=extdata.dir, n.cores=1, isPairedBED=TRUE,
sampleNames=c("sample1-rep1", "sample1-rep2","sample2-rep1",
"sample2-rep2"), replicateIDs=c(1,1,2,2)) #datasets 1-2 and 3-4 are replicates

For convenience, loadTSSobj() may also be called with argument sampleSheet=“ssfile”, where ssfile is either a tab-delimited text file or an EXCEL spreadsheet (with extension .xls or .xlsx). In either case, the first row must have the header SAMPLE ReplicateID FILE and the same information described above in the respective columns. If necessary, you can provide input to loadTSSobj() consisting of both .bam and .bed files, but be aware that loading of the data into the tssObject will always add the .bam files before the .bed files and the internal numbering of the TSS sets will be accordingly.

If we wish to see our new tssObject, we simply type its name on the console and hit return, as follows:

tssObjectExample

Now that the .bam files have been imported, we need to retrieve TSSs from the BAM records and calculate the abundance of each tag at a given TSS. We opt to not run these in parallel and specify this by setting n.cores = 1.

tssObjectExample <- inputToTSS(experimentName=tssObjectExample)
## ... inputToTSS ...
## 
## Beginning input to TSS data conversion ...
## Retrieving data from bam file #1...
## Retrieving data from bam file #2...
## Retrieving data from bam file #3...
## Retrieving data from bam file #4...
## Done. TSS data from 4 separate bam files have been successfully
## added to the tssObject.
## ----------------------------------------------------
##  Done.
tssObjectExample <- processTSS(experimentName=tssObjectExample, n.cores=1,
tssSet="all", writeTable=FALSE)
## ... processTSS ...
## 
## ... the TSS expression matrix for dataset 1 has been successfully
## added to the tssObject.
## -----------------------------------------------------
## 
## ... the TSS expression matrix for dataset 2 has been successfully
## added to the tssObject.
## -----------------------------------------------------
## 
## ... the TSS expression matrix for dataset 3 has been successfully
## added to the tssObject.
## -----------------------------------------------------
## 
## ... the TSS expression matrix for dataset 4 has been successfully
## added to the tssObject.
## -----------------------------------------------------
## 
## -----------------------------------------------------
##  Done.

Now that this is complete we can we proceed with identifing TSRs from TSSs. We do this for each of the 4 datasets we imported at once by specifying tssSet=“all”.

tssObjectExample <- determineTSR(experimentName=tssObjectExample, n.cores=1,
tssSetType="replicates", tssSet="all", tagCountThreshold=25, clustDist=20,
writeTable=FALSE)
## ... determineTSR ...
## ... detTSR ...
## ---------------------------------------------------------
##  Done.
## ... detTSR ...
## ---------------------------------------------------------
##  Done.
## ... detTSR ...
## ---------------------------------------------------------
##  Done.
## ... detTSR ...
## ---------------------------------------------------------
##  Done.
## -----------------------------------------------------
##  Done.

Next we merge our replicate data (according to the information we provided in loadTSSobj) and identify TSRs on these merged samples. We then use addTagCountsToTSR to quantify the number of tags found in each of our 4 datasets.

tssObjectExample <- mergeSampleData(experimentName=tssObjectExample, n.cores=1,
tagCountThreshold=1)
## ... mergeSampleData ...
## 
## ... the TSS expression data have been merged
## and added to the tssObject object.
## ------------------------------------------------------
##  Done.
tssObjectExample <- determineTSR(experimentName=tssObjectExample, n.cores=1,
tssSetType="merged", tssSet="all", tagCountThreshold=25, clustDist=20,
writeTable=FALSE)
## ... determineTSR ...
## ... detTSR ...
## ---------------------------------------------------------
##  Done.
## ... detTSR ...
## ---------------------------------------------------------
##  Done.
## ... detTSR ...
## ---------------------------------------------------------
##  Done.
## -----------------------------------------------------
##  Done.
tssObjectExample <- addTagCountsToTSR(experimentName=tssObjectExample,
tsrSetType="merged", tsrSet=1, tagCountThreshold=25, writeTable=FALSE)
## ... addTagCountsToTSR ...
## 
## The merged TSR set for TSS dataset 1 will be written to file TSRsetMerged-1.txt
## in your working directory.
## ---------------------------------------------------------
##  Done.

Now that we have identified TSRs using determineTSR from all replicate and merged datasets, we can select individual results directly. To do this, we use getTSRdata, one of the tssObject accessor methods.

sample_1_1_tsrs <- getTSRdata(experimentName=tssObjectExample,
slotType="replicates", slot=1)

print(sample_1_1_tsrs)
##     seq    start      end strand nTSSs nTAGs tsrPeak tsrWdth tsrTrq tsrSI
## 2 chr22 11974144 11974144      -     1    44    1.00       1   0.00  2.00
## 3 chr22 11974187 11974199      -     3   141    0.47      13   1.32  0.48
##   tsrMSI
## 2   1.00
## 3   0.04

We see that two distinct TSRs were identified from this small example dataset.

Finally, we import an annotation file (containing Gencode annotated transcripts) and then assocate this annotation with our small set of identified TSRs.

gff3data.dir <- system.file("extdata", package="TSRchitect") 
tssObjectExample <- importAnnotationExternal(experimentName=tssObjectExample,
fileType="gff3",
annotFile=paste(gff3data.dir,"gencode.v19.chr22.transcript.gff3",sep="/"))
## ... importAnnotationExternal ...
## Done. Annotation data have been attached to the tssObject.
## -----------------------------------------------------
##  Done.
tssObjectExample <- addAnnotationToTSR(experimentName=tssObjectExample,
tsrSetType="merged", tsrSet=1, upstreamDist=2000, downstreamDist=500,
feature="transcript", featureColumnID="ID", writeTable=FALSE)
## ... addAnnotationToTSR ...
## 
## The merged TSR set for TSS dataset 1 will be written to file TSRsetMerged-1.txt
## in your working directory.
## Done. GeneIDs have been associated with adjacent TSRs.
## -----------------------------------------------------
##  Done.

Before we end, we choose to save our tssObject in an .RData file to return to at a later time:

save(tssObjectExample, file="tssObjectExample.RData")

This ends our vignette. Please see the TSRchitect User’s Guide for more extensively documented examples.