Contents

1 Introduction

RNASeqRData is a helper package for vignette in RNASeqR software package. This vignette shows the criteria of input_files and extraction process of mini example data.

2 input_files criteria

input.path.prefix is the parameter that stores the directory location of ‘input_files/’. Users have to prepare an ‘input_file/’ before running RNASeqR package workflow. The criteria of ‘input_file/’ are listed below:

3 Sample definition

The data in this experiment data package is originated from NCBI’s Sequence Read Archive for the entries SRR3396381, SRR3396382, SRR3396384, SRR3396385, SRR3396386, and SRR3396387. These samples were from Saccharomyces cerevisiae. To create mini data for demonstration purpose, reads aligned to the region from 0 to 100000 at chromosome XV were extracted. More details steps will be explained in the next chapter. Reference genome and gene annotation files, Saccharomyces_cerevisiae_XV_Ensembl.fa and Saccharomyces_cerevisiae_XV_Ensembl.gtf, are downloaded from iGenomes, Ensembl, R64-1-1.

4 Sample data preparation process

  1. SAMtools builds bam indexes of BAM files :
  1. SAMtools extracts reads in certain range :
  1. SAMtools sorts extracted BAM files :
  1. SAMtools gets splited fastq files :
  1. gzip fastq files :

Finally, mini data in this RNASeqRData package are created.

5 Session Information

sessionInfo()
## R version 3.5.0 (2018-04-23)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 17.10
## 
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] grid      stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
## [1] png_0.1-7       BiocStyle_2.9.6
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.18       bookdown_0.7       digest_0.6.17     
##  [4] rprojroot_1.3-2    backports_1.1.2    magrittr_1.5      
##  [7] evaluate_0.11      stringi_1.2.4      rmarkdown_1.10    
## [10] tools_3.5.0        stringr_1.3.1      xfun_0.3          
## [13] yaml_2.2.0         compiler_3.5.0     BiocManager_1.30.2
## [16] htmltools_0.3.6    knitr_1.20