Introduction to derfinderData

If you wish, you can view this vignette online here.

Overview

derfinderData is a small data package with information extracted from BrainSpan (see here) (Boettiger, 2017) for 24 samples restricted to chromosome 21. The BigWig files in this package can then be used by other packages for examples, such as in derfinder and derfinderPlot.

While you could download the data from BrainSpan (Boettiger, 2017), this package is helpful for scenarios where you might encounter some difficulties such as the one described in this thread.

Data

The following code builds the phenotype table included in derfinderData. For two randomly selected structures, 12 samples were chosen with 6 of them being fetal samples and the other 6 coming from adult individuals. For the fetal samples, the age in PCW is transformed into age in years by

age_in_years = (age_in_PCW - 40) / 52

In other data sets you might want to subtract 42 instead of 40 if some observations have PCW up to 42.

## Construct brainspanPheno table
brainspanPheno <- data.frame(
    gender = c('F', 'M', 'M', 'M', 'F', 'F', 'F', 'M', 'F', 'M', 'M', 'F', 'M', 'M', 'M', 'M', 'F', 'F', 'F', 'M', 'F', 'M', 'M', 'F'),
    lab = c('HSB97.AMY', 'HSB92.AMY', 'HSB178.AMY', 'HSB159.AMY', 'HSB153.AMY', 'HSB113.AMY', 'HSB130.AMY', 'HSB136.AMY', 'HSB126.AMY', 'HSB145.AMY', 'HSB123.AMY', 'HSB135.AMY', 'HSB114.A1C', 'HSB103.A1C', 'HSB178.A1C', 'HSB154.A1C', 'HSB150.A1C', 'HSB149.A1C', 'HSB130.A1C', 'HSB136.A1C', 'HSB126.A1C', 'HSB145.A1C', 'HSB123.A1C', 'HSB135.A1C'),
    Age = c(-0.442307692307693, -0.365384615384615, -0.461538461538461, -0.307692307692308, -0.538461538461539, -0.538461538461539, 21, 23, 30, 36, 37, 40, -0.519230769230769, -0.519230769230769, -0.461538461538461, -0.461538461538461, -0.538461538461539, -0.519230769230769, 21, 23, 30, 36, 37, 40)
)
brainspanPheno$structure_acronym <- rep(c('AMY', 'A1C'), each = 12)
brainspanPheno$structure_name <- rep(c('amygdaloid complex', 'primary auditory cortex (core)'), each = 12)
brainspanPheno$file <- paste0('http://download.alleninstitute.org/brainspan/MRF_BigWig_Gencode_v10/bigwig/', brainspanPheno$lab, '.bw')
brainspanPheno$group <- factor(ifelse(brainspanPheno$Age < 0, 'fetal', 'adult'), levels = c('fetal', 'adult'))

We can then save the phenotype information, which is included in derfinderData.

## Save pheno table
save(brainspanPheno, file = 'brainspanPheno.RData')

Here is how the data looks like:

library('knitr')

## Explore pheno
p <- brainspanPheno[, -which(colnames(brainspanPheno) %in% c('structure_acronym', 'structure_name', 'file'))]
kable(p, format = 'html', row.names = TRUE)
gender lab Age group
1 F HSB97.AMY -0.4423077 fetal
2 M HSB92.AMY -0.3653846 fetal
3 M HSB178.AMY -0.4615385 fetal
4 M HSB159.AMY -0.3076923 fetal
5 F HSB153.AMY -0.5384615 fetal
6 F HSB113.AMY -0.5384615 fetal
7 F HSB130.AMY 21.0000000 adult
8 M HSB136.AMY 23.0000000 adult
9 F HSB126.AMY 30.0000000 adult
10 M HSB145.AMY 36.0000000 adult
11 M HSB123.AMY 37.0000000 adult
12 F HSB135.AMY 40.0000000 adult
13 M HSB114.A1C -0.5192308 fetal
14 M HSB103.A1C -0.5192308 fetal
15 M HSB178.A1C -0.4615385 fetal
16 M HSB154.A1C -0.4615385 fetal
17 F HSB150.A1C -0.5384615 fetal
18 F HSB149.A1C -0.5192308 fetal
19 F HSB130.A1C 21.0000000 adult
20 M HSB136.A1C 23.0000000 adult
21 F HSB126.A1C 30.0000000 adult
22 M HSB145.A1C 36.0000000 adult
23 M HSB123.A1C 37.0000000 adult
24 F HSB135.A1C 40.0000000 adult

We can verify that this is indeed the information included in derfinderData.

## Rename our newly created pheno data
newPheno <- brainspanPheno

## Load the included data
library('derfinderData')

## Verify
identical(newPheno, brainspanPheno)
## [1] TRUE

Using the phenotype information, you can use derfinder to extract the base-level coverage information for chromosome 21 from these samples. Then, you can export the data to BigWig files.

library('derfinder')

## Determine the files to use and fix the names
files <- brainspanPheno$file
names(files) <- gsub('.AMY|.A1C', '', brainspanPheno$lab)

## Load the data
system.time(fullCovAMY <- fullCoverage(
    files = files[brainspanPheno$structure_acronym == 'AMY'], chrs = 'chr21'))
#user  system elapsed 
#4.505   0.178  37.676 
system.time(fullCovA1C <- fullCoverage(
    files = files[brainspanPheno$structure_acronym == 'A1C'], chrs = 'chr21'))
#user  system elapsed 
#2.968   0.139  27.704
    
## Write BigWig files
dir.create('AMY')
system.time(createBw(fullCovAMY, path = 'AMY', keepGR = FALSE))
#user  system elapsed 
#5.749   0.332   6.045
dir.create('A1C')
system.time(createBw(fullCovA1C, path = 'A1C', keepGR = FALSE))
#user  system elapsed 
#5.025   0.299   5.323 

## Check that 12 files were created in each directory
all(c(length(dir('AMY')), length(dir('A1C'))) == 12)
#TRUE

## Save data for examples running on Windows
save(fullCovAMY, file = 'fullCovAMY.RData')
save(fullCovA1C, file = 'fullCovA1C.RData')

These BigWig files are available under extdata as shown below:

## Find AMY BigWigs
dir(system.file('extdata', 'AMY', package = 'derfinderData'))
##  [1] "HSB113.bw" "HSB123.bw" "HSB126.bw" "HSB130.bw" "HSB135.bw"
##  [6] "HSB136.bw" "HSB145.bw" "HSB153.bw" "HSB159.bw" "HSB178.bw"
## [11] "HSB92.bw"  "HSB97.bw"
## Find A1C BigWigs
dir(system.file('extdata', 'A1C', package = 'derfinderData'))
##  [1] "HSB103.bw" "HSB114.bw" "HSB123.bw" "HSB126.bw" "HSB130.bw"
##  [6] "HSB135.bw" "HSB136.bw" "HSB145.bw" "HSB149.bw" "HSB150.bw"
## [11] "HSB154.bw" "HSB178.bw"

Reproducibility

Code for creating the vignette

## Create the vignette
library('knitrBootstrap') 

knitrBootstrapFlag <- packageVersion('knitrBootstrap') < '1.0.0'
if(knitrBootstrapFlag) {
    ## CRAN version
    library('knitrBootstrap')
    system.time(knit_bootstrap('derfinderData.Rmd', chooser=c('boot',
        'code'), show_code = TRUE))
    unlink('derfinderData.md')
} else {
    ## GitHub version
    library('rmarkdown')
    system.time(render('derfinderData.Rmd',
        'knitrBootstrap::bootstrap_document'))
}
## Note: if you prefer the knitr version use:
# library('rmarkdown')
# system.time(render('derfinder.Rmd', 'html_document'))

## Extract the R code
library('knitr')
knit('derfinderData.Rmd', tangle = TRUE)

## Clean up
file.remove('derfinderDataRef.bib')

Date the vignette was generated.

## [1] "2017-10-31 12:26:00 EDT"

Wallclock time spent generating the vignette.

## Time difference of 1.891 secs

R session information.

## Session info -------------------------------------------------------------
##  setting  value                       
##  version  R version 3.4.2 (2017-09-28)
##  system   x86_64, linux-gnu           
##  ui       X11                         
##  language (EN)                        
##  collate  C                           
##  tz       posixrules                  
##  date     2017-10-31
## Packages -----------------------------------------------------------------
##  package        * version date       source        
##  backports        1.1.1   2017-09-25 CRAN (R 3.4.2)
##  base           * 3.4.2   2017-10-06 local         
##  bibtex           0.4.2   2017-06-30 CRAN (R 3.4.2)
##  compiler         3.4.2   2017-10-06 local         
##  datasets       * 3.4.2   2017-10-06 local         
##  derfinderData  * 0.112.0 2017-10-31 Bioconductor  
##  devtools       * 1.13.3  2017-08-02 CRAN (R 3.4.2)
##  digest           0.6.12  2017-01-27 CRAN (R 3.4.2)
##  evaluate         0.10.1  2017-06-24 CRAN (R 3.4.2)
##  graphics       * 3.4.2   2017-10-06 local         
##  grDevices      * 3.4.2   2017-10-06 local         
##  highr            0.6     2016-05-09 CRAN (R 3.4.2)
##  htmltools        0.3.6   2017-04-28 CRAN (R 3.4.2)
##  httr             1.3.1   2017-08-20 CRAN (R 3.4.2)
##  jsonlite         1.5     2017-06-01 CRAN (R 3.4.2)
##  knitcitations  * 1.0.8   2017-07-04 CRAN (R 3.4.2)
##  knitr          * 1.17    2017-08-10 CRAN (R 3.4.2)
##  knitrBootstrap * 1.0.1   2017-07-19 CRAN (R 3.4.2)
##  lubridate        1.7.0   2017-10-29 CRAN (R 3.4.2)
##  magrittr         1.5     2014-11-22 CRAN (R 3.4.2)
##  markdown         0.8     2017-04-20 CRAN (R 3.4.2)
##  memoise          1.1.0   2017-04-21 CRAN (R 3.4.2)
##  methods        * 3.4.2   2017-10-06 local         
##  plyr             1.8.4   2016-06-08 CRAN (R 3.4.2)
##  R6               2.2.2   2017-06-17 CRAN (R 3.4.2)
##  Rcpp             0.12.13 2017-09-28 CRAN (R 3.4.2)
##  RefManageR       0.14.20 2017-08-17 CRAN (R 3.4.2)
##  rmarkdown      * 1.6     2017-06-15 CRAN (R 3.4.2)
##  rprojroot        1.2     2017-01-16 CRAN (R 3.4.2)
##  stats          * 3.4.2   2017-10-06 local         
##  stringi          1.1.5   2017-04-07 CRAN (R 3.4.2)
##  stringr          1.2.0   2017-02-18 CRAN (R 3.4.2)
##  tools            3.4.2   2017-10-06 local         
##  utils          * 3.4.2   2017-10-06 local         
##  withr            2.0.0   2017-07-28 CRAN (R 3.4.2)
##  xml2             1.1.1   2017-01-24 CRAN (R 3.4.2)
##  yaml             2.1.14  2016-11-12 CRAN (R 3.4.2)

Bibliography

This vignette was generated using knitrBootstrap (BrainSpan, 2011) with knitr (Hester, 2017) and rmarkdown (Xie, 2014) running behind the scenes.

Citations made with knitcitations (Allaire, Cheng, Xie, McPherson, et al., 2017).

[1] J. Allaire, J. Cheng, Y. Xie, J. McPherson, et al. rmarkdown: Dynamic Documents for R. R package version 1.6. 2017. URL: https://CRAN.R-project.org/package=rmarkdown.

[1] C. Boettiger. knitcitations: Citations for 'Knitr' Markdown Files. R package version 1.0.8. 2017. URL: https://CRAN.R-project.org/package=knitcitations.

[1] BrainSpan. “Atlas of the Developing Human Brain [Internet]. Funded by ARRA Awards 1RC2MH089921-01, 1RC2MH090047-01, and 1RC2MH089929-01.” 2011. URL: http://developinghumanbrain.org.

[1] J. Hester. knitrBootstrap: 'knitr' Bootstrap Framework. R package version 1.0.1. 2017. URL: https://CRAN.R-project.org/package=knitrBootstrap.

[1] Y. Xie. “knitr: A Comprehensive Tool for Reproducible Research in R”. In: Implementing Reproducible Computational Research. Ed. by V. Stodden, F. Leisch and R. D. Peng. ISBN 978-1466561595. Chapman and Hall/CRC, 2014. URL: http://www.crcpress.com/product/isbn/9781466561595.