library(GEOfastq)
GEOfastq
can be installed from Bioconductor as
follows:
if(!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
::install("GEOfastq") BiocManager
The NCBI Gene Expression
Omnibus (GEO) offers a convenient interface to explore
high-throughput experimental data such as RNA-seq. GEO deposits RNA-seq
data as sra files to the Sequence Read Archive (SRA) which can be
converted to fastq files using fastq-dump
. This conversion
process can be quite slow and it is usually more convenient to download
fastq files for a GEO accession generated by the European Nucleotide
Archive (ENA). GEOfastq
crawls GEO to retrieve metadata and
ENA fastq urls, and then downloads them.
To get fastq data for a GEO series, we first retrieve the metadata for a GEO accession:
<- 'GSE133758'
gse_name <- crawl_gse(gse_name) gse_text
Next, we extract the sample accessions for this study and retrieve the GEO metadata and ENA fastq url for an example:
<- extract_gsms(gse_text)
gsm_names <- gsm_names[182]
gsm_name <- crawl_gsms(gsm_name)
srp_meta #> 1 GSMs to process
Now that we have retrieved the necessary metadata, we are ready to download the fastq files for this sample:
<- tempdir()
data_dir
# example using smaller file
<- data.frame(
srp_meta run = 'SRR014242',
row.names = 'SRR014242',
gsm_name = 'GSM315559',
ebi_dir = get_dldir('SRR014242'), stringsAsFactors = FALSE)
<- get_fastqs(srp_meta, data_dir) res
The following package and versions were used in the production of this vignette.
#> R version 4.2.1 Patched (2022-07-09 r82577)
#> Platform: x86_64-apple-darwin17.0 (64-bit)
#> Running under: macOS Big Sur ... 10.16
#>
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
#>
#> locale:
#> [1] C/en_US.UTF-8/en_US.UTF-8/C/en_GB/en_US.UTF-8
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] GEOfastq_1.6.0
#>
#> loaded via a namespace (and not attached):
#> [1] Rcpp_1.0.9 knitr_1.40 magrittr_2.0.3 doParallel_1.0.17
#> [5] R6_2.5.1 rlang_1.0.6 fastmap_1.1.0 foreach_1.5.2
#> [9] stringr_1.4.1 plyr_1.8.7 tools_4.2.1 parallel_4.2.1
#> [13] xfun_0.34 cli_3.4.1 jquerylib_0.1.4 htmltools_0.5.3
#> [17] iterators_1.0.14 yaml_2.3.6 digest_0.6.30 bitops_1.0-7
#> [21] sass_0.4.2 codetools_0.2-18 RCurl_1.98-1.9 cachem_1.0.6
#> [25] evaluate_0.17 rmarkdown_2.17 stringi_1.7.8 compiler_4.2.1
#> [29] bslib_0.4.0 jsonlite_1.8.3