version 4.5.1: - changed variant calling tests in accordance with changes made to bam_tally in gmapR version 4.4: - bioc release version bump version 4.2: - bioc release version bump version 4.1.15: - investigating a memory fault produced by tallyVariants() in the test.callVariantsVariantTools.genotype() unit test version: 4.1.13: - fixing NGSPIPE-191: removed mergeReads and peared references ; changed BioC imports ; should build on the BioC server (but cannot right now due to VariantTools::tallyVariants memory issues) version 4.1.11: - fixed NGSPIPE-182: updated the documentation of generateSingleGeneDERs(), highlighting that the function generates DEXSeq-ready exons by disjoining the whole exon set, keeping only the exons of coding regions and keeping only the exons that belong to unique genes. version 4.1.10: - add simple check for duplicate ids to prepocessReads. Note this will only catch dups within a chunk. version 4.1.9: - fixed a small glitch that prevented truncateReads to work with trimReads.trim5 config parameter only (without length truncation) version 4.1.8: - fixed NGSPIPE-167: added 5'-read trimming with the configuration parameter trimReads.trim5 (code, unit test, configuration check code, handling corner cases) version 4.1.7: - fixed the NGSPIPE-150 ticket: cleaned up readRNASeqEnds and removed consolidateByRead version 4.1.6: - fixed a buglet (absence of summary_alignment.tab file) that prevented test.callVariantsVariantTools.genotype() to run - moved the loading of genomic_features in countGenomicFeatures and not countGenomicFeaturesChunk, for speedup - R CMD check ok version 4.1.5: - fixed the NGSPIPE-172 ticket on "variant calling slow on GRCH38". The selftest took about 1.5 h to complete, due to the analyzeVariants() phase of the two test.runPipeline.Exome() tests, each lasting about 0.6 h. This was due to (1) a wrong loading of the huge dbSNP database for avgNborCount filtering (needed only if the avgNborCount keyword is present in the config parameter "analyzeVariants.postFilters". And due to (2) a costly tiling of the genome, which is not required for small datasets (with a number of reads lower than 100e6). The selftest now takes about 34 minutes to complete. version 4.1.4: - added test.readRNASeqEnds.dupmark() to test if readRNASeqEnds() handles properly dups (which is the case; the bug NGSPIPE-180 on "Ignore Dup marked reads during feature counting" is due to the fact that markdup is done on whole bams, while featuring counting is done on chunks). This revision closes NGSPIPE-180. version 4.1.3: - filter out alignments that are marked as dups and QC failed using the proper bam flag before feature counting and coverage version 4.1.2: - activate read quality trimming for GATK-rescaled - move sanger quality at the end of the list of possible qualities, as we do not do quality trimming for real sanger. Current and Recent data from illuma should always come with illumina 1.5 or 1.8 range, so we want to make sure the read triming is triggered. version 4.1.1: - createTmpDir() is now more robust and retries if dir already exists. this is needed as the indel realigner step uses this function to parallelize by chromosomes. For genomes with lots of chromomes/contigs such as GRCh38 (>200) this fixed a rare bug. version 4.1.0: - starting new dev cycle version 3.17.9: - add variant_effect_predictor step version 3.17.8: - make sure chunk_dir points to some non-existant dir version 3.17.7: - VT changed its default Filters, probably due to an error. Until that is resolved, we manually create out Filterset ourselves. version 3.17.6: - de-activate indel Realignment step in mergeLanes again. Doesn't work as expected as chunk_firs are set to in config which throws up indel realignment step. version 3.17.5: - update buildGenomicFeaturesFromTxdb with GNE internal version - add tests version 3.17.4: - add config mergeReads.pear_param passed down to pear binary view peared::pear() version 3.17.3: - activate indel Realignment step in mergeLanes version 3.17.2: - add mergeReads step using pear version 3.17.1 - new dev cycle - fix some missing imports that causes problems when functions re-used in other package version 3.15.7 - use less cores for VT genotyping to use less memory version 3.15.6 - adapt to changes in API of VariantTools BSgenome.Hsapiens.UCSC.hg19 - use GeneGenome from gmapR to create test genomes version 3.15.5 - call genotypes via VariantTools controlled by config value version 3.15.4 - do some cleanup and gc during coverage computation version 3.15.3 - parallelize Indel realignment by chromosome version 3.15.2 - use a binary tree-reduce like algorithm to merge coverages. For a big WGS this reduced runtime from 12h to 2 h. version 3.15.1 - start next dev cycle - remove count_transcripts and count_ncRNA_nongenic from QA version 3.14.1 - 3.14.1 release - fixed the fragmentLength NA bug spotted by Jinfeng version 3.13.20: - use .gz suffix for vcf files to make IGV happy - reactivate genome tiling with larger tiles of 1e8 version 3.13.19: - force coverage to use SimpleIntegerLists if data becomes too dense - disable genome tiling during bam_tallying, as it incures a huge time overhead under some circumstances version 3.13.18: - new convenience function findVariantFile - use genome tiling with VariantTools to keep memory footprint small version 3.13.17: - depend on VariantTools 1.6.1 to get the bamtally memory leak fix version 3.13.16: - version bump to match HTSeqGenie.gne version 3.13.14: - turn off read_min_length check in mergeLanes as due to qual trimming we now expect lanes to differ version 3.13.13: - speed up coverage step for WGS data by switching back to SimpleIntegerList if the coverage becomes too ragged. version 3.13.12: - repurpose the gatk.params config value, for general GATK params. This param will be passed on to all GATK calls. An example would be gatk.params: -L intervals.bed to restrict realignment and variant calling to a certain subset of the genome thus saving huge amounts of time for targeted sequencing approaches with a few hundred genes. version 3.13.12: - now uses TP53Which() (instead of which.rds) for variant calling tests - now builds in BioC version 3.13.11: - now use TxDb.Hsapiens.UCSC.hg19.knownGene's version number in the TP53 genomic feature cache file, to avoid bugs when updating TxDb.Hsapiens.UCSC.hg19.knownGene version 3.13.10: - added an optional which config to limit VariantTools variant calling to certain regions version 3.13.9: - NA version 3.13.8: - fixed the "as.vector(x, mode) : invalid 'mode' argument" bug by adding the "importFrom(BiocGenerics,table)" NAMESPACE directive version 3.13.7 - enable indelRealigner in runPipeline controlled by config value of alignReads.do version 3.13.6 - set gatk.path in options if GATK_PATH env is set to an existing file. This is mostly for allowing the unit tests to know if and where a GATK is installed version 3.13.5: - require bamtally bugfixed version of gmapR version 3.13.4: - fixed bug in generateCountFeaturesPlots() when no genes left after filtering. Practically only happens for bacterial or viral genomes when every gene is hit more than 200 times version 3.13.3: - now saving fragmentLength in summary_coverage.txt - using Jinfeng's weighted mean to estimate fragmentLength - removed the parallelization in saveCoverage (since saveCoverage is used in parallel) version 3.13.2 - activate read trimming for illumina 1.5 quality using the 'B' as qual indicator version 3.13.1 - deactived the unit tests (test.wrap.callVariants, test.wrap.callVariants.parallel, test.wrap.callVariants.rmsk_dbsnp) until VariantAnnotation is fixed, to be able to compile HTSeqGenie on BioC servers version 3.13.0 - change to new dev cycle version number version 3.12.0 - release version version 3.11.20 - add extra config value to turrn off gatk repeat filtering - do not filter GATK variants for repeats by default, as this produces buggy vcf version 3.11.19 - make indel calling default for VT version 3.11.18 -make the summary qa report much more robust. Now runs on single and paired end, Varaint calling is optional. Also works on non human/mouse genomes version 3.11.17 - make sure VRanges are sorted before writing as vcf until Variantannotatin fixes bug version 3.11.16 - fixed the NA bug when the file indicated by template_config is not found, now reports an explicit error message version 3.11.15 - filter GATK variants by overlap with low_complexit regions if bed file given in config version 3.11.14 -fixed bug in VariantCalling where dbsnp whitelist was effectively not passed but lost in the '...' orkus. version 3.11.13 - remove getRDataFromFile and replace with safeGetObject() which is used in a couple of scripts version 3.11.12 - fix bug in sanity check that did not count circular mappings version 3.11.11 - now breaks properly when input FastQ gzipped files are corrupted/truncated version 3.11.10 - vignette updated to build on BioC dev version 3.11.9 - TP53GenomicFeatures() now uses the same temp dir as TP53Genome(), to fix a build issues on BioC servers version 3.11.8 -choice of VT filters configurable via config version 3.11.7 - mergeBams(..., sort=FALSE) in alignReads.R: since bams are already sorted in chunks, merging preserves the order and no re-sorting should is necessary version 3.11.6 - saveCoverage now outputs a summary.coverage summary file version 3.11.5 - directly export coverage from Rle with rtracklayer::export (instead of to make a memory-costly convertsion to a RangedData object) version 3.11.4 - update variant calling code to work with VariantTools 1.3.6 - include Jens' minlength=1 modification to handle reads fully trimmed version 3.11.3 - exports loginfo, logdebug, logwarn, logerror - uses TallyVariantsParam and as(,"VRanges") to fix a bug preventing compilation on BioC - the number of threads used during processChunks is divided by 2 due to an erroneous extra mcparallel(...) step in sclapply/safeExecute - use a maximum of 12 cores in preprocessReads version 3.11.2 - trim lowest quality tail of reads according to Illumina manual. This new quality step precedes the regular quality filtering that is still done. version 3.11.1 - allow to use better filters for variant tools by default, uses standard filters (bad), but if repeat masked track and dbsnp are given in config it uses those. version 3.11.0: - starting the new development branch version 3.10.1: - release version version 3.9.31: -use new VariantTools API to do id verifu in mergeLanes. works for both GATK and VariantTools version 3.9.30: - fix logger bug in mergeLanes version 3.9.29: - make report_pipeline_QA work with new version, since some fields in the summaries are gone or renamed version 3.9.28: - revert recent changes to gatk.R. We now make our own gz and index file, as GATK repeatedly failed on some samples. version 3.9.27: - allow multiple paths in HTSEQGENIE_CONFIG, separated by : version 3.9.26: - removed preschedule=FALSE in wrapGsnap - added vcfStat to produce summary_variants.tab file - changed vcf name from "_variants.vcf.gz" to ".variants.vcf.gz", to stay consistent within the results/ directory version 3.9.25: - VariantTools "analyzeVariants.indels" are OK version 3.9.24 - aloow for new quality score range "GATK-rescaled" from 1-50 (33-83 in ASCII) version 3.9.23 - added the config parameter "analyzeVariants.indels" version 3.9.22 - removed the dependency towards the "logging" package version 3.9.21 - variant calling via GATK version 3.9.20: - preparation to BioC submission - now using detectRRNA.do: FALSE in default-config.txt version 3.9.19: - removed mc.preschedule=FALSE from mergeBAMsAcrossDirs - added the configuration parameter 'analyzeVariants.method' (GATK check has yet to be done) version 3.9.18 - now depends on VariantTools 1.1.13 that fixes the mclapply(mc.preschedule=FALSE) bug version 3.9.17: - added the config parameter 'alignReads.analyzedBam' to control how analyzed.bam are built - removed the former config parameter 'alignReads.analyzed_bamregexp' that could not work on single ends version 3.9.16: - sessionInfo() is not called anymore in writePreprocessAlignReport() during generation of report, to prevent crash when PACKAGES have been updated while the pipeline is running - sclapply() now uses a 'finally' cleanup procedure to kill all threads it has created - added some unit tests to check that no leftover threads are present after sclapply() in different scenarios version 3.9.15: - use low lever variant calling interface from VariantTools. This allows for access to the raw_variants as well as the filtered ones/ - variant calling now included in mergeLanes() version 3.9.14: - added the config parameter "alignReads.use_gmapR_gsnap" to control if gsnap should be called from gmapR or from the PATH - default config parameter "alignReads.use_gmapR_gsnap" is now TRUE - removed the duplicated default config parameters: path.picard_tools, markDuplicates.do - added a check in checkConfig() to stop if some config paramters are duplicated version 3.9.13: - add variant calling using VariantTools (not yet parallelized yet) version 3.9.12: - include markDuplicates into runPipeline(), controlled by markDuplicates.do config version 3.9.11: - refactor setupTestFramework() to allow for injection of TP53 genome template version 3.9.10: - add function to mark duplicates via picard tools version 3.9.9: - fixed detectRRNA code, including bug in wrapGsnap - add test for detectRRNA working on tp53 genome version 3.9.8: - the system command 'samtools' is no used anymore in the code - removed unused functions: indexBAMFiles, filterBam, getReadLengthFromBam, getBamIndexStats - (filterBam will be back in the xenograft module) version 3.9.7: - works with Biobase 2.18.0 (Bioconductor release 2.11) - fixed the "x is not present in the PATH" bogus message - old gmapR stuffs are now gone: parallelized_gsnap, consolidateSAM, consolidateGsnapOutput, consolidateBAm - now use wrapGsnap, to facilitate the transition to the gsnap offered by gmapR - now depends on gmapR (to load TP53Genome()) version 3.9.6: - make remaining tests run with TP53 genome - move detectRRNA tests to HTSeqGenie.gne as they depend on IGIS version 3.9.5: - remove runPipeline tests depending on IGIS. Instead use simple integration test based on TP53 genome. This requried additon of : R/runPipeline.R R/TP53GenomicFeatures.R copied from bioc branch and dependance on gmapR for the TP53Genome version 3.9.4: - converging with the BioC version: adding @internal keyword - configuration parameter version 3.9.3: - minor comments (converging with the BioC version...) - checks OK on module apps/ngs_pipeline/dev version 3.9.2: - removed everything related to calculateJunctionReads, junctionReads (due to the usage of an obsolete newCompressedList in BioC) - checks OK on apps/ngs_pipeline/dev version 3.9.1: - removed everything related to SNVsOmuc, analyzeVariants, variantConcordance (due to gmapR conflict) - renamed CHANGES into NEWS version 3.9.0: - strict copy from 3.8.0 version 3.8.0: - added the configuration parameter "filterQuality.minLength", to remove reads shorter than filterQuality.minLength during preprocessReads(). Default is NULL. - added the configuration parameter "alignReads.analyzed_bamregexp", to specify the regexp to select bam files to build analyzed.bam. Default is "_uniq.*\.bam$". - added the configuration parameter "coverage.do" to enable/disable coverage computation. Default is TRUE. - added the configuration parameter "coverage.maxFragmentLength" to remove long read pairs, as suggested by Thomas when analysing ChIP-Seq. Default is NULL. - the ChIP-Seq config files now have "alignReads.analyzed_bamregexp: concordant_uniq.*\.bam$" and "coverage.maxFragmentLength: 1e4" by default - coverage computation now uses SimpleRleList and should be faster version 3.6.1: - speedup: calculateCoverage() now uses a map/reduce technique to speed up coverage computation - speedup: bamCountUniqueReads() does not scan bam file per chromosome any more - mergeLanes() now supports missing variants or missing countGenomicFeatures - BioC 2.11 fix: using queryLength() instead of nrow() after findOverlaps() version 3.6: - support of IGIS 2.2 - support of R 2.15.0 and Bioconductor 2.10 - support for ChIP-Seq analysis (template configurations and coverage read extension) - support for merging lanes (ngs_merge) - support for variant concordance comparison (ngs_vconcord) - results are different from 3.4.1 (due to the new splice sites of IGIS 2.2 used during alignment and due to the new variant caller) correlation of RNA-Seq RPKM is usally higher than 0.99 between 3.4.1 and 3.6.0. Variant concordance is also typically higher than 0.999 - now computes intronic RNA counts - minor bug fixes version 3.5.17: - mergeLanes (and therefore, ngs_merge) now checks that sample versions are identical before merging - mergeLanes and has an improved interface to include parameters that have to be ignored during checkInputConfigs() version 3.5.16: - fixed the warning message "replacing previous import ‘density’ when loading ‘stats’" (coming from the chipseq package, version 1.6.1 fixes this message) - fixed the gzfile(description) bug caused in calculateJunctionReads, due to the fact that "countGenomicFeatures.gfeatures" was needed when computing calculateJunctionReads() - fixed the GenomeSeq-USA300-config.txt (removed the "path.genomic_features:", which is not need anymore, and added "countGenomicFeatures.do: FALSE") version 3.5.15: - added configuration parameters: analyzeVariants.bin_fraction - new TxDb.*.BioMart.igis 2.2.0 with correct seqlenghts version 3.5.14: - checks OK version 3.5.13: - added ChIP-Seq config files - added configuration parameters: countGenomicFeatures.do, analyzeVariants.do, coverage.extendReads, coverage.fragmentLength - now uses SNVsOmuC 1.0.1 - ngs_merge can now merge only one file (not optimized) - fixed ngs_vconcord (doesn't display the subgraph igraph 0.6 version issue any longer) - added tmp_dir to config. If set will be used to store temporary chunk dirs version 3.5.12: - use min and max_processed_read_length for merge_checks instead of bam file - use lsf_ngs_merge script that worked well in CGP3 version 3.5.11: - added filterBam, to filter bam files based on a logical vector - now writes ... in merged config files - now trim target length at 600 in calculateTargetLengths, fixing a bug reported by Gregory Zynda version 3.5.10: - now building the "intron" track - added the "intron" track in gfeatures-human-IGIS_2.1.0b.RData and gfeatures-mouse-IGIS_2.10b.RData; all other tracks are strictly identical to the ones in gfeatures-human-IGIS_2.1.0.RData and gfeatures-mouse-IGIS_2.1.0.RData, respectively version 3.5.8: - copied HTSeq and RNASeqGenie in HTSeqGenie - checks are OK! version 3.5.6:] - copied HTSeq into HTSeqGenie version 3.5.5 - now works with R 2.15.0/Bioconductor 2.10 (and still works with R 2.14/ngs_pipeline environment) - added ngs_concord script, to compute variant concordance between samples - coverage.RData is now a RangedData object version 3.5.3 - added lsf_ngs_merge and ngs_merge scripts - ngs_pipeline does not crash anymore when fed with bogus arguments version 3.5.2 - mergeLanes now uses safeExecute to save memory between merging steps version 3.5.1 - initPipelineFromSaveDir now updates save_dir - mergeLanes.R does not check for identical quality_encoding any longer - mergePreprocessSummary does not check for identical read length any longer - mergeLanes now accepts config_update version 3.5.0 - bump dev version 3.4.1 - set default config parameter analyzeVariants.use_read_length to FALSE, to prevent the buggy 3*2 Fisher's test used in SNVsOmuC/variantFilter.R to crash the pipeline - overload logdebug, loginfo and logwarn with a try() statement, to prevent errors when concurrent threads are logging at the same time version 3.4 - change gsnap param -E from 4 to 1. This should allow gsnap to find more translocations. version 3.3.11 - get rid of analysis type. At this point the only thing we do differently for Exome vs RNASeq is the call to gsnap. Since that is actually created from the snp, splice and gsnap_param option in the config, we donl;t need this explicite type any more. version 3.3.10 - now creating summary_analyzed_bamstats.tab - reportQA now includes analysed bam stats version 3.3.9 - added computeBamStats, createSummaryAlignment, mergeSummaryAlignment - now summary_alignment are merged (and not recomputed on the merged bams) - new bam statistics in computeBamStats version 3.3.8 - fixed buildSplicesIIT, to build correct IIT splicing file, with checks - generated splices-human-IGIS_2.1.0b and splices-mouse-IGIS_2.1.0b - now using splices-human-IGIS_2.1.0b and splices-mouse-IGIS_2.1.0b in RNA-Seq template configs - deleted bogus splices-human-IGIS_2.1.0 and splices-mouse-IGIS_2.1.0 version 3.3.6 - added input_min_read_length, input_max_read_length, processed_min_read_length, processed_max_read_length in summary_preprocess - removed reportwarning version 3.3.5 - added reportwarning (to log warnings in {save_dir}) - the pipeline now reports a warning if reads are of variable lengths version 3.3.4 - added mergeLanes, to merge lanes - added the unit test: test.mergeLanes - now check during preprocessReads that read are of constant length - added concatListElements, used when building DEXSeq in buildGenomicFeatures - added getReadLengthFromBam - added test.runPipeline.identical320, to test if the results are identical compared to NGS 3.2.0 - the pipeline now fails if reads are of variable length version 3.3.3 - checkConfig now checks for absence of whitespace in: input_file, input_file2, save_dir, prepend_str, alignReads.sam_id - the pipeline now fails if one chunk fails (stop.onfail=TRUE in processChunks) - fixed the empty chunk bug, added the unit test: test.alignReads.sparsechunks - added statCountFeatures(), to compute diverse quantile statistics on read/feature counts version 3.2 - release version version 3.1.5 - num_cores is now 1 by default - alignReads.nbthreads_perchunk is now empty by default - if unspecified, alignReads.nbthreads_perchunk is set to min(4, num_cores) version 3.1.4 - safeUnlink now stops if it can't delete a file version 3.1.3 - disabling ShortRead OPEN_MP, which crashes R when used in combination with mcparallel() - safeExecute now executes an expression in a child thread, to avoid allocating memory in the main thread (this is the cause of memory leaks, since R has a internal hashtable to store strings that keeps growing and is never cleared up) - quality encoding upper limit of "illumina1.5" is now 105 instead of 104, to accomodate Phred-qualities of 41 (instead of 40) version 3.1.2 - added GenomeSeq-USA300-config.txt version 3.1.1 - added parseProgressLog() to parse progress.log files - added gc() in processChunks() to save memory before firing new threads - default.config: num_cores set to 4 and alignReads.nbthreads_perchunk to 4, for performance reasons - now using safeUnlink(), to not follow symlink dirs when deleting files/dirs - checkConfig.template() now first looks in the local directory for a template config file - trimReads can now trim reads of variable lengths and now keeps the input quality encoding version 2.99.39 - now using quality_encoding: sanger, solexa, illumina1.3, illumina1.5, illumina1.8 version 2.99.38 - now uses the config parameter quality_encoding, which can take a value out of: sanger, solexa, illumina13, illumina15, illumina18 - added detectQualityInFASTQFile - added deterministic subsampling test - the argument filname is now optional in writeConfig and writeAudit version 2.99.36 - umask is now set in both initPipelineFromConfig and initPipelineFromSaveDir - now produce an "analyzed.bam" file instead of "main.bam" vesrion 2.99.35 - added config parameters: alignReads.nbthreads_perchunk, alignReads.static_parameters, analysis_type - added ExomeSeq-human-config.txt and ExomeSeq-mouse-config.txt - processChunks now accepts nb.parallel.jobs - DEXSeq OK - new buildAlignerParams that accepts nbthreads_perchunk and alignReads.static_parameters version 2.99.34 - creation of SNP indexes for human and mouse, now used in the RNASeq templates version 2.99.33 - path.gsnap and path.samtools are now gone - alignReads.do, countGenomicFeatures.do, processUniqueMappers.do, calculateJunctions.do are now gone - added checkConfig.tools to check that gsnap, samtools, get-genome and bam_tally version 2.99.32 - now outputs summary_alignment.tab - now uses main.bam in the RNASeqPipeline - added bamCountUniqueReads() version 2.99.31 - renamed processRawFastq by preprocessReads - defined initPipelineFromConfig and initPipelineFromSaveDir - output of preprocessReads is now summary_preprocess.tab - creation of {prepend_str}.main.bam - implemented safeExecute (which now does the memory tracing) instead of logErrorOnFail version 2.99.29 - IGIS 2.1 for human and mouse version 2.99.26: - implemented the subsampler (controlled by subsample_nbreads) - added config parameters: remove_processedfastq and remove_chunkdir - removed mergeChunks - the summaryTable doesn't contain the preprocess_summary information anymore version 2.99.25: - major release! - using IGIS and our internal gsnap version for pipeline 3.0 - output tabulated results for counts - added finally argument to logErrorOnFail (to kill memtracer in case of exceptions) - using a 2 s delay between fired jobs in processChunks (to prevent firing all jobs at the same time, avoiding I/O collisions) - checks OK (except the mouse genome) version 2.99.24: - removed rpkm_old - fused detectNcRNA with countGenomicFeatures - removed config parameters related to detectNcRNA - removed depluralization code - changed config parameters countGenomicFeatures.granges by countGenomicFeatures.gfeatures - count output is now tab files with name, count, width and rpkm - addec config parameters alignReads.extra_parameters - saveWithID now supports tab-separated file format version 2.99.23: - now using new 3.0 gsnap aligner - now using hg19_IGIS21 - the aligner found highqualAdapterContamIn3PrimeEnd:1:1:1:7#0/1 was rRNA-contaminated (which is true, with CIGAR 30C3C3C1): updated test.processRawFastq_single_end() - tests OK except mouse-related tests (genome mm9_IGIS21 has to be built) version 2.99.21: - added config parameter calculateJunctions.do - now stops if all jobs fail - renamed log/ by logs/ - remove chunks/ at the end version 2.99.20: - plot insert lenghts OK - added config parameter: debug.remove_chunkdir - renamed output directory profile/ to log/ - renamed output directory RData/ to results/ - removed fastq_for_aligner12 fields version 2.99.19: - now uses ShortRead 1.13.12 that fixes a FastqSampler bug (that causes random crashes!) - buildShortReadReports works (now subsampling by default 20e6 reads) version 2.99.18: - uses gmapR 0.12.5 to log gsnap system calls version 2.99.17: - new file permissions are now -rw-r--r-- and dir permissions are drwxr-xr-x - detectAdapterContam: cutoff is now 13.87229 (independent of read length, see estimateCutoffs) - detectAdapterContam: now save read names - detectAdapterContam: added mergeDetectAdapterContam - no max_mismatches in the pipeline anymore: using gsnap's defaults or alignReads.max_mismatches if specified version 2.99.16: - removed txdb_info - added config parameter max_mismatches - config is now written in RData/ - preprocessed reads are now merged version 2.99.15: - supports the HTSEQ_CONFIG environment variable to look for template config files - passes samtools path to gmapR - checks the presence of non-empty config parameters - now produces the output directory, with chunks/chunk_%06d, with audit.txt and progress.log in profile/ version 2.99.9: - added config parameters: detectNcRNA.do and detectNcRNA.granges version 2.99.8: - bumped version number to stay in sync with RNASeqGenie version 2.99.7 - preprocess_summary$adapter_contam and preprocess_summary$rRNA_contam_reads are now set to 0 if their modules are disabled, to prevent unexpected behaviors in the final report - added getChunkDirs, mergePreprocessSummary - the merge/ directory is created mergeLanes and not in initPipeline any more version 2.99.6: - uses gmapR 0.12.3 to fix a deadly bug in consolidateSAMFiles() causing random crashes - now consolidateSAMFiles is silent version 2.99.5: - chunked loggs - continue on fail - added logErrorOnFail(), encaspulating tryKeepTraceback and getTraceback - sclapply now passes chunkid as an additional argument - sclapply accepts now a tracer function - added processChunks(), that does sclapply + logErrorOnFail + continue on fail + chunked logs version 2.99.4: - added max_nbchunks for debug purposes - added runTophalf, runPreprocessing, runAlignment - ready to test on new CGP 2011 data! version 2.99.3: - using path-config.txt to store system-dependent paths - renamed resync() by resource(dirname); which can reload any package R directory - implemented checkConfig.countGenomicFeatures() - cleaned up writeAudit() - removed package dependencies: snow and gtools - implemented tests for tryKeepTraceback, writeAudit and minichunks for processRawFastq - implemented mergeProcessRawFastq - new alignReads() that does the parallel/chunking job - new merge/ directory version 2.99.2: - templated configuration using the parameter "template_config" - moved parameters from globals() and HTSeqGenieBase_globals-default.dcf to our configuration file: -- removed globals.R -- new config parameters: path.gsnap, path.samtools, path.gsnap_bin_dir, path.genomic_features, path.gsnap_genomes -- new config parameters: countGenomicFeatures.do, countGenomicFeatures.grange, countGenomicFeatures.txdb_info -- removed config parameter: alignReads.gsnap (now path.gsnap) -- removed HTSeqGenieBase_globals-default.dcf - sclapply now uses the argument max.parallel.jobs in the third position - removed setGenomeFiles() and added config parameter: detectRRNA.rrna_genome version 2.99.1: - version number roll to be ready to release the 3.0.0 version - now used by RNASeqGenie - FastQStreamer.init() and FastQStreamer.getReads() do not need the configuration environment any more - renamed trimReadsList by trimReads, mismatches_per_readwidth by getMismatchesPerReadwidth - getConfig() and getConfig.*() now stops if the parameter is not declared and returns NULL if empty - NAMESPACE does not export all function any longer version 0.0.17: - full stream version - make_dir was renamed in makeDir() - overwrite_save_dir now takes a parameter out of "never", "overwrite" or "erase" - parseDCF doesn't remove empty parameters any longer - added getConfig.nonempty() to check if a paramter is non-empty - alignReads is now in parallel using sclapply version 0.0.16: - transition to the stream version - alignReads now uses a single core and uses "-B 2" mode, sharing genome memory between processes, to save memory - this version is not expected to R CMD check OK version 0.0.15: - save_dir/chunk_%06d/ output - save_dir/progress.log version 0.0.14: - processRawFastqChunks works well (TODO: collect data) - added unit tests for: FastQStreamer.init(), FastQStreamer.getReads(), sclapply() - added initPipeline() (to initialise the pipeline) - preprocessReads() is now independent version 0.0.13: - moved setGenomeFiles in detectRRNA.R - added config parameters: filterQuality.do, alignReads.do - added config parameters: chunk_size, subsample_nbreads - fixed another stupid bug in sclapply() version 0.0.12: - renamed myTry() by tryKeepTraceback() - added the checkConfig.noextraparameters() config check - fixed a stupid bug: sclapply() does not throw chunks when waiting any more - added temporary processRawFastqChunks.R version 0.0.11: - implementation of scProcessReads() for a simple, efficient parallel processing of read chunks version 0.0.10: - restyled code (using " instead of ', using <- instead of =m, added comments) - "align.*" config parameters renamed to "alignReads.*" - config parameter "genome" renamed to "alignReads.genome" - added writeFastQFiles(), to write generic FastQ files - traceMem() is now failsafe - detectRRNA() now accepts lreads as a first argument - added setupTestFramework(), to set up test frameworks version 0.0.9: - the version is now able to process LIB2478_SAM634423_L1.R122! - added traceMem() to track memory peak usage - added the num_cores config parameter. If unspecified the parameter is guessed from the environment variable NCPUS (set by PBS) or by the multicore package - detectAdapterContam.R/detectAdapterContam() is now parallelized using mcProcessReads() version 0.0.8: - added getMemoryUsage() to track memory peak usage - added mcProcessReads() a safe version of mclapply, to process reads in a parallel fashion - added the parameter config: debug.tracemem - added isConfig(), to test the presence of a parameter - getConfig() without parameter now returns the config list - added initLog(), starting logging information with R session info and config parameters - now, filterQuality() uses mcProcessReads() to filter reads in parallel: this should help to process LIB2478_SAM634423 version 0.0.7: - created updateConfig() - now loadConfig() doesn't call checkConfig() any longer - the "local" mode is now activated only and only in interactive sessions - setUpDirs() has been renamed and now accepts the argument overwrite version 0.0.4: - implemented the RUnit test suite version 0.0.3: - renamed HTSeq version 0.0.2: - stricter test.filterQuality() tests version 0.0.1: - initial release