Changes in version 1.6.1 * Updated to Ensembl release 100, GENCODE 34/M25. Changes in version 1.6.0 * Added PLOS Computational Biology citation! :-) * Added function `splitSE` to split one assay of a SummarizedExperiment into multiple assays, each containing features of a given type. * Added a wrapper function makeDGEList() to simplify making a DGEList for use with edgeR. See vignette for example. + tximeta will now make use of EnsDb created and distributed on AnnotationHub, unless useHub=FALSE. Also, a new function retrieveDb() can be called on a SummarizedExperiment to retrieve the underlying TxDb or EnsDb. + tximeta can now use `customMetaInfo` argument to locate a custom metadata information file such as `meta_info.json`, which should contain a tage, `index_seq_hash`, with the SHA-256 hash value of the reference transcripts. * Added `markDuplicateTxps` argument to add `hasDuplicate` and `duplicates` columns to rowData of SummarizedExperiment. One note is that, for efficiency, this argument and `cleanDuplicateTxps` will now share a duplicates CharacterList that is stored in the BiocFileCache, with the name `dups-...`. Therefore, if you have previously used `cleanDuplicateTxps`, you may need to bfcremove() any `dups-...` entries. Summarization to gene level will keep track of `numDupSets` per gene which informs about the number of transcripts sets (equivalence classes by transcript sequence content). + If during the indexing step, user didn't use --gencode for a Gencode transcriptome file, tximeta will deal with this internally now by stripping all characters after the vertical bar `|`, in order to match long transcript names in the `quant.sf` files to the correct transcript names in the GTF. Changes in version 1.5.28 * Added function `splitSE` to split one assay of a SummarizedExperiment into multiple assays, each containing features of a given type. Changes in version 1.5.19 * Added `markDuplicateTxps` argument to add `hasDuplicate` and `duplicates` columns to rowData of SummarizedExperiment. One note is that, for efficiency, this argument and `cleanDuplicateTxps` will now share a duplicates CharacterList that is stored in the BiocFileCache, with the name `dups-...`. Therefore, if you have previously used `cleanDuplicateTxps`, you may need to bfcremove() any `dups-...` entries. Summarization to gene level will keep track of `numDupSets` per gene which informs about the number of transcripts sets (equivalence classes by transcript sequence content). Changes in version 1.5.16 * Added a wrapper function makeDGEList() to simplify making a DGEList for use with edgeR. See vignette for example. Changes in version 1.5.11 + tximeta can now use `customMetaInfo` argument to locate a custom metadata information file such as `meta_info.json`, which should contain a tage, `index_seq_hash`, with the SHA-256 hash value of the reference transcripts. Changes in version 1.5.8 + tximeta will now make use of EnsDb created and distributed on AnnotationHub, unless useHub=FALSE. Also, a new function retrieveDb() can be called on a SummarizedExperiment to retrieve the underlying TxDb or EnsDb. Changes in version 1.5.3 + If during the indexing step, user didn't use --gencode for a Gencode transcriptome file, tximeta will deal with this internally now by stripping all characters after the vertical bar `|`, in order to match long transcript names in the `quant.sf` files to the correct transcript names in the GTF. Changes in version 1.4.0 + tximeta will now pull down RefSeq seqinfo, using the dirname() of the GTF location, and assuming some consistency in the structure of the assembly_report.txt that is located in the same directory. Needs more testing though across releases and organisms. + expanded caching of ranges to exons and genes as well. Exons in particular take a long time to build from TxDb, so this saves quite a lot of time. + new 'addExons' function will add exons to trancript-level summarized experiments, by replacing transcript GRanges with exon-by-transcript GRangesList. Purposely designed only for transcript-level, see note in ?addExons + tximeta now also caches the transcript ranges themselves, rather than just the TxDb. This shaves extra seconds off the tximeta() call! + add 'skipSeqinfo' argument, which avoids attempting to fetch chromosome information (from UCSC) if set to TRUE. Changes in version 1.1.18 + Specifying gene=TRUE in addIds() when rows are transcripts will attempt to use a gene_id column to map the IDs. This usually gives a better mapping rate. Changes in version 1.1.14 + Cut off version number from Ensembl names only (not GENCODE) Changes in version 1.1.13 + Added 'cleanDuplicateTxps' argument, which does a lot of work for the user: it downloads the FASTA from the source, identifies duplicate transcripts (identical cDNA sequence) then looks to see if transcripts that are in the quantification files, but missing from the GTF, could be renamed from the list of duplicate transcripts such that they would be present in the GTF. Changes in version 1.1.11 + Added coding + non-coding combinations of Ensembl transcriptomes to the hash table. Must be in this order: coding, then non-coding. Changes in version 1.1.9 + Added support for dammit de novo transcriptomes. Changes in version 1.1.6 + Added summarizeToGene as a method, to avoid conflicts with tximport. Changes in version 1.1.5 + Added in Charlotte's code to split out GENCODE and Ensembl code for generating transcript ranges.