importAllelicCounts {fishpond} | R Documentation |
Read in Salmon quantification of allelic counts from a
diploid transcriptome. Assumes that diploid transcripts
are marked with the following suffix: an underscore and
a consistent symbol for each of the two alleles,
e.g. ENST123_M
and ENST123_P
,
or ENST123_alt
(a1) and ENST123_ref
(a2).
There must be exactly two alleles for each reference transcript,
and the --keep-duplicates
option should be used in
Salmon indexing to avoid removing transcripts with identical sequence.
The output object has half the number of transcripts,
with the two alleles either stored in a "wide"
object,
or as re-named "assays"
. Note carefully that the symbol
provided to a1
is used as the alternative allele,
and a2
is used as the reference allele
(and therefore "a2"
is the reference level of the
allele
factor that is returned in the colData).
importAllelicCounts( coldata, a1, a2, format = c("wide", "assays"), tx2gene = NULL, ... )
coldata |
a data.frame as used in |
a1 |
the symbol for the effect/alternative allele |
a2 |
the symbol for the reference allele |
format |
either |
tx2gene |
optional, a data.frame with first column indicating
transcripts, second column indicating genes (or any other transcript
grouping). Either this should include the |
... |
any arguments to pass to tximeta |
Requires the tximeta package.
skipMeta=TRUE
is used, as it is assumed
the diploid transcriptome does not match any reference
transcript collection. This may change in future iterations
of the function, depending on developments in upstream
software.
a SummarizedExperiment, with allele counts (and other data)
combined into a wide matrix [a2 | a1]
, or as assays (a1, then a2).
The original strings associated with a1 and a2 are stored in the
metadata of the object, in the alleles
list element.
Note the reference level of se$allele
will be "a2"
, and
so comparisons by default will be a1/a2 (alt/ref).