transcript2gene {BUSpaRse} | R Documentation |
This function is a shortcut to get the correctly sorted data frame with
transcript IDs and the corresponding gene IDs from Ensembl biomart or Ensembl
transcriptome FASTA files. For biomart query, it calls
tr2g_ensembl
and then sort_tr2g
. For FASTA files,
it calls tr2g_fasta
and then sort_tr2g
. Unlike in
tr2g_ensembl
and tr2g_fasta
, multiple species can
be supplied if cells from different species were sequenced together. This
function should only be used if the kallisto inidex was built with
transcriptomes from Ensembl. Also, if querying biomart, please make sure to set
ensembl_version
to match the version where the transcriptomes were
downloaded.
transcript2gene(species, fasta_file, kallisto_out_path, type = "vertebrate", verbose = TRUE, ...)
species |
A character vector of Latin names of species present in this scRNA-seq dataset. This is used to retrieve Ensembl information from biomart. |
fasta_file |
Character vector of paths to the transcriptome FASTA files
used to build the kallisto index. Exactly one of |
kallisto_out_path |
Path to the |
type |
A character vector indicating the type of each species. Each
element must be one of "vertebrate", "metazoa", "plant", "fungus", and
"protist". If length is 1, then this type will be used for all species specified
here. Can be missing if |
verbose |
Whether to display progress. Defaults to |
... |
Other arguments passed to |
A data frame with two columns: gene
and transcript
,
with Ensembl gene and transcript IDs (with version number), in the same order
as in the transcriptome index used in kallisto
.
Other functions to retrieve transcript and gene info: sort_tr2g
,
tr2g_EnsDb
, tr2g_TxDb
,
tr2g_ensembl
, tr2g_fasta
,
tr2g_gff3
, tr2g_gtf
# Download dataset already in BUS format library(TENxBUSData) TENxBUSData(".", dataset = "retina") tr2g <- transcript2gene("Mus musculus", type = "vertebrate", ensembl_version = 94, kallisto_out_path = "./out_retina")