transcript2gene {BUSpaRse}R Documentation

Map Ensembl transcript ID to gene ID

Description

This function is a shortcut to get the correctly sorted data frame with transcript IDs and the corresponding gene IDs from Ensembl biomart or Ensembl transcriptome FASTA files. For biomart query, it calls tr2g_ensembl and then sort_tr2g. For FASTA files, it calls tr2g_fasta and then sort_tr2g. Unlike in tr2g_ensembl and tr2g_fasta, multiple species can be supplied if cells from different species were sequenced together. This function should only be used if the kallisto inidex was built with transcriptomes from Ensembl. Also, if querying biomart, please make sure to set ensembl_version to match the version where the transcriptomes were downloaded.

Usage

transcript2gene(species, fasta_file, kallisto_out_path,
  type = "vertebrate", verbose = TRUE, ...)

Arguments

species

A character vector of Latin names of species present in this scRNA-seq dataset. This is used to retrieve Ensembl information from biomart.

fasta_file

Character vector of paths to the transcriptome FASTA files used to build the kallisto index. Exactly one of species and fasta_file can be missing.

kallisto_out_path

Path to the kallisto bus output directory.

type

A character vector indicating the type of each species. Each element must be one of "vertebrate", "metazoa", "plant", "fungus", and "protist". If length is 1, then this type will be used for all species specified here. Can be missing if fasta_file is specified.

verbose

Whether to display progress. Defaults to TRUE.

...

Other arguments passed to tr2g_ensembl such as other_attrs, ensembl_version, and arguments passed to useEnsembl. If fasta_files is supplied instead of species, then this will be extra argumennts to tr2g_fasta, such as use_transcript_version and use_gene_version.

Value

A data frame with two columns: gene and transcript, with Ensembl gene and transcript IDs (with version number), in the same order as in the transcriptome index used in kallisto.

See Also

Other functions to retrieve transcript and gene info: sort_tr2g, tr2g_EnsDb, tr2g_TxDb, tr2g_ensembl, tr2g_fasta, tr2g_gff3, tr2g_gtf

Examples

# Download dataset already in BUS format
library(TENxBUSData)
TENxBUSData(".", dataset = "retina")
tr2g <- transcript2gene("Mus musculus", type = "vertebrate",
  ensembl_version = 94, kallisto_out_path = "./out_retina")

[Package BUSpaRse version 1.0.0 Index]