tr2g_EnsDb {BUSpaRse} | R Documentation |
Bioconductor provides Ensembl genome annotation in AnnotationHub
; older
versions of Ensembl annotation can be obtained from packages like
EnsDb.Hsapiens.v86
. This is an alternative to querying Ensembl with
biomart; Ensembl's server seems to be less stable than that of Bioconductor.
However, more information and species are available on Ensembl biomart than
on AnnotationHub
.
tr2g_EnsDb( ensdb, Genome = NULL, get_transcriptome = TRUE, out_path = ".", write_tr2g = TRUE, other_attrs = NULL, use_gene_name = TRUE, use_transcript_version = TRUE, use_gene_version = TRUE, transcript_biotype_col = "TXBIOTYPE", gene_biotype_col = "GENEBIOTYPE", transcript_biotype_use = "all", gene_biotype_use = "all", chrs_only = TRUE, compress_fa = FALSE, overwrite = FALSE )
ensdb |
Ann |
Genome |
Either a |
get_transcriptome |
Logical, whether to extract transcriptome from
genome with the GTF file. If filtering biotypes or chromosomes, the filtered
|
out_path |
Directory to save the outputs written to disk. If this directory does not exist, then it will be created. Defaults to the current working directory. |
write_tr2g |
Logical, whether to write tr2g to disk. If |
other_attrs |
Character vector. Other attributes to get from the |
use_gene_name |
Logical, whether to get gene names. |
use_transcript_version |
Logical, whether to include version number in
the Ensembl transcript ID. To decide whether to
include transcript version number, check whether version numbers are included
in the |
use_gene_version |
Logical, whether to include version number in the Ensembl gene ID. Unlike transcript version number, it's up to you whether to include gene version number. |
transcript_biotype_col |
Character vector of length 1. Tag in
|
gene_biotype_col |
Character vector of length 1. Tag in |
transcript_biotype_use |
Character, can be "all" or
a vector of transcript biotypes to be used. Transcript biotypes aren't
entirely the same as gene biotypes. For instance, in Ensembl annotation,
|
gene_biotype_use |
Character, can be "all", "cellranger", or
a vector of gene biotypes to be used. If "cellranger", then the biotypes
used by Cell Ranger's reference are used. See |
chrs_only |
Logical, whether to include chromosomes only, for GTF and
GFF files can contain annotations for scaffolds, which are not incorporated
into chromosomes. This will also exclude haplotypes. Defaults to |
compress_fa |
Logical, whether to compress the output fasta file. If
|
overwrite |
Logical, whether to overwrite if files with names of outputs written to disk already exist. |
A data frame with at least 2 columns: gene
for gene ID,
transcript
for transcript ID, and optionally gene_name
for gene names. If other_attrs
has been specified, then those will
also be columns in the data frame returned.
ensembl_gene_biotypes ensembl_tx_biotypes cellranger_biotypes
Other functions to retrieve transcript and gene info:
sort_tr2g()
,
tr2g_TxDb()
,
tr2g_ensembl()
,
tr2g_fasta()
,
tr2g_gff3()
,
tr2g_gtf()
,
transcript2gene()
library(EnsDb.Hsapiens.v86) tr2g_EnsDb(EnsDb.Hsapiens.v86, get_transcriptome = FALSE, write_tr2g = FALSE, use_transcript_version = FALSE, use_gene_version = FALSE)