getGenomeAndAnnotation {ORFik} | R Documentation |
Will create a R transcript database (TxDb object) from the annotation.
It will also index the genome for you
If you misspelled something or crashed, delete wrong files and
run again.
Do remake = TRUE, to do it all over again.
getGenomeAndAnnotation( organism, output.dir, db = "ensembl", GTF = TRUE, genome = TRUE, phix = FALSE, ncRNA = "", tRNA = "", rRNA = "", gunzip = TRUE, remake = FALSE, assembly_type = "primary_assembly" )
organism |
scientific name of organism, Homo sapiens, Danio rerio, Mus musculus, etc. |
output.dir |
directory to save downloaded data |
db |
database to use for genome and GTF, default adviced: "ensembl" (will contain haplotypes, large file!). Alternatives: "refseq" (primary assembly) and "genbank" (mix) |
GTF |
logical, default: TRUE, download gtf of organism specified
in "organism" argument. If FALSE, check if the downloaded
file already exist. If you want to use a custom gtf from you hard drive,
set GTF = FALSE,
and assign: |
genome |
logical, default: TRUE, download genome of organism
specified in "organism" argument. If FALSE, check if the downloaded
file already exist. If you want to use a custom gtf from you hard drive,
set GTF = FALSE,
and assign: |
phix |
logical, default FALSE, download phix sequence to filter out with. Only use if illumina sequencing. Phix is used in Illumina sequencers for sequencing quality control. Genome is: refseq, Escherichia virus phiX174 |
ncRNA |
character, default "" (no download), a contaminant genome. Alternatives: "auto" or manual assign like "human". If "auto" will try to find ncRNA file from organism, Homo sapiens -> human etc. "auto" will not work for all, then you must specify the name used by NONCODE, go to the link below and find it. If not "auto" / "" it must be a character vector of species common name (not scientific name) Homo sapiens is human, Rattus norwegicus is rat etc, download ncRNA sequence to filter out with. From NONCODE online server, if you cant find common name see: http://www.noncode.org/download.php/ |
tRNA |
chatacter, default "" (not used), if not "" it must be a character vector to valid path of mature tRNAs fasta file to remove as contaminants on your disc. Find and download your wanted mtRNA at: http://gtrnadb.ucsc.edu/, or run trna-scan on you genome. |
rRNA |
chatacter, default "" (not used), if not "" it must be a character vector to valid path of mature rRNA fasta file to remove as contaminants on your disc. Find and download your wanted rRNA at: https://www.arb-silva.de/ |
gunzip |
logical, default TRUE, uncompress downloaded files that are zipped when downloaded, should be TRUE! |
remake |
logical, default: FALSE, if TRUE remake everything specified |
assembly_type |
a character string specifying from which assembly type
the genome shall be retrieved from (ensembl only, else this argument is ignored):
Default is
|
If you want custom genome or gtf from you hard drive, assign it
after you run this function, like this:
annotation <- getGenomeAndAnnotation(GTF = FALSE, genome = FALSE)
annotation["genome"] = "path/to/genome.fasta"
annotation["gtf"] = "path/to/gtf.gtf"
a character vector of path to genomes and gtf downloaded, and additional contaminants if used.
Other STAR:
STAR.align.folder()
,
STAR.align.single()
,
STAR.index()
,
STAR.install()
,
STAR.multiQC()
,
STAR.remove.crashed.genome()
,
install.fastp()
output.dir <- "/Bio_data/references/zebrafish" #getGenomeAndAnnotation("Danio rerio", output.dir)