This function parses output of seqbuster tool to allow isomiRs/miRNAs analysis of samples in different groups such as characterization, differential expression and clustering. It creates an isomiRs::IsomirDataSeq object.

IsomirDataSeqFromFiles(files, coldata, rate = 0.2, canonicalAdd = TRUE,
  uniqueMism = TRUE, uniqueHits = FALSE, design = ~1L,
  minHits = 1L, header = TRUE, skip = 0, quiet = TRUE, ...)

IsomirDataSeqFromRawData(rawdata, coldata, design = ~1L, pct = 0.1,
  n_snv = 1, whitelist = NULL, ...)

Arguments

files

files with the output of seqbuster tool

coldata

data frame containing groups for each sample

rate

minimum counts fraction to consider a mismatch a real mutation

canonicalAdd

boolean only keep A/T non-template addition. All non-template nucleotides at the 3' end will be removed if they contain C/G nts.

uniqueMism

boolean only keep mutations that have a unique hit to one miRNA molecule. For instance, if the sequence map to two different miRNAs, then it would be removed.

uniqueHits

boolean whether filtering ambigous sequences or not.

design

a formula to pass to DESeq2::DESeqDataSet

minHits

Minimum number of reads in the sample to consider it in the final matrix.

header

boolean to indicate files contain headers

skip

skip first line when reading files

quiet

boolean indicating to print messages while reading files. Default FALSE.

...

arguments provided to SummarizedExperiment. including rowData.

rawdata

data.frame stored in metadata slot of IsomirDataSeq object.

pct

numeric used to remove isomiRs with an importance lower than this value. Importance is calculated by dividing the isomiR count by the total counts of the miRNA to which it maps.

n_snv

numeric used to remove isomiRs with more than this number of single nucleotide variants (indels are counted here).

whitelist

character vector with sequences to keep even if the filtering step would have removed them. They have to match the seq column in the table.

Value

IsomirDataSeq class object.

Details

This function parses the output of http://seqcluster.readthedocs.org/mirna_annotation.html for each sample to create a count matrix for isomiRs, miRNAs or isomiRs grouped in types (i.e all sequences with variations at 5' but ignoring any other type). It creates isomiRs::IsomirDataSeq object (see link to example usage of this class) to allow visualization, queries, differential expression analysis and clustering. To create the isomiRs::IsomirDataSeq, it parses the isomiRs files, and generates an initial matrix having all isomiRs detected among samples. As well, it creates a summary for each isomiR type (trimming, addition and substitution) to visualize general isomiRs distribution.

Examples

path <- system.file("extra", package="isomiRs") fn_list <- list.files(path, full.names = TRUE) de <- data.frame(row.names=c("f1" , "f2"), condition = c("newborn", "newborn")) ids <- IsomirDataSeqFromFiles(fn_list, coldata=de)
#> Error in guess_header_(datasource, tokenizer, locale): Cannot read file /Users/lpantano/repos/isomiRs/inst/extra/mirtop: Invalid argument
head(counts(ids))
#> Error in counts(ids): object 'ids' not found
IsomirDataSeqFromRawData(metadata(ids)[["rawData"]], de)
#> Error in metadata(ids): object 'ids' not found