getCountsByPositions {BRGenomics} | R Documentation |
Get the sum of the signal in dataset.gr
that overlaps each position
within each range in regions.gr
. If binning is used (i.e. positions
are wider than 1 bp), any function can be used to summarize the signal
overlapping each bin. For a description of the critical difference between
expand_ranges = FALSE
and expand_ranges = TRUE
, see
getCountsByRegions
.
getCountsByPositions( dataset.gr, regions.gr, binsize = 1, FUN = sum, simplify.multi.widths = c("error", "list", "pad 0", "pad NA"), field = "score", NF = NULL, blacklist = NULL, NA_blacklisted = FALSE, melt = FALSE, expand_ranges = FALSE, ncores = getOption("mc.cores", 2L) )
dataset.gr |
A GRanges object in which signal is contained in metadata (typically in the "score" field), or a named list of such GRanges objects. |
regions.gr |
A GRanges object containing regions of interest. |
binsize |
Size of bins (in bp) to use for counting within each range of
|
FUN |
If |
simplify.multi.widths |
A string indicating the output format if the
ranges in |
field |
The metadata field of |
NF |
An optional normalization factor by which to multiply the counts.
If given, |
blacklist |
An optional GRanges object containing regions that should be excluded from signal counting. |
NA_blacklisted |
A logical indicating if NA values should be returned
for blacklisted regions. By default, signal in the blacklisted sites is
ignored, i.e. the reads are excluded. If |
melt |
A logical indicating if the count matrices should be melted. If
set to |
expand_ranges |
Logical indicating if ranges in |
ncores |
Multiple cores will only be used if |
If the widths of all ranges in regions.gr
are equal, a matrix
is returned that contains a row for each region of interest, and a column
for each position (each base if binsize = 1
) within each region. If
dataset.gr
is a list, a parallel list is returned containing a
matrix for each input dataset.
If the input
regions.gr
contains ranges of varying widths, setting
simplify.multi.widths = "list"
will output a list of variable-length
vectors, with each vector corresponding to an individual input region. If
simplify.multi.widths = "pad 0"
or "pad NA"
, the output is a
matrix containing a row for each range in regions.gr
, but the number
of columns is determined by the largest range in regions.gr
. For
each region of interest, columns that correspond to positions outside of
the input range are set, depending on the argument, to 0
or
NA
.
Mike DeBerardine
getCountsByRegions
,
metaSubsample
data("PROseq") # load included PROseq data data("txs_dm6_chr4") # load included transcripts #--------------------------------------------------# # counts from 0 to 50 bp after the TSS #--------------------------------------------------# txs_pr <- promoters(txs_dm6_chr4, 0, 50) # first 50 bases countsmat <- getCountsByPositions(PROseq, txs_pr) countsmat[10:15, 41:50] # show only 41-50 bp after TSS #--------------------------------------------------# # redo with 10 bp bins from 0 to 100 #--------------------------------------------------# # column 5 is sums of rows shown above txs_pr <- promoters(txs_dm6_chr4, 0, 100) countsmat <- getCountsByPositions(PROseq, txs_pr, binsize = 10) countsmat[10:15, ] #--------------------------------------------------# # same as the above, but with the average signal in each bin #--------------------------------------------------# countsmat <- getCountsByPositions(PROseq, txs_pr, binsize = 10, FUN = mean) countsmat[10:15, ] #--------------------------------------------------# # standard deviation of signal in each bin #--------------------------------------------------# countsmat <- getCountsByPositions(PROseq, txs_pr, binsize = 10, FUN = sd) round(countsmat[10:15, ], 1)