bind_rows {tidySummarizedExperiment} | R Documentation |
filter()
retains the rows where the conditions you provide a TRUE
. Note
that, unlike base subsetting with [
, rows where the condition evaluates
to NA
are dropped.
Most data operations are done on groups defined by variables.
group_by()
takes an existing tbl and converts it into a grouped tbl
where operations are performed "by group". ungroup()
removes grouping.
summarise()
creates a new data frame. It will have one (or more) rows for
each combination of grouping variables; if there are no grouping variables,
the output will have a single row summarising all observations in the input.
It will contain one column for each grouping variable and one column
for each of the summary statistics that you have specified.
summarise()
and summarize()
are synonyms.
mutate()
adds new variables and preserves existing ones;
transmute()
adds new variables and drops existing ones.
New variables overwrite existing variables of the same name.
Variables can be removed by setting their value to NULL
.
Rename individual variables using new_name=old_name
syntax.
See this repository for alternative ways to perform row-wise operations.
slice()
lets you index rows by their (integer) locations. It allows you
to select, remove, and duplicate rows. It is accompanied by a number of
helpers for common use cases:
slice_head()
and slice_tail()
select the first or last rows.
slice_sample()
randomly selects rows.
slice_min()
and slice_max()
select rows with highest or lowest values
of a variable.
If .data
is a grouped_df, the operation will be performed on each group,
so that (e.g.) slice_head(df, n=5)
will select the first five rows in
each group.
Select (and optionally rename) variables in a data frame, using a concise
mini-language that makes it easy to refer to variables based on their name
(e.g. a:f
selects all columns from a
on the left to f
on the
right). You can also use predicate functions like is.numeric to select
variables based on their properties.
sample_n()
and sample_frac()
have been superseded in favour of
slice_sample()
. While they will not be deprecated in the near future,
retirement means that we will only perform critical bug fixes, so we recommend
moving to the newer alternative.
These functions were superseded because we realised it was more convenient to
have two mutually exclusive arguments to one function, rather than two
separate functions. This also made it to clean up a few other smaller
design issues with sample_n()
/sample_frac
:
The connection to slice()
was not obvious.
The name of the first argument, tbl
, is inconsistent with other
single table verbs which use .data
.
The size
argument uses tidy evaluation, which is surprising and
undocumented.
It was easier to remove the deprecated .env
argument.
...
was in a suboptimal position.
pull()
is similar to $
. It's mostly useful because it looks a little
nicer in pipes, it also works with remote data frames, and it can optionally
name the output.
bind_rows(..., .id = NULL, add.cell.ids = NULL) bind_cols(..., .id = NULL)
... |
For use by methods. |
.id |
Data frame identifier. When |
add.cell.ids |
from SummarizedExperiment 3.0 A character vector of length(x=c(x, y)). Appends the corresponding values to the start of each objects' cell names. |
.keep_all |
If TRUE, keep all variables in .data. If a combination of ... is not distinct, this keeps the first row of values. (See dplyr) |
.preserve |
when |
.add |
When This argument was previously called |
.drop |
When |
data |
Input data frame. |
x |
tbls to join. (See dplyr) |
y |
tbls to join. (See dplyr) |
by |
A character vector of variables to join by. (See dplyr) |
copy |
If x and y are not from the same data source, and copy is TRUE, then y will be copied into the same src as x. (See dplyr) |
suffix |
If there are non-joined duplicate variables in x and y, these suffixes will be added to the output to disambiguate them. Should be a character vector of length 2. (See dplyr) |
tbl |
A data.frame. |
size |
< |
replace |
Sample with or without replacement? |
weight |
< |
.env |
DEPRECATED. |
.data |
A tidySummarizedExperiment object or any data frame |
name |
An optional parameter that specifies the column to be used
as names for a named vector. Specified in a similar manner as |
dplyr is not yet smart enough to optimise filtering optimisation
on grouped datasets that don't need grouped calculations. For this reason,
filtering is often considerably faster on ungroup()
ed data.
rowwise()
is used for the results of do()
when you
create list-variables. It is also useful to support arbitrary
complex operations that need to be applied to each row.
Currently, rowwise grouping only works with data frames. Its
main impact is to allow you to work with list-variables in
summarise()
and mutate()
without having to
use [[1]]
. This makes summarise()
on a rowwise tbl
effectively equivalent to plyr::ldply()
.
Slice does not work with relational databases because they have no
intrinsic notion of row order. If you want to perform the equivalent
operation, use filter()
and row_number()
.
A tidySummarizedExperiment object
An object of the same type as .data
.
Rows are a subset of the input, but appear in the same order.
Columns are not modified.
The number of groups may be reduced (if .preserve
is not TRUE
).
Data frame attributes are preserved.
A grouped data frame, unless the combination of
...
and add
yields a non empty set of grouping columns, a
regular (ungrouped) data frame otherwise.
An object usually of the same type as .data
.
The rows come from the underlying group_keys()
.
The columns are a combination of the grouping keys and the summary expressions that you provide.
If x
is grouped by more than one variable, the output will be another
grouped_df with the right-most group removed.
If x
is grouped by one variable, or is not grouped, the output will
be a tibble.
Data frame attributes are not preserved, because summarise()
fundamentally creates a new data frame.
An object of the same type as .data
.
For mutate()
:
Rows are not affected.
Existing columns will be preserved unless explicitly modified.
New columns will be added to the right of existing columns.
Columns given value NULL
will be removed
Groups will be recomputed if a grouping variable is mutated.
Data frame attributes are preserved.
For transmute()
:
Rows are not affected.
Apart from grouping variables, existing columns will be remove unless explicitly kept.
Column order matches order of expressions.
Groups will be recomputed if a grouping variable is mutated.
Data frame attributes are preserved.
An object of the same type as .data
.
Rows are not affected.
Column names are changed; column order is preserved
Data frame attributes are preserved.
Groups are updated to reflect new names.
A tbl
A tbl
A tidySummarizedExperiment object
A tidySummarizedExperiment object
A tidySummarizedExperiment object
A tidySummarizedExperiment object
An object of the same type as .data
. The output has the following
properties:
Each row may appear 0, 1, or many times in the output.
Columns are not modified.
Groups are not modified.
Data frame attributes are preserved.
An object of the same type as .data
. The output has the following
properties:
Rows are not affected.
Output columns are a subset of input columns, potentially with a different
order. Columns will be renamed if new_name=old_name
form is used.
Data frame attributes are preserved.
Groups are maintained; you can't select off grouping variables.
A tidySummarizedExperiment object
A vector the same size as .data
.
Because filtering expressions are computed within groups, they may yield different results on grouped tibbles. This will be the case as soon as an aggregating, lagging, or ranking function is involved. Compare this ungrouped filtering:
The former keeps rows with mass
greater than the global average
whereas the latter keeps rows with mass
greater than the gender
average.
Because mutating expressions are computed within groups, they may yield different results on grouped tibbles. This will be the case as soon as an aggregating, lagging, or ranking function is involved. Compare this ungrouped mutate:
This function is a generic, which means that packages can provide implementations (methods) for other classes. See the documentation of individual methods for extra arguments and differences in behaviour.
The following methods are currently available in loaded packages:
These function are generics, which means that packages can provide implementations (methods) for other classes. See the documentation of individual methods for extra arguments and differences in behaviour.
Methods available in currently loaded packages:
This function is a generic, which means that packages can provide implementations (methods) for other classes. See the documentation of individual methods for extra arguments and differences in behaviour.
The following methods are currently available in loaded packages:
These function are generics, which means that packages can provide implementations (methods) for other classes. See the documentation of individual methods for extra arguments and differences in behaviour.
Methods available in currently loaded packages:
This function is a generic, which means that packages can provide implementations (methods) for other classes. See the documentation of individual methods for extra arguments and differences in behaviour.
The following methods are currently available in loaded packages:
These function are generics, which means that packages can provide implementations (methods) for other classes. See the documentation of individual methods for extra arguments and differences in behaviour.
Methods available in currently loaded packages:
slice()
: no methods found.
slice_head()
: no methods found.
slice_tail()
: no methods found.
slice_min()
: no methods found.
slice_max()
: no methods found.
slice_sample()
: no methods found.
This function is a generic, which means that packages can provide implementations (methods) for other classes. See the documentation of individual methods for extra arguments and differences in behaviour.
The following methods are currently available in loaded packages: no methods found.
This function is a generic, which means that packages can provide implementations (methods) for other classes. See the documentation of individual methods for extra arguments and differences in behaviour.
The following methods are currently available in loaded packages: no methods found.
Range: min()
, max()
, quantile()
Count: n()
, n_distinct()
The data frame backend supports creating a variable and using it in the
same summary. This means that previously created summary variables can be
further transformed or combined within the summary, as in mutate()
.
However, it also means that summary variables with the same names as previous
variables overwrite them, making those variables unavailable to later summary
variables.
This behaviour may not be supported in other backends. To avoid unexpected results, consider using new names for your summary variables, especially when creating multiple summaries.
Use the three scoped variants (rename_all()
, rename_if()
, rename_at()
)
to renaming a set of variables with a function.
filter_all()
, filter_if()
and filter_at()
.
`%>%` <- magrittr::`%>%` tidySummarizedExperiment::pasilla %>% distinct(sample) `%>%` <- magrittr::`%>%` tidySummarizedExperiment::pasilla %>% filter(sample == "untrt1") # Learn more in ?dplyr_tidy_eval `%>%` <- magrittr::`%>%` tidySummarizedExperiment::pasilla %>% group_by(sample) `%>%` <- magrittr::`%>%` tidySummarizedExperiment::pasilla %>% summarise(mean(counts)) `%>%` <- magrittr::`%>%` tidySummarizedExperiment::pasilla %>% mutate(logcounts=log2(counts)) `%>%` <- magrittr::`%>%` # tidySummarizedExperiment::pasilla %>% # # rename(cond=condition) `%>%` <- magrittr::`%>%` `%>%` <- magrittr::`%>%` tt <- tidySummarizedExperiment::pasilla tt %>% left_join(tt %>% distinct(condition) %>% mutate(new_column=1:2)) `%>%` <- magrittr::`%>%` tt <- tidySummarizedExperiment::pasilla tt %>% inner_join(tt %>% distinct(condition) %>% mutate(new_column=1:2) %>% slice(1)) `%>%` <- magrittr::`%>%` tt <- tidySummarizedExperiment::pasilla tt %>% right_join(tt %>% distinct(condition) %>% mutate(new_column=1:2) %>% slice(1)) `%>%` <- magrittr::`%>%` tt <- tidySummarizedExperiment::pasilla tt %>% full_join(tibble::tibble(condition="treated", dose=10)) `%>%` <- magrittr::`%>%` tidySummarizedExperiment::pasilla %>% slice(1) `%>%` <- magrittr::`%>%` tidySummarizedExperiment::pasilla %>% select(sample, feature, counts) `%>%` <- magrittr::`%>%` tidySummarizedExperiment::pasilla %>% sample_n(50) tidySummarizedExperiment::pasilla %>% sample_frac(0.1) `%>%` <- magrittr::`%>%` tidySummarizedExperiment::pasilla %>% pull(feature)