bind_rows {tidySummarizedExperiment}R Documentation

distinct

Description

filter() retains the rows where the conditions you provide a TRUE. Note that, unlike base subsetting with [, rows where the condition evaluates to NA are dropped.

Most data operations are done on groups defined by variables. group_by() takes an existing tbl and converts it into a grouped tbl where operations are performed "by group". ungroup() removes grouping.

summarise() creates a new data frame. It will have one (or more) rows for each combination of grouping variables; if there are no grouping variables, the output will have a single row summarising all observations in the input. It will contain one column for each grouping variable and one column for each of the summary statistics that you have specified.

summarise() and summarize() are synonyms.

mutate() adds new variables and preserves existing ones; transmute() adds new variables and drops existing ones. New variables overwrite existing variables of the same name. Variables can be removed by setting their value to NULL.

Rename individual variables using new_name=old_name syntax.

See this repository for alternative ways to perform row-wise operations.

slice() lets you index rows by their (integer) locations. It allows you to select, remove, and duplicate rows. It is accompanied by a number of helpers for common use cases:

If .data is a grouped_df, the operation will be performed on each group, so that (e.g.) slice_head(df, n=5) will select the first five rows in each group.

Select (and optionally rename) variables in a data frame, using a concise mini-language that makes it easy to refer to variables based on their name (e.g. a:f selects all columns from a on the left to f on the right). You can also use predicate functions like is.numeric to select variables based on their properties.

[Superseded] sample_n() and sample_frac() have been superseded in favour of slice_sample(). While they will not be deprecated in the near future, retirement means that we will only perform critical bug fixes, so we recommend moving to the newer alternative.

These functions were superseded because we realised it was more convenient to have two mutually exclusive arguments to one function, rather than two separate functions. This also made it to clean up a few other smaller design issues with sample_n()/sample_frac:

pull() is similar to $. It's mostly useful because it looks a little nicer in pipes, it also works with remote data frames, and it can optionally name the output.

Usage

bind_rows(..., .id = NULL, add.cell.ids = NULL)

bind_cols(..., .id = NULL)

Arguments

...

For use by methods.

.id

Data frame identifier.

When .id is supplied, a new column of identifiers is created to link each row to its original data frame. The labels are taken from the named arguments to bind_rows(). When a list of data frames is supplied, the labels are taken from the names of the list. If no names are found a numeric sequence is used instead.

add.cell.ids

from SummarizedExperiment 3.0 A character vector of length(x=c(x, y)). Appends the corresponding values to the start of each objects' cell names.

.keep_all

If TRUE, keep all variables in .data. If a combination of ... is not distinct, this keeps the first row of values. (See dplyr)

.preserve

when FALSE (the default), the grouping structure is recalculated based on the resulting data, otherwise it is kept as is.

.add

When FALSE, the default, group_by() will override existing groups. To add to the existing groups, use .add=TRUE.

This argument was previously called add, but that prevented creating a new grouping variable called add, and conflicts with our naming conventions.

.drop

When .drop=TRUE, empty groups are dropped. See group_by_drop_default() for what the default value is for this argument.

data

Input data frame.

x

tbls to join. (See dplyr)

y

tbls to join. (See dplyr)

by

A character vector of variables to join by. (See dplyr)

copy

If x and y are not from the same data source, and copy is TRUE, then y will be copied into the same src as x. (See dplyr)

suffix

If there are non-joined duplicate variables in x and y, these suffixes will be added to the output to disambiguate them. Should be a character vector of length 2. (See dplyr)

tbl

A data.frame.

size

<tidy-select> For sample_n(), the number of rows to select. For sample_frac(), the fraction of rows to select. If tbl is grouped, size applies to each group.

replace

Sample with or without replacement?

weight

<tidy-select> Sampling weights. This must evaluate to a vector of non-negative numbers the same length as the input. Weights are automatically standardised to sum to 1.

.env

DEPRECATED.

.data

A tidySummarizedExperiment object or any data frame

name

An optional parameter that specifies the column to be used as names for a named vector. Specified in a similar manner as var.

Details

dplyr is not yet smart enough to optimise filtering optimisation on grouped datasets that don't need grouped calculations. For this reason, filtering is often considerably faster on ungroup()ed data.

rowwise() is used for the results of do() when you create list-variables. It is also useful to support arbitrary complex operations that need to be applied to each row.

Currently, rowwise grouping only works with data frames. Its main impact is to allow you to work with list-variables in summarise() and mutate() without having to use [[1]]. This makes summarise() on a rowwise tbl effectively equivalent to plyr::ldply().

Slice does not work with relational databases because they have no intrinsic notion of row order. If you want to perform the equivalent operation, use filter() and row_number().

Value

A tidySummarizedExperiment object

An object of the same type as .data.

A grouped data frame, unless the combination of ... and add yields a non empty set of grouping columns, a regular (ungrouped) data frame otherwise.

An object usually of the same type as .data.

An object of the same type as .data.

For mutate():

For transmute():

An object of the same type as .data.

A tbl

A tbl

A tidySummarizedExperiment object

A tidySummarizedExperiment object

A tidySummarizedExperiment object

A tidySummarizedExperiment object

An object of the same type as .data. The output has the following properties:

An object of the same type as .data. The output has the following properties:

A tidySummarizedExperiment object

A vector the same size as .data.

Useful filter functions

Grouped tibbles

Because filtering expressions are computed within groups, they may yield different results on grouped tibbles. This will be the case as soon as an aggregating, lagging, or ranking function is involved. Compare this ungrouped filtering:

The former keeps rows with mass greater than the global average whereas the latter keeps rows with mass greater than the gender

average.

Because mutating expressions are computed within groups, they may yield different results on grouped tibbles. This will be the case as soon as an aggregating, lagging, or ranking function is involved. Compare this ungrouped mutate:

Methods

This function is a generic, which means that packages can provide implementations (methods) for other classes. See the documentation of individual methods for extra arguments and differences in behaviour.

The following methods are currently available in loaded packages:

These function are generics, which means that packages can provide implementations (methods) for other classes. See the documentation of individual methods for extra arguments and differences in behaviour.

Methods available in currently loaded packages:

This function is a generic, which means that packages can provide implementations (methods) for other classes. See the documentation of individual methods for extra arguments and differences in behaviour.

The following methods are currently available in loaded packages:

These function are generics, which means that packages can provide implementations (methods) for other classes. See the documentation of individual methods for extra arguments and differences in behaviour.

Methods available in currently loaded packages:

This function is a generic, which means that packages can provide implementations (methods) for other classes. See the documentation of individual methods for extra arguments and differences in behaviour.

The following methods are currently available in loaded packages:

These function are generics, which means that packages can provide implementations (methods) for other classes. See the documentation of individual methods for extra arguments and differences in behaviour.

Methods available in currently loaded packages:

This function is a generic, which means that packages can provide implementations (methods) for other classes. See the documentation of individual methods for extra arguments and differences in behaviour.

The following methods are currently available in loaded packages: no methods found.

This function is a generic, which means that packages can provide implementations (methods) for other classes. See the documentation of individual methods for extra arguments and differences in behaviour.

The following methods are currently available in loaded packages: no methods found.

Useful functions

Backend variations

The data frame backend supports creating a variable and using it in the same summary. This means that previously created summary variables can be further transformed or combined within the summary, as in mutate(). However, it also means that summary variables with the same names as previous variables overwrite them, making those variables unavailable to later summary variables.

This behaviour may not be supported in other backends. To avoid unexpected results, consider using new names for your summary variables, especially when creating multiple summaries.

Useful mutate functions

Scoped selection and renaming

Use the three scoped variants (rename_all(), rename_if(), rename_at()) to renaming a set of variables with a function.

See Also

filter_all(), filter_if() and filter_at().

Examples


`%>%` <- magrittr::`%>%`
tidySummarizedExperiment::pasilla %>%
    
    distinct(sample)

`%>%` <- magrittr::`%>%`
tidySummarizedExperiment::pasilla %>%
    
    filter(sample == "untrt1")

# Learn more in ?dplyr_tidy_eval
`%>%` <- magrittr::`%>%`
tidySummarizedExperiment::pasilla %>%
    
    group_by(sample)
`%>%` <- magrittr::`%>%`
tidySummarizedExperiment::pasilla %>%
    
    summarise(mean(counts))
`%>%` <- magrittr::`%>%`
tidySummarizedExperiment::pasilla %>%
    
    mutate(logcounts=log2(counts))

`%>%` <- magrittr::`%>%`
# tidySummarizedExperiment::pasilla %>%
#     
#     rename(cond=condition)
`%>%` <- magrittr::`%>%`
`%>%` <- magrittr::`%>%`

tt <- tidySummarizedExperiment::pasilla 
tt %>% left_join(tt %>% distinct(condition) %>% mutate(new_column=1:2))
`%>%` <- magrittr::`%>%`

tt <- tidySummarizedExperiment::pasilla 
tt %>% inner_join(tt %>% distinct(condition) %>% mutate(new_column=1:2) %>% slice(1))

`%>%` <- magrittr::`%>%`

tt <- tidySummarizedExperiment::pasilla 
tt %>% right_join(tt %>% distinct(condition) %>% mutate(new_column=1:2) %>% slice(1))

`%>%` <- magrittr::`%>%`

tt <- tidySummarizedExperiment::pasilla 
tt %>% full_join(tibble::tibble(condition="treated", dose=10))


`%>%` <- magrittr::`%>%`
tidySummarizedExperiment::pasilla %>%
    
    slice(1)

`%>%` <- magrittr::`%>%`
tidySummarizedExperiment::pasilla %>%
    
    select(sample, feature, counts)

`%>%` <- magrittr::`%>%`
tidySummarizedExperiment::pasilla %>%
    
    sample_n(50)
tidySummarizedExperiment::pasilla %>%
    
    sample_frac(0.1)

`%>%` <- magrittr::`%>%`
tidySummarizedExperiment::pasilla %>%
    
    pull(feature)

[Package tidySummarizedExperiment version 1.4.0 Index]