unnest {tidySummarizedExperiment}R Documentation

unnest

Description

Given a regular expression with capturing groups, extract() turns each group into a new column. If the groups don't match, or the input is NA, the output will be NA.

pivot_longer() "lengthens" data, increasing the number of rows and decreasing the number of columns. The inverse transformation is pivot_wider()

Learn more in vignette("pivot").

pivot_wider() "widens" data, increasing the number of columns and decreasing the number of rows. The inverse transformation is pivot_longer().

Learn more in vignette("pivot").

Convenience function to paste together multiple columns into one.

Given either a regular expression or a vector of character positions, separate() turns a single character column into multiple columns.

Arguments

keep_empty

See tidyr::unnest

ptype

See tidyr::unnest

.drop

See tidyr::unnest

.id

tidyr::unnest

.sep

tidyr::unnest

.preserve

See tidyr::unnest

.data

A tbl. (See tidyr)

.names_sep

See ?tidyr::nest

into

Names of new variables to create as character vector. Use NA to omit the variable in the output.

regex

a regular expression used to extract the desired values. There should be one group (defined by ()) for each element of into.

convert

If TRUE, will run type.convert() with as.is=TRUE on new columns. This is useful if the component columns are integer, numeric or logical.

NB: this will cause string "NA"s to be converted to NAs.

cols

<tidy-select> Columns to pivot into longer format.

names_to

A string specifying the name of the column to create from the data stored in the column names of data.

Can be a character vector, creating multiple columns, if names_sep or names_pattern is provided. In this case, there are two special values you can take advantage of:

  • NA will discard that component of the name.

  • .value indicates that component of the name defines the name of the column containing the cell values, overriding values_to.

names_sep, names_pattern

If names_to contains multiple values, these arguments control how the column name is broken up.

names_sep takes the same specification as separate(), and can either be a numeric vector (specifying positions to break on), or a single string (specifying a regular expression to split on).

names_pattern takes the same specification as extract(), a regular expression containing matching groups (()).

If these arguments do not give you enough control, use pivot_longer_spec() to create a spec object and process manually as needed.

names_repair

What happens if the output has invalid column names? The default, "check_unique" is to error if the columns are duplicated. Use "minimal" to allow duplicates in the output, or "unique" to de-duplicated by adding numeric suffixes. See vctrs::vec_as_names() for more options.

values_to

A string specifying the name of the column to create from the data stored in cell values. If names_to is a character containing the special .value sentinel, this value will be ignored, and the name of the value column will be derived from part of the existing column names.

values_drop_na

If TRUE, will drop rows that contain only NAs in the value_to column. This effectively converts explicit missing values to implicit missing values, and should generally be used only when missing values in data were created by its structure.

names_transform, values_transform

A list of column name-function pairs. Use these arguments if you need to change the type of specific columns. For example, names_transform=list(week=as.integer) would convert a character week variable to an integer.

names_ptypes, values_ptypes

A list of column name-prototype pairs. A prototype (or ptype for short) is a zero-length vector (like integer() or numeric()) that defines the type, class, and attributes of a vector. Use these arguments to confirm that the created columns are the types that you expect.

If not specified, the type of the columns generated from names_to will be character, and the type of the variables generated from values_to will be the common type of the input columns used to generate them.

id_cols

<tidy-select> A set of columns that uniquely identifies each observation. Defaults to all columns in data except for the columns specified in names_from and values_from. Typically used when you have redundant variables, i.e. variables whose values are perfectly correlated with existing variables.

names_from, values_from

<tidy-select> A pair of arguments describing which column (or columns) to get the name of the output column (names_from), and which column (or columns) to get the cell values from (values_from).

If values_from contains multiple values, the value will be added to the front of the output column.

names_sep

If names_from or values_from contains multiple variables, this will be used to join their values together into a single string to use as a column name.

names_prefix

String added to the start of every variable name. This is particularly useful if names_from is a numeric vector and you want to create syntactic variable names.

names_glue

Instead of names_sep and names_prefix, you can supply a glue specification that uses the names_from columns (and special .value) to create custom column names.

names_sort

Should the column names be sorted? If FALSE, the default, column names are ordered by first appearance.

values_fill

Optionally, a (scalar) value that specifies what each value should be filled in with when missing.

This can be a named list if you want to apply different aggregations to different value columns.

values_fn

Optionally, a function applied to the value in each cell in the output. You will typically use this when the combination of id_cols and value column does not uniquely identify an observation.

This can be a named list if you want to apply different aggregations to different value columns.

data

A data frame.

col

The name of the new column, as a string or symbol.

This argument is passed by expression and supports quasiquotation (you can unquote strings and symbols). The name is captured from the expression with rlang::ensym() (note that this kind of interface where symbols do not represent actual objects is now discouraged in the tidyverse; we support it here for backward compatibility).

...

<tidy-select> Columns to unite

na.rm

If TRUE, missing values will be remove prior to uniting each value.

remove

If TRUE, remove input columns from output data frame.

sep

Separator between columns.

If character, sep is interpreted as a regular expression. The default value is a regular expression that matches any sequence of non-alphanumeric values.

If numeric, sep is interpreted as character positions to split at. Positive values start at 1 at the far-left of the string; negative value start at -1 at the far-right of the string. The length of sep should be one less than into.

extra

If sep is a character vector, this controls what happens when there are too many pieces. There are three valid options:

  • "warn" (the default): emit a warning and drop extra values.

  • "drop": drop any extra values without a warning.

  • "merge": only splits at most length(into) times

fill

If sep is a character vector, this controls what happens when there are not enough pieces. There are three valid options:

  • "warn" (the default): emit a warning and fill from the right

  • "right": fill with missing values on the right

  • "left": fill with missing values on the left

Details

pivot_longer() is an updated approach to gather(), designed to be both simpler to use and to handle more use cases. We recommend you use pivot_longer() for new code; gather() isn't going away but is no longer under active development.

pivot_wider() is an updated approach to spread(), designed to be both simpler to use and to handle more use cases. We recommend you use pivot_wider() for new code; spread() isn't going away but is no longer under active development.

Value

A tidySummarizedExperiment objector a tibble depending on input

A tidySummarizedExperiment objector a tibble depending on input

A tidySummarizedExperiment objector a tibble depending on input

A tidySummarizedExperiment objector a tibble depending on input

A tidySummarizedExperiment objector a tibble depending on input

A tidySummarizedExperiment objector a tibble depending on input

See Also

separate() to split up by a separator.

pivot_wider_spec() to pivot "by hand" with a data frame that defines a pivotting specification.

separate(), the complement.

unite(), the complement, extract() which uses regular expression capturing groups.

Examples


tidySummarizedExperiment::pasilla %>%
    
    nest(data=-condition) %>%
    unnest(data)


tidySummarizedExperiment::pasilla %>%
    
    nest(data=-condition)


tidySummarizedExperiment::pasilla %>%
    
    extract(type, into="sequencing", regex="([a-z]*)_end", convert=TRUE)
# See vignette("pivot") for examples and explanation

library(dplyr)
tidySummarizedExperiment::pasilla %>%
    
    pivot_longer(c(condition, type), names_to="name", values_to="value")
# See vignette("pivot") for examples and explanation

library(dplyr)
tidySummarizedExperiment::pasilla %>%
    
    pivot_wider(names_from=feature, values_from=counts)

tidySummarizedExperiment::pasilla %>%
    
    unite("group", c(condition, type))

un <- tidySummarizedExperiment::pasilla %>%
    
    unite("group", c(condition, type))
un %>% separate(col=group, into=c("condition", "type"))

[Package tidySummarizedExperiment version 1.4.0 Index]