experimental.pivot.pivot_wider

experimental.pivot.pivot_wider(__data, id_cols=None, id_expand=False, names_from='name', names_prefix='', names_sep='_', names_glue=None, names_sort=None, names_vary='fastest', names_expand=False, names_repair='check_unique', values_from='value', values_fill=None, values_fn=None, unused_fn=None)

Pivot data from long to wide format.

This function splits a column, putting the pieces side-by-side based on an index.

Parameters

Name	Description	Default
`__data`	The input data.	required
`id_cols`	A selection of columns that uniquely identify each observation.	`None`
`id_expand`	Whether to ensure each unique combination of id_cols is a row in the data before pivoting, using `expand()`. This results in more rows. When True, this also sorts the final result by the `id_cols`.	`False`
`names_from`	A pair fo arguments describing which column (or columns) to get the name of the output column (names_from), and which column (or columns) to get the cell values from (values_from).	`'name'`
`values_from`	A pair fo arguments describing which column (or columns) to get the name of the output column (names_from), and which column (or columns) to get the cell values from (values_from).	`'name'`
`names_prefix`	String added to the start of every variable name.	`''`
`names_sep`	If names_from or values_from contains multiple values, this will be used to join their values together into a single string to use as a column name.	`'_'`
`names_glue`	Instead of names_sep and names_prefix, supply a string template that uses the names_from columns (and a special .value variable) to create custom column names.	`None`
`names_sort`	Should the column names be sorted? The default is False, which results in column names ordered by first appearance.	`None`
`names_vary`	Option specifying how columns are ordered when names_from and values_from both identify new columns. “fastest” varies names_from fastest, while “slowest” varies names_from slowest.	`'fastest'`
`names_expand`	Whether to ensure all combinations of names_from columns are in the result using the `expand()` function. This results in more columns in the output.	`False`
`names_repair`	Strategy for fixing of invalid column names. “minimal” leaves them as is. “check_unique” raises an error if there are duplicate names. “unique” de-duplicates names by appending “___{position}” to them.	`'check_unique'`
`values_fill`	A scalar value used to fill in any missing values. Alternatively, a dictionary mapping column names to fill values.	`None`
`values_fn`	An optional function to apply to each cell of the output. This is useful when each cell would contain multiple values. E.g. values_fn=“max” would calculate the max value.	`None`
`unused_fn`	Not implemented.	`None`

Examples

>>> from siuba import _

>>> df = pd.DataFrame(
...    {"id": ["a", "b", "a"], "name": ["x", "x", "y"], "value": [1, 2, 3]}
... )
>>> df
  id name  value
0  a    x      1
1  b    x      2
2  a    y      3

>>> pivot_wider(df, names_from=_.name, values_from=_.value)
  id    x    y
0  a  1.0  3.0
1  b  2.0  NaN

>>> pivot_wider(df, names_from=_.name, values_from=_.value, values_fill=0)
  id  x  y
0  a  1  3
1  b  2  0

>>> many = pd.DataFrame({
...    "id": [1, 1, 2, 2],
...    "var": ["one", "two", "one", "two"],
...    "x": [1, 2, 3, 4],
...    "y": [6, 7, 8, 9]
... })
>>> pivot_wider(many, names_from=_.var, values_from=_[_.x, _.y])
   id  x_one  x_two  y_one  y_two
0   1      1      2      6      7
1   2      3      4      8      9

>>> pivot_wider(many, names_from=_.var, values_from=_[_.x, _.y], names_vary="slowest")
   id  x_one  y_one  x_two  y_two
0   1      1      6      2      7
1   2      3      8      4      9

>>> pivot_wider(many, names_from=_.var, values_from=_[_.x, _.y], names_sep=".")
   id  x.one  x.two  y.one  y.two
0   1      1      2      6      7
1   2      3      4      8      9

>>> glue = "{variable}_X_{value}"
>>> pivot_wider(many, names_from=_.var, values_from=_[_.x, _.y], names_glue=glue)
   id  x_X_one  x_X_two  y_X_one  y_X_two
0   1        1        2        6        7
1   2        3        4        8        9

>>> from siuba.data import warpbreaks
>>> warpbreaks.head()
   breaks wool tension
0      26    A       L
1      30    A       L
2      54    A       L
3      25    A       L
4      70    A       L

>>> pivot_wider(warpbreaks, names_from=_.wool, values_from=_.breaks, values_fn="mean")
  tension          A          B
0       H  24.555556  18.777778
1       L  44.555556  28.222222
2       M  24.000000  28.777778