experimental.pivot.pivot_wider
experimental.pivot.pivot_wider(__data, id_cols=None, id_expand=False, names_from='name', names_prefix='', names_sep='_', names_glue=None, names_sort=None, names_vary='fastest', names_expand=False, names_repair='check_unique', values_from='value', values_fill=None, values_fn=None, unused_fn=None)
Pivot data from long to wide format.
This function splits a column, putting the pieces side-by-side based on an index.
Parameters
Name | Type | Description | Default |
---|---|---|---|
__data |
The input data. | required | |
id_cols |
A selection of columns that uniquely identify each observation. | None |
|
id_expand |
Whether to ensure each unique combination of id_cols is a row in the data before pivoting, using expand() . This results in more rows. When True, this also sorts the final result by the id_cols . |
False |
|
names_from |
A pair fo arguments describing which column (or columns) to get the name of the output column (names_from), and which column (or columns) to get the cell values from (values_from). | 'name' |
|
values_from |
A pair fo arguments describing which column (or columns) to get the name of the output column (names_from), and which column (or columns) to get the cell values from (values_from). | 'name' |
|
names_prefix |
String added to the start of every variable name. | '' |
|
names_sep |
If names_from or values_from contains multiple values, this will be used to join their values together into a single string to use as a column name. | '_' |
|
names_glue |
Instead of names_sep and names_prefix, supply a string template that uses the names_from columns (and a special .value variable) to create custom column names. | None |
|
names_sort |
Should the column names be sorted? The default is False, which results in column names ordered by first appearance. | None |
|
names_vary |
Option specifying how columns are ordered when names_from and values_from both identify new columns. “fastest” varies names_from fastest, while “slowest” varies names_from slowest. | 'fastest' |
|
names_expand |
Whether to ensure all combinations of names_from columns are in the result using the expand() function. This results in more columns in the output. |
False |
|
names_repair |
Strategy for fixing of invalid column names. “minimal” leaves them as is. “check_unique” raises an error if there are duplicate names. “unique” de-duplicates names by appending “___{position}” to them. | 'check_unique' |
|
values_fill |
A scalar value used to fill in any missing values. Alternatively, a dictionary mapping column names to fill values. | None |
|
values_fn |
An optional function to apply to each cell of the output. This is useful when each cell would contain multiple values. E.g. values_fn=“max” would calculate the max value. | None |
|
unused_fn |
Not implemented. | None |
Examples
>>> from siuba import _
>>> df = pd.DataFrame(
"id": ["a", "b", "a"], "name": ["x", "x", "y"], "value": [1, 2, 3]}
... {
... )>>> df
id name value
0 a x 1
1 b x 2
2 a y 3
>>> pivot_wider(df, names_from=_.name, values_from=_.value)
id x y
0 a 1.0 3.0
1 b 2.0 NaN
>>> pivot_wider(df, names_from=_.name, values_from=_.value, values_fill=0)
id x y
0 a 1 3
1 b 2 0
>>> many = pd.DataFrame({
"id": [1, 1, 2, 2],
... "var": ["one", "two", "one", "two"],
... "x": [1, 2, 3, 4],
... "y": [6, 7, 8, 9]
...
... })>>> pivot_wider(many, names_from=_.var, values_from=_[_.x, _.y])
id x_one x_two y_one y_two
0 1 1 2 6 7
1 2 3 4 8 9
>>> pivot_wider(many, names_from=_.var, values_from=_[_.x, _.y], names_vary="slowest")
id x_one y_one x_two y_two
0 1 1 6 2 7
1 2 3 8 4 9
>>> pivot_wider(many, names_from=_.var, values_from=_[_.x, _.y], names_sep=".")
id x.one x.two y.one y.two
0 1 1 2 6 7
1 2 3 4 8 9
>>> glue = "{variable}_X_{value}"
>>> pivot_wider(many, names_from=_.var, values_from=_[_.x, _.y], names_glue=glue)
id x_X_one x_X_two y_X_one y_X_two
0 1 1 2 6 7
1 2 3 4 8 9
>>> from siuba.data import warpbreaks
>>> warpbreaks.head()
breaks wool tension0 26 A L
1 30 A L
2 54 A L
3 25 A L
4 70 A L
>>> pivot_wider(warpbreaks, names_from=_.wool, values_from=_.breaks, values_fn="mean")
tension A B0 H 24.555556 18.777778
1 L 44.555556 28.222222
2 M 24.000000 28.777778