experimental.pivot.pivot_longer

experimental.pivot.pivot_longer(__data, *cols, *, names_to='name', names_prefix=None, names_sep=None, names_pattern=None, names_ptypes=None, names_repair='check_unique', values_to='value', values_drop_na=False, values_ptypes=None, values_transform=None)

Pivot data from wide to long format.

This function stacks columns of data, turning them into rows.

Parameters

Name Type Description Default
__data The input data. required
*cols Columns to pivot into longer format. This uses tidyselect (e.g. _\[_.some_col, _.another_col\]). ()
names_to Union[str, Tuple[str, …]] A list specifying the new column or columns to create from the information stored in the column names of data specified by cols. 'name'
names_prefix Optional[str] A regular expression to strip off from the start of column selected by *cols. None
names_sep Optional[str] If names_to is a list of name parts, this is a separater the name is split on. This is the same as the sep argument in the separate() function. None
names_pattern Optional[str] If names_to is a list of name parts, this is a pattern to extract parts This is the same as the regex argument in the extract() function. None
names_ptypes Optional[Tuple] Not implemented. None
values_ptypes Optional[Tuple] Not implemented. None
names_transform TODO required
names_repair str Strategy for fixing of invalid column names. “minimal” leaves them as is. “check_unique” raises an error if there are duplicate names. “unique” de-duplicates names by appending “___{position}” to them. 'check_unique'
values_to str A string specifying the name of the column created to hold the stacked values of the selected *cols. If names_to is a list with the entry “.value”, then this argument is ignored. 'value'

Examples

>>> from siuba import _
>>> df = pd.DataFrame({"id": [1, 2], "x": [5, 6], "y": [7, 8]})
>>> pivot_longer(df, ~_.id, names_to="variable", values_to="number")
   id variable  number
0   1        x       5
0   1        y       7
1   2        x       6
1   2        y       8
>>> weeks = pd.DataFrame({"id": [1], "year": [2020], "wk1": [5], "wk2": [6]})
>>> pivot_longer(weeks, _.startswith("wk"), names_to="week", names_prefix="wk")
   id  year week  value
0   1  2020    1      5
0   1  2020    2      6
>>> df2 = pd.DataFrame({"id": [1], "a_x1": [2], "b_x2": [3], "a_y1": [4]})
>>> names = ["condition", "group", "number"]
>>> pat = "(.*)_(.)(.*)"
>>> pivot_longer(df2, _["a_x1":"a_y1"], names_to = names, names_pattern = pat)
   id condition group number  value
0   1         a     x      1      2
0   1         b     x      2      3
0   1         a     y      1      4
>>> names = ["x1", "x2", "y1", "y2"]
>>> wide = pd.DataFrame({
...    "x1": [1, 11], "x2": [2, 22], "y1": [3, 33], "y2": [4, 44]
... })
>>> pivot_longer(wide, _[:], names_to = [".value", "set"], names_pattern = "(.)(.)")
  set   x   y
0   1   1   3
0   2   2   4
1   1  11  33
1   2  22  44