dply.forcats
dply.forcats
Functions for working with categorical column data.
Functions
| Name | Description |
|---|---|
| fct_collapse | Return copy of fct with categories renamed. Optionally group all others. |
| fct_infreq | Return a copy of fct, with categories ordered by frequency (largest first) |
| fct_inorder | Return a copy of fct, with categories ordered by when they first appear. |
| fct_lump | Return a copy of fct with categories lumped together. |
| fct_recode | Return copy of fct with renamed categories. |
| fct_reorder | Return copy of fct, with categories reordered according to values in x. |
| fct_rev | Return a copy of fct with category level order reversed.next |
fct_collapse
dply.forcats.fct_collapse(fct, recat, group_other=None)
Return copy of fct with categories renamed. Optionally group all others.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
fct |
A pandas.Categorical, or array(-like) used to create one. | required | |
recat |
Dictionary of form {new_cat_name: old_cat_name}. old_cat_name may be a list of existing categories, to be given the same name. | required | |
group_other |
An optional string, specifying what all other categories should be named. This will always be the last category level in the result. | None |
Notes
Resulting levels index is ordered according to the earliest level replaced. If we rename the first and last levels to “c”, then “c” is the first level.
Examples
>>> fct_collapse(['a', 'b', 'c'], {'x': 'a'})
['x', 'b', 'c']
Categories (3, object): ['x', 'b', 'c']>>> fct_collapse(['a', 'b', 'c'], {'x': 'a'}, group_other = 'others')
['x', 'others', 'others']
Categories (2, object): ['x', 'others']>>> fct_collapse(['a', 'b', 'c'], {'ab': ['a', 'b']})
['ab', 'ab', 'c']
Categories (2, object): ['ab', 'c']>>> fct_collapse(['a', 'b', None], {'a': ['b']})
['a', 'a', NaN]
Categories (1, object): ['a']fct_infreq
dply.forcats.fct_infreq(fct, ordered=None)
Return a copy of fct, with categories ordered by frequency (largest first)
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
fct |
list-like | A pandas Series, Categorical, or list-like object | required |
ordered |
bool | Whether to return an ordered categorical. By default a Categorical inputs’ ordered setting is respected. Use this to override it. | None |
See Also
fct_inorder: Order categories by when they’re first observed.
Examples
>>> fct_infreq(["c", "a", "c", "c", "a", "b"])
['c', 'a', 'c', 'c', 'a', 'b']
Categories (3, object): ['c', 'a', 'b']fct_inorder
dply.forcats.fct_inorder(fct, ordered=None)
Return a copy of fct, with categories ordered by when they first appear.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
fct |
list-like | A pandas Series, Categorical, or list-like object | required |
ordered |
bool | Whether to return an ordered categorical. By default a Categorical inputs’ ordered setting is respected. Use this to override it. | None |
See Also
fct_infreq: Order categories by value frequency count.
Examples
>>> fct = pd.Categorical(["c", "a", "b"])
>>> fct
['c', 'a', 'b']
Categories (3, object): ['a', 'b', 'c']Note that above the categories are sorted alphabetically. Use fct_inorder to keep the categories in first-observed order.
>>> fct_inorder(fct)
['c', 'a', 'b']
Categories (3, object): ['c', 'a', 'b']fct_inorder also accepts pd.Series and list objects:
>>> fct_inorder(["z", "a"])
['z', 'a']
Categories (2, object): ['z', 'a']By default, the ordered setting of categoricals is respected. Use the ordered parameter to override it.
>>> fct2 = pd.Categorical(["z", "a", "b"], ordered=True)
>>> fct_inorder(fct2)
['z', 'a', 'b']
Categories (3, object): ['z' < 'a' < 'b']>>> fct_inorder(fct2, ordered=False)
['z', 'a', 'b']
Categories (3, object): ['z', 'a', 'b']fct_lump
dply.forcats.fct_lump(fct, n=None, prop=None, w=None, other_level='Other', ties=None)
Return a copy of fct with categories lumped together.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
fct |
A pandas.Categorical, or array(-like) used to create one. | required | |
n |
Number of categories to keep. | None |
|
prop |
(not implemented) keep categories that occur prop proportion of the time. | None |
|
w |
Array of weights corresponding to each value in fct. | None |
|
other_level |
Name for all lumped together levels. | 'Other' |
|
ties |
(not implemented) method to use in the case of ties. | None |
Notes
Currently, one of n and prop must be specified.
Examples
>>> fct_lump(['a', 'a', 'b', 'c'], n = 1)
['a', 'a', 'Other', 'Other']
Categories (2, object): ['a', 'Other']>>> fct_lump(['a', 'a', 'b', 'b', 'c', 'd'], prop = .2)
['a', 'a', 'b', 'b', 'Other', 'Other']
Categories (3, object): ['a', 'b', 'Other']fct_recode
dply.forcats.fct_recode(fct, recat=None, **kwargs)
Return copy of fct with renamed categories.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
fct |
A pandas.Categorical, or array(-like) used to create one. | required | |
**kwargs |
Arguments of form new_name = old_name. | {} |
Examples
>>> cat = ['a', 'b', 'c']
>>> fct_recode(cat, z = 'c')
['a', 'b', 'z']
Categories (3, object): ['a', 'b', 'z']>>> fct_recode(cat, x = ['a', 'b'])
['x', 'x', 'c']
Categories (2, object): ['x', 'c']>>> fct_recode(cat, {"x": ['a', 'b']})
['x', 'x', 'c']
Categories (2, object): ['x', 'c']fct_reorder
dply.forcats.fct_reorder(fct, x, func=np.median, desc=False)
Return copy of fct, with categories reordered according to values in x.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
fct |
A pandas.Categorical, or array(-like) used to create one. | required | |
x |
Values used to reorder categorical. Must be same length as fct. | required | |
func |
Function run over all values within a level of the categorical. | np.median |
|
desc |
Whether to sort in descending order. | False |
Notes
NaN categories can’t be ordered. When func returns NaN, sorting is always done with NaNs last.
Examples
>>> fct_reorder(['a', 'a', 'b'], [4, 3, 2])
['a', 'a', 'b']
Categories (2, object): ['b', 'a']>>> fct_reorder(['a', 'a', 'b'], [4, 3, 2], desc = True)
['a', 'a', 'b']
Categories (2, object): ['a', 'b']>>> fct_reorder(['x', 'x', 'y'], [4, 0, 2], np.max)
['x', 'x', 'y']
Categories (2, object): ['y', 'x']fct_rev
dply.forcats.fct_rev(fct)
Return a copy of fct with category level order reversed.next
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
fct |
A pandas.Categorical, or array(-like) used to create one. | required |
Examples
>>> fct = pd.Categorical(["a", "b", "c"])
>>> fct
['a', 'b', 'c']
Categories (3, object): ['a', 'b', 'c']>>> fct_rev(fct)
['a', 'b', 'c']
Categories (3, object): ['c', 'b', 'a']Note that this function can also accept a list.
>>> fct_rev(["a", "b", "c"])
['a', 'b', 'c']
Categories (3, object): ['c', 'b', 'a']