dply.forcats
dply.forcats
Functions for working with categorical column data.
Functions
Name | Description |
---|---|
fct_collapse | Return copy of fct with categories renamed. Optionally group all others. |
fct_infreq | Return a copy of fct, with categories ordered by frequency (largest first) |
fct_inorder | Return a copy of fct, with categories ordered by when they first appear. |
fct_lump | Return a copy of fct with categories lumped together. |
fct_recode | Return copy of fct with renamed categories. |
fct_reorder | Return copy of fct, with categories reordered according to values in x. |
fct_rev | Return a copy of fct with category level order reversed.next |
fct_collapse
dply.forcats.fct_collapse(fct, recat, group_other=None)
Return copy of fct with categories renamed. Optionally group all others.
Parameters
Name | Type | Description | Default |
---|---|---|---|
fct |
A pandas.Categorical, or array(-like) used to create one. | required | |
recat |
Dictionary of form {new_cat_name: old_cat_name}. old_cat_name may be a list of existing categories, to be given the same name. | required | |
group_other |
An optional string, specifying what all other categories should be named. This will always be the last category level in the result. | None |
Notes
Resulting levels index is ordered according to the earliest level replaced. If we rename the first and last levels to “c”, then “c” is the first level.
Examples
>>> fct_collapse(['a', 'b', 'c'], {'x': 'a'})
'x', 'b', 'c']
[3, object): ['x', 'b', 'c'] Categories (
>>> fct_collapse(['a', 'b', 'c'], {'x': 'a'}, group_other = 'others')
'x', 'others', 'others']
[2, object): ['x', 'others'] Categories (
>>> fct_collapse(['a', 'b', 'c'], {'ab': ['a', 'b']})
'ab', 'ab', 'c']
[2, object): ['ab', 'c'] Categories (
>>> fct_collapse(['a', 'b', None], {'a': ['b']})
'a', 'a', NaN]
[1, object): ['a'] Categories (
fct_infreq
dply.forcats.fct_infreq(fct, ordered=None)
Return a copy of fct, with categories ordered by frequency (largest first)
Parameters
Name | Type | Description | Default |
---|---|---|---|
fct |
list-like | A pandas Series, Categorical, or list-like object | required |
ordered |
bool | Whether to return an ordered categorical. By default a Categorical inputs’ ordered setting is respected. Use this to override it. | None |
See Also
fct_inorder: Order categories by when they’re first observed.
Examples
>>> fct_infreq(["c", "a", "c", "c", "a", "b"])
'c', 'a', 'c', 'c', 'a', 'b']
[3, object): ['c', 'a', 'b'] Categories (
fct_inorder
dply.forcats.fct_inorder(fct, ordered=None)
Return a copy of fct, with categories ordered by when they first appear.
Parameters
Name | Type | Description | Default |
---|---|---|---|
fct |
list-like | A pandas Series, Categorical, or list-like object | required |
ordered |
bool | Whether to return an ordered categorical. By default a Categorical inputs’ ordered setting is respected. Use this to override it. | None |
See Also
fct_infreq: Order categories by value frequency count.
Examples
>>> fct = pd.Categorical(["c", "a", "b"])
>>> fct
'c', 'a', 'b']
[3, object): ['a', 'b', 'c'] Categories (
Note that above the categories are sorted alphabetically. Use fct_inorder to keep the categories in first-observed order.
>>> fct_inorder(fct)
'c', 'a', 'b']
[3, object): ['c', 'a', 'b'] Categories (
fct_inorder also accepts pd.Series and list objects:
>>> fct_inorder(["z", "a"])
'z', 'a']
[2, object): ['z', 'a'] Categories (
By default, the ordered setting of categoricals is respected. Use the ordered parameter to override it.
>>> fct2 = pd.Categorical(["z", "a", "b"], ordered=True)
>>> fct_inorder(fct2)
'z', 'a', 'b']
[3, object): ['z' < 'a' < 'b'] Categories (
>>> fct_inorder(fct2, ordered=False)
'z', 'a', 'b']
[3, object): ['z', 'a', 'b'] Categories (
fct_lump
dply.forcats.fct_lump(fct, n=None, prop=None, w=None, other_level='Other', ties=None)
Return a copy of fct with categories lumped together.
Parameters
Name | Type | Description | Default |
---|---|---|---|
fct |
A pandas.Categorical, or array(-like) used to create one. | required | |
n |
Number of categories to keep. | None |
|
prop |
(not implemented) keep categories that occur prop proportion of the time. | None |
|
w |
Array of weights corresponding to each value in fct. | None |
|
other_level |
Name for all lumped together levels. | 'Other' |
|
ties |
(not implemented) method to use in the case of ties. | None |
Notes
Currently, one of n and prop must be specified.
Examples
>>> fct_lump(['a', 'a', 'b', 'c'], n = 1)
'a', 'a', 'Other', 'Other']
[2, object): ['a', 'Other'] Categories (
>>> fct_lump(['a', 'a', 'b', 'b', 'c', 'd'], prop = .2)
'a', 'a', 'b', 'b', 'Other', 'Other']
[3, object): ['a', 'b', 'Other'] Categories (
fct_recode
dply.forcats.fct_recode(fct, recat=None, **kwargs)
Return copy of fct with renamed categories.
Parameters
Name | Type | Description | Default |
---|---|---|---|
fct |
A pandas.Categorical, or array(-like) used to create one. | required | |
**kwargs |
Arguments of form new_name = old_name. | {} |
Examples
>>> cat = ['a', 'b', 'c']
>>> fct_recode(cat, z = 'c')
'a', 'b', 'z']
[3, object): ['a', 'b', 'z'] Categories (
>>> fct_recode(cat, x = ['a', 'b'])
'x', 'x', 'c']
[2, object): ['x', 'c'] Categories (
>>> fct_recode(cat, {"x": ['a', 'b']})
'x', 'x', 'c']
[2, object): ['x', 'c'] Categories (
fct_reorder
dply.forcats.fct_reorder(fct, x, func=np.median, desc=False)
Return copy of fct, with categories reordered according to values in x.
Parameters
Name | Type | Description | Default |
---|---|---|---|
fct |
A pandas.Categorical, or array(-like) used to create one. | required | |
x |
Values used to reorder categorical. Must be same length as fct. | required | |
func |
Function run over all values within a level of the categorical. | np.median |
|
desc |
Whether to sort in descending order. | False |
Notes
NaN categories can’t be ordered. When func returns NaN, sorting is always done with NaNs last.
Examples
>>> fct_reorder(['a', 'a', 'b'], [4, 3, 2])
'a', 'a', 'b']
[2, object): ['b', 'a'] Categories (
>>> fct_reorder(['a', 'a', 'b'], [4, 3, 2], desc = True)
'a', 'a', 'b']
[2, object): ['a', 'b'] Categories (
>>> fct_reorder(['x', 'x', 'y'], [4, 0, 2], np.max)
'x', 'x', 'y']
[2, object): ['y', 'x'] Categories (
fct_rev
dply.forcats.fct_rev(fct)
Return a copy of fct with category level order reversed.next
Parameters
Name | Type | Description | Default |
---|---|---|---|
fct |
A pandas.Categorical, or array(-like) used to create one. | required |
Examples
>>> fct = pd.Categorical(["a", "b", "c"])
>>> fct
'a', 'b', 'c']
[3, object): ['a', 'b', 'c'] Categories (
>>> fct_rev(fct)
'a', 'b', 'c']
[3, object): ['c', 'b', 'a'] Categories (
Note that this function can also accept a list.
>>> fct_rev(["a", "b", "c"])
'a', 'b', 'c']
[3, object): ['c', 'b', 'a'] Categories (