dply.forcats

dply.forcats

Functions for working with categorical column data.

Functions

Name Description
fct_collapse Return copy of fct with categories renamed. Optionally group all others.
fct_infreq Return a copy of fct, with categories ordered by frequency (largest first)
fct_inorder Return a copy of fct, with categories ordered by when they first appear.
fct_lump Return a copy of fct with categories lumped together.
fct_recode Return copy of fct with renamed categories.
fct_reorder Return copy of fct, with categories reordered according to values in x.
fct_rev Return a copy of fct with category level order reversed.next

fct_collapse

dply.forcats.fct_collapse(fct, recat, group_other=None)

Return copy of fct with categories renamed. Optionally group all others.

Parameters

Name Type Description Default
fct A pandas.Categorical, or array(-like) used to create one. required
recat Dictionary of form {new_cat_name: old_cat_name}. old_cat_name may be a list of existing categories, to be given the same name. required
group_other An optional string, specifying what all other categories should be named. This will always be the last category level in the result. None

Notes

Resulting levels index is ordered according to the earliest level replaced. If we rename the first and last levels to “c”, then “c” is the first level.

Examples

>>> fct_collapse(['a', 'b', 'c'], {'x': 'a'})
['x', 'b', 'c']
Categories (3, object): ['x', 'b', 'c']
>>> fct_collapse(['a', 'b', 'c'], {'x': 'a'}, group_other = 'others')
['x', 'others', 'others']
Categories (2, object): ['x', 'others']
>>> fct_collapse(['a', 'b', 'c'], {'ab': ['a', 'b']})
['ab', 'ab', 'c']
Categories (2, object): ['ab', 'c']
>>> fct_collapse(['a', 'b', None], {'a': ['b']})
['a', 'a', NaN]
Categories (1, object): ['a']

fct_infreq

dply.forcats.fct_infreq(fct, ordered=None)

Return a copy of fct, with categories ordered by frequency (largest first)

Parameters

Name Type Description Default
fct list-like A pandas Series, Categorical, or list-like object required
ordered bool Whether to return an ordered categorical. By default a Categorical inputs’ ordered setting is respected. Use this to override it. None

See Also

fct_inorder: Order categories by when they’re first observed.

Examples

>>> fct_infreq(["c", "a", "c", "c", "a", "b"])
['c', 'a', 'c', 'c', 'a', 'b']
Categories (3, object): ['c', 'a', 'b']

fct_inorder

dply.forcats.fct_inorder(fct, ordered=None)

Return a copy of fct, with categories ordered by when they first appear.

Parameters

Name Type Description Default
fct list-like A pandas Series, Categorical, or list-like object required
ordered bool Whether to return an ordered categorical. By default a Categorical inputs’ ordered setting is respected. Use this to override it. None

See Also

fct_infreq: Order categories by value frequency count.

Examples

>>> fct = pd.Categorical(["c", "a", "b"])
>>> fct
['c', 'a', 'b']
Categories (3, object): ['a', 'b', 'c']

Note that above the categories are sorted alphabetically. Use fct_inorder to keep the categories in first-observed order.

>>> fct_inorder(fct)
['c', 'a', 'b']
Categories (3, object): ['c', 'a', 'b']

fct_inorder also accepts pd.Series and list objects:

>>> fct_inorder(["z", "a"])
['z', 'a']
Categories (2, object): ['z', 'a']

By default, the ordered setting of categoricals is respected. Use the ordered parameter to override it.

>>> fct2 = pd.Categorical(["z", "a", "b"], ordered=True)
>>> fct_inorder(fct2)
['z', 'a', 'b']
Categories (3, object): ['z' < 'a' < 'b']
>>> fct_inorder(fct2, ordered=False)
['z', 'a', 'b']
Categories (3, object): ['z', 'a', 'b']

fct_lump

dply.forcats.fct_lump(fct, n=None, prop=None, w=None, other_level='Other', ties=None)

Return a copy of fct with categories lumped together.

Parameters

Name Type Description Default
fct A pandas.Categorical, or array(-like) used to create one. required
n Number of categories to keep. None
prop (not implemented) keep categories that occur prop proportion of the time. None
w Array of weights corresponding to each value in fct. None
other_level Name for all lumped together levels. 'Other'
ties (not implemented) method to use in the case of ties. None

Notes

Currently, one of n and prop must be specified.

Examples

>>> fct_lump(['a', 'a', 'b', 'c'], n = 1)
['a', 'a', 'Other', 'Other']
Categories (2, object): ['a', 'Other']
>>> fct_lump(['a', 'a', 'b', 'b', 'c', 'd'], prop = .2)
['a', 'a', 'b', 'b', 'Other', 'Other']
Categories (3, object): ['a', 'b', 'Other']

fct_recode

dply.forcats.fct_recode(fct, recat=None, **kwargs)

Return copy of fct with renamed categories.

Parameters

Name Type Description Default
fct A pandas.Categorical, or array(-like) used to create one. required
**kwargs Arguments of form new_name = old_name. {}

Examples

>>> cat = ['a', 'b', 'c']
>>> fct_recode(cat, z = 'c')
['a', 'b', 'z']
Categories (3, object): ['a', 'b', 'z']
>>> fct_recode(cat, x = ['a', 'b'])
['x', 'x', 'c']
Categories (2, object): ['x', 'c']
>>> fct_recode(cat, {"x": ['a', 'b']})
['x', 'x', 'c']
Categories (2, object): ['x', 'c']

fct_reorder

dply.forcats.fct_reorder(fct, x, func=np.median, desc=False)

Return copy of fct, with categories reordered according to values in x.

Parameters

Name Type Description Default
fct A pandas.Categorical, or array(-like) used to create one. required
x Values used to reorder categorical. Must be same length as fct. required
func Function run over all values within a level of the categorical. np.median
desc Whether to sort in descending order. False

Notes

NaN categories can’t be ordered. When func returns NaN, sorting is always done with NaNs last.

Examples

>>> fct_reorder(['a', 'a', 'b'], [4, 3, 2])
['a', 'a', 'b']
Categories (2, object): ['b', 'a']
>>> fct_reorder(['a', 'a', 'b'], [4, 3, 2], desc = True)
['a', 'a', 'b']
Categories (2, object): ['a', 'b']
>>> fct_reorder(['x', 'x', 'y'], [4, 0, 2], np.max)
['x', 'x', 'y']
Categories (2, object): ['y', 'x']

fct_rev

dply.forcats.fct_rev(fct)

Return a copy of fct with category level order reversed.next

Parameters

Name Type Description Default
fct A pandas.Categorical, or array(-like) used to create one. required

Examples

>>> fct = pd.Categorical(["a", "b", "c"])
>>> fct
['a', 'b', 'c']
Categories (3, object): ['a', 'b', 'c']
>>> fct_rev(fct)
['a', 'b', 'c']
Categories (3, object): ['c', 'b', 'a']

Note that this function can also accept a list.

>>> fct_rev(["a", "b", "c"])
['a', 'b', 'c']
Categories (3, object): ['c', 'b', 'a']