distinct
distinct(__data, *args, *, _keep_all=False, **kwargs)
Keep only distinct (unique) rows from a table.
Parameters
Name | Type | Description | Default |
---|---|---|---|
__data |
The input data. | required | |
*args |
Columns to use when determining which rows are unique. | () |
|
_keep_all |
Whether to keep all columns of the original data, not just *args. | False |
|
**kwargs |
If specified, arguments passed to the verb mutate(), and then being used in distinct(). | {} |
See Also
count: keep distinct rows, and count their number of observations.
Examples
>>> from siuba import _, distinct, select
>>> from siuba.data import penguins
>>> penguins >> distinct(_.species, _.island)
species island0 Adelie Torgersen
1 Adelie Biscoe
2 Adelie Dream
3 Gentoo Biscoe
4 Chinstrap Dream
Use _keep_all=True, to keep all columns in each distinct row. This lets you peak at the values of the first unique row.
>>> small_penguins = penguins >> select(_[:4])
>>> small_penguins >> distinct(_.species, _keep_all = True)
species island bill_length_mm bill_depth_mm0 Adelie Torgersen 39.1 18.7
1 Gentoo Biscoe 46.1 13.2
2 Chinstrap Dream 46.5 17.9