distinct

distinct(__data, *args, *, _keep_all=False, **kwargs)

Keep only distinct (unique) rows from a table.

Parameters

Name	Type	Description	Default
`__data`		The input data.	required
`*args`		Columns to use when determining which rows are unique.	`()`
`_keep_all`		Whether to keep all columns of the original data, not just *args.	`False`
`**kwargs`		If specified, arguments passed to the verb mutate(), and then being used in distinct().	`{}`

See Also

count: keep distinct rows, and count their number of observations.

Examples

>>> from siuba import _, distinct, select
>>> from siuba.data import penguins

>>> penguins >> distinct(_.species, _.island)
     species     island
0     Adelie  Torgersen
1     Adelie     Biscoe
2     Adelie      Dream
3     Gentoo     Biscoe
4  Chinstrap      Dream

Use _keep_all=True, to keep all columns in each distinct row. This lets you peak at the values of the first unique row.

>>> small_penguins = penguins >> select(_[:4])
>>> small_penguins >> distinct(_.species, _keep_all = True)
     species     island  bill_length_mm  bill_depth_mm
0     Adelie  Torgersen            39.1           18.7
1     Gentoo     Biscoe            46.1           13.2
2  Chinstrap      Dream            46.5           17.9