from siuba.data import penguins
from siuba import _, summarize, group_by, if_else, transmute, case_when
penguins
species
island
bill_length_mm
bill_depth_mm
flipper_length_mm
body_mass_g
sex
year
0
Adelie
Torgersen
39.1
18.7
181.0
3750.0
male
2007
1
Adelie
Torgersen
39.5
17.4
186.0
3800.0
female
2007
...
...
...
...
...
...
...
...
...
342
Chinstrap
Dream
50.8
19.0
210.0
4100.0
male
2009
343
Chinstrap
Dream
50.2
18.7
198.0
3775.0
female
2009
344 rows × 8 columns
if_else
for two cases
Use the if_else()
when values depend only on two cases—like whether some condition is True
or False
. This is similar to a Python if else
statement, but applies to each value in a column.
Basics
if_else(penguins.bill_length_mm > 40 , "long" , "short" )
0 short
1 short
...
342 long
343 long
Length: 344, dtype: object
Use in a verb
transmute(
penguins,
bill_length = if_else(_.bill_length_mm > 40 , "long" , "short" )
)
bill_length
0
short
1
short
...
...
342
long
343
long
344 rows × 1 columns
case_when
for many cases
The case_when()
function is a more general version of if_else()
. It lets you check as many cases as you want, and map them to resulting values.
Basics
case_when(penguins, {
_.bill_depth_mm <= 18 : "short" ,
_.bill_depth_mm <= 19 : "medium" ,
_.bill_depth_mm > 19 : "long"
})
0 medium
1 short
...
342 medium
343 medium
Length: 344, dtype: object
Use in a verb
# also works
penguins >> case_when({ ... })
Set default when no match
Use a True
as the final case, in order to set a value when no other cases match.
case_when(penguins, {
_.bill_depth_mm.between(18 , 19 ): "medium" ,
True : "OTHER"
})
0 medium
1 OTHER
...
342 medium
343 medium
Length: 344, dtype: object
Note that this works because—for each value—case_when
checks for the first matching condition. The final True
condition guarantees that it will always be a match.