from siuba import _, across, Fx, group_by, mutate, summarize, filter, arrange
from siuba.data import mtcars
Across column apply
Use the across()
function to apply the same transformation to multiple columns.
Basic use
>> mutate(across(_["mpg", "hp"], Fx - Fx.mean(), names="demeaned_{col}")) mtcars
mpg | cyl | disp | hp | drat | wt | qsec | vs | am | gear | carb | demeaned_mpg | demeaned_hp | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 21.0 | 6 | 160.0 | 110 | 3.90 | 2.620 | 16.46 | 0 | 1 | 4 | 4 | 0.909375 | -36.6875 |
1 | 21.0 | 6 | 160.0 | 110 | 3.90 | 2.875 | 17.02 | 0 | 1 | 4 | 4 | 0.909375 | -36.6875 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
30 | 15.0 | 8 | 301.0 | 335 | 3.54 | 3.570 | 14.60 | 0 | 1 | 5 | 8 | -5.090625 | 188.3125 |
31 | 21.4 | 4 | 121.0 | 109 | 4.11 | 2.780 | 18.60 | 1 | 1 | 4 | 2 | 1.309375 | -37.6875 |
32 rows × 13 columns
Note three important pieces in the code above:
- select:
_["mpg", "hp"]
chooses the columns to transform. - transform:
Fx - Fx.mean()
is the transformation, whereFx
stands for the column being operated on. - rename:
names=
is an optional argument, specifying how to name the result. The{col}
in"demeaned_{col}"
gets replaced with the column name.
Selecting columns
Any selection that can be passed to select()
, can also be used in across()
. Note that you can use _[...]
to combine selections.
>> summarize(across(_[_.startswith("m"), _.endswith("p")], Fx.mean())) mtcars
mpg | disp | hp | |
---|---|---|---|
0 | 20.090625 | 230.721875 | 146.6875 |
Passing multiple transformations
>> summarize(across(_["mpg", "hp"], {"avg": Fx.mean(), "std": Fx.std()})) mtcars
mpg_avg | mpg_std | hp_avg | hp_std | |
---|---|---|---|---|
0 | 20.090625 | 6.026948 | 146.6875 | 68.562868 |
With grouped data
>> group_by(_.cyl) >> summarize(across(_[_.mpg, _.hp], Fx.mean())) mtcars
cyl | mpg | hp | |
---|---|---|---|
0 | 4 | 26.663636 | 82.636364 |
1 | 6 | 19.742857 | 122.285714 |
2 | 8 | 15.100000 | 209.214286 |