from siuba import _, across, Fx, group_by, mutate, summarize, filter, arrange
from siuba.data import mtcarsAcross column apply
Use the across() function to apply the same transformation to multiple columns.
Basic use
mtcars >> mutate(across(_["mpg", "hp"], Fx - Fx.mean(), names="demeaned_{col}"))| mpg | cyl | disp | hp | drat | wt | qsec | vs | am | gear | carb | demeaned_mpg | demeaned_hp | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 21.0 | 6 | 160.0 | 110 | 3.90 | 2.620 | 16.46 | 0 | 1 | 4 | 4 | 0.909375 | -36.6875 |
| 1 | 21.0 | 6 | 160.0 | 110 | 3.90 | 2.875 | 17.02 | 0 | 1 | 4 | 4 | 0.909375 | -36.6875 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 30 | 15.0 | 8 | 301.0 | 335 | 3.54 | 3.570 | 14.60 | 0 | 1 | 5 | 8 | -5.090625 | 188.3125 |
| 31 | 21.4 | 4 | 121.0 | 109 | 4.11 | 2.780 | 18.60 | 1 | 1 | 4 | 2 | 1.309375 | -37.6875 |
32 rows × 13 columns
Note three important pieces in the code above:
- select:
_["mpg", "hp"]chooses the columns to transform. - transform:
Fx - Fx.mean()is the transformation, whereFxstands for the column being operated on. - rename:
names=is an optional argument, specifying how to name the result. The{col}in"demeaned_{col}"gets replaced with the column name.
Selecting columns
Any selection that can be passed to select(), can also be used in across(). Note that you can use _[...] to combine selections.
mtcars >> summarize(across(_[_.startswith("m"), _.endswith("p")], Fx.mean()))| mpg | disp | hp | |
|---|---|---|---|
| 0 | 20.090625 | 230.721875 | 146.6875 |
Passing multiple transformations
mtcars >> summarize(across(_["mpg", "hp"], {"avg": Fx.mean(), "std": Fx.std()}))| mpg_avg | mpg_std | hp_avg | hp_std | |
|---|---|---|---|---|
| 0 | 20.090625 | 6.026948 | 146.6875 | 68.562868 |
With grouped data
mtcars >> group_by(_.cyl) >> summarize(across(_[_.mpg, _.hp], Fx.mean()))| cyl | mpg | hp | |
|---|---|---|---|
| 0 | 4 | 26.663636 | 82.636364 |
| 1 | 6 | 19.742857 | 122.285714 |
| 2 | 8 | 15.100000 | 209.214286 |