Across column apply

Use the across() function to apply the same transformation to multiple columns.

from siuba import _, across, Fx, group_by, mutate, summarize, filter, arrange
from siuba.data import mtcars

Basic use

mtcars >> mutate(across(_["mpg", "hp"], Fx - Fx.mean(), names="demeaned_{col}"))
mpg cyl disp hp drat wt qsec vs am gear carb demeaned_mpg demeaned_hp
0 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 0.909375 -36.6875
1 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 0.909375 -36.6875
... ... ... ... ... ... ... ... ... ... ... ... ... ...
30 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8 -5.090625 188.3125
31 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2 1.309375 -37.6875

32 rows × 13 columns

Note three important pieces in the code above:

  • select: _["mpg", "hp"] chooses the columns to transform.
  • transform: Fx - Fx.mean() is the transformation, where Fx stands for the column being operated on.
  • rename: names= is an optional argument, specifying how to name the result. The {col} in "demeaned_{col}" gets replaced with the column name.

Selecting columns

Any selection that can be passed to select(), can also be used in across(). Note that you can use _[...] to combine selections.

mtcars >> summarize(across(_[_.startswith("m"), _.endswith("p")], Fx.mean()))
mpg disp hp
0 20.090625 230.721875 146.6875

Passing multiple transformations

mtcars >> summarize(across(_["mpg", "hp"], {"avg": Fx.mean(), "std": Fx.std()}))
mpg_avg mpg_std hp_avg hp_std
0 20.090625 6.026948 146.6875 68.562868

With grouped data

mtcars >> group_by(_.cyl) >> summarize(across(_[_.mpg, _.hp], Fx.mean()))
cyl mpg hp
0 4 26.663636 82.636364
1 6 19.742857 122.285714
2 8 15.100000 209.214286