Summarize to aggregate

The summarize() creates new columns in your table, based on an aggregation. Aggregations take data and reduces it to a single number. When applied to grouped data, this function returns one row per grouping.

from siuba.data import mtcars
from siuba import _, summarize, group_by, select

Summarize over all rows

mtcars >> summarize(avg_mpg = _.mpg.mean())
mtcars
mpg cyl disp hp drat wt qsec vs am gear carb
0 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
1 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
... ... ... ... ... ... ... ... ... ... ... ...
30 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
31 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2

32 rows × 11 columns

Summarize over groups

Use group_by() to split the data up, apply some aggregation, and then combine results.

(mtcars
  >> group_by(_.cyl)
  >> summarize(
       avg = _.mpg.mean(),
       range = _.mpg.max() - _.mpg.min(),
       avg_per_cyl = (_.mpg / _.cyl).mean()
  )
)
cyl avg range avg_per_cyl
0 4 26.663636 12.5 6.665909
1 6 19.742857 3.6 3.290476
2 8 15.100000 8.8 1.887500

Note there are 3 unique groupings for cyl (4, 6, and 8), so the resulting table has 3 rows.