Summarize to aggregate

The summarize() creates new columns in your table, based on an aggregation. Aggregations take data and reduces it to a single number. When applied to grouped data, this function returns one row per grouping.

from siuba.data import mtcars
from siuba import _, summarize, group_by, select

Summarize over all rows

mtcars >> summarize(avg_mpg = _.mpg.mean())
mtcars

	mpg	cyl	disp	hp	drat	wt	qsec	vs	am	gear	carb
0	21.0	6	160.0	110	3.90	2.620	16.46	0	1	4	4
1	21.0	6	160.0	110	3.90	2.875	17.02	0	1	4	4
...	...	...	...	...	...	...	...	...	...	...	...
30	15.0	8	301.0	335	3.54	3.570	14.60	0	1	5	8
31	21.4	4	121.0	109	4.11	2.780	18.60	1	1	4	2

32 rows × 11 columns

Summarize over groups

Use group_by() to split the data up, apply some aggregation, and then combine results.

(mtcars
  >> group_by(_.cyl)
  >> summarize(
       avg = _.mpg.mean(),
       range = _.mpg.max() - _.mpg.min(),
       avg_per_cyl = (_.mpg / _.cyl).mean()
  )
)

	cyl	avg	range	avg_per_cyl
0	4	26.663636	12.5	6.665909
1	6	19.742857	3.6	3.290476
2	8	15.100000	8.8	1.887500

Note there are 3 unique groupings for cyl (4, 6, and 8), so the resulting table has 3 rows.