# π§ TODO: expose verb_dispatch on siu module
import pandas as pd
from siuba.siu.dispatchers import verb_dispatch
from siuba.data import mtcars
Custom table verbs
You may be wondering how a siuba function, like mutate()
, could work on a SQL database. This is because these functions are defined using a technique called single dispatch. This approach allows you to define class-specific versions of a function.
Create a new verb with verb_dispatch()
Use the verb_dispatch()
decorator to create a new table verb function.
The code below creates a function called head(), with an implementation that works specifically on a DataFrame.
# DataFrame version of function ---
@verb_dispatch(pd.DataFrame)
def head(__data, n = 5):
return __data.head(n)
2) head(mtcars,
mpg | cyl | disp | hp | drat | wt | qsec | vs | am | gear | carb | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 21.0 | 6 | 160.0 | 110 | 3.9 | 2.620 | 16.46 | 0 | 1 | 4 | 4 |
1 | 21.0 | 6 | 160.0 | 110 | 3.9 | 2.875 | 17.02 | 0 | 1 | 4 | 4 |
Register a SQL translation
We can define a SQL specific version, that acts on a SqlAlchemy Table by registering a new translation of our function head
, with the @head.register
decorator.
from siuba.sql import LazyTbl
@head.register
def _head_sql(__data: LazyTbl, n = 5):
= __data.last_select.limit(n)
new_select
return __data.append_op(new_select)
Here is the function being run on a mock postgresql backend:
from siuba.data import cars_sql
from siuba import show_query
= cars_sql >> head(3) >> show_query() q
SELECT cars.cyl, cars.mpg, cars.hp
FROM cars
LIMIT 3
Grouped data
Since single dispatch functions define how to execute an action for a specific class of data, it allows siuba to handle grouped data in two ways:
- pandas - register dispatchers for its special grouped data classes (
DataFrameGroupBy
,SeriesGroupBy
). - SQL - use a single class for grouped and ungrouped data, with grouping info as an attribute (
siuba.sql.LazyTbl
).
For example, here is a simple verb that calculates the number of rows in a grouped DataFrame.
from pandas.core.groupby import DataFrameGroupBy
@verb_dispatch(DataFrameGroupBy)
def size(__data):
return __data.size()
'cyl')) size(mtcars.groupby(
cyl
4 11
6 7
8 14
dtype: int64
Why not use methods?
Why use singledispatch rather than create a class method like mtcars.head()
?
There are two big benefits:
- Anyone can define and package a function. Using it is just a matter of importing it. With a method, you need to somehow put it onto the class representing your data. You end up with 300+ methods on a class.
- Your function might do something that is not the classβs core responsibility. In this case, it should not be part of the class definition.
See the post Single dispatch for democratizing data science tools for more.