Pandas backend

import pandas as pd
pd.set_option("display.max_rows", 5)

The pandas backend has two key responsibilities:

  1. Implementing functions that enable complex and fast expressions over grouped data.
  2. Implementing verb behavior over pandas’ DataFrameGroupBy (e.g. a grouped mutate).

While it might seem like a lot of work, this mostly involves using a few simple strategies to take advantage of logic already existing in the pandas library.

The strategy can be described as follows:

As a final note, while the SQL backend uses a custom backend class (LazyTbl), the backend for table verbs in this case is just the pandas’ DataFrameGroupBy class itself.

Column op translation

from siuba.ops import mean
from siuba.data import mtcars

# equivalent to mtcars.hp.mean()
mean(mtcars.hp)
146.6875
import pandas as pd
from siuba.ops.utils import operation

mean2 = operation("mean", "Agg", 1)

# Series implementation just calls the Series' mean method
mean2.register(
    pd.Series,
    lambda self, *args, **kwargs: self.mean(*args, **kwargs)
)
<function __main__.<lambda>(self, *args, **kwargs)>
mean2(mtcars.hp)
146.6875
mean2.operation.name
'mean'

The purpose of the .operation data is to make it easy to generate new translations for functions. For example, if we want to translate pandas’ ser.str.upper() method, then it helps to know it uses the str accessor.

Using an existing translation

from siuba.experimental.pd_groups.translate import method_el_op

df = pd.DataFrame({
    "x": ["a", "b", "c"],
    "g": [0, 0, 1]
})

# notice method_ag_op uses some details from .operation
upper = method_el_op("upper", is_property = False, accessor = "str")
lower = method_el_op("lower", is_property = False, accessor = "str")

g_df = df.groupby("g")

res = upper(g_df.x)

# note: .obj is how you "ungroup" a SeriesGroupBy
res.obj
0    A
1    B
2    C
Name: x, dtype: object
# convert to uppercase and back to lowercase
# equivalent to df.x.str.upper().str.lower()
res2 = lower(upper(g_df.x))
res2.obj
0    a
1    b
2    c
Name: x, dtype: object
isinstance(res, pd.core.groupby.SeriesGroupBy)
True
lower(upper(g_df.x))
<pandas.core.groupby.generic.SeriesGroupBy object at 0x7f32e8195b20>

See the internals of functions like method_el_op for details.

New verb implementations

Like with other backends, verbs use single dispatch to register new backend implementations.

from pandas import DataFrame
from pandas.core.groupby import DataFrameGroupBy
from siuba.dply.verbs import singledispatch2

@singledispatch2(DataFrame)
def my_verb(__data):
    print("Running default.")
    
# register grouped implementation ----
@my_verb.register(DataFrameGroupBy)
def _my_verb_gdf(__data):
    print("Running grouped!")


# test it out ----
from siuba.data import mtcars

my_verb(mtcars.groupby("cyl"))
Running grouped!