Backend Examples 🚧

from siuba import _, group_by, filter, show_query, tbl
from siuba.sql import LazyTbl
from siuba.data import cars

from sqlalchemy import create_engine

Demo query

We’ll use the following lazy pipe to demonstrate each backend. It groups by the cyl column, then filters by mpg.

filter_mpg = (
    _
    >> group_by(_.cyl)
    >> filter(_.mpg < _.mpg.mean())
)

For example, here is the pipe called on pandas data.

cars >> filter_mpg

(grouped data frame)

cyl mpg hp
2 4 22.8 93
5 6 18.1 105
... ... ... ...
30 8 15.0 335
31 4 21.4 109

16 rows × 3 columns

duckdb

engine = create_engine("duckdb:///:memory:")

tbl_cars_duckdb = tbl(engine, "cars", cars)
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/duckdb_engine/__init__.py:229: DuckDBEngineWarning: duckdb-engine doesn't yet support reflection on indices
  warnings.warn(
tbl_filtered = tbl_cars_duckdb >> filter_mpg
tbl_filtered
# Source: lazy query
# DB Conn: Engine(duckdb:///:memory:)
# Preview:
cyl mpg hp
0 4 22.8 93
1 4 24.4 62
2 4 22.8 95
3 4 21.5 97
4 4 26.0 91

# .. may have more rows

q = tbl_filtered >> show_query(simplify=True)
SELECT anon_1.cyl, anon_1.mpg, anon_1.hp 
FROM (SELECT *, cars_1.mpg < avg(cars_1.mpg) OVER (PARTITION BY cars_1.cyl) AS win1 
FROM cars AS cars_1) AS anon_1 
WHERE anon_1.win1

sqlite

engine = create_engine("sqlite:///:memory:")
cars.to_sql("cars", engine, index=False)

tbl_cars_sqlite = tbl(engine, "cars")
tbl_filtered = tbl_cars_sqlite >> filter_mpg
tbl_filtered
/home/runner/work/siuba.org/siuba.org/src/siuba/siuba/sql/utils.py:85: SAWarning: Class AggOver will not make use of SQL compilation caching as it does not set the 'inherit_cache' attribute to ``True``.  This can have significant performance implications including some performance degradations in comparison to prior SQLAlchemy versions.  Set this attribute to True if this object can make use of the cache key generated by the superclass.  Alternatively, this attribute may be set to False which will disable this warning. (Background on this error at: https://sqlalche.me/e/14/cprf)
  return self.connectable.execute(*args, **kwargs)
# Source: lazy query
# DB Conn: Engine(sqlite:///:memory:)
# Preview:
cyl mpg hp
0 4 22.8 93
1 4 24.4 62
2 4 22.8 95
3 4 21.5 97
4 4 26.0 91

# .. may have more rows

q = tbl_filtered >> show_query(simplify=True)
SELECT anon_1.cyl, anon_1.mpg, anon_1.hp 
FROM (SELECT *, cars_1.mpg < avg(cars_1.mpg) OVER (PARTITION BY cars_1.cyl) AS win1 
FROM cars AS cars_1) AS anon_1 
WHERE anon_1.win1 = 1

bigquery

bigquery is thoroughly tested, but needs to be added to this page!

mysql

mysql is thoroughly tested, but needs to be added to this page!

postgresql

postgresql is thoroughly tested, but needs to be added to this page!

snowflake

snowflake is thoroughly tested, but needs to be added to this page!