from siuba import _, group_by, filter, show_query, tbl
from siuba.sql import LazyTbl
from siuba.data import cars
from sqlalchemy import create_engine
Backend Examples 🚧
Demo query
We’ll use the following lazy pipe to demonstrate each backend. It groups by the cyl
column, then filters by mpg
.
= (
filter_mpg
_>> group_by(_.cyl)
>> filter(_.mpg < _.mpg.mean())
)
For example, here is the pipe called on pandas data.
>> filter_mpg cars
(grouped data frame)
cyl | mpg | hp | |
---|---|---|---|
2 | 4 | 22.8 | 93 |
5 | 6 | 18.1 | 105 |
... | ... | ... | ... |
30 | 8 | 15.0 | 335 |
31 | 4 | 21.4 | 109 |
16 rows × 3 columns
duckdb
= create_engine("duckdb:///:memory:")
engine
= tbl(engine, "cars", cars) tbl_cars_duckdb
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/duckdb_engine/__init__.py:229: DuckDBEngineWarning: duckdb-engine doesn't yet support reflection on indices
warnings.warn(
= tbl_cars_duckdb >> filter_mpg
tbl_filtered tbl_filtered
# Source: lazy query # DB Conn: Engine(duckdb:///:memory:) # Preview:
cyl | mpg | hp | |
---|---|---|---|
0 | 4 | 22.8 | 93 |
1 | 4 | 24.4 | 62 |
2 | 4 | 22.8 | 95 |
3 | 4 | 21.5 | 97 |
4 | 4 | 26.0 | 91 |
# .. may have more rows
= tbl_filtered >> show_query(simplify=True) q
SELECT anon_1.cyl, anon_1.mpg, anon_1.hp
FROM (SELECT *, cars_1.mpg < avg(cars_1.mpg) OVER (PARTITION BY cars_1.cyl) AS win1
FROM cars AS cars_1) AS anon_1
WHERE anon_1.win1
sqlite
= create_engine("sqlite:///:memory:")
engine "cars", engine, index=False)
cars.to_sql(
= tbl(engine, "cars") tbl_cars_sqlite
= tbl_cars_sqlite >> filter_mpg
tbl_filtered tbl_filtered
/home/runner/work/siuba.org/siuba.org/src/siuba/siuba/sql/utils.py:85: SAWarning: Class AggOver will not make use of SQL compilation caching as it does not set the 'inherit_cache' attribute to ``True``. This can have significant performance implications including some performance degradations in comparison to prior SQLAlchemy versions. Set this attribute to True if this object can make use of the cache key generated by the superclass. Alternatively, this attribute may be set to False which will disable this warning. (Background on this error at: https://sqlalche.me/e/14/cprf)
return self.connectable.execute(*args, **kwargs)
# Source: lazy query # DB Conn: Engine(sqlite:///:memory:) # Preview:
cyl | mpg | hp | |
---|---|---|---|
0 | 4 | 22.8 | 93 |
1 | 4 | 24.4 | 62 |
2 | 4 | 22.8 | 95 |
3 | 4 | 21.5 | 97 |
4 | 4 | 26.0 | 91 |
# .. may have more rows
= tbl_filtered >> show_query(simplify=True) q
SELECT anon_1.cyl, anon_1.mpg, anon_1.hp
FROM (SELECT *, cars_1.mpg < avg(cars_1.mpg) OVER (PARTITION BY cars_1.cyl) AS win1
FROM cars AS cars_1) AS anon_1
WHERE anon_1.win1 = 1
bigquery
bigquery is thoroughly tested, but needs to be added to this page!
mysql
mysql is thoroughly tested, but needs to be added to this page!
postgresql
postgresql is thoroughly tested, but needs to be added to this page!
snowflake
snowflake is thoroughly tested, but needs to be added to this page!