Up to this point weβve covered lazy expressions (_), and using table verbs. A major benefit of these two approaches is that they allow us to change how siuba behaves depending on the data source on which it is operating.
Setup
For these examples we first set up a sqlite database, with an mtcars table.
from sqlalchemy import create_enginefrom siuba.sql import LazyTblfrom siuba import _, group_by, summarize, show_query, collect from siuba.data import mtcars# copy in to sqlite, using the pandas .to_sql() methodengine = create_engine("sqlite:///:memory:")mtcars.to_sql("mtcars", engine, if_exists ="replace")
32
Accessing tables
Use the LazyTbl class to connect to a SQL table. Printing the table will show a preview of the first few rows.
# Create a lazy SQL DataFrametbl_mtcars = LazyTbl(engine, "mtcars")tbl_mtcars
# Source: lazy query
# DB Conn: Engine(sqlite:///:memory:)
# Preview:
index
mpg
cyl
disp
hp
drat
wt
qsec
vs
am
gear
carb
0
0
21.0
6
160.0
110
3.90
2.620
16.46
0
1
4
4
1
1
21.0
6
160.0
110
3.90
2.875
17.02
0
1
4
4
2
2
22.8
4
108.0
93
3.85
2.320
18.61
1
1
4
1
3
3
21.4
6
258.0
110
3.08
3.215
19.44
1
0
3
1
4
4
18.7
8
360.0
175
3.15
3.440
17.02
0
0
3
2
# .. may have more rows
Notice that we defined the variable tbl_mtcars to refer to the mtcars table in the database. When we print tbl_mtcars it shows a preview of the underlying data, along with some notes about the database being used: # DB Conn: Engine(sqlite:///:memory:).
Basic analysis
You donβt need to change your analysis code to run it on a SQL table. For example, the code below groups and summarizes the data.