Duckdb with sqlalchemy #380

cc-a · 2024-06-28T13:08:20Z

Description

This PR builds on #379 prototypes a partial adoption of SQLAlchemy in combination with Duckdb. This attempts to get the best of both worlds by using SQLAlchemy to define the database schema whilst still leveraging the capabilities of Duckdb to work with CSV files and easily spit out numpy arrays.

The main point in using SQLAlchemy here is to cut down on the amount of SQL that needs to be written and to do schema definition via the Python declarative interface. By having a generic CSV reader function you can then cut down to two simple lines of SQL. The advantage of this may be more apparent in a more mature implementation that will require more complex SQL in the duckdb only approach.

One very nice side benefit of having the schema in SQLAlchemy would be on the input creation side. At the moment the only real approach is to hack a bunch of csv files and deal with errors as you try to read them in. Instead you could use the SQLAlchemy classes to populate a database then dump it into csv files which could be quite powerful in the case of large or complex input datasets.

Type of change

Please add a line in the relevant section of
CHANGELOG.md to
document the change (include PR #) - note reverse order of PR #s.

New feature (non-breaking change which adds functionality)
Optimization (non-breaking, back-end change that speeds up the code)
Bug fix (non-breaking change which fixes an issue)
Breaking change (whatever its nature)

Key checklist

All tests pass: $ python -m pytest
The documentation builds and looks OK: $ python -m sphinx -b html docs docs/build

Further checks

Code is commented, particularly in hard-to-understand areas
Tests added that prove fix is effective or that feature works

codecov · 2024-06-28T13:16:44Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 71.41%. Comparing base (935b3e2) to head (606dd66).
Report is 3 commits behind head on new-data-input.

Additional details and impacted files

@@                Coverage Diff                 @@
##           new-data-input     #380      +/-   ##
==================================================
+ Coverage           71.35%   71.41%   +0.06%     
==================================================
  Files                  44       44              
  Lines                5892     5892              
  Branches             1163     1163              
==================================================
+ Hits                 4204     4208       +4     
+ Misses               1368     1364       -4     
  Partials              320      320

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

tsmbland · 2024-06-28T13:31:55Z

Funnily enough I actually find the SQL table schemas more readable than the SQLAlchemy typed classes, but I get your point about more complex tables.

RE input creation - that reminds me: One thing that we will want to do is the reverse process of populating the database using data from the xarrays, and then creating csv files from the database. This is so we can essentially convert input files from the old format to the new format (i.e. old format files -> xarrays -> database -> new format files). We should have a go at doing this before scaling up the database as this may influence our approach.

cc-a requested review from alexdewar and tsmbland June 28, 2024 13:08

alexdewar removed their request for review January 30, 2025 15:48

cc-a and others added 4 commits August 4, 2025 14:52

Add new model and test scaffolds

e1df337

Get tests running

79016d9

Change column titles and order

549cdd1

Correct id columns

01760a2

tsmbland force-pushed the new-data-input branch from c32f4f0 to 01760a2 Compare August 4, 2025 13:59

tsmbland force-pushed the duckdb-sqlalchemy branch from 606dd66 to 2d29bc7 Compare August 4, 2025 14:20

Ignore default_new_input in regression tests

08c9c70

tsmbland force-pushed the duckdb-sqlalchemy branch from 22607cb to 147483c Compare August 4, 2025 14:42

tsmbland and others added 5 commits August 4, 2025 15:44

Fix typo in CO2

8059a2e

First pass at duckdb data interface

803a078

Define db schema using SQLAlchemy

4e861be

Adopt generic read_csv function

1c0d742

Fix merge mistake

3ba664b

tsmbland force-pushed the duckdb-sqlalchemy branch from 147483c to 3ba664b Compare August 4, 2025 14:58

tsmbland and others added 2 commits August 8, 2025 12:37

Fix popping error

e9361da

Merge branch 'new-data-input' into duckdb-sqlalchemy

d762472

tsmbland changed the base branch from new-data-input to main October 23, 2025 10:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Duckdb with sqlalchemy #380

Duckdb with sqlalchemy #380

Uh oh!

cc-a commented Jun 28, 2024

Uh oh!

codecov bot commented Jun 28, 2024

Uh oh!

tsmbland commented Jun 28, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Duckdb with sqlalchemy #380

Are you sure you want to change the base?

Duckdb with sqlalchemy #380

Uh oh!

Conversation

cc-a commented Jun 28, 2024

Description

Type of change

Key checklist

Further checks

Uh oh!

codecov bot commented Jun 28, 2024

Codecov Report

Uh oh!

tsmbland commented Jun 28, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants