-
Couldn't load subscription status.
- Fork 10
Duckdb with sqlalchemy #380
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## new-data-input #380 +/- ##
==================================================
+ Coverage 71.35% 71.41% +0.06%
==================================================
Files 44 44
Lines 5892 5892
Branches 1163 1163
==================================================
+ Hits 4204 4208 +4
+ Misses 1368 1364 -4
Partials 320 320 ☔ View full report in Codecov by Sentry. |
|
Funnily enough I actually find the SQL table schemas more readable than the SQLAlchemy typed classes, but I get your point about more complex tables. RE input creation - that reminds me: One thing that we will want to do is the reverse process of populating the database using data from the xarrays, and then creating csv files from the database. This is so we can essentially convert input files from the old format to the new format (i.e. old format files -> xarrays -> database -> new format files). We should have a go at doing this before scaling up the database as this may influence our approach. |
c32f4f0 to
01760a2
Compare
606dd66 to
2d29bc7
Compare
22607cb to
147483c
Compare
147483c to
3ba664b
Compare
Description
This PR builds on #379 prototypes a partial adoption of SQLAlchemy in combination with Duckdb. This attempts to get the best of both worlds by using SQLAlchemy to define the database schema whilst still leveraging the capabilities of Duckdb to work with CSV files and easily spit out numpy arrays.
The main point in using SQLAlchemy here is to cut down on the amount of SQL that needs to be written and to do schema definition via the Python declarative interface. By having a generic CSV reader function you can then cut down to two simple lines of SQL. The advantage of this may be more apparent in a more mature implementation that will require more complex SQL in the duckdb only approach.
One very nice side benefit of having the schema in SQLAlchemy would be on the input creation side. At the moment the only real approach is to hack a bunch of csv files and deal with errors as you try to read them in. Instead you could use the SQLAlchemy classes to populate a database then dump it into csv files which could be quite powerful in the case of large or complex input datasets.
Type of change
Please add a line in the relevant section of
CHANGELOG.md to
document the change (include PR #) - note reverse order of PR #s.
Key checklist
$ python -m pytest$ python -m sphinx -b html docs docs/buildFurther checks