Skip to content

Conversation

kylebarron
Copy link
Member

@kylebarron kylebarron commented Sep 10, 2025

This is pretty hacky, especially because it blocks execution to fetch the schema from the remote file...

from geodatafusion import new_flatgeobuf, register_all

from datafusion import SessionContext

path = "/Users/kyle/Downloads/countries.fgb"
test = new_flatgeobuf(path)

ctx = SessionContext()
register_all(ctx)
ctx.register_table_provider("countries", test)

sql = "SELECT * FROM countries;"
df = ctx.sql(sql)
df.show()

Unfortunately, this crashes the kernel 😕

image

I think I saw in some issue that @timsaucer was saying the ABI changed here between DF 49 and 50? Maybe that's why it's crashing? I'm using datafusion Python v49 but here geodatafusion is on DF 50 (via git patch) because the rest of the crate updated already to latest arrow.

@github-actions github-actions bot added the feat label Sep 10, 2025
@kylebarron kylebarron marked this pull request as draft September 10, 2025 01:32
@timsaucer
Copy link

I found the error and I was able to get this code snippet to run. The problem is that there is an intermediate version of datafusion being used. There was an update to the FFI definition that breaks from 48 -> 49 and it looks like this DF version is before that change. So you could probably run your code with df-python 48.

I'll put up a PR targeting this branch to show the changes I needed to upgrade DF.

@timsaucer
Copy link

Oh, and the breaking change from 49->50 only impacts aggregate UDFs, so you should be able to test using DF-python 49. Alternatively you can install a wheel from my pre-release branch: https://github.com/apache/datafusion-python/actions/runs/17613542711

@kylebarron
Copy link
Member Author

There was an update to the FFI definition that breaks from 48 -> 49 and it looks like this DF version is before that change. So you could probably run your code with df-python 48.

Huh... The Rust side is pinned to apache/datafusion@fa1f8c1, which is the upgrade to Arrow 56 after the datafusion 49 release. I can move the datafusion pin later if something changed after that?

@kylebarron
Copy link
Member Author

I found the error and I was able to get this code snippet to run.

How did you get it to run? With datafusion Python 48?

Alternatively you can install a wheel from my pre-release branch: apache/datafusion-python/actions/runs/17613542711

I don't think there's a way to install that from a pyproject.toml in CI though?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants