-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: query with Apache Arrow result #134
Conversation
Thanks @levakin for the PR. Could you please have a look at https://github.com/marcboeker/go-duckdb/actions/runs/7060722450/job/19233036222?pr=134 |
20b3763
to
f90b478
Compare
@marcboeker thanks, I fixed the issue. It was happening due to different memory allocation behaviour on linux and macos.
Before the fix |
1bb0046
to
39b8d13
Compare
@levakin I would prefer to separate the standard driver from additional features like Arrow support or the DuckDB appender API. The DuckDB Appender implementation has a method called With Have you seen this part of the PR, which does exactly what I'm proposing. |
@marcboeker Yes, I can create such interface. I thought it would be a better idea to follow the same library design as https://github.com/jackc/pgx does. Postgres native interface (without any additional interface initializations like Appender) + stdlib wrapper provided as separate package |
@levakin As we already have the Appender separated from the main driver, I would prefer a separated Arrow implementation as well. Maybe it would make sense to get in contact with @phillipleblanc as he has already built an Arrow integration. The goal should be to have one implementation shipped with go-duckdb that satisfies all parties. |
@catkins mentioned in #75 that it might make sense to have the Arrow interface as a separate module - so people that didn't need the Arrow interface wouldn't need to pull in the Arrow dependencies (since it can be quite heavyweight). That also meshes with @marcboeker's desire to have the Arrow interface separated from the main driver - but I'll defer to Marc on what he wants to do there. The approach @levakin took with using prepared statements is better than my implementation in https://github.com/spicehq/go-duckdb/blob/master/connection.go#L36 - which has the drawback of the context not being respected and needing to execute the query synchronously. What I can do is work on a separate module that wraps EDIT: Actually I'm not sure if the separate module approach will work - both implementations need access to internal functions/state to work properly. |
if they're in the same tree, could |
I'll be back soon from the holidays and implement proposed changes. I don't think a separate repo would be convenient to use, but a separate module is a good idea |
@phillipleblanc I'm fine with including it in the go-duckdb module, but maybe not tightly integrated into the core driver but instead as an add-on like the Appender. That keeps the main driver interface clean but also does not require a second module/dependency. I'd be happy to merge such a variant. |
fd3b63f
to
fd2c76d
Compare
@marcboeker please check again. I've separated most of the new logic in Separating Arrow in a different module would at least require exposing Statements methods. If we can tolerate arrow dependency in one module, it would be easier to maintain IMHO |
Hey @levakin, sorry for the delay and thank you for the extensive refactoring. I've also refactored the code to keep Next, we should validate the Arrow specific code for any memory leaks (which can easily occur in a CGO environment) and check for correctness/optimizations. As I have no experience with Apache Arrow, I need to understand it first. |
5ba0924
to
1ca5141
Compare
@marcboeker Thanks for the review. Keeping C code in Let's discuss the approach with Also I think users should be encouraged to use Are you sure sure we need to provide both Query and QueryContext methods in new interfaces?
AFAIK QueryContext method in the standard library was added after context package was developed, and it's the only reason why it was not included in Query method to follow backwards compatibility. So it will be safe to provide Query method accepting context argument since the beginning. |
@marcboeker Do you think having working examples in the README is a bad idea or should it be addressed in a separate PR? |
f3cecb8
to
7c01cac
Compare
Changing a function signature, even if it does not break things, is unusual in Go without bumping the major release number. Normally you initialize a new Arrow connection with
which works perfectly fine. In most cases you would not need a
The idea behind the Connector was to have a convenient way of running an init func, that for example installs necessary plugins.
I'm fine with removing it. My idea was to keep the interface in sync with the SQL driver and remove |
What do you mean by "working examples" in the README? Are they broken? Regarding the Connector: why should it not be possible to close it? Line 107 in 495b2cf
Could you please elaborate a bit. |
Added and defined structures for Arrow schema and Arrow array in a new file "arrow.h". This allows Arrow-compatible communication and data sharing, which is memory-efficient. Implemented Arrow interface in the connection.go file. Also updated the 'duckdb_test.go' file to include tests for the Arrow interface.
Renames the function QueryArrowContext to QueryArrow. This change is to simplify the function name as 'Context' in the name isn't required or adds any meaning.
7c01cac
to
e9c1ddd
Compare
- conn structure was made unexported - NewConnector was renamed to OpenConnector to be consistent with sql package and now returns ConnectorCloser interface to expose Close method - appender: NewAppenderFromConn receives connection of type any to be compatible with sql.Conn.Raw method. Tests and docs were adopted as well.
e9c1ddd
to
8c2ea52
Compare
@marcboeker ok, then let's keep the same interface. However, changing to the 'any' type won't break backwards compatibility, so a major version won't be needed.
I aggree with you, it's very convenient to have init for every new connection.
I guess sql package won't remove Query until GO 2, which is unlikely. So I suggest we leave only Query(ctx, ...) or QueryContext(ctx, ...). Which option do you prefer?
Well, it's kinda possible, but too hacky.
And to get access to Close method it's needed to make type assertion.
I meant, that you can't just copy them to your IDE and expect to be run as is. With the changes I proposed, I had advantages like syntax highlighting, autoformatting, etc. like with normal code. |
Let me give more concrete example regarding connector.Close.
Here connector is opened and Database is opened internally. The bug here is that connector is never closed. Unless it's passed to Here's part of sql package code: func (db *DB) Close() error {
....
if c, ok := db.connector.(io.Closer); ok {
err1 := c.Close()
if err1 != nil {
err = err1
}
} We end up with 2 options, leave it as is and work with database only through standard sql package or expose connector Close method to the outer world. It can be done by returning ConnectorCloser interface. Also in this case we should ensure Connector is fine with multiple close calls. |
Okay, then let's remove Regarding the Connector: Thanks for the detailed explanation. You probably refer to this issue? golang/go#41790 I would suggest to first merge the Arrow feature and then take care of implementing the
Okay, got it. What do you think of keeping the samples in the README short and concise and adding two new examples as Go files for the Arrow and Appender to |
You are right about the the issue. Here's this check https://go-review.googlesource.com/c/go/+/258360/6/src/database/sql/sql.go @marcboeker Yeah, sure. I can file a PR and let's resolve it separately.
I would appreciate some help with it :) |
Great, thanks!
I've done some simplifications as And one test case didn't return anything as there were no rows in the DB. I've added some test data. If you don't see any issues, I would suggest merging it. |
I don't see any impact. However type casting C pointers is cumbersome and we can improve later if needed. Thanks! Let's merge it @marcboeker 👍🏻 |
This PR implements Connection method QueryArrowContext to get query result in Apache Arrow format.
Connection type is now exported and can be used in similar way as pgx driver. See example here.
Partly implements #75 , some methods are still missing.