Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve C Data Interface and Add Integration Testing Entrypoints #5080

Merged
merged 10 commits into from Nov 20, 2023

Conversation

pitrou
Copy link
Member

@pitrou pitrou commented Nov 15, 2023

Which issue does this PR close?

Closes #4867.

Rationale for this change

Allow Rust to participate in integration testing for the C Data Interface, as explained here:
https://arrow.apache.org/docs/dev/format/Integration.html#example-c-data-interface

What changes are included in this PR?

Add C-compatible entrypoints for the C Data Interface integration testing machinery.

Add support for exporting and importing Intervals over the C Data Interface.

Allow importing from a FFI_ArrowArray and an existing DataType.

Fix handling of null_count field in the C ArrowArray struct.

Are there any user-facing changes?

No.

@github-actions github-actions bot added the arrow Changes to the arrow crate label Nov 15, 2023
@pitrou
Copy link
Member Author

pitrou commented Nov 15, 2023

For now I have tested this locally against the C++ implementation, using these changes to the Archery machinery:
apache/arrow@main...pitrou:arrow:rust-cdata-integration

Comment on lines 208 to 209
let imported_schema = unsafe { std::ptr::replace(c_schema, FFI_ArrowSchema::empty()) };
let imported_schema = Schema::try_from(&imported_schema)?;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pattern would probably benefit from an API function. It's easy to get it wrong.

Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this, left some comments

arrow-integration-testing/src/lib.rs Show resolved Hide resolved
arrow-integration-testing/src/lib.rs Outdated Show resolved Hide resolved
@pitrou
Copy link
Member Author

pitrou commented Nov 16, 2023

Ok, I've now checked this works properly against C++, Go, Java and C#. Will mark ready for review.

@pitrou pitrou marked this pull request as ready for review November 16, 2023 10:47
@pitrou
Copy link
Member Author

pitrou commented Nov 16, 2023

@tustvold Is it possible to query the amount of memory allocated by Arrow Rust? It is not mandatory but allows checking for memory leaks during integration testing.

@tustvold
Copy link
Contributor

tustvold commented Nov 16, 2023

Not easily afaik, you could register a global allocator that supports this, but nothing out of the box. Rust's custom allocators are not yet stable

@pitrou
Copy link
Member Author

pitrou commented Nov 17, 2023

Not easily afaik, you could register a global allocator that supports this, but nothing out of the box. Rust's custom allocators are not yet stable

Ok, I see. We can revisit this later.

@pitrou
Copy link
Member Author

pitrou commented Nov 17, 2023

@tustvold What would you think of adding a macro like this to ease error reporting?
https://gist.github.com/pitrou/698144c6277d913448398c996aaac3eb

@tustvold
Copy link
Contributor

What would you think of adding a macro like this to ease error reporting?

No objections from me, although FYI #2725 tracks the broader issue that the error variants in arrow-rs are not particularly meaningful.

Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good, mostly just minor nits, although I think we should probably make from_ffi and from_ffi_and_data_type unsafe before merging

Schema::new(fields).with_metadata(schema.metadata().clone())
}

struct PartialArrowFile {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we could get a doc comment explaining how this differs from ArrowFile? Perhaps LazyArrowFile might be a better name??

Alternatively I wonder if we could just make ArrowFile lazy, and have member functions for decoding a batch to RecordBatch by index or something

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've renamed to LazyArrowFile, which indeed sounds better. As for ArrowFile, it's marked public, is it ok to change its API?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This crate isn't published, so you can go wild 😄

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I did the suggested changes.

arrow/src/ffi.rs Outdated Show resolved Hide resolved
arrow/src/ffi.rs Outdated Show resolved Hide resolved
arrow/src/ffi.rs Outdated Show resolved Hide resolved
@@ -357,9 +357,11 @@ impl Iterator for ArrowArrayStreamReader {
}

let schema_ref = self.schema();
// NOTE: this parses the FFI_ArrowSchema again on each iterator call;
// should probably use from_ffi_and_data_type() instead.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps a separate issue should be opened for this. @tustvold

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #5103

@pitrou
Copy link
Member Author

pitrou commented Nov 20, 2023

I think I addressed all review comments now.

@tustvold tustvold added the api-change Changes to the arrow API label Nov 20, 2023
@tustvold tustvold changed the title Add C Data Interface integration testing entrypoints Improve C Data Interface and Add Integration Testing Entrypoints Nov 20, 2023
@tustvold tustvold merged commit b724849 into apache:master Nov 20, 2023
26 checks passed
@tustvold
Copy link
Contributor

Thanks once again 👍

@pitrou pitrou deleted the cdata-integration-testing branch November 20, 2023 16:58
@pitrou pitrou restored the cdata-integration-testing branch November 20, 2023 16:58
@pitrou pitrou deleted the cdata-integration-testing branch November 20, 2023 16:58
@pitrou pitrou restored the cdata-integration-testing branch November 20, 2023 16:58
@pitrou pitrou deleted the cdata-integration-testing branch November 20, 2023 16:58
pitrou added a commit to apache/arrow that referenced this pull request Nov 21, 2023
…n Rust (#38799)

### Rationale for this change

Arrow Rust has added entrypoints for C Data Interface integration testing, so this can now be enabled on our side:
apache/arrow-rs#5080

### Are these changes tested?

Yes.

### Are there any user-facing changes?

No.
* Closes: #38798

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
dgreiss pushed a commit to dgreiss/arrow that referenced this pull request Feb 19, 2024
…ting on Rust (apache#38799)

### Rationale for this change

Arrow Rust has added entrypoints for C Data Interface integration testing, so this can now be enabled on our side:
apache/arrow-rs#5080

### Are these changes tested?

Yes.

### Are there any user-facing changes?

No.
* Closes: apache#38798

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Michael-J-Ward added a commit to Michael-J-Ward/dora that referenced this pull request Apr 16, 2024
Arrow needs to be upgraded alongside pyo3 because they both link to python.

Arrow's upgrade to pyo3 is in master but not yet released as of v51.
apache/arrow-rs#5566

NOTE: Arrow marked `ffi::from_ffi` unsafe.
apache/arrow-rs#5080
Michael-J-Ward added a commit to Michael-J-Ward/dora that referenced this pull request Apr 18, 2024
Arrow needs to be upgraded alongside pyo3 because they both link to python.

Arrow's upgrade to pyo3 is in master but not yet released as of v51.
apache/arrow-rs#5566

NOTE: Arrow marked `ffi::from_ffi` unsafe.
apache/arrow-rs#5080
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-change Changes to the arrow API arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enable C Data Integration Tests
4 participants