Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python][C++] C Data Interface incorrect validate failures #14875

Closed
zeroshade opened this issue Dec 7, 2022 · 3 comments · Fixed by #14814
Closed

[Python][C++] C Data Interface incorrect validate failures #14875

zeroshade opened this issue Dec 7, 2022 · 3 comments · Fixed by #14814
Assignees
Milestone

Comments

@zeroshade
Copy link
Member

Describe the bug, including details regarding any error messages, version, and platform.

Spinning off from #14814:

When testing round trips of empty arrays between Python and Go using the C Data Interface, I found an issue with the binary and string data type arrays.

The data types: pa.binary(), pa.large_binary(), pa.string(), pa.large_string() all throw an error when calling validate(full=True) after the _import_from_c that contained a null value data buffer:

Traceback (most recent call last):
  File "/home/zeroshade/Projects/GitHub/arrow/go/arrow/cdata/test/test_export_to_cgo.py", line 218, in test
    b.validate(full=True)
  File "pyarrow/array.pxi", line 1501, in pyarrow.lib.Array.validate
  File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Value data buffer is null

Following up from #14805 clarifying that buffers can be null in a 0-length array. My guess here is rather than the offsets buffer, the issue is the second data buffer which would contain the actual binary/utf-8 data if the array had a length >0. But that's just a theory, I haven't confirmed it.

Component(s)

C++, Python

@pitrou
Copy link
Member

pitrou commented Dec 8, 2022

Hmm, I'm not sure it's a good idea to relax the validation check after all.
There's no guarantee that a null data pointer on binary arrays would work correctly with all our internal routines.
Rather than that, it sounds safer to turn the null pointer into a non-null 0-size buffer at the C Data Interface boundary.

@zeroshade
Copy link
Member Author

Would this still be something that needs to be fixed / changed in the PyArrow / C++ libs? Or should the Go lib be changed to not provide a null pointer in this situation?

@pitrou
Copy link
Member

pitrou commented Dec 8, 2022

I should fix it in #14814.

@raulcd raulcd added the Priority: Blocker Marks a blocker for the release label Jan 17, 2023
@raulcd raulcd added this to the 11.0.0 milestone Jan 17, 2023
@pitrou pitrou modified the milestones: 11.0.0, 12.0.0 Jan 17, 2023
pitrou added a commit that referenced this issue Jan 17, 2023
…14814)

The C data interface may expose null data pointers for zero-sized buffers.
Make sure that all buffer pointers remain non-null internally.

Followup to GH-14805

* Closes: #14875

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
@raulcd raulcd modified the milestones: 12.0.0, 11.0.0 Jan 18, 2023
raulcd pushed a commit that referenced this issue Jan 18, 2023
…14814)

The C data interface may expose null data pointers for zero-sized buffers.
Make sure that all buffer pointers remain non-null internally.

Followup to GH-14805

* Closes: #14875

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants