Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pyarrow crash on converstion of Pandas dataframe -> arrow with Decimal column #1888

Closed
robambalu opened this issue Apr 12, 2018 · 3 comments
Closed

Comments

@robambalu
Copy link

I was attempting to grab some data form a db -> dataframe -> arrow table. Some columns came in from the db as Decimal types, and some elements were None
This appears to cause a crash in the release build of pyarrow, and a proper error + std::abort in the debug build. numpy - > arrow conversion deduces that the column is a decimal type, but then ConvertDecimals barfs on the None type... Needless to say, if possible, would prefer to get a python exception that a crash / abort.
Relevant Stack:
#0 0x00007ffff712b1d7 in raise () from /lib64/libc.so.6
#1 0x00007ffff712c8c8 in abort () from /lib64/libc.so.6
#2 0x00007fffeb521fda in arrow::internal::CerrLog::~CerrLog (this=0x7fffffffaaa0, __in_chrg=) at /home/ra7293/arrow/cpp/src/arrow/util/logging.h:112
#3 0x00007fffeb1256d0 in arrow::py::internal::DecimalMetadata::Update (this=0x7fffffffaf00, object=0x88e760 <_Py_NoneStruct>) at /home/ra7293/arrow/cpp/src/arrow/python/helpers.cc:270
#4 0x00007fffeb131a36 in arrow::py::NumPyConverter::ConvertDecimals (this=0x7fffffffb860) at /home/ra7293/arrow/cpp/src/arrow/python/numpy_to_arrow.cc:789
#5 0x00007fffeb1376d2 in arrow::py::NumPyConverter::ConvertObjectsInfer (this=0x7fffffffb860) at /home/ra7293/arrow/cpp/src/arrow/python/numpy_to_arrow.cc:1090
#6 0x00007fffeb138aae in arrow::py::NumPyConverter::ConvertObjects (this=0x7fffffffb860) at /home/ra7293/arrow/cpp/src/arrow/python/numpy_to_arrow.cc:1176
#7 0x00007fffeb1313a7 in arrow::py::NumPyConverter::Convert (this=0x7fffffffb860) at /home/ra7293/arrow/cpp/src/arrow/python/numpy_to_arrow.cc:547
#8 0x00007fffeb13cbf0 in arrow::py::NdarrayToArrow (pool=0x7fffebbfa240 arrow::default_memory_pool()::default_memory_pool_, ao=0x7ffff0968030, mo=0x88e760 <_Py_NoneStruct>,
use_pandas_null_sentinels=true, type=..., out=0x7fffffffb9e0) at /home/ra7293/arrow/cpp/src/arrow/python/numpy_to_arrow.cc:1725
#9 0x00007fffebcca52b in __pyx_f_7pyarrow_3lib__ndarray_to_array (__pyx_v_values=0x7ffff0968030, __pyx_v_mask=0x88e760 <_Py_NoneStruct>, __pyx_v_type=0x88e760 <_Py_NoneStruct>,

Quick repro:
import pyarrow as pa
import pandas as pd
from decimal import Decimal

df = pd.DataFrame( { "test" : [ None, Decimal(1.0), Decimal(2.0), None ] } )
print(df, df["test"])
pa.Table.from_pandas( df )

@cpcloud
Copy link
Contributor

cpcloud commented Apr 12, 2018

This is fixed in #1878, with regression tests (e.g.: https://github.com/apache/arrow/pull/1878/files#diff-9819c8ade833fc019ee222c043ed0334R1332)

@cpcloud cpcloud closed this as completed Apr 12, 2018
@cpcloud
Copy link
Contributor

cpcloud commented Apr 12, 2018

Thanks for the report!

@robambalu
Copy link
Author

robambalu commented Apr 15, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants