-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add warning for pyarrow<14.0.1
usage
#10622
Conversation
Builds are failing because I'm raising this warning... At least we got confirmation that works :) I'll add this warning to the ignore list (BTW this will trigger as soon as people import |
It looks like we still need to ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for handling this @fjetter. A couple of thoughts:
- Having the warning at the
import dask.dataframe as dd
level seems a little heavy handed. IIRC we only usepyarrow
in a few place (i.e. Parquet / ORC /engine="pyarrow"
inread_csv
and auto-inferring pyarrow strings). Can we just emit this warning in those cases? - Does adding
pyarrow-hotfix
as a coredask
dependency have any downside? It's a small, pure Python package with no dependencies (not evenpyarrow
). That would ensure that no matter what, if you're using a new version of Dask, you're not impacted by the issue.
Also cc @rjzamora for visibility
I don't feel like this is worth the effort and considering this is a security risk, I'm fine with being a little noisy. It's very easy for people to avoid this by upgrading.
See above
beyond this, this shouldn't have any implications |
Co-authored-by: James Bourbeau <jrbourbeau@users.noreply.github.com>
Pyarrow has a vulnerability to arbitrary code execution embedded in parquet, feather or arrow IPC. Dask users are potentially affected by this through our parquet reader.
The hotfix package raises if the read file has the potential to be harmful which is not ideal but I believe this is still the best approach for most users.
We can install this package whenever somebody installs
dask[complete]
(we should do the same for our conda package) and raise appropriate warnings.Increasing the minimal version to 14.0.1 was also brought up in another context for P2P shuffling so going through this deprecation cycle is beneficial for us either way.
Context