Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add warning for pyarrow<14.0.1 usage #10622

Merged
merged 7 commits into from
Nov 10, 2023
Merged

Conversation

fjetter
Copy link
Member

@fjetter fjetter commented Nov 9, 2023

Pyarrow has a vulnerability to arbitrary code execution embedded in parquet, feather or arrow IPC. Dask users are potentially affected by this through our parquet reader.

The hotfix package raises if the read file has the potential to be harmful which is not ideal but I believe this is still the best approach for most users.
We can install this package whenever somebody installs dask[complete] (we should do the same for our conda package) and raise appropriate warnings.
Increasing the minimal version to 14.0.1 was also brought up in another context for P2P shuffling so going through this deprecation cycle is beneficial for us either way.

Context

@fjetter
Copy link
Member Author

fjetter commented Nov 9, 2023

Builds are failing because I'm raising this warning... At least we got confirmation that works :)

I'll add this warning to the ignore list

(BTW this will trigger as soon as people import dask.dataframe)

@hendrikmakait
Copy link
Member

It looks like we still need to ignore UserWarning: You are using pyarrow version 9.0.0 which is known to be insecure. See https://www.cve.org/CVERecord?id=CVE-2023-47248 for further details. Please upgrade to pyarrow>=14.0.1 or install pyarrow-hotfix to patch your current version..

@jrbourbeau jrbourbeau mentioned this pull request Nov 9, 2023
4 tasks
pyproject.toml Outdated Show resolved Hide resolved
Copy link
Member

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for handling this @fjetter. A couple of thoughts:

  • Having the warning at the import dask.dataframe as dd level seems a little heavy handed. IIRC we only use pyarrow in a few place (i.e. Parquet / ORC / engine="pyarrow" in read_csv and auto-inferring pyarrow strings). Can we just emit this warning in those cases?
  • Does adding pyarrow-hotfix as a core dask dependency have any downside? It's a small, pure Python package with no dependencies (not even pyarrow). That would ensure that no matter what, if you're using a new version of Dask, you're not impacted by the issue.

Also cc @rjzamora for visibility

@fjetter
Copy link
Member Author

fjetter commented Nov 10, 2023

Can we just emit this warning in those cases?

I don't feel like this is worth the effort and considering this is a security risk, I'm fine with being a little noisy. It's very easy for people to avoid this by upgrading.

That would ensure that no matter what, if you're using a new version of Dask, you're not impacted by the issue.

See above

The hotfix package raises if the read file has the potential to be harmful which is not ideal but I believe this is still the best approach for most users.

beyond this, this shouldn't have any implications

fjetter and others added 2 commits November 10, 2023 09:09
Co-authored-by: James Bourbeau <jrbourbeau@users.noreply.github.com>
@jrbourbeau jrbourbeau changed the title Add pyarrow_hotfix Add warning for pyarrow<14.0.1 usage Nov 10, 2023
@jrbourbeau jrbourbeau merged commit 972352a into dask:main Nov 10, 2023
25 of 27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants