Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX propagate configuration to workers in parallel #25363

Merged
merged 33 commits into from Jan 20, 2023

Conversation

glemaitre
Copy link
Member

closes #25242
closes #25239
closes #25290

This is an alternative to #25290. The issue in #25290 is that we change the public API for 1.2.1. Making the change in a private _delayed is not really possible since we would warn our user or developer to use a private function.

This PR proposes to overload Parallel and propagate the config using the thread that calls Parallel. It only requires changing the import without changing the Parallel or delayed calls.

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this approach the best. Here is a first pass of feedback on phrasing.

sklearn/utils/fixes.py Outdated Show resolved Hide resolved
sklearn/utils/fixes.py Outdated Show resolved Hide resolved
sklearn/utils/fixes.py Outdated Show resolved Hide resolved
sklearn/utils/fixes.py Outdated Show resolved Hide resolved
sklearn/utils/tests/test_fixes.py Outdated Show resolved Hide resolved
Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR! I do like this option a bit more than the other ones.

sklearn/utils/fixes.py Outdated Show resolved Hide resolved
glemaitre and others added 17 commits January 12, 2023 11:30
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
…ed attributes (scikit-learn#24882)

Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
…atch (scikit-learn#25297)

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Chiara Marmo <cmarmo@users.noreply.github.com>
Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading the generated API doc for the sklearn.fixes module I see that we also expose utils.parallel_backend and utils.register_parallel_backend.

I believe that we should also deprecated those and move them to sklearn.utils.parallel for consistency.

We should also probably included versionchanged or versionchanged info in the docstring of sklearn.utils.parallel.delayed and versionadded for .Parallel.

sklearn/utils/parallel.py Outdated Show resolved Hide resolved
sklearn/utils/parallel.py Show resolved Hide resolved
sklearn/utils/parallel.py Outdated Show resolved Hide resolved
sklearn/utils/parallel.py Show resolved Hide resolved
sklearn/utils/tests/test_parallel.py Outdated Show resolved Hide resolved
@ogrisel
Copy link
Member

ogrisel commented Jan 18, 2023

Other than the comments above, LGTM by the way. Thanks very much for this work @glemaitre!

@thomasjpfan
Copy link
Member

Reading the generated API doc for the sklearn.fixes module I see that we also expose utils.parallel_backend and utils.register_parallel_backend.

I believe that we should also deprecated those and move them to sklearn.utils.parallel for consistency.

Does the following comment still apply?

# Do not deprecate parallel_backend and register_parallel_backend as they are
# needed to tune `scikit-learn` behavior and have different effect if called
# from the vendored version or or the site-package version. The other are
# utilities that are independent of scikit-learn so they are not part of
# scikit-learn public API.
parallel_backend = _joblib.parallel_backend
register_parallel_backend = _joblib.register_parallel_backend

I do not think we need to provide parallel_backend or register_parallel_backend anywhere if they are already in joblib.

glemaitre and others added 2 commits January 18, 2023 17:44
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
@ogrisel
Copy link
Member

ogrisel commented Jan 18, 2023

I do not think we need to provide parallel_backend or register_parallel_backend anywhere if they are already in joblib.

Good catch, I should have checked that. I would be in favor of deprecating those two in favor of the matching functions from joblib.

@glemaitre
Copy link
Member Author

Good catch, I should have checked that. I would be in favor of deprecating those two in favor of the matching functions from joblib.

I can do that in a subsequent PR since it is independent of this PR.

@glemaitre
Copy link
Member Author

I think that I addressed the last comments. I will now open a PR for the deprecation of the joblib helper functions.

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! +1 when CI is green.

@ogrisel
Copy link
Member

ogrisel commented Jan 18, 2023

Hum, sphinx chokes in the ..rubric:: directive used in the Parallel.__doc__ attribute:

https://github.com/scikit-learn/scikit-learn/actions/runs/3951111594/jobs/6764541521#step:3:2324

Maybe we could instead do a simple docstring for sklearn.utils.parallel.Parallel that only present how it is different from joblib.Parallel (config propagation) and refer the reader to the docstring the parent class the rest.

@ogrisel
Copy link
Member

ogrisel commented Jan 19, 2023

Green!

@thomasjpfan any final comment?

@glemaitre
Copy link
Member Author

I swear it was green before :)

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comment about the changelog, otherwise LGTM.

@@ -105,6 +115,10 @@ Changelog
boolean. The type is maintained, instead of converting to `float64.`
:pr:`25147` by :user:`Tim Head <betatim>`.

- |API| :func:`utils.fixes.delayed` is deprecated in 1.2.1 and will be removed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need a changelog entry to introduce utils.parallel.Parallel as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than a new entry I think we can just expand this entry as suggested below to keep related changes together:

doc/whats_new/v1.2.rst Outdated Show resolved Hide resolved
glemaitre and others added 2 commits January 20, 2023 11:06
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
@thomasjpfan thomasjpfan added the To backport PR merged in master that need a backport to a release branch defined based on the milestone. label Jan 20, 2023
@thomasjpfan thomasjpfan merged commit a7cd0ca into scikit-learn:main Jan 20, 2023
jjerphan pushed a commit to jjerphan/scikit-learn that referenced this pull request Jan 23, 2023
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
adrinjalali pushed a commit that referenced this pull request Jan 24, 2023
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
To backport PR merged in master that need a backport to a release branch defined based on the milestone.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ColumnTransformers don't honor set_config(transform_output="pandas") when multiprocessing with n_jobs>1
8 participants