Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX EmptyRequest.get defaults to Bunch of METHODS #28371

Conversation

eddiebergman
Copy link
Contributor

@eddiebergman eddiebergman commented Feb 6, 2024

Reference Issues/PRs

Fixes #28370

What does this implement/fix? Explain your changes.

This makes the process_routing behave equally if sklearn.config(enable_metadata_routing=True) or False. Please see the reference issue for more information.

Any other comments?

I could not run all tests locally, I am hoping to rely on the automated test runners. The changes attempted were trying to be minimal.

One other comment would be to change get() to check explicitly on default is None rather than implicit falsyness as things like get(name, default=[]) would not work as it stands.

Copy link

github-actions bot commented Feb 6, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 2ca2aa7. Link to the linter CI: here

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

Could you please add a test here? Something that the reproducer in the issue is triggering.

@adrinjalali adrinjalali added this to the 1.4.1 milestone Feb 6, 2024
@eddiebergman
Copy link
Contributor Author

eddiebergman commented Feb 6, 2024

Added the test to test_metadata_routing.py as I didn't know where it best fit. Please advise how test should change if needs be.

@adrinjalali
Copy link
Member

@ogrisel codecov failing here.

@StefanieSenger
Copy link
Contributor

Also, with sklearn.set_config(enable_metadata_routing=True), even more scoring methods would crash, that popped up to me somewhere in the doctests.

Namely score methods called from within:

  • doc/modules/permutation_importance.rst
  • doc/modules/cross_validation.rst
  • doc/modules/ensemble.rst
  • doc/modules/model_evaluation.rst
  • doc/tutorial/statistical_inference/model_selection.rst

The Fix from this PR will also fix these issues.

@eddiebergman
Copy link
Contributor Author

Test moved, unsure what to do about code-coverage of this change

@ogrisel
Copy link
Member

ogrisel commented Feb 6, 2024

The codecov failures (502 error followed by 503) are probably related to:

Hopefully, this will improve and is probably unrelated to token rotation.

I found about those by browsing https://status.codecov.com/ . It was empty yesterday when we first started to experience failures on most of our CI builds.

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise LGTM.

sklearn/tests/test_metadata_routing.py Show resolved Hide resolved
@adrinjalali
Copy link
Member

@OmarManzoor @glemaitre should be a quick review, and fixes doc builds.

@glemaitre glemaitre changed the title fix(routing): EmptyRequest.get defaults to Bunch of METHODS FIX EmptyRequest.get defaults to Bunch of METHODS Feb 8, 2024
Comment on lines 1494 to 1495

def test_metadata_routing_multimetric_without_metadata_works_with_and_without_routing():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def test_metadata_routing_multimetric_without_metadata_works_with_and_without_routing():
@pytest.mark.parametrize("enable_metadata_routing", [True, False])
def test_metadata_routing_multimetric_metadata_routing(enable_metadata_routing):

Comment on lines 1505 to 1509
with config_context(enable_metadata_routing=True):
multimetric_scorer(estimator, X, y)

with config_context(enable_metadata_routing=False):
multimetric_scorer(estimator, X, y)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
with config_context(enable_metadata_routing=True):
multimetric_scorer(estimator, X, y)
with config_context(enable_metadata_routing=False):
multimetric_scorer(estimator, X, y)
with config_context(enable_metadata_routing=enable_metadata_routing):
multimetric_scorer(estimator, X, y)

if not default:
return Bunch(**{method: dict() for method in METHODS})

return default
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

codecov is not happy here. I need to figure out when is it the case that default=None

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adrinjalali I assume that we should be able to cover this one because it would be equivalent to call e.g.

routed_params = _process_routing(self, "score", **kwargs)
routed_params.get("score", default="default")

I don't where is the best place to test this. This looks like a metadata routing test to me.

@glemaitre glemaitre requested review from glemaitre and adrinjalali and removed request for glemaitre February 8, 2024 18:02
Comment on lines 261 to 262
# This would fail due to use of `if not default` instead of `if default is None`
# assert routed_params.get(method, default=[]) == []
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please advise on this part. I raised it earlier in the PR but there's been no comment there.

It's hard to find exact usage without types but by searching for .get( across the repo and I found a few places there is params.get("x", {}). It's hard to tell if params is a dict or an EmptyRequest.

It might not be a problem yet but I feel like this could silently cause hard to debug issues in the future, especically in cases where you expected a {} but instead got a Bunch(**{method: {} for method in METHODS}).

if len(params.get("x", {})) == 0:
    # Can never get here, the `Bunch` was returned, not the suggested default of `{}`

First recommendation is use a sentinel value to indicate nothing was passed in. Similar to how more_itertools works with defaults. This allows things to work like so:

default_bunch = params.get("x")
none_value = params.get("x", default=None)
list_value = params.get("x", default=[])

Second recommendation if you do not wish to introduce a sentinel pattern is just to use an explicit if default is None check instead of implicit falsyness. However this might not work as expected:

default_bunch = params.get("x")
dict_value = params.get("x", default={})  # This gives back what was expected

if params.get("x", default=None) is None:
   # This can never happen

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, the most intuitive approach here is the sentinel value. Basically, not passing anything will always return a Bunch. Setting default will return the type of default.

Such semantic is not surprising and expected. Right now, having None returning a Bunch is indeed surprising.

I don't know what @adrinjalali thinks?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented the sentinel value approach in the meantime, happy to revert it if @adrinjalali thinks this should not be done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The behavior should be like a dictionary, when passed the default and the key doesn't exist, we return the default. In this case, I wonder if we should ignore default completely (there only to immitate dict), and always return the empty routing list. Afterall, the whole point of this class is to return an empty routed_params object.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not opposed at removing the default param. However, we would need to change the pattern:

params.get("fit", default={})

that is used in the pipeline for instance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't remove it, it just always return empty and ignores default. The default needs to be there to mimic a dict().get


# However, with a default, should return that instead.
assert routed_params.get(method, default="default") == "default"
assert routed_params.get(method, default=[]) == []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test is failing but I don't think that we are using the sentinel in the file so it looks normal :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, never actually committed that change. I'm doing this PR in between little bits of free time. Apologies for the mini mistakes

@eddiebergman
Copy link
Contributor Author

So to clarify, revert back to previous behaviour and use the following?

if not default:
   return Bunch(...)

@adrinjalali
Copy link
Member

I think we can even ignore the default here completely.

@eddiebergman
Copy link
Contributor Author

I think we can even ignore the default here completely.

Alright done!

@glemaitre glemaitre self-requested a review February 12, 2024 18:28
Comment on lines 242 to 261
def test_process_routing_empty_params_get_with_default():
empty_params = {}
routed_params = process_routing(ConsumingClassifier(), "fit", **empty_params)

# Behaviour should be an empty dictionary returned for each method when retrieved.
for method in METHODS:
params_for_method = routed_params[method]

# An empty dictionary for each method
assert isinstance(params_for_method, dict)
assert set(params_for_method.keys()) == set(METHODS)

# No default to `get` should be equivalent to the default
default_params_for_method = routed_params.get(method)
assert default_params_for_method == params_for_method

# Default to `get` is ignored and equivalent to the default
default_params_for_method = routed_params.get(method, default="default")
assert default_params_for_method == params_for_method

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's parametrize the test.

Suggested change
def test_process_routing_empty_params_get_with_default():
empty_params = {}
routed_params = process_routing(ConsumingClassifier(), "fit", **empty_params)
# Behaviour should be an empty dictionary returned for each method when retrieved.
for method in METHODS:
params_for_method = routed_params[method]
# An empty dictionary for each method
assert isinstance(params_for_method, dict)
assert set(params_for_method.keys()) == set(METHODS)
# No default to `get` should be equivalent to the default
default_params_for_method = routed_params.get(method)
assert default_params_for_method == params_for_method
# Default to `get` is ignored and equivalent to the default
default_params_for_method = routed_params.get(method, default="default")
assert default_params_for_method == params_for_method
@pytest.mark.parametrize("method", METHODS)
@pytest.mark.parametrize("default", [None, "default"])
def test_process_routing_empty_params_get_default(method, default):
"""Check that `default` parameter of `params.get` is ignored and always returns
an EmptyRequest object."""
empty_params = {}
routed_params = process_routing(ConsumingClassifier(), "fit", **empty_params)
params_for_method = routed_params[method]
assert isinstance(params_for_method, dict)
assert set(params_for_method.keys()) == set(METHODS)
# The default parameter is expected to be ignored
default_params_for_method = routed_params.get(method, default=default)
assert default_params_for_method == params_for_method

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, good suggestion!

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise LGTM then.

@adrinjalali adrinjalali merged commit 3b32067 into scikit-learn:main Feb 13, 2024
30 checks passed
@glemaitre
Copy link
Member

Nice Thanks @eddiebergman

glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Feb 13, 2024
glemaitre pushed a commit that referenced this pull request Feb 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug, 1.5 nightly] set_config(enable_metadata_routing=True) broken by #28256
5 participants