New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX EmptyRequest.get
defaults to Bunch
of METHODS
#28371
FIX EmptyRequest.get
defaults to Bunch
of METHODS
#28371
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
Could you please add a test here? Something that the reproducer in the issue is triggering.
`enable_metadata_routing`
Added the test to |
@ogrisel codecov failing here. |
Also, with Namely score methods called from within:
The Fix from this PR will also fix these issues. |
Test moved, unsure what to do about code-coverage of this change |
The codecov failures (502 error followed by 503) are probably related to: Hopefully, this will improve and is probably unrelated to token rotation. I found about those by browsing https://status.codecov.com/ . It was empty yesterday when we first started to experience failures on most of our CI builds. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise LGTM.
@OmarManzoor @glemaitre should be a quick review, and fixes doc builds. |
EmptyRequest.get
defaults to Bunch
of METHODS
EmptyRequest.get
defaults to Bunch
of METHODS
|
||
def test_metadata_routing_multimetric_without_metadata_works_with_and_without_routing(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def test_metadata_routing_multimetric_without_metadata_works_with_and_without_routing(): | |
@pytest.mark.parametrize("enable_metadata_routing", [True, False]) | |
def test_metadata_routing_multimetric_metadata_routing(enable_metadata_routing): |
with config_context(enable_metadata_routing=True): | ||
multimetric_scorer(estimator, X, y) | ||
|
||
with config_context(enable_metadata_routing=False): | ||
multimetric_scorer(estimator, X, y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with config_context(enable_metadata_routing=True): | |
multimetric_scorer(estimator, X, y) | |
with config_context(enable_metadata_routing=False): | |
multimetric_scorer(estimator, X, y) | |
with config_context(enable_metadata_routing=enable_metadata_routing): | |
multimetric_scorer(estimator, X, y) |
sklearn/utils/_metadata_requests.py
Outdated
if not default: | ||
return Bunch(**{method: dict() for method in METHODS}) | ||
|
||
return default |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
codecov is not happy here. I need to figure out when is it the case that default=None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adrinjalali I assume that we should be able to cover this one because it would be equivalent to call e.g.
routed_params = _process_routing(self, "score", **kwargs)
routed_params.get("score", default="default")
I don't where is the best place to test this. This looks like a metadata routing test to me.
# This would fail due to use of `if not default` instead of `if default is None` | ||
# assert routed_params.get(method, default=[]) == [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please advise on this part. I raised it earlier in the PR but there's been no comment there.
It's hard to find exact usage without types but by searching for .get(
across the repo and I found a few places there is params.get("x", {})
. It's hard to tell if params
is a dict
or an EmptyRequest
.
It might not be a problem yet but I feel like this could silently cause hard to debug issues in the future, especically in cases where you expected a {}
but instead got a Bunch(**{method: {} for method in METHODS})
.
if len(params.get("x", {})) == 0:
# Can never get here, the `Bunch` was returned, not the suggested default of `{}`
First recommendation is use a sentinel value to indicate nothing was passed in. Similar to how more_itertools
works with defaults. This allows things to work like so:
default_bunch = params.get("x")
none_value = params.get("x", default=None)
list_value = params.get("x", default=[])
Second recommendation if you do not wish to introduce a sentinel pattern is just to use an explicit if default is None
check instead of implicit falsyness. However this might not work as expected:
default_bunch = params.get("x")
dict_value = params.get("x", default={}) # This gives back what was expected
if params.get("x", default=None) is None:
# This can never happen
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, the most intuitive approach here is the sentinel value. Basically, not passing anything will always return a Bunch
. Setting default
will return the type of default
.
Such semantic is not surprising and expected. Right now, having None
returning a Bunch is indeed surprising.
I don't know what @adrinjalali thinks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I implemented the sentinel value approach in the meantime, happy to revert it if @adrinjalali thinks this should not be done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The behavior should be like a dictionary, when passed the default and the key doesn't exist, we return the default. In this case, I wonder if we should ignore default completely (there only to immitate dict
), and always return the empty routing list. Afterall, the whole point of this class is to return an empty routed_params object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not opposed at removing the default
param. However, we would need to change the pattern:
params.get("fit", default={})
that is used in the pipeline for instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't remove it, it just always return empty and ignores default
. The default
needs to be there to mimic a dict().get
|
||
# However, with a default, should return that instead. | ||
assert routed_params.get(method, default="default") == "default" | ||
assert routed_params.get(method, default=[]) == [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test is failing but I don't think that we are using the sentinel in the file so it looks normal :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops, never actually committed that change. I'm doing this PR in between little bits of free time. Apologies for the mini mistakes
So to clarify, revert back to previous behaviour and use the following? if not default:
return Bunch(...) |
I think we can even ignore the default here completely. |
Alright done! |
def test_process_routing_empty_params_get_with_default(): | ||
empty_params = {} | ||
routed_params = process_routing(ConsumingClassifier(), "fit", **empty_params) | ||
|
||
# Behaviour should be an empty dictionary returned for each method when retrieved. | ||
for method in METHODS: | ||
params_for_method = routed_params[method] | ||
|
||
# An empty dictionary for each method | ||
assert isinstance(params_for_method, dict) | ||
assert set(params_for_method.keys()) == set(METHODS) | ||
|
||
# No default to `get` should be equivalent to the default | ||
default_params_for_method = routed_params.get(method) | ||
assert default_params_for_method == params_for_method | ||
|
||
# Default to `get` is ignored and equivalent to the default | ||
default_params_for_method = routed_params.get(method, default="default") | ||
assert default_params_for_method == params_for_method | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's parametrize the test.
def test_process_routing_empty_params_get_with_default(): | |
empty_params = {} | |
routed_params = process_routing(ConsumingClassifier(), "fit", **empty_params) | |
# Behaviour should be an empty dictionary returned for each method when retrieved. | |
for method in METHODS: | |
params_for_method = routed_params[method] | |
# An empty dictionary for each method | |
assert isinstance(params_for_method, dict) | |
assert set(params_for_method.keys()) == set(METHODS) | |
# No default to `get` should be equivalent to the default | |
default_params_for_method = routed_params.get(method) | |
assert default_params_for_method == params_for_method | |
# Default to `get` is ignored and equivalent to the default | |
default_params_for_method = routed_params.get(method, default="default") | |
assert default_params_for_method == params_for_method | |
@pytest.mark.parametrize("method", METHODS) | |
@pytest.mark.parametrize("default", [None, "default"]) | |
def test_process_routing_empty_params_get_default(method, default): | |
"""Check that `default` parameter of `params.get` is ignored and always returns | |
an EmptyRequest object.""" | |
empty_params = {} | |
routed_params = process_routing(ConsumingClassifier(), "fit", **empty_params) | |
params_for_method = routed_params[method] | |
assert isinstance(params_for_method, dict) | |
assert set(params_for_method.keys()) == set(METHODS) | |
# The default parameter is expected to be ignored | |
default_params_for_method = routed_params.get(method, default=default) | |
assert default_params_for_method == params_for_method | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, good suggestion!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise LGTM then.
Nice Thanks @eddiebergman |
Reference Issues/PRs
Fixes #28370
What does this implement/fix? Explain your changes.
This makes the
process_routing
behave equally ifsklearn.config(enable_metadata_routing=True)
orFalse
. Please see the reference issue for more information.Any other comments?
I could not run all tests locally, I am hoping to rely on the automated test runners. The changes attempted were trying to be minimal.
One other comment would be to change
get()
to check explicitly ondefault is None
rather than implicit falsyness as things likeget(name, default=[])
would not work as it stands.