FIX `EmptyRequest.get` defaults to `Bunch` of `METHODS` #28371

eddiebergman · 2024-02-06T12:06:07Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

This makes the process_routing behave equally if sklearn.config(enable_metadata_routing=True) or False. Please see the reference issue for more information.

Any other comments?

I could not run all tests locally, I am hoping to rely on the automated test runners. The changes attempted were trying to be minimal.

One other comment would be to change get() to check explicitly on default is None rather than implicit falsyness as things like get(name, default=[]) would not work as it stands.

github-actions · 2024-02-06T12:07:30Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 2ca2aa7. Link to the linter CI: here}

adrinjalali

Thanks.

Could you please add a test here? Something that the reproducer in the issue is triggering.

`enable_metadata_routing`

eddiebergman · 2024-02-06T12:47:27Z

Added the test to test_metadata_routing.py as I didn't know where it best fit. Please advise how test should change if needs be.

adrinjalali · 2024-02-06T13:44:54Z

@ogrisel codecov failing here.

sklearn/tests/test_metadata_routing.py

StefanieSenger · 2024-02-06T14:51:40Z

Also, with sklearn.set_config(enable_metadata_routing=True), even more scoring methods would crash, that popped up to me somewhere in the doctests.

Namely score methods called from within:

doc/modules/permutation_importance.rst
doc/modules/cross_validation.rst
doc/modules/ensemble.rst
doc/modules/model_evaluation.rst
doc/tutorial/statistical_inference/model_selection.rst

The Fix from this PR will also fix these issues.

eddiebergman · 2024-02-06T16:29:07Z

Test moved, unsure what to do about code-coverage of this change

ogrisel · 2024-02-06T17:21:34Z

The codecov failures (502 error followed by 503) are probably related to:

Hopefully, this will improve and is probably unrelated to token rotation.

I found about those by browsing https://status.codecov.com/ . It was empty yesterday when we first started to experience failures on most of our CI builds.

adrinjalali

Otherwise LGTM.

sklearn/tests/test_metadata_routing.py

adrinjalali · 2024-02-07T12:06:49Z

@OmarManzoor @glemaitre should be a quick review, and fixes doc builds.

glemaitre · 2024-02-08T17:59:55Z

sklearn/metrics/tests/test_score_objects.py

+
+def test_metadata_routing_multimetric_without_metadata_works_with_and_without_routing():


Suggested change

def test_metadata_routing_multimetric_without_metadata_works_with_and_without_routing():

@pytest.mark.parametrize("enable_metadata_routing", [True, False])

def test_metadata_routing_multimetric_metadata_routing(enable_metadata_routing):

glemaitre · 2024-02-08T18:00:15Z

sklearn/metrics/tests/test_score_objects.py

+    with config_context(enable_metadata_routing=True):
+        multimetric_scorer(estimator, X, y)
+
+    with config_context(enable_metadata_routing=False):
+        multimetric_scorer(estimator, X, y)


Suggested change

with config_context(enable_metadata_routing=True):

multimetric_scorer(estimator, X, y)

with config_context(enable_metadata_routing=False):

multimetric_scorer(estimator, X, y)

with config_context(enable_metadata_routing=enable_metadata_routing):

multimetric_scorer(estimator, X, y)

glemaitre · 2024-02-08T18:02:20Z

sklearn/utils/_metadata_requests.py

+                if not default:
+                    return Bunch(**{method: dict() for method in METHODS})
+
+                return default


codecov is not happy here. I need to figure out when is it the case that default=None

@adrinjalali I assume that we should be able to cover this one because it would be equivalent to call e.g.

routed_params = _process_routing(self, "score", **kwargs) routed_params.get("score", default="default")

I don't where is the best place to test this. This looks like a metadata routing test to me.

`process_routing`

eddiebergman · 2024-02-09T09:00:24Z

sklearn/tests/test_metadata_routing.py

+        # This would fail due to use of `if not default` instead of `if default is None`
+        # assert routed_params.get(method, default=[]) == []


Please advise on this part. I raised it earlier in the PR but there's been no comment there.

It's hard to find exact usage without types but by searching for .get( across the repo and I found a few places there is params.get("x", {}). It's hard to tell if params is a dict or an EmptyRequest.

It might not be a problem yet but I feel like this could silently cause hard to debug issues in the future, especically in cases where you expected a {} but instead got a Bunch(**{method: {} for method in METHODS}).

if len(params.get("x", {})) == 0: # Can never get here, the `Bunch` was returned, not the suggested default of `{}`

First recommendation is use a sentinel value to indicate nothing was passed in. Similar to how more_itertools works with defaults. This allows things to work like so:

default_bunch = params.get("x") none_value = params.get("x", default=None) list_value = params.get("x", default=[])

Second recommendation if you do not wish to introduce a sentinel pattern is just to use an explicit if default is None check instead of implicit falsyness. However this might not work as expected:

default_bunch = params.get("x") dict_value = params.get("x", default={}) # This gives back what was expected if params.get("x", default=None) is None: # This can never happen

IMO, the most intuitive approach here is the sentinel value. Basically, not passing anything will always return a Bunch. Setting default will return the type of default.

Such semantic is not surprising and expected. Right now, having None returning a Bunch is indeed surprising.

I don't know what @adrinjalali thinks?

I implemented the sentinel value approach in the meantime, happy to revert it if @adrinjalali thinks this should not be done.

The behavior should be like a dictionary, when passed the default and the key doesn't exist, we return the default. In this case, I wonder if we should ignore default completely (there only to immitate dict), and always return the empty routing list. Afterall, the whole point of this class is to return an empty routed_params object.

I'm not opposed at removing the default param. However, we would need to change the pattern:

params.get("fit", default={})

that is used in the pipeline for instance.

I wouldn't remove it, it just always return empty and ignores default. The default needs to be there to mimic a dict().get

glemaitre · 2024-02-12T09:46:15Z

sklearn/tests/test_metadata_routing.py

+
+        # However, with a default, should return that instead.
+        assert routed_params.get(method, default="default") == "default"
+        assert routed_params.get(method, default=[]) == []


The test is failing but I don't think that we are using the sentinel in the file so it looks normal :)

Whoops, never actually committed that change. I'm doing this PR in between little bits of free time. Apologies for the mini mistakes

eddiebergman · 2024-02-12T16:57:20Z

So to clarify, revert back to previous behaviour and use the following?

if not default:
   return Bunch(...)

adrinjalali · 2024-02-12T17:00:09Z

I think we can even ignore the default here completely.

eddiebergman · 2024-02-12T17:17:46Z

I think we can even ignore the default here completely.

Alright done!

glemaitre · 2024-02-12T18:34:04Z

sklearn/tests/test_metadata_routing.py

+def test_process_routing_empty_params_get_with_default():
+    empty_params = {}
+    routed_params = process_routing(ConsumingClassifier(), "fit", **empty_params)
+
+    # Behaviour should be an empty dictionary returned for each method when retrieved.
+    for method in METHODS:
+        params_for_method = routed_params[method]
+
+        # An empty dictionary for each method
+        assert isinstance(params_for_method, dict)
+        assert set(params_for_method.keys()) == set(METHODS)
+
+        # No default to `get` should be equivalent to the default
+        default_params_for_method = routed_params.get(method)
+        assert default_params_for_method == params_for_method
+
+        # Default to `get` is ignored and equivalent to the default
+        default_params_for_method = routed_params.get(method, default="default")
+        assert default_params_for_method == params_for_method
+


Let's parametrize the test.

Suggested change

def test_process_routing_empty_params_get_with_default():

empty_params = {}

routed_params = process_routing(ConsumingClassifier(), "fit", **empty_params)

# Behaviour should be an empty dictionary returned for each method when retrieved.

for method in METHODS:

params_for_method = routed_params[method]

# An empty dictionary for each method

assert isinstance(params_for_method, dict)

assert set(params_for_method.keys()) == set(METHODS)

# No default to `get` should be equivalent to the default

default_params_for_method = routed_params.get(method)

assert default_params_for_method == params_for_method

# Default to `get` is ignored and equivalent to the default

default_params_for_method = routed_params.get(method, default="default")

assert default_params_for_method == params_for_method

@pytest.mark.parametrize("method", METHODS)

@pytest.mark.parametrize("default", [None, "default"])

def test_process_routing_empty_params_get_default(method, default):

"""Check that `default` parameter of `params.get` is ignored and always returns

an EmptyRequest object."""

empty_params = {}

routed_params = process_routing(ConsumingClassifier(), "fit", **empty_params)

params_for_method = routed_params[method]

assert isinstance(params_for_method, dict)

assert set(params_for_method.keys()) == set(METHODS)

# The default parameter is expected to be ignored

default_params_for_method = routed_params.get(method, default=default)

assert default_params_for_method == params_for_method

Done, good suggestion!

glemaitre

Otherwise LGTM then.

glemaitre · 2024-02-13T11:40:03Z

Nice Thanks @eddiebergman

…#28371)

fix(routing): EmptyRequest.get defaults to Bunch of METHODS

5e94bd1

github-actions bot added the module:utils label Feb 6, 2024

eddiebergman mentioned this pull request Feb 6, 2024

[Bug, 1.5 nightly] set_config(enable_metadata_routing=True) broken by #28256 #28370

Closed

adrinjalali reviewed Feb 6, 2024

View reviewed changes

adrinjalali added this to the 1.4.1 milestone Feb 6, 2024

test: Preventitive regression with _MultimetricScorer and

e56b4d8

`enable_metadata_routing`

eddiebergman requested a review from adrinjalali February 6, 2024 12:50

adrinjalali added the No Changelog Needed label Feb 6, 2024

adrinjalali reviewed Feb 6, 2024

View reviewed changes

sklearn/tests/test_metadata_routing.py Outdated Show resolved Hide resolved

test: Move test to test_score_objects.py

6da316e

eddiebergman requested a review from adrinjalali February 6, 2024 16:28

doc: Make test name and description clearer about what's tested

2aa6c16

adrinjalali approved these changes Feb 6, 2024

View reviewed changes

sklearn/tests/test_metadata_routing.py Show resolved Hide resolved

fix: Reset changes to test_metadata_routing

940e53d

glemaitre changed the title ~~fix(routing): EmptyRequest.get defaults to Bunch of METHODS~~ FIX EmptyRequest.get defaults to Bunch of METHODS Feb 8, 2024

glemaitre reviewed Feb 8, 2024

View reviewed changes

glemaitre requested review from glemaitre and adrinjalali and removed request for glemaitre February 8, 2024 18:02

eddiebergman added 2 commits February 9, 2024 09:48

test: Parametrize test

8ed5630

test: Add test for get() behaviour on returned value of

4bcaab6

`process_routing`

eddiebergman commented Feb 9, 2024

View reviewed changes

eddiebergman requested a review from glemaitre February 9, 2024 09:01

eddiebergman added 2 commits February 9, 2024 16:08

style: Fix linting with ruff

6610d7b

test: Fixup

d32ca3a

glemaitre mentioned this pull request Feb 10, 2024

DOC improve metadata routing example #27357

Merged

fix(routing): Use sentinel for EmptyRequest.get

d500ae5

glemaitre reviewed Feb 12, 2024

View reviewed changes

fix: Use the sentinel value... whoops

974c559

fix(metadata): EmptyRequest ignores default in get

adf4eff

glemaitre self-requested a review February 12, 2024 18:28

glemaitre reviewed Feb 12, 2024

View reviewed changes

glemaitre approved these changes Feb 12, 2024

View reviewed changes

test: Parametrized test

2ca2aa7

eddiebergman requested a review from glemaitre February 13, 2024 08:37

adrinjalali approved these changes Feb 13, 2024

View reviewed changes

adrinjalali merged commit 3b32067 into scikit-learn:main Feb 13, 2024
30 checks passed

glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Feb 13, 2024

FIX EmptyRequest.get defaults to Bunch of METHODS (scikit-learn…

504ac9a

…#28371)

glemaitre pushed a commit that referenced this pull request Feb 13, 2024

FIX EmptyRequest.get defaults to Bunch of METHODS (#28371)

b2e231e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FIX `EmptyRequest.get` defaults to `Bunch` of `METHODS` #28371

FIX `EmptyRequest.get` defaults to `Bunch` of `METHODS` #28371

eddiebergman commented Feb 6, 2024 •

edited by adrinjalali

github-actions bot commented Feb 6, 2024 •

edited

adrinjalali left a comment

eddiebergman commented Feb 6, 2024 •

edited

adrinjalali commented Feb 6, 2024

StefanieSenger commented Feb 6, 2024

eddiebergman commented Feb 6, 2024

ogrisel commented Feb 6, 2024 •

edited

adrinjalali left a comment

adrinjalali commented Feb 7, 2024

glemaitre Feb 8, 2024

glemaitre Feb 8, 2024

glemaitre Feb 8, 2024

glemaitre Feb 8, 2024

eddiebergman Feb 9, 2024 •

edited

glemaitre Feb 10, 2024

eddiebergman Feb 12, 2024

adrinjalali Feb 12, 2024

glemaitre Feb 12, 2024

adrinjalali Feb 12, 2024

glemaitre Feb 12, 2024

eddiebergman Feb 12, 2024

eddiebergman commented Feb 12, 2024

adrinjalali commented Feb 12, 2024

eddiebergman commented Feb 12, 2024

glemaitre Feb 12, 2024

eddiebergman Feb 13, 2024

glemaitre left a comment

glemaitre commented Feb 13, 2024


		def test_metadata_routing_multimetric_without_metadata_works_with_and_without_routing():

-def test_metadata_routing_multimetric_without_metadata_works_with_and_without_routing():
+@pytest.mark.parametrize("enable_metadata_routing", [True, False])
+def test_metadata_routing_multimetric_metadata_routing(enable_metadata_routing):

		# This would fail due to use of `if not default` instead of `if default is None`
		# assert routed_params.get(method, default=[]) == []

FIX EmptyRequest.get defaults to Bunch of METHODS #28371

FIX EmptyRequest.get defaults to Bunch of METHODS #28371

Conversation

eddiebergman commented Feb 6, 2024 • edited by adrinjalali

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

github-actions bot commented Feb 6, 2024 • edited

✔️ Linting Passed

adrinjalali left a comment

Choose a reason for hiding this comment

eddiebergman commented Feb 6, 2024 • edited

adrinjalali commented Feb 6, 2024

StefanieSenger commented Feb 6, 2024

eddiebergman commented Feb 6, 2024

ogrisel commented Feb 6, 2024 • edited

adrinjalali left a comment

Choose a reason for hiding this comment

adrinjalali commented Feb 7, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eddiebergman Feb 9, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eddiebergman commented Feb 12, 2024

adrinjalali commented Feb 12, 2024

eddiebergman commented Feb 12, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

glemaitre left a comment

Choose a reason for hiding this comment

glemaitre commented Feb 13, 2024

FIX `EmptyRequest.get` defaults to `Bunch` of `METHODS` #28371

FIX `EmptyRequest.get` defaults to `Bunch` of `METHODS` #28371

eddiebergman commented Feb 6, 2024 •

edited by adrinjalali

github-actions bot commented Feb 6, 2024 •

edited

eddiebergman commented Feb 6, 2024 •

edited

ogrisel commented Feb 6, 2024 •

edited

eddiebergman Feb 9, 2024 •

edited