Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX Raise an error when min_samples_split=1 in trees #25744

Merged
merged 4 commits into from Mar 6, 2023

Conversation

jeremiedbb
Copy link
Member

Fixes #25481

Fixes a regression introduced by the new param validations. This PR introduces a new type for the Interval constraint: "real_not_int". It's useful when parameters have a behavior depending on whether the param is an int or a float.

@thomasjpfan I didn't follow exactly your suggestion from #25481 (comment) because there can only be 1 situation where invalid_type is meaningful, i.e real but not int (int but not real is already excluded by definition). I felt that making it a third type option was more natural. Let me know what you think.

I think we should use this type wherever a parameter has 2 Interval constraints, one for ints and one for reals. It would make constraints more disjoints and would prevent more situations like this one. I plan to do that in a separate PR though.

@jeremiedbb jeremiedbb added To backport PR merged in master that need a backport to a release branch defined based on the milestone. Validation related to input validation labels Mar 2, 2023
@jeremiedbb jeremiedbb added this to the 1.2.2 milestone Mar 2, 2023
@@ -364,7 +364,7 @@ class Interval(_Constraint):

Parameters
----------
type : {numbers.Integral, numbers.Real}
type : {numbers.Integral, numbers.Real, "real_not_int"}
The set of numbers in which to set the interval.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be good to have a description of these internally in the docstring? Just for community-devs that aren't familiar w/ what each of these are intended to mean?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a description of the new option.

A short description of all the constraints and what they represent can be found here https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/utils/_param_validation.py#L28

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

I was just mentioning mainly the real_not_int option, not the others.

Copy link
Contributor

@adam2392 adam2392 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes make sense to me and I'm glad this error is fixed since it actually came up for me before.

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, introducing "real_not_int" to Interval is cleaner than adding another keyword parameter.

sklearn/utils/_param_validation.py Show resolved Hide resolved
"closed must be either 'left', 'right', 'both' or 'neither'. "
f"Got {self.closed} instead."
)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the removal of @validate_params, does self.left and self.right needs to be validated here?

Copy link
Member Author

@jeremiedbb jeremiedbb Mar 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I added the missing validation

@glemaitre
Copy link
Member

@jeremiedbb do you think that we should introduce this validation wherever we should in this PR or let it for another one?

@jeremiedbb
Copy link
Member Author

@glemaitre I already answered in the PR description :)

I plan to do that in a separate PR though.

@glemaitre
Copy link
Member

@glemaitre I already answered in the PR description :)

I should read the description of PRs :)

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@thomasjpfan thomasjpfan merged commit 86548df into scikit-learn:main Mar 6, 2023
jeremiedbb added a commit to jeremiedbb/scikit-learn that referenced this pull request Mar 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:tree module:utils To backport PR merged in master that need a backport to a release branch defined based on the milestone. Validation related to input validation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Setting min_samples_split=1 in DecisionTreeClassifier does not raise exception
4 participants