Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release highlights for 1.3 #26526

Merged
merged 11 commits into from Jun 29, 2023

Conversation

jeremiedbb
Copy link
Member

As usual, let's start with a few highlights and add more if needed in subsequent PRs.
For now I put the TargetEncoder, HDBSCAN, and missing values support in trees.

Do not hesitate to edit.

Something we should add is some news about the metadata routing but I wasn't sure how to write that. @adrinjalali would you mind adding a section (here or in a separate PR) ?

@jeremiedbb jeremiedbb added this to the 1.3 milestone Jun 7, 2023
@jeremiedbb
Copy link
Member Author

jeremiedbb commented Jun 7, 2023

Other features that I'm thinking about are

  • Grouping infrequent categories in OrdinalEncoder
  • Gamma deviance in HGBRegressor

Do you think they should end up in the highlights ?

@adrinjalali
Copy link
Member

@jeremiedbb added metadata routing.

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a good starting point to iterate from. LGTM

non_noisy_labels = hdbscan.labels_[hdbscan.labels_ != -1]
print(f"number of clusters found: {len(np.unique(non_noisy_labels))}")

v_measure_score(true_labels[hdbscan.labels_ != -1], non_noisy_labels)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:

Suggested change
v_measure_score(true_labels[hdbscan.labels_ != -1], non_noisy_labels)
print(
"V-measure:", v_measure_score(true_labels[hdbscan.labels_ != -1], non_noisy_labels)
)

tree.predict(X)

# %%
# Metadata Routing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would place this at the top.

@lorentzenchr
Copy link
Member

lorentzenchr commented Jun 23, 2023

Other features that I'm thinking about are

  • Grouping infrequent categories in OrdinalEncoder
  • Gamma deviance in HGBRegressor

From a practitioner‘s perspective, pricing actuaries in particular, gamma deviance HGBT are a big deal. The PR itself was pretty small, but based on a pile of work with the common loss functions.

@jeremiedbb
Copy link
Member Author

jeremiedbb commented Jun 26, 2023

@lorentzenchr would you mind adding a small section for the Gamma deviance ?

@github-actions
Copy link

github-actions bot commented Jun 26, 2023

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: a7b9264. Link to the linter CI: here

Comment on lines 48 to 51
# By performing :class:`cluster.DBSCAN` over varying epsilon values
# :class:`cluster.HDBSCAN` finds clusters of varying densities making it more robust to
# parameter selection than :class:`cluster.DBSCAN`. More details in the
# :ref:`User Guide <hdbscan>`.
Copy link
Contributor

@Micky774 Micky774 Jun 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The wording is a bit difficult here only because HDBSCAN doesn't solve exactly the DBSCAN problem for multiple epsilon, but rather a modified problem that solves for the same solutions locally. I think this is a fair-enough compromise between correctness and readability. Open to thoughts though :)

Suggested change
# By performing :class:`cluster.DBSCAN` over varying epsilon values
# :class:`cluster.HDBSCAN` finds clusters of varying densities making it more robust to
# parameter selection than :class:`cluster.DBSCAN`. More details in the
# :ref:`User Guide <hdbscan>`.
# By performing a modified version of :class:`cluster.DBSCAN` over multiple epsilon
# values simultaneously, :class:`cluster.HDBSCAN` finds clusters of varying densities
# making it more robust to parameter selection than :class:`cluster.DBSCAN`.
# More details in the :ref:`User Guide <hdbscan>`.

@lorentzenchr
Copy link
Member

@lorentzenchr would you mind adding a small section for the Gamma deviance ?

Until when?
Something like https://scikit-learn.org/stable/auto_examples/release_highlights/plot_release_highlights_0_23_0.html#generalized-linear-models-and-poisson-loss-for-gradient-boosting will do with the rng.poisson replaced by rng.gamma.

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM on my side.

@jeremiedbb jeremiedbb merged commit 5dd5811 into scikit-learn:main Jun 29, 2023
25 of 26 checks passed
jeremiedbb added a commit to jeremiedbb/scikit-learn that referenced this pull request Jun 29, 2023
Co-authored-by: adrinjalali <adrin.jalali@gmail.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
@lorentzenchr
Copy link
Member

lorentzenchr commented Jun 29, 2023

@jeremiedbb Thank you! 🚀

jeremiedbb added a commit that referenced this pull request Jun 29, 2023
Co-authored-by: adrinjalali <adrin.jalali@gmail.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
punndcoder28 pushed a commit to punndcoder28/scikit-learn that referenced this pull request Jul 29, 2023
Co-authored-by: adrinjalali <adrin.jalali@gmail.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
REDVM pushed a commit to REDVM/scikit-learn that referenced this pull request Nov 16, 2023
Co-authored-by: adrinjalali <adrin.jalali@gmail.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants