Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add RepeatedStratifiedGroupKFold #24247

Open
arvkevi opened this issue Aug 24, 2022 · 1 comment · May be fixed by #24227
Open

Add RepeatedStratifiedGroupKFold #24247

arvkevi opened this issue Aug 24, 2022 · 1 comment · May be fixed by #24227

Comments

@arvkevi
Copy link

arvkevi commented Aug 24, 2022

Describe the workflow you want to enable

Building off conversation #13621 and work already done in #18649, I'd like to add an implementation of RepeatedStratifiedGroupKFold.

Describe your proposed solution

See the implementation in #24227. Then RepeatedStratifiedGroupKFold could be used similar to below:

  >>> import numpy as np
  >>> from sklearn.model_selection import RepeatedStratifiedGroupKFold
  >>> X = np.random.randn(10, 1)
  >>> y = np.array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1])
  >>> groups = np.array([1, 1, 2, 2, 2, 3, 4, 4, 5, 5])
  >>> rsgkf = RepeatedStratifiedGroupKFold(n_splits=3, n_repeats=2, random_state=42)
  >>> for train_idxs, test_idxs in rsgkf.split(X, y, groups):
  ...     # print the group assignment for the train/test indices
  ...     print("TRAIN:", groups[train_idxs], "TEST:", groups[test_idxs])
  ...     X_train, X_test = X[train_idxs], X[test_idxs]
  ...     y_train, y_test = y[train_idxs], y[test_idxs]
  TRAIN: [2 2 2 4 4 5 5] TEST: [1 1 3]
  TRAIN: [1 1 3 4 4 5 5] TEST: [2 2 2]
  TRAIN: [1 1 2 2 2 3] TEST: [4 4 5 5]
  TRAIN: [1 1 4 4 5 5] TEST: [2 2 2 3]
  TRAIN: [2 2 2 3 4 4 5 5] TEST: [1 1]
  TRAIN: [1 1 2 2 2 3] TEST: [4 4 5 5]

Describe alternatives you've considered, if relevant

No response

Additional context

No response

@arvkevi arvkevi added Needs Triage Issue requires triage New Feature labels Aug 24, 2022
@thomasjpfan
Copy link
Member

thomasjpfan commented Sep 9, 2022

From the triaging meeting, we think this would be a good inclusion. Although, there are already many splitter classes in API: https://scikit-learn.org/stable/modules/classes.html#splitter-classes, adding a new splitter for "repeated" is the current status quo.

Note that it may take some time for a maintainer to review your PR.

@thomasjpfan thomasjpfan removed the Needs Triage Issue requires triage label Sep 9, 2022
@arvkevi arvkevi changed the title Add RepeatedStratifiedGroupKFold [MRG] Add RepeatedStratifiedGroupKFold Oct 1, 2023
@arvkevi arvkevi changed the title [MRG] Add RepeatedStratifiedGroupKFold Add RepeatedStratifiedGroupKFold Oct 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants