Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grouped and Stratified K-Fold CV #18618

Closed
gkiar opened this issue Oct 14, 2020 · 1 comment
Closed

Grouped and Stratified K-Fold CV #18618

gkiar opened this issue Oct 14, 2020 · 1 comment

Comments

@gkiar
Copy link

gkiar commented Oct 14, 2020

Hi!

I'm trying to prepare dataset splits for a problem I'm working on, and would ultimately like a hybrid of Stratified K-Fold and Grouped K-Fold. Is there a way to accomplish this using logic already built into sklearn? If not, where would be the right place for me to add it/do you have any suggestions for how to get started before I give it a go?

For a bit more context, my dataset is a considerably larger version of the following structure:

groups = [1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4]  # you can think of this as a sample-id
y = [1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2]  # this is some trait of the samples 

The objective would be to balance y across the train and test sets, and have each group only represented on one side of the folds. Thanks in advance!

@NicolasHug
Copy link
Member

Closing as a duplicate of #13621

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants