Grouped and Stratified K-Fold CV #18618

gkiar · 2020-10-14T14:32:09Z

Hi!

I'm trying to prepare dataset splits for a problem I'm working on, and would ultimately like a hybrid of Stratified K-Fold and Grouped K-Fold. Is there a way to accomplish this using logic already built into sklearn? If not, where would be the right place for me to add it/do you have any suggestions for how to get started before I give it a go?

For a bit more context, my dataset is a considerably larger version of the following structure:

groups = [1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4]  # you can think of this as a sample-id
y = [1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2]  # this is some trait of the samples

The objective would be to balance y across the train and test sets, and have each group only represented on one side of the folds. Thanks in advance!

The text was updated successfully, but these errors were encountered:

NicolasHug · 2020-10-14T15:15:51Z

Closing as a duplicate of #13621

NicolasHug closed this as completed Oct 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Grouped and Stratified K-Fold CV #18618

Grouped and Stratified K-Fold CV #18618

gkiar commented Oct 14, 2020

NicolasHug commented Oct 14, 2020

Grouped and Stratified K-Fold CV #18618

Grouped and Stratified K-Fold CV #18618

Comments

gkiar commented Oct 14, 2020

NicolasHug commented Oct 14, 2020