Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix/ngboost col sample lt 1 #3294

Merged
merged 1 commit into from
Oct 3, 2023

Conversation

CloseChoice
Copy link
Collaborator

@CloseChoice CloseChoice commented Sep 30, 2023

Overview

Closes #3261

Description of the changes proposed in this pull request:
This PR only solves the problem if we have just 1 estimator. So for this script:

import shap 
import ngboost as ngb
import numpy as np

X, y = shap.datasets.california(n_points=500)
model = ngb.NGBRegressor(n_estimators=1, col_sample = 0.9).fit(X, y)

predicted = model.predict(X)

# explain the model's predictions using SHAP values
explainer = shap.TreeExplainer(model, model_output=0)

explanation = explainer(X_predict_explain)

# check that SHAP values sum to model output
assert (
    np.abs(explanation.values.sum(1) + explanation.base_values - predicted).max()
    < 1e-5
)

Checklist

  • All pre-commit checks pass.
  • Unit tests added (if fixing a bug or adding a new feature)

Here is a breakdown of the issue: If we use ngboost with col_sample < 1 some columns of X get dropped, the rest is reordered, see here. Unfortunately, this might be different for each estimator. Which means, we would need to rewrite the shap_values function or predict function, not quite sure though.

@thatlittleboy
Copy link
Collaborator

Thanks for the PR @CloseChoice , I've not yet taken a close look at this, but can you add your script as a test that reproduces the error so we make sure we don't regress next time?

Maybe Connor's PR #3262 might be of inspiration as well.

@CloseChoice
Copy link
Collaborator Author

@thatlittleboy so I found a better fix that also fixes the problem for n estimators. The problem was, that due to the reordering of columns in ngboost, the predictions of the ngboost model do not match the predictions that we get from the treeensemble model, that we directly construct from the ngboost trees. I fixed this by reordering the features correctly. I also added a test that checks that the predictions are always identical

@CloseChoice CloseChoice marked this pull request as ready for review October 2, 2023 22:19
@connortann connortann added bug Indicates an unexpected problem or unintended behaviour enhancement Indicates new feature requests labels Oct 3, 2023
Copy link
Collaborator

@connortann connortann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great PR - thanks for the fix and the test!

@connortann connortann merged commit 4885feb into shap:master Oct 3, 2023
13 checks passed
@connortann connortann removed the enhancement Indicates new feature requests label Oct 3, 2023
@CloseChoice CloseChoice deleted the fix/ngboost-col-sample-lt-1 branch October 3, 2023 15:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Indicates an unexpected problem or unintended behaviour
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: test_ngboost fails with col_sample < 1
3 participants