-
-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG+1] Isotonic calibration #1176
Conversation
X_train, y_train = X[:n_samples], y[:n_samples] | ||
X_train_oob, y_train_oob = X[:n_samples / 2], y[:n_samples / 2] | ||
X_oob, y_oob = X[n_samples / 2:], y[n_samples / 2:] | ||
X_test, y_test = X[n_samples:], y[n_samples:] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should use sklearn.utils.train_test_split
to transform this block into 2 one-liners:
http://scikit-learn.org/dev/modules/generated/sklearn.cross_validation.train_test_split.html
"Brier score" seems to be the accepted name so I'm inclined to keep it that way. IIRC, in "Transforming Classifier Scores into Accurate Multiclass Probability Estimates" (KDD 2002), they study several methods and conclude that one-vs-rest in the most practical solution. |
hum. I am sure it will pass the consistency brigade :)
ok so IsotonicCalibrator should use OneVsRestClassifier internally and |
I guess |
About the Brier score being not a sklearn consistent score (higher == worst in this case): I don't really know what would be best as the sklearn naming convention is indeed conflicting with this official name. http://en.wikipedia.org/wiki/Brier_score I would be +0 for keeping the |
|
||
def calibration_plot(y_true, y_prob, bins=5, verbose=0): | ||
"""Compute true and predicted probabilities to be used | ||
for a calibration plot. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PEP 257
indeed in there experimental results they use OvR I think it's a good idea to fit an IR to each decision score. |
can we call it brier ? or brier_error ? as it's a mean squared error |
One solution would be to introduce decorators to label some functions as scores and others as losses. Of course, this is out of the scope of this PR... (This idea would also be useful to define whether a metric accepts predicted labels or predicted scores, c.f. the AUC issue) Another solution is to introduce a function So adding a note to the documentation as @ogrisel suggested seems like a good temporary solution to me. |
Some quick comments ...
|
hi paolo, thanks for this valuable feedback. I won't work on this in the next few |
+1 |
If you use more than one fold, how do you combine the results? |
def __init__(self, estimator): | ||
self.estimator = estimator | ||
|
||
def fit(self, X, y, X_oob, y_oob): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The above is not consistent with our API. So, a CV object passed to the constructor is a good idea in any case.
rebased on master + address some comments I needed a coding break... ;) If you feel like playing with it please do... |
You should provide a default value for the
|
I've rebased and cleanup the example. I'am open to discussion regarding API and multiclass handling. I don't know how to take care of the OOB to estimate the probas. |
@mblondel any chance you can provide feedback on this? I'll get back to it tomorrow morning. |
y : array-like, shape = [n_samples] | ||
Target values. | ||
|
||
X_oob : array-like, shape = [n_samples, n_features] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about generating the calibration data with a cv object (parameter of the constructor)?
Changes Unknown when pulling bd727f3 on agramfort:isotonic_calibration into * on scikit-learn:master*. |
y_prob : array, shape = [n_samples] | ||
Probabilities of the positive class. | ||
|
||
bins: int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
space before ":" ;)
346e75e
to
1a2a9ae
Compare
Alright, I squashed it now into 5 generic commits (calibration module, brier score, tests, examples, narrative doc). I've also added @mblondel to the list of authors in calibration.py |
I pushed a couple of commits (cosmit + coverage improvement). The coverage could be slightly improved. Currently we have 95% coverage. |
Great, thank you very much! |
Indeed there is a couple of exceptions that should be covered by additional https://coveralls.io/builds/1947924/source?filename=sklearn%2Fcalibration.py It should be easy to raise the coverage close to 99% on that file. |
838b06e
to
db13132
Compare
I've added some assert_raises, coverage of calibration.py should now be effectively 100% |
Thanks. I think this is ok for merge. @agramfort @mblondel any more comments? |
+1 for merge on my side. |
Just to confirm: do we want really want Other than that +1 as well! |
Let's make sigmoid_calibration private indeed.
|
+1 for a private |
return label_binarize(y_true, labels)[:, 0] | ||
|
||
|
||
def brier_score_loss(y_true, y_prob, sample_weight=None, pos_label=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you test this functions using the common tests in https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/metrics/tests/test_common.py ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done (these tests are pretty nice; I didn't know they exist). I had to modify test_invariance_string_vs_numbers_labels() slightly such that pos_label is also set for THRESHOLDED_METRICS
Oops I had already merged, I will cherry pick this. |
Done! 🍻 Thank you very much everyone! |
yeah 🍻 !!!
|
🍻 indeed people. Congratulations!
|
Thanks for merging! 🍻 |
Congrats! When was this effort started again? Two? Three years ago? :) |
plt.tight_layout() | ||
|
||
# Plot calibration cuve for Gaussian Naive Bayes | ||
plot_calibration_curve(GaussianNB(), "Naive Bayes", 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This raises a warning as GaussianNB doesn't support sample_weights. If this is expected behavior, I think the warning should be caught.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really want examples to raise warnings? ;) I thought the use of a classifier that doesn't have sample_weights
was intentional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it me or there is no use of sample_weight in this example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought it was used internally in the calibration. But you are right, it shouldn't warn if it is not used. Will investigate!
calibration module with platt and Isotonic calibration.
A few issues :