IsotonicRegression results differ between fit/transform and fit_transform with ties in X #4184

mjbommar · 2015-01-30T00:00:36Z

Per conversation in issue #2507, IsotonicRegression appears to have regressed due to commit a9ea55f.

This IPython notebook demonstrates the failure on HEAD.

I tested the following two commits with the notebook:

d255866: no difference, SUCCESS
a9ea55f: difference, FAILURE

In other words, I think we can blame the switch for interp1d from "linear" to "slinear"; first thought is that 1-d spline "slinear" matrix formulation is ill-posed for x-ties, whereas the piecewise "linear" implementation is unaffected?

Small additional note: confirmed failure with test case where x-values are all non-zero, e.g., [1, 1, 2, 3, 4] instead of [0, 0, 1, 2, 3], so x=0 isn't part of the cause.

The text was updated successfully, but these errors were encountered:

mjbommar · 2015-01-30T00:00:54Z

Including @ragv and @amueller.

…ion re: issue scikit-learn#4184

raghavrv · 2015-01-30T12:50:24Z

Ah extremely sorry that I broke that :/
Will see ASAP what can be done to keep slinear intact and still avoid this issue...

note to self: need NRTs after fix.

raghavrv · 2015-01-30T12:50:37Z

thanks for the report!

mjbommar · 2015-01-30T13:12:30Z

No worries, @ragv! I have a simple pull request for a basic x-ties unit test in PR #4185, which shows the failure in Travis.

mjbommar · 2015-01-30T13:42:28Z

As a workaround, I tested using scipy.interpolate.UnivariateSpline as follows and got acceptable stuff:

# Sample with x ties
data_with_ties = pandas.DataFrame()
data_with_ties["feature"] = [1, 1, 2, 3]
data_with_ties["target"] = [0.1, 0.05, 0.15, 0.2]

sp1 = scipy.interpolate.UnivariateSpline(data_with_ties["feature"],
                                   data_with_ties["target"],
                                   k=1,
                                   ext=3)
sp1(data_with_ties["feature"])
array([ 0.07727273,  0.07727273,  0.14090909,  0.20454545])

vs.

sp2 = scipy.interpolate.interp1d(data_with_ties["feature"],
                                data_with_ties["target"],
                                kind="slinear")
sp2(data_with_ties["feature"])

array([ 0.  ,  0.  ,  0.15,  0.2 ])

amueller · 2015-01-30T18:43:56Z

At d255866
I get

AssertionError: 
Arrays are not equal

(mismatch 14.2857142857%)
 x: array([ 0.,  1.,  1.,  3.,  4.,  5.,  6.])
 y: array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.])

amueller · 2015-01-30T18:51:32Z

I get the same failure in 0.15.2

amueller · 2015-01-30T18:52:00Z

(that is using your regression tests, should have posted in the PR I guess)

mjbommar · 2015-01-30T19:34:04Z

Hm, I had only tested 0.15.2 with the simple duplicate zero minimum value case. It goes deeper!

mjbommar · 2015-01-31T16:25:40Z

Just updated PR #4185 to include comparison to R's isotone with gpava(ties="primary"). Discussion there on ties= methods is probably one of the first questions to answer...

ghost · 2015-02-09T11:34:25Z

Possibly related, also in sklearn 0.15.2

Given dataset X

nmf = sklearn.decomposition.nmf.NMF(n_components=3, random_state=long(1))
A = nmf.fit_transform(X)
B = nmf.fit(X).transform(X)

Provides:

>>> A[0]
[  1.76089236e-32   2.60906610e-05   3.82422063e-06]
>>> B[0]
[  3.87175453e-33   2.60907250e-05   3.79032621e-06]

Although not huge differences, the two approaches to fit>transform are no longer equivalent

amueller · 2015-02-10T16:44:23Z

@FixLaurens not really related to the issue here. I am think this outcome is fine, as the result is quite close. You can't entirely guarantee exactly similar results from numeric optimization. We could try to get closer in NMF but I'm not sure this is worth the extra computation.
Did this cause any issues in your application?

…ion re: issue scikit-learn#4184

…ion re: issue scikit-learn#4184 Expanding tests to include ties at both x_min and x_max Updating unit test to include reference data against R's isotone gpava() with ties=primary Adding R and isotone package versions for reproducibility/documentation Removing double space in docstring Combining tests for fit and transform with ties; fixing spelling error

* tag '0.16b1': (1589 commits) 0.16.X branching, version 0.16b1 Fix scikit-learn#4351. Rendering of docs in MinMaxScaler. Fix rebase conflict MAINT use canonical PEP-440 dev version consistently Adding fix for issue scikit-learn#4297, isotonic infinite loop DOC deprecate random_state for DBSCAN FIX/TST boundary cases in dbscan (closes scikit-learn#4073) Do not shuffle in DBSCAN (warn if `random_state` is used). Update docstring predict_proba() Update documentation of predict_proba in tree module add scipy2013 tutorial links to presentations on website. TST boundary handling in LSHForest.radius_neighbors ENH improve docstrings and test for radius_neighbors models use a pipeline for pre-processing feature selection, as per best practise DOC remove unnecessary backticks in CONTRIBUTING. ENH no need for tie breaking jitter in calibration Implement "secondary" tie strategy in isotonic. Adding unit test to cover ties/duplicate x values in Isotonic Regression re: issue scikit-learn#4184 MAINT fix typo pyagm -> pygamg in SkipTest STYLE trailing spaces ...

* releases: (1589 commits) 0.16.X branching, version 0.16b1 Fix scikit-learn#4351. Rendering of docs in MinMaxScaler. Fix rebase conflict MAINT use canonical PEP-440 dev version consistently Adding fix for issue scikit-learn#4297, isotonic infinite loop DOC deprecate random_state for DBSCAN FIX/TST boundary cases in dbscan (closes scikit-learn#4073) Do not shuffle in DBSCAN (warn if `random_state` is used). Update docstring predict_proba() Update documentation of predict_proba in tree module add scipy2013 tutorial links to presentations on website. TST boundary handling in LSHForest.radius_neighbors ENH improve docstrings and test for radius_neighbors models use a pipeline for pre-processing feature selection, as per best practise DOC remove unnecessary backticks in CONTRIBUTING. ENH no need for tie breaking jitter in calibration Implement "secondary" tie strategy in isotonic. Adding unit test to cover ties/duplicate x values in Isotonic Regression re: issue scikit-learn#4184 MAINT fix typo pyagm -> pygamg in SkipTest STYLE trailing spaces ... Conflicts: sklearn/externals/joblib/__init__.py sklearn/externals/joblib/numpy_pickle.py sklearn/externals/joblib/parallel.py sklearn/externals/joblib/pool.py

* dfsg: (1589 commits) 0.16.X branching, version 0.16b1 Fix scikit-learn#4351. Rendering of docs in MinMaxScaler. Fix rebase conflict MAINT use canonical PEP-440 dev version consistently Adding fix for issue scikit-learn#4297, isotonic infinite loop DOC deprecate random_state for DBSCAN FIX/TST boundary cases in dbscan (closes scikit-learn#4073) Do not shuffle in DBSCAN (warn if `random_state` is used). Update docstring predict_proba() Update documentation of predict_proba in tree module add scipy2013 tutorial links to presentations on website. TST boundary handling in LSHForest.radius_neighbors ENH improve docstrings and test for radius_neighbors models use a pipeline for pre-processing feature selection, as per best practise DOC remove unnecessary backticks in CONTRIBUTING. ENH no need for tie breaking jitter in calibration Implement "secondary" tie strategy in isotonic. Adding unit test to cover ties/duplicate x values in Isotonic Regression re: issue scikit-learn#4184 MAINT fix typo pyagm -> pygamg in SkipTest STYLE trailing spaces ...

…polation to linear)

mjbommar added a commit to mjbommar/scikit-learn that referenced this issue Jan 30, 2015

Adding unit test to cover ties/duplicate x values in Isotonic Regress…

a70094d

…ion re: issue scikit-learn#4184

mjbommar mentioned this issue Jan 30, 2015

Adding unit test to cover ties/duplicate x values in Isotonic Regression... #4185

Closed

amueller added the Bug label Jan 30, 2015

amueller added this to the 0.16 milestone Jan 30, 2015

amueller mentioned this issue Feb 27, 2015

[MRG+1] Isotonic regression duplicate fixes #4302

Merged

amueller pushed a commit to amueller/scikit-learn that referenced this issue Feb 27, 2015

Adding unit test to cover ties/duplicate x values in Isotonic Regress…

e14fafe

…ion re: issue scikit-learn#4184

ogrisel closed this as completed in #4302 Mar 6, 2015

mjbommar referenced this issue in ogrisel/scikit-learn Feb 18, 2016

much faster isotonic regression prediction (involved re-setting inter…

d472a6a

…polation to linear)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IsotonicRegression results differ between fit/transform and fit_transform with ties in X #4184

IsotonicRegression results differ between fit/transform and fit_transform with ties in X #4184

mjbommar commented Jan 30, 2015

mjbommar commented Jan 30, 2015

raghavrv commented Jan 30, 2015

raghavrv commented Jan 30, 2015

mjbommar commented Jan 30, 2015

mjbommar commented Jan 30, 2015

amueller commented Jan 30, 2015

amueller commented Jan 30, 2015

amueller commented Jan 30, 2015

mjbommar commented Jan 30, 2015

mjbommar commented Jan 31, 2015

ghost commented Feb 9, 2015

amueller commented Feb 10, 2015

IsotonicRegression results differ between fit/transform and fit_transform with ties in X #4184

IsotonicRegression results differ between fit/transform and fit_transform with ties in X #4184

Comments

mjbommar commented Jan 30, 2015

mjbommar commented Jan 30, 2015

raghavrv commented Jan 30, 2015

raghavrv commented Jan 30, 2015

mjbommar commented Jan 30, 2015

mjbommar commented Jan 30, 2015

amueller commented Jan 30, 2015

amueller commented Jan 30, 2015

amueller commented Jan 30, 2015

mjbommar commented Jan 30, 2015

mjbommar commented Jan 31, 2015

ghost commented Feb 9, 2015

amueller commented Feb 10, 2015