Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IsotonicRegression results differ between fit/transform and fit_transform with ties in X #4184

Closed
mjbommar opened this issue Jan 30, 2015 · 12 comments · Fixed by #4302
Closed
Labels
Milestone

Comments

@mjbommar
Copy link
Contributor

Per conversation in issue #2507, IsotonicRegression appears to have regressed due to commit a9ea55f.

This IPython notebook demonstrates the failure on HEAD.

I tested the following two commits with the notebook:

  • d255866: no difference, SUCCESS

  • a9ea55f: difference, FAILURE

    In other words, I think we can blame the switch for interp1d from "linear" to "slinear"; first thought is that 1-d spline "slinear" matrix formulation is ill-posed for x-ties, whereas the piecewise "linear" implementation is unaffected?

Small additional note: confirmed failure with test case where x-values are all non-zero, e.g., [1, 1, 2, 3, 4] instead of [0, 0, 1, 2, 3], so x=0 isn't part of the cause.

@mjbommar
Copy link
Contributor Author

Including @ragv and @amueller.

@raghavrv
Copy link
Member

Ah extremely sorry that I broke that :/
Will see ASAP what can be done to keep slinear intact and still avoid this issue...

note to self: need NRTs after fix.

@raghavrv
Copy link
Member

thanks for the report!

@mjbommar
Copy link
Contributor Author

No worries, @ragv! I have a simple pull request for a basic x-ties unit test in PR #4185, which shows the failure in Travis.

@mjbommar
Copy link
Contributor Author

As a workaround, I tested using scipy.interpolate.UnivariateSpline as follows and got acceptable stuff:

# Sample with x ties
data_with_ties = pandas.DataFrame()
data_with_ties["feature"] = [1, 1, 2, 3]
data_with_ties["target"] = [0.1, 0.05, 0.15, 0.2]

sp1 = scipy.interpolate.UnivariateSpline(data_with_ties["feature"],
                                   data_with_ties["target"],
                                   k=1,
                                   ext=3)
sp1(data_with_ties["feature"])
array([ 0.07727273,  0.07727273,  0.14090909,  0.20454545])

vs.

sp2 = scipy.interpolate.interp1d(data_with_ties["feature"],
                                data_with_ties["target"],
                                kind="slinear")
sp2(data_with_ties["feature"])

array([ 0.  ,  0.  ,  0.15,  0.2 ])

@amueller
Copy link
Member

At d255866
I get

AssertionError: 
Arrays are not equal

(mismatch 14.2857142857%)
 x: array([ 0.,  1.,  1.,  3.,  4.,  5.,  6.])
 y: array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.])

@amueller
Copy link
Member

I get the same failure in 0.15.2

@amueller
Copy link
Member

(that is using your regression tests, should have posted in the PR I guess)

@mjbommar
Copy link
Contributor Author

Hm, I had only tested 0.15.2 with the simple duplicate zero minimum value case. It goes deeper!

@mjbommar
Copy link
Contributor Author

Just updated PR #4185 to include comparison to R's isotone with gpava(ties="primary"). Discussion there on ties= methods is probably one of the first questions to answer...

@ghost
Copy link

ghost commented Feb 9, 2015

Possibly related, also in sklearn 0.15.2

Given dataset X

nmf = sklearn.decomposition.nmf.NMF(n_components=3, random_state=long(1))
A = nmf.fit_transform(X)
B = nmf.fit(X).transform(X)

Provides:

>>> A[0]
[  1.76089236e-32   2.60906610e-05   3.82422063e-06]
>>> B[0]
[  3.87175453e-33   2.60907250e-05   3.79032621e-06]

Although not huge differences, the two approaches to fit>transform are no longer equivalent

@amueller
Copy link
Member

@FixLaurens not really related to the issue here. I am think this outcome is fine, as the result is quite close. You can't entirely guarantee exactly similar results from numeric optimization. We could try to get closer in NMF but I'm not sure this is worth the extra computation.
Did this cause any issues in your application?

amueller pushed a commit to amueller/scikit-learn that referenced this issue Feb 27, 2015
amueller pushed a commit to amueller/scikit-learn that referenced this issue Mar 1, 2015
…ion re: issue scikit-learn#4184

Expanding tests to include ties at both x_min and x_max

Updating unit test to include reference data against R's isotone gpava() with ties=primary

Adding R and isotone package versions for reproducibility/documentation

Removing double space in docstring

Combining tests for fit and transform with ties; fixing spelling error
amueller pushed a commit to amueller/scikit-learn that referenced this issue Mar 1, 2015
…ion re: issue scikit-learn#4184

Expanding tests to include ties at both x_min and x_max

Updating unit test to include reference data against R's isotone gpava() with ties=primary

Adding R and isotone package versions for reproducibility/documentation

Removing double space in docstring

Combining tests for fit and transform with ties; fixing spelling error
amueller pushed a commit to amueller/scikit-learn that referenced this issue Mar 4, 2015
…ion re: issue scikit-learn#4184

Expanding tests to include ties at both x_min and x_max

Updating unit test to include reference data against R's isotone gpava() with ties=primary

Adding R and isotone package versions for reproducibility/documentation

Removing double space in docstring

Combining tests for fit and transform with ties; fixing spelling error
rasbt pushed a commit to rasbt/scikit-learn that referenced this issue Apr 6, 2015
…ion re: issue scikit-learn#4184

Expanding tests to include ties at both x_min and x_max

Updating unit test to include reference data against R's isotone gpava() with ties=primary

Adding R and isotone package versions for reproducibility/documentation

Removing double space in docstring

Combining tests for fit and transform with ties; fixing spelling error
yarikoptic added a commit to yarikoptic/scikit-learn that referenced this issue Jul 11, 2015
* tag '0.16b1': (1589 commits)
  0.16.X branching, version 0.16b1
  Fix scikit-learn#4351. Rendering of docs in MinMaxScaler.
  Fix rebase conflict
  MAINT use canonical PEP-440 dev version consistently
  Adding fix for issue scikit-learn#4297, isotonic infinite loop
  DOC deprecate random_state for DBSCAN
  FIX/TST boundary cases in dbscan (closes scikit-learn#4073)
  Do not shuffle in DBSCAN (warn if `random_state` is used).
  Update docstring predict_proba()
  Update documentation of predict_proba in tree module
  add scipy2013 tutorial links to presentations on website.
  TST boundary handling in LSHForest.radius_neighbors
  ENH improve docstrings and test for radius_neighbors models
  use a pipeline for pre-processing feature selection, as per best practise
  DOC remove unnecessary backticks in CONTRIBUTING.
  ENH no need for tie breaking jitter in calibration
  Implement "secondary" tie strategy in isotonic.
  Adding unit test to cover ties/duplicate x values in Isotonic Regression re: issue scikit-learn#4184
  MAINT fix typo pyagm -> pygamg in SkipTest
  STYLE trailing spaces
  ...
yarikoptic added a commit to yarikoptic/scikit-learn that referenced this issue Jul 11, 2015
* releases: (1589 commits)
  0.16.X branching, version 0.16b1
  Fix scikit-learn#4351. Rendering of docs in MinMaxScaler.
  Fix rebase conflict
  MAINT use canonical PEP-440 dev version consistently
  Adding fix for issue scikit-learn#4297, isotonic infinite loop
  DOC deprecate random_state for DBSCAN
  FIX/TST boundary cases in dbscan (closes scikit-learn#4073)
  Do not shuffle in DBSCAN (warn if `random_state` is used).
  Update docstring predict_proba()
  Update documentation of predict_proba in tree module
  add scipy2013 tutorial links to presentations on website.
  TST boundary handling in LSHForest.radius_neighbors
  ENH improve docstrings and test for radius_neighbors models
  use a pipeline for pre-processing feature selection, as per best practise
  DOC remove unnecessary backticks in CONTRIBUTING.
  ENH no need for tie breaking jitter in calibration
  Implement "secondary" tie strategy in isotonic.
  Adding unit test to cover ties/duplicate x values in Isotonic Regression re: issue scikit-learn#4184
  MAINT fix typo pyagm -> pygamg in SkipTest
  STYLE trailing spaces
  ...

Conflicts:
	sklearn/externals/joblib/__init__.py
	sklearn/externals/joblib/numpy_pickle.py
	sklearn/externals/joblib/parallel.py
	sklearn/externals/joblib/pool.py
yarikoptic added a commit to yarikoptic/scikit-learn that referenced this issue Jul 11, 2015
* dfsg: (1589 commits)
  0.16.X branching, version 0.16b1
  Fix scikit-learn#4351. Rendering of docs in MinMaxScaler.
  Fix rebase conflict
  MAINT use canonical PEP-440 dev version consistently
  Adding fix for issue scikit-learn#4297, isotonic infinite loop
  DOC deprecate random_state for DBSCAN
  FIX/TST boundary cases in dbscan (closes scikit-learn#4073)
  Do not shuffle in DBSCAN (warn if `random_state` is used).
  Update docstring predict_proba()
  Update documentation of predict_proba in tree module
  add scipy2013 tutorial links to presentations on website.
  TST boundary handling in LSHForest.radius_neighbors
  ENH improve docstrings and test for radius_neighbors models
  use a pipeline for pre-processing feature selection, as per best practise
  DOC remove unnecessary backticks in CONTRIBUTING.
  ENH no need for tie breaking jitter in calibration
  Implement "secondary" tie strategy in isotonic.
  Adding unit test to cover ties/duplicate x values in Isotonic Regression re: issue scikit-learn#4184
  MAINT fix typo pyagm -> pygamg in SkipTest
  STYLE trailing spaces
  ...
mjbommar referenced this issue in ogrisel/scikit-learn Feb 18, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants