Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TST Extend tests for scipy.sparse/*array in sklearn/manifold/tests/test_spectral_embedding #27240

Merged
merged 26 commits into from
Nov 4, 2023

Conversation

Charlie-XIAO
Copy link
Contributor

Towards #27090.

@github-actions
Copy link

github-actions bot commented Aug 30, 2023

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: bb9c6b5. Link to the linter CI: here

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise LGTM.

sklearn/manifold/_spectral_embedding.py Show resolved Hide resolved
Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise LGTM.

@Charlie-XIAO
Copy link
Contributor Author

Charlie-XIAO commented Sep 12, 2023

Otherwise LGTM.

I think there are still some failing tests, and I'm looking into it. Previously I did not have pyamg installed so I overlooked those errors.

@glemaitre glemaitre self-requested a review September 12, 2023 15:56
@glemaitre
Copy link
Member

I don't see the traceback but I assume that somewhere we have if issparse(X) and the process in this branch was returning a np.matrix. In the previous PR, that I reviewed, I just check that you can replace if isspmatrix(X) because in the case of the sparse arrays, the operation will return a NumPy array that is what is expected as well for the dense case.

I have to go back to some previous PR to check if I omit such changes.

@Charlie-XIAO
Copy link
Contributor Author

Charlie-XIAO commented Sep 12, 2023

This case seems to be a bit different, I got:

>       laplacian, dd = csgraph_laplacian(
            adjacency, normed=norm_laplacian, return_diag=True
        )

sklearn\manifold\_spectral_embedding.py:279:

>       lap, d = create_lap(
            csgraph,
            normed=normed,
            axis=degree_axis,
            copy=copy,
            form=form,
            dtype=dtype,
            symmetrized=symmetrized,
        )

D:\Downloads\mambaforge\envs\sklearn-env\lib\site-packages\scipy\sparse\csgraph\_laplacian.py:354:

>       w = m.sum(axis=axis).getA1() - m.diagonal()
E       AttributeError: 'numpy.ndarray' object has no attribute 'getA1'

D:\Downloads\mambaforge\envs\sklearn-env\lib\site-packages\scipy\sparse\csgraph\_laplacian.py:459: AttributeError

This seems to imply that scipy.sparse.csgraph.laplacian does not accept sparse arrays as input. I can indeed do something like

if sparse.issparse(adjacency) and not sparse.isspmatrix(adjacency):
    adjacency = adjacency.toarray()

But this does not seem to be desired. Somehow I need a method to convert xxx_array to xxx_matrix but I read through scipy sparse array documentation and did not seem to find one? Or is there a better workaround? @glemaitre

@glemaitre
Copy link
Member

Actually, there is a bug with isspmatrix for SciPy < 1.11 where it returns True even for sparse arrays. So we cannot do what I propose. I will check this PR accordingdly.

@glemaitre
Copy link
Member

But this does not seem to be desired. Somehow I need a method to convert xxx_array to xxx_matrix but I read through scipy sparse array documentation and did not seem to find one? Or is there a better workaround?

Indeed, we can convert it by calling the *_matrix() constructor on the array. We just need to be sure which type of sparse format we have. We can make it a temporary fix.

I will open an issue/PR in SciPy to fix it.

@glemaitre
Copy link
Member

OK. So the bug has been solved in scipy/scipy#19156
One potential thing that we could do here is to vendor the file for sometimes as we did in the past.

Would you mind to copy paste the _laplacian.py file from scipy main branch into sklearn/externals folder. Then, we need to edit sklearn.utils.fixes where we will import laplacian from SciPy only if the version is > 1.11, otherwise we import the version from externals. Then, we need to change the import in the _spectral_embedding.py to import from sklearn.utils.fixes import laplacian instead from the SciPy import.

@glemaitre glemaitre removed their request for review September 13, 2023 09:53
@Charlie-XIAO
Copy link
Contributor Author

Charlie-XIAO commented Sep 13, 2023

Sure, I don't have time at this moment but will revisit in a day.

@glemaitre glemaitre self-requested a review September 14, 2023 07:15
@Charlie-XIAO
Copy link
Contributor Author

@glemaitre Do we need to write some comment at the beginning of the _laplacian.py file to mention that this is vendored from scipy?

@glemaitre
Copy link
Member

Do we need to write some comment at the beginning of the _laplacian.py file to mention that this is vendored from scipy?

I will do that and add an additional entry in the changelog acknowledging the non-support for 64-bits indices in SpectralEmbedding with the AMG solver.

@glemaitre
Copy link
Member

I checked that merging #27372 and solve the issue in this PR.

@lesteve
Copy link
Member

lesteve commented Sep 15, 2023

Looking at the failure, it seems like this has been fixed in scipy 1.11.2 scipy/scipy#18644. It looks like the nomkl environment can only get scipy 1.11.1 because it is using the defaults conda channel and scipy 1.11.2 is not available yet on the defaults conda channel ...

@glemaitre
Copy link
Member

@lesteve But it means that we don't support SciPy from 1.8 to 1.11.1.

@lesteve
Copy link
Member

lesteve commented Sep 15, 2023

Maybe we can find a way to convey that sparse array support is experimental for now?

@ogrisel
Copy link
Member

ogrisel commented Sep 15, 2023

Thanks @jeromedockes for your analysis, I opened scipy/scipy#19246 upstream.

@ogrisel
Copy link
Member

ogrisel commented Sep 15, 2023

Maybe we can find a way to convey that sparse array support is experimental for now?

I would be fine stating in the changelog that the support of scipy sparse arrays in scikit-learn estimators requires the latest release of scipy, or even the future 1.12.0 (even if scipy sparse arrays were introduced in 1.8).

@jeromedockes
Copy link
Contributor

Thanks @jeromedockes for your analysis, I opened scipy/scipy#19246 upstream.

oops I think @glemaitre had already opened one

@glemaitre
Copy link
Member

OK so we merged #27372, we should be able to go forward with this PR now. @Charlie-XIAO would you mind resolving the conflicts?

@Charlie-XIAO
Copy link
Contributor Author

OK so we merged #27372, we should be able to go forward with this PR now. @Charlie-XIAO would you mind resolving the conflicts?

Sure, having followed the progress of that issue for long~

@Charlie-XIAO
Copy link
Contributor Author

Strange that codecov is complaining about fixes.py, though I've already added # type: ignore # noqa. Should I do something else to resolve this @glemaitre?

sklearn/utils/fixes.py Outdated Show resolved Hide resolved
@Charlie-XIAO
Copy link
Contributor Author

Done, I think things are looking good so far. @glemaitre

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. We will need another reviewer on this.

@glemaitre glemaitre added the Waiting for Second Reviewer First reviewer is done, need a second one! label Oct 31, 2023
Copy link
Member

@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @Charlie-XIAO!

A few comments and suggestions regarding mentioning and integrating the recent support of int64 in SciPy graph traversal routine; otherwise LGTM.

doc/whats_new/v1.4.rst Outdated Show resolved Hide resolved
Comment on lines 2 to 4
This file is a copy of the scipy.sparse.csgraph._laplacian module from SciPy 1.12

Laplacian of a compressed-sparse graph
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you indicate the following in this header

  • the license of this code
  • the reason for vendoring the code
  • potential modifications

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added the license and reason for vendoring the code. There is no modification, and I have provided the PR link in scipy that solved the issue.

sklearn/manifold/tests/test_spectral_embedding.py Outdated Show resolved Hide resolved
sklearn/manifold/tests/test_spectral_embedding.py Outdated Show resolved Hide resolved
sklearn/manifold/_spectral_embedding.py Outdated Show resolved Hide resolved
@Charlie-XIAO
Copy link
Contributor Author

@jjerphan Sorry for my delayed response. I have addressed your suggestions, so would you mind taking another look?

Copy link
Member

@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@jjerphan jjerphan merged commit f284ef2 into scikit-learn:main Nov 4, 2023
27 checks passed
@Charlie-XIAO Charlie-XIAO deleted the tst_sp_spectral_embedding branch November 11, 2023 06:37
REDVM pushed a commit to REDVM/scikit-learn that referenced this pull request Nov 16, 2023
…/test_spectral_embedding` (scikit-learn#27240)

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:manifold Waiting for Second Reviewer First reviewer is done, need a second one!
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants