Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

set_config(transform_output="pandas") causes error in Isomap #27579

Closed
evo-nlueck opened this issue Oct 13, 2023 · 5 comments · Fixed by #27583
Closed

set_config(transform_output="pandas") causes error in Isomap #27579

evo-nlueck opened this issue Oct 13, 2023 · 5 comments · Fixed by #27583
Labels

Comments

@evo-nlueck
Copy link

evo-nlueck commented Oct 13, 2023

Describe the bug

I am getting an error when using the awesome set_config(transform_output="pandas") in combination with Isomap. The Error says "AttributeError: 'DataFrame' object has no attribute 'dtype'", so my temporary solution is to switch back to the default config.

I am working with version 1.3.1 (latest I could find).

Steps/Code to Reproduce

import numpy as np
import pandas as pd
from sklearn.manifold import Isomap
from sklearn import set_config

# generate random data
n_rows = 500
n_cols = 30
X = pd.DataFrame(
    data=np.random.random((n_rows, n_cols)),
    columns=[f"x{i}" for i in range(n_cols)]
)

# this works
set_config(transform_output="default")
Isomap(n_neighbors=5, n_components=2, p=1).fit_transform(X)

# this fails
set_config(transform_output="pandas")
Isomap(n_neighbors=5, n_components=2, p=1).fit_transform(X)

Expected Results

No error is thrown and transformed results are returned.

Actual Results

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "...\Anaconda3\envs\...\lib\site-packages\sklearn\utils\_set_output.py", line 157, in wrapped
    data_to_wrap = f(self, X, *args, **kwargs)
  File "...\Anaconda3\envs\...\lib\site-packages\sklearn\base.py", line 1152, in wrapper
    return fit_method(estimator, *args, **kwargs)
  File "...\Anaconda3\envs\...\lib\site-packages\sklearn\manifold\_isomap.py", line 383, in fit_transform
    self._fit_transform(X)
  File "...\Anaconda3\envs\...\lib\site-packages\sklearn\manifold\_isomap.py", line 309, in _fit_transform
    self.embedding_ = self.kernel_pca_.fit_transform(G)
  File "...\Anaconda3\envs\...\lib\site-packages\sklearn\utils\_set_output.py", line 157, in wrapped
    data_to_wrap = f(self, X, *args, **kwargs)
  File "...\Anaconda3\envs\...\lib\site-packages\sklearn\decomposition\_kernel_pca.py", line 469, in fit_transform
    self.fit(X, **params)
  File "...\Anaconda3\envs\...\lib\site-packages\sklearn\base.py", line 1152, in wrapper
    return fit_method(estimator, *args, **kwargs)
  File "...\Anaconda3\envs\...\lib\site-packages\sklearn\decomposition\_kernel_pca.py", line 437, in fit
    self._fit_transform(K)
  File "...\Anaconda3\envs\...\lib\site-packages\sklearn\decomposition\_kernel_pca.py", line 348, in _fit_transform
    self.eigenvalues_, self.eigenvectors_ = eigsh(
  File "...\Anaconda3\envs\...\lib\site-packages\scipy\sparse\linalg\_eigen\arpack\arpack.py", line 1556, in eigsh
    if np.issubdtype(A.dtype, np.complexfloating):
  File "...\Anaconda3\envs\...\lib\site-packages\pandas\core\generic.py", line 5575, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'dtype'

Versions

System:
    python: 3.9.15 (main, Nov  4 2022, 16:35:55) [MSC v.1916 64 bit (AMD64)]
executable: ...\Anaconda3\envs\...\python.exe
   machine: Windows-10-10.0.22621-SP0

Python dependencies:
      sklearn: 1.3.1
          pip: 22.2.2
   setuptools: 65.5.0
        numpy: 1.21.5
        scipy: 1.9.3
       Cython: None
       pandas: 1.4.4
   matplotlib: 3.5.3
       joblib: 1.1.1
threadpoolctl: 2.2.0

Built with OpenMP: True

threadpoolctl info:
       filepath: ...\Anaconda3\envs\...\Lib\site-packages\sklearn\.libs\vcomp140.dll
         prefix: vcomp
       user_api: openmp
   internal_api: openmp
        version: None
    num_threads: 8

       filepath: ...\Anaconda3\envs\...\Library\bin\mkl_rt.1.dll
         prefix: mkl_rt
       user_api: blas
   internal_api: mkl
        version: 2021.4-Product
    num_threads: 4
threading_layer: intel
@evo-nlueck evo-nlueck added Bug Needs Triage Issue requires triage labels Oct 13, 2023
@hvsesha
Copy link

hvsesha commented Oct 13, 2023

Hi

#try this
scaler = StandardScaler().set_output(transform="pandas")

scaler.fit(X)
X_test_scaled = scaler.transform(X)
X_test_scaled.head()
Isomap(n_neighbors=5, n_components=2, p=1).fit_transform(X)
It should work

@evo-nlueck
Copy link
Author

I'm afraid it does not work for me. Assuming you mean to call fit_transform with X_test_scaled and not X, I get the same error when the config is set to transform_output="pandas". Only when I change it back to default it works with X_test_scaled. But then the returned value is a numpy array again and not a pandas DataFrame.

@glemaitre
Copy link
Member

It is due to the fact that we don't force the output of the internal kernel centerer in the KernelPCA used to build the embedding.

The fix would be to set the output of the kernel centerer to "default".

I will make a PR:

/take

@glemaitre glemaitre removed the Needs Triage Issue requires triage label Oct 13, 2023
@glemaitre
Copy link
Member

A minimal reproducer will be the following:

import sklearn
from sklearn.datasets import load_iris
from sklearn.decomposition import KernelPCA

X, _ = load_iris(as_frame=True, return_X_y=True)
with sklearn.config_context(transform_output="pandas"):
    KernelPCA(eigen_solver="arpack").fit_transform(X)

@glemaitre
Copy link
Member

#27583 should solve the issue and will be in the next release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants