set_config(transform_output="pandas") causes error in Isomap #27579

evo-nlueck · 2023-10-13T06:11:23Z

Describe the bug

I am getting an error when using the awesome set_config(transform_output="pandas") in combination with Isomap. The Error says "AttributeError: 'DataFrame' object has no attribute 'dtype'", so my temporary solution is to switch back to the default config.

I am working with version 1.3.1 (latest I could find).

Steps/Code to Reproduce

import numpy as np
import pandas as pd
from sklearn.manifold import Isomap
from sklearn import set_config

# generate random data
n_rows = 500
n_cols = 30
X = pd.DataFrame(
    data=np.random.random((n_rows, n_cols)),
    columns=[f"x{i}" for i in range(n_cols)]
)

# this works
set_config(transform_output="default")
Isomap(n_neighbors=5, n_components=2, p=1).fit_transform(X)

# this fails
set_config(transform_output="pandas")
Isomap(n_neighbors=5, n_components=2, p=1).fit_transform(X)

Expected Results

No error is thrown and transformed results are returned.

Actual Results

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "...\Anaconda3\envs\...\lib\site-packages\sklearn\utils\_set_output.py", line 157, in wrapped
    data_to_wrap = f(self, X, *args, **kwargs)
  File "...\Anaconda3\envs\...\lib\site-packages\sklearn\base.py", line 1152, in wrapper
    return fit_method(estimator, *args, **kwargs)
  File "...\Anaconda3\envs\...\lib\site-packages\sklearn\manifold\_isomap.py", line 383, in fit_transform
    self._fit_transform(X)
  File "...\Anaconda3\envs\...\lib\site-packages\sklearn\manifold\_isomap.py", line 309, in _fit_transform
    self.embedding_ = self.kernel_pca_.fit_transform(G)
  File "...\Anaconda3\envs\...\lib\site-packages\sklearn\utils\_set_output.py", line 157, in wrapped
    data_to_wrap = f(self, X, *args, **kwargs)
  File "...\Anaconda3\envs\...\lib\site-packages\sklearn\decomposition\_kernel_pca.py", line 469, in fit_transform
    self.fit(X, **params)
  File "...\Anaconda3\envs\...\lib\site-packages\sklearn\base.py", line 1152, in wrapper
    return fit_method(estimator, *args, **kwargs)
  File "...\Anaconda3\envs\...\lib\site-packages\sklearn\decomposition\_kernel_pca.py", line 437, in fit
    self._fit_transform(K)
  File "...\Anaconda3\envs\...\lib\site-packages\sklearn\decomposition\_kernel_pca.py", line 348, in _fit_transform
    self.eigenvalues_, self.eigenvectors_ = eigsh(
  File "...\Anaconda3\envs\...\lib\site-packages\scipy\sparse\linalg\_eigen\arpack\arpack.py", line 1556, in eigsh
    if np.issubdtype(A.dtype, np.complexfloating):
  File "...\Anaconda3\envs\...\lib\site-packages\pandas\core\generic.py", line 5575, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'dtype'

Versions

System:
    python: 3.9.15 (main, Nov  4 2022, 16:35:55) [MSC v.1916 64 bit (AMD64)]
executable: ...\Anaconda3\envs\...\python.exe
   machine: Windows-10-10.0.22621-SP0

Python dependencies:
      sklearn: 1.3.1
          pip: 22.2.2
   setuptools: 65.5.0
        numpy: 1.21.5
        scipy: 1.9.3
       Cython: None
       pandas: 1.4.4
   matplotlib: 3.5.3
       joblib: 1.1.1
threadpoolctl: 2.2.0

Built with OpenMP: True

threadpoolctl info:
       filepath: ...\Anaconda3\envs\...\Lib\site-packages\sklearn\.libs\vcomp140.dll
         prefix: vcomp
       user_api: openmp
   internal_api: openmp
        version: None
    num_threads: 8

       filepath: ...\Anaconda3\envs\...\Library\bin\mkl_rt.1.dll
         prefix: mkl_rt
       user_api: blas
   internal_api: mkl
        version: 2021.4-Product
    num_threads: 4
threading_layer: intel

The text was updated successfully, but these errors were encountered:

hvsesha · 2023-10-13T06:56:25Z

Hi

#try this
scaler = StandardScaler().set_output(transform="pandas")

scaler.fit(X)
X_test_scaled = scaler.transform(X)
X_test_scaled.head()
Isomap(n_neighbors=5, n_components=2, p=1).fit_transform(X)
It should work

evo-nlueck · 2023-10-13T07:24:24Z

I'm afraid it does not work for me. Assuming you mean to call fit_transform with X_test_scaled and not X, I get the same error when the config is set to transform_output="pandas". Only when I change it back to default it works with X_test_scaled. But then the returned value is a numpy array again and not a pandas DataFrame.

glemaitre · 2023-10-13T11:26:28Z

It is due to the fact that we don't force the output of the internal kernel centerer in the KernelPCA used to build the embedding.

The fix would be to set the output of the kernel centerer to "default".

I will make a PR:

/take

glemaitre · 2023-10-13T11:34:39Z

A minimal reproducer will be the following:

import sklearn
from sklearn.datasets import load_iris
from sklearn.decomposition import KernelPCA

X, _ = load_iris(as_frame=True, return_X_y=True)
with sklearn.config_context(transform_output="pandas"):
    KernelPCA(eigen_solver="arpack").fit_transform(X)

glemaitre · 2023-10-13T11:43:30Z

#27583 should solve the issue and will be in the next release.

evo-nlueck added Bug Needs Triage Issue requires triage labels Oct 13, 2023

glemaitre removed the Needs Triage Issue requires triage label Oct 13, 2023

glemaitre mentioned this issue Oct 13, 2023

FIX make sure that KernelPCA works with pandas output and arpack solver #27583

Merged

lesteve closed this as completed in #27583 Oct 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

set_config(transform_output="pandas") causes error in Isomap #27579

set_config(transform_output="pandas") causes error in Isomap #27579

evo-nlueck commented Oct 13, 2023 •

edited by glemaitre

hvsesha commented Oct 13, 2023

evo-nlueck commented Oct 13, 2023

glemaitre commented Oct 13, 2023

glemaitre commented Oct 13, 2023

glemaitre commented Oct 13, 2023

set_config(transform_output="pandas") causes error in Isomap #27579

set_config(transform_output="pandas") causes error in Isomap #27579

Comments

evo-nlueck commented Oct 13, 2023 • edited by glemaitre

Describe the bug

Steps/Code to Reproduce

Expected Results

Actual Results

Versions

hvsesha commented Oct 13, 2023

evo-nlueck commented Oct 13, 2023

glemaitre commented Oct 13, 2023

glemaitre commented Oct 13, 2023

glemaitre commented Oct 13, 2023

evo-nlueck commented Oct 13, 2023 •

edited by glemaitre