Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Add Pandas 2.0 support #5466

Closed
vkhodygo opened this issue Jun 12, 2023 · 10 comments
Closed

[FEA] Add Pandas 2.0 support #5466

vkhodygo opened this issue Jun 12, 2023 · 10 comments
Labels
0 - Backlog In queue waiting for assignment feature request New feature or request

Comments

@vkhodygo
Copy link

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I wish I could use cuML to do [...]

Describe the solution you'd like
v2.0 has been officially released and it should be supported as well.

Describe alternatives you've considered
cudf is based on PyArrow too, but it doesn't fit the bill I'm afraid.

Additional context
Add any other context, code examples, or references to existing implementations about the feature request here.

@vkhodygo vkhodygo added ? - Needs Triage Need team to review and classify feature request New feature or request labels Jun 12, 2023
@dantegd dantegd added 0 - Backlog In queue waiting for assignment and removed ? - Needs Triage Need team to review and classify labels Jun 26, 2023
@dantegd
Copy link
Member

dantegd commented Jun 26, 2023

We haven't quite yet tested things with Pandas 2.0, but in general most things in cuML should work since all we do with Pandas inputs is transfer them to GPU. Will do some testing and debug any issues for the current development version (23.08).

@vkhodygo
Copy link
Author

@dantegd I'm mostly interested in PyArrow data types support. Employing metadata such as this one makes imputation and data handling much easier and robust.

@niklashoelter
Copy link

niklashoelter commented Aug 1, 2023

Hi everyone I also tried using cuml with pandas 2.0.3.

I already get an error when trying to import cuml:

Traceback (most recent call last):
  File "/home/user/script.py", line 11, in <module>
    import cuml
  File "/opt/miniconda/lib/python3.10/site-packages/cuml/__init__.py", line 17, in <module>
    from cuml.internals.base import Base, UniversalBase
  File "/opt/miniconda/lib/python3.10/site-packages/cuml/internals/__init__.py", line 17, in <module>
    from cuml.internals.base_helpers import BaseMetaClass, _tags_class_and_instance
  File "/opt/miniconda/lib/python3.10/site-packages/cuml/internals/base_helpers.py", line 20, in <module>
    from cuml.internals.api_decorators import (
  File "/opt/miniconda/lib/python3.10/site-packages/cuml/internals/api_decorators.py", line 24, in <module>
    from cuml.internals import input_utils as iu
  File "/opt/miniconda/lib/python3.10/site-packages/cuml/internals/input_utils.py", line 19, in <module>
    from cuml.internals.array import CumlArray
  File "/opt/miniconda/lib/python3.10/site-packages/cuml/internals/array.py", line 22, in <module>
    from cuml.internals.global_settings import GlobalSettings
  File "/opt/miniconda/lib/python3.10/site-packages/cuml/internals/global_settings.py", line 20, in <module>
    from cuml.internals.device_type import DeviceType
  File "/opt/miniconda/lib/python3.10/site-packages/cuml/internals/device_type.py", line 19, in <module>
    from cuml.internals.mem_type import MemoryType
  File "/opt/miniconda/lib/python3.10/site-packages/cuml/internals/mem_type.py", line 22, in <module>
    cudf = gpu_only_import("cudf")
  File "/opt/miniconda/lib/python3.10/site-packages/cuml/internals/safe_imports.py", line 356, in gpu_only_import
    return importlib.import_module(module)
  File "/opt/miniconda/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "/opt/miniconda/lib/python3.10/site-packages/cudf/__init__.py", line 19, in <module>
    from cudf import api, core, datasets, testing
  File "/opt/miniconda/lib/python3.10/site-packages/cudf/api/__init__.py", line 3, in <module>
    from cudf.api import extensions, types
  File "/opt/miniconda/lib/python3.10/site-packages/cudf/api/types.py", line 464, in <module>
    is_extension_type = pd_types.is_extension_type
AttributeError: module 'pandas.api.types' has no attribute 'is_extension_type'. Did you mean: 'is_extension_array_dtype'?

Seems like this is related to the deprecated 'is_extension_type' in pandas:
https://pandas.pydata.org/pandas-docs/version/1.4.0/reference/api/pandas.api.types.is_extension_type.html
https://pandas.pydata.org/pandas-docs/version/1.0.3/reference/api/pandas.api.types.is_extension_array_dtype.html

Unfortunately I'm limited to using the (stable) pip version of cuml (23.06) since I only have cude 12 available on my server - so I can't tell if it works with the nightly 23.08a.

@chengarthur
Copy link

I found this problem again

@vkhodygo
Copy link
Author

@dantegd Any progress in this direction?

@loftusa
Copy link

loftusa commented Feb 22, 2024

@dantegd Hi, I have had this problem independently in two separate environments now, which means it should probably be extremely high priority for you guys, e.g., it is likely the primary thing bottlenecking cudf.pandas adoptation

(most people will not want to use the extension if the first thing that happens when they try it is an error, if they are using pandas >= 2.0, which most people are)

@vyasr
Copy link
Contributor

vyasr commented Feb 22, 2024

The primary limitation for cuml supporting pandas 2 is via a transitive dependency from cudf. cudf 24.04 will support pandas 2, so cuml should work with pandas 2 as well in the next release.

@dantegd not sure if you want to close this issue now that rapidsai/cudf#14916 is merged or if you want to wait until the 24.04 release.

@krunolp
Copy link

krunolp commented Mar 13, 2024

Was there any progress? I am experiencing the same issue

@quasiben
Copy link
Member

With 24.04 out I think this can be closed. @dantegd / @krunolp please re-open if you feel otherwise

@vkhodygo
Copy link
Author

vkhodygo commented May 2, 2024

Much appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0 - Backlog In queue waiting for assignment feature request New feature or request
Projects
None yet
Development

No branches or pull requests

8 participants