Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Installed versions of pandas and SQLAlchemy are incompatible #1261

Closed
zaneselvans opened this issue May 29, 2023 · 4 comments
Closed

Installed versions of pandas and SQLAlchemy are incompatible #1261

zaneselvans opened this issue May 29, 2023 · 4 comments
Labels
bug bug & failures with existing packages help wanted

Comments

@zaneselvans
Copy link

馃悰 Bug

The default python environment on Kaggle includes incompatible versions of pandas and SQLAlchemy. Namely, SQLAlchemy is 2.x while pandas is 1.5.x, and there is no intention to ever support SQLAlchemy 2.x with pandas 1.5.x See this discussion

To Reproduce

import pandas as pd
import sqlalchemy as sa

print(f"{pd.__version__=}")
print(f"{sa.__version__=}")

engine = sa.create_engine("sqlite://")
df = pd.DataFrame(
    {
        "first": [1, 2, 3],
        "second": ["a", "b", "c"],
    }
)
df.to_sql("test_table", con=engine, index=False)
read_df = pd.read_sql("test_table", con=engine)
pd.testing.assert_frame_equal(df, read_df)
pd.__version__='1.5.3'
sa.__version__='2.0.12'

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[19], line 15
      8 df = pd.DataFrame(
      9     {
     10         "first": [1, 2, 3],
     11         "second": ["a", "b", "c"],
     12     }
     13 )
     14 df.to_sql("test_table", con=engine)
---> 15 pd.read_sql("test_table", con=engine)

File /opt/conda/lib/python3.10/site-packages/pandas/io/sql.py:581, in read_sql(sql, con, index_col, coerce_float, params, parse_dates, columns, chunksize)
    579 if _is_table_name:
    580     pandas_sql.meta.reflect(bind=pandas_sql.connectable, only=[sql])
--> 581     return pandas_sql.read_table(
    582         sql,
    583         index_col=index_col,
    584         coerce_float=coerce_float,
    585         parse_dates=parse_dates,
    586         columns=columns,
    587         chunksize=chunksize,
    588     )
    589 else:
    590     return pandas_sql.read_query(
    591         sql,
    592         index_col=index_col,
   (...)
    596         chunksize=chunksize,
    597     )

File /opt/conda/lib/python3.10/site-packages/pandas/io/sql.py:1460, in SQLDatabase.read_table(self, table_name, index_col, coerce_float, parse_dates, columns, schema, chunksize)
   1417 """
   1418 Read SQL database table into a DataFrame.
   1419 
   (...)
   1457 
   1458 """
   1459 table = SQLTable(table_name, self, index=index_col, schema=schema)
-> 1460 return table.read(
   1461     coerce_float=coerce_float,
   1462     parse_dates=parse_dates,
   1463     columns=columns,
   1464     chunksize=chunksize,
   1465 )

File /opt/conda/lib/python3.10/site-packages/pandas/io/sql.py:1003, in SQLTable.read(self, coerce_float, parse_dates, columns, chunksize)
   1001 else:
   1002     sql_select = select(self.table)
-> 1003 result = self.pd_sql.execute(sql_select)
   1004 column_names = result.keys()
   1006 if chunksize is not None:

File /opt/conda/lib/python3.10/site-packages/pandas/io/sql.py:1405, in SQLDatabase.execute(self, *args, **kwargs)
   1403 def execute(self, *args, **kwargs):
   1404     """Simple passthrough to SQLAlchemy connectable"""
-> 1405     return self.connectable.execution_options().execute(*args, **kwargs)

AttributeError: 'OptionEngine' object has no attribute 'execute'

Expected behavior

If the pandas and SQLAlchemy versions are compatible, the code above should print out the package versions and the assertion should succeed.

For reasons I don't understand, running

!pip install "SQLAlchemy<2"

or

!mamba install --yes "sqlalchemy<2"

don't seem to affect the version of the package that's installed, and so don't fix the incompatibility.

@zaneselvans zaneselvans added bug bug & failures with existing packages help wanted labels May 29, 2023
@djherbis
Copy link
Contributor

Interesting, it looks like we don't explicitly install sqlalchemy, it must be being installed by one of our other packages in the image. We also don't pin pandas so our pandas version is dependent currently on what works with the other packages we do specify in the image.

We can keep an eye on this though and see if a newer version of the image upgrades pandas and hopefully fixes this.

@zaneselvans
Copy link
Author

In the pandas 1.5.3 tag it looks like they are requiring sqlalchemy<1.4.46 in the environment.yml and requirements-dev.txt, but I'm not seeing where the required versions for "extras" in the deployed package are defined.

But it would seem weird if the package didn't also have those version requirements embedded within it, which makes it seem weird that conda is allowing these incompatible versions of pandas and sqlalchemy to be installed together.

Unfortunately this means one can't easily read data out of SQLite DBs into pandas, e.g. this dataset.

@djherbis
Copy link
Contributor

@zaneselvans Can you just fork an older Kaggle notebook in the meantime and wait for a pandas fix that we can hopefully upgrade to in a future image release?

@calderjo
Copy link
Contributor

Hi we are aiming to close out old bugs...

since this post, pandas and sqlalchemy in our image have been updated (2.2.1 and 2.0.25 respectively), reproducing the script above no longer produces an error:

Screenshot 2024-05-30 at 11 19 44鈥疉M

it look to be self-resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug bug & failures with existing packages help wanted
Projects
None yet
Development

No branches or pull requests

3 participants