Installed versions of pandas and SQLAlchemy are incompatible #1261

zaneselvans · 2023-05-29T23:51:25Z

🐛 Bug

The default python environment on Kaggle includes incompatible versions of pandas and SQLAlchemy. Namely, SQLAlchemy is 2.x while pandas is 1.5.x, and there is no intention to ever support SQLAlchemy 2.x with pandas 1.5.x See this discussion

To Reproduce

import pandas as pd
import sqlalchemy as sa

print(f"{pd.__version__=}")
print(f"{sa.__version__=}")

engine = sa.create_engine("sqlite://")
df = pd.DataFrame(
    {
        "first": [1, 2, 3],
        "second": ["a", "b", "c"],
    }
)
df.to_sql("test_table", con=engine, index=False)
read_df = pd.read_sql("test_table", con=engine)
pd.testing.assert_frame_equal(df, read_df)

pd.__version__='1.5.3'
sa.__version__='2.0.12'

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[19], line 15
      8 df = pd.DataFrame(
      9     {
     10         "first": [1, 2, 3],
     11         "second": ["a", "b", "c"],
     12     }
     13 )
     14 df.to_sql("test_table", con=engine)
---> 15 pd.read_sql("test_table", con=engine)

File /opt/conda/lib/python3.10/site-packages/pandas/io/sql.py:581, in read_sql(sql, con, index_col, coerce_float, params, parse_dates, columns, chunksize)
    579 if _is_table_name:
    580     pandas_sql.meta.reflect(bind=pandas_sql.connectable, only=[sql])
--> 581     return pandas_sql.read_table(
    582         sql,
    583         index_col=index_col,
    584         coerce_float=coerce_float,
    585         parse_dates=parse_dates,
    586         columns=columns,
    587         chunksize=chunksize,
    588     )
    589 else:
    590     return pandas_sql.read_query(
    591         sql,
    592         index_col=index_col,
   (...)
    596         chunksize=chunksize,
    597     )

File /opt/conda/lib/python3.10/site-packages/pandas/io/sql.py:1460, in SQLDatabase.read_table(self, table_name, index_col, coerce_float, parse_dates, columns, schema, chunksize)
   1417 """
   1418 Read SQL database table into a DataFrame.
   1419 
   (...)
   1457 
   1458 """
   1459 table = SQLTable(table_name, self, index=index_col, schema=schema)
-> 1460 return table.read(
   1461     coerce_float=coerce_float,
   1462     parse_dates=parse_dates,
   1463     columns=columns,
   1464     chunksize=chunksize,
   1465 )

File /opt/conda/lib/python3.10/site-packages/pandas/io/sql.py:1003, in SQLTable.read(self, coerce_float, parse_dates, columns, chunksize)
   1001 else:
   1002     sql_select = select(self.table)
-> 1003 result = self.pd_sql.execute(sql_select)
   1004 column_names = result.keys()
   1006 if chunksize is not None:

File /opt/conda/lib/python3.10/site-packages/pandas/io/sql.py:1405, in SQLDatabase.execute(self, *args, **kwargs)
   1403 def execute(self, *args, **kwargs):
   1404     """Simple passthrough to SQLAlchemy connectable"""
-> 1405     return self.connectable.execution_options().execute(*args, **kwargs)

AttributeError: 'OptionEngine' object has no attribute 'execute'

Expected behavior

If the pandas and SQLAlchemy versions are compatible, the code above should print out the package versions and the assertion should succeed.

For reasons I don't understand, running

!pip install "SQLAlchemy<2"

or

!mamba install --yes "sqlalchemy<2"

don't seem to affect the version of the package that's installed, and so don't fix the incompatibility.

The text was updated successfully, but these errors were encountered:

djherbis · 2023-05-30T19:08:18Z

Interesting, it looks like we don't explicitly install sqlalchemy, it must be being installed by one of our other packages in the image. We also don't pin pandas so our pandas version is dependent currently on what works with the other packages we do specify in the image.

We can keep an eye on this though and see if a newer version of the image upgrades pandas and hopefully fixes this.

zaneselvans · 2023-05-30T20:59:41Z

In the pandas 1.5.3 tag it looks like they are requiring sqlalchemy<1.4.46 in the environment.yml and requirements-dev.txt, but I'm not seeing where the required versions for "extras" in the deployed package are defined.

But it would seem weird if the package didn't also have those version requirements embedded within it, which makes it seem weird that conda is allowing these incompatible versions of pandas and sqlalchemy to be installed together.

Unfortunately this means one can't easily read data out of SQLite DBs into pandas, e.g. this dataset.

djherbis · 2023-05-31T15:43:37Z

@zaneselvans Can you just fork an older Kaggle notebook in the meantime and wait for a pandas fix that we can hopefully upgrade to in a future image release?

calderjo · 2024-05-30T18:29:10Z

Hi we are aiming to close out old bugs...

since this post, pandas and sqlalchemy in our image have been updated (2.2.1 and 2.0.25 respectively), reproducing the script above no longer produces an error:

it look to be self-resolved.

zaneselvans added bug bug & failures with existing packages help wanted labels May 29, 2023

calderjo closed this as completed May 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Installed versions of pandas and SQLAlchemy are incompatible #1261

Installed versions of pandas and SQLAlchemy are incompatible #1261

zaneselvans commented May 29, 2023

djherbis commented May 30, 2023

zaneselvans commented May 30, 2023

djherbis commented May 31, 2023

calderjo commented May 30, 2024

Installed versions of pandas and SQLAlchemy are incompatible #1261

Installed versions of pandas and SQLAlchemy are incompatible #1261

Comments

zaneselvans commented May 29, 2023

🐛 Bug

To Reproduce

Expected behavior

djherbis commented May 30, 2023

zaneselvans commented May 30, 2023

djherbis commented May 31, 2023

calderjo commented May 30, 2024