Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug Report: LAST_INSERT_ID behaviour differs between MySQL and Vitess #15696

Open
arthurschreiber opened this issue Apr 11, 2024 · 2 comments · May be fixed by #15699
Open

Bug Report: LAST_INSERT_ID behaviour differs between MySQL and Vitess #15696

arthurschreiber opened this issue Apr 11, 2024 · 2 comments · May be fixed by #15699

Comments

@arthurschreiber
Copy link
Contributor

Overview of the Issue

LAST_INSERT_ID behaviour differs between MySQL and Vitess for INSERT ... ON DUPLICATE KEY UPDATE ... queries.

From my testing, MySQL 8.0 behaves like this:

  • If no rows are inserted or updated, LAST_INSERT_ID will be 0.
  • If no rows are inserted, but some rows are updated, LAST_INSERT_ID will be equal to the AUTO_INCREMENT column's value of the last row that was updated. "Last row updated" is based on the order of VALUES tuples in the INSERT statements.
  • If at least one row is inserted, LAST_INSERT_ID will be equal to the highest inserted AUTO_INCREMENT column's value.

Inserts into tables that use a sequence for auto increment value generation in Vitess (v16+, haven't tested earlier version) will always return the highest reserved sequence value for the auto increment column, whether an actual insert happened or not. This breaks applications that depend on the behaviour of MySQL.

Reproduction Steps

N/A

Binary Version

v16 and later

Operating System and Environment details

N/A

Log Fragments

N/A
@arthurschreiber arthurschreiber added Type: Bug Needs Triage This issue needs to be correctly labelled and triaged Component: Query Serving and removed Needs Triage This issue needs to be correctly labelled and triaged labels Apr 11, 2024
@arthurschreiber arthurschreiber self-assigned this Apr 11, 2024
@arthurschreiber
Copy link
Contributor Author

arthurschreiber commented Apr 11, 2024

Thinking about this more, I actually don't believe Vitess can match MySQLs behaviour in all of the cases described above.

When we send an INSERT ... ON DUPLICATE KEY UPDATE ... statement down to a shard, we can't rely on whatever last_insert_id value comes back. The reason for that is that the autoIncrement column defined in the VSchema might either not be flagged as AUTO_INCREMENT in the MySQL schema, or might even be a different column than the one defined as AUTO_INCREMENT in the MySQL schema.

So the only information we can use is the rows_affected value sent back from MySQL. But that does not really help us a lot either, as we can only infer from it whether:

  • no rows were modified (rows_affected=0)
  • only one row was inserted (rows_affected=1)
  • all rows were updated (rows_affected=2*n, where n is the number of rows we tried to insert)
  • a mix of rows was either inserted or updated, or a few inserts were made without updates, or a few updates were made without inserts (rows_affected between 1 and 2*n)

We can handle the case where no rows were changed correctly, by setting last_insert_id to 0.

For all the cases where there's at least one insert, this is also correctly handled. last_insert_id is set to the first generated/inserted auto increment value. (This is the current behaviour already implemented by Vitess).

For cases where there's no inserts but some updates, we have no way to get ahold of the column values for the columns defined as autoIncrement in the VSchema. We also don't get enough information back from MySQL to understand which row might have been the last one that got updated to run an additional query to fetch the auto increment column value. We also can't use LAST_INSERT_ID in the ON DUPLICATE KEY UPDATE ... part of the query to "fetch" the value of the auto increment column, because that would be executed independent of whether the row was actually modified or not.

@arthurschreiber
Copy link
Contributor Author

We can handle the case where no rows were changed correctly, by setting last_insert_id to 0.

I think this will only work on connections that don't set the CLIENT_FOUND_ROWS flag, because if the flag is set, rows_affected can never be 0. 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant