Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQLErrorCodesFactory.getErrorCodes(DataSource) returns empty error codes if access to transactional connection fails #25681

Closed
ebarlas opened this issue Sep 3, 2020 · 5 comments
Assignees
Labels
in: data Issues in data modules (jdbc, orm, oxm, tx) status: backported An issue that has been backported to maintenance branches type: bug A general bug
Milestone

Comments

@ebarlas
Copy link

ebarlas commented Sep 3, 2020

The SQLErrorCodes getErrorCodes(DataSource dataSource) method of the org.springframework.jdbc.support.SQLErrorCodesFactory class is responsible for producing a SQLErrorCodes instance with a repository of vendor-specific SQLException error codes that are neatly categorized for Spring JDBC exception mapping.

For example, duplicateKeyCodes contains codes that will be mapped to DuplicateKeyException.

This is a great convenience for applications since they can work with a comprehensive and intuitive hierarchy of exceptions rather than vendor-specific error codes.

Unfortunately, getErrorCodes has fallback logic that can have dramatic and damaging side effects.

If a SQLErrorCodes instance is not yet cached for a particular DataSource then getErrorCodes creates one. It does so by first obtaining the "database product name" for the DataSource. Next it looks in a repository of error codes, keyed by database product name, for the target SQLErrorCodes instance. Lastly, that result is added to the cache by DataSource.

Unfortunately, obtaining the database product name is an operation on a JDBC connection and has many failure modes. For example, the thread-bound JDBC connection might have been closed for some reason.

Bizarrely, getErrorCodes responds to such exceptions by simply producing an empty SQLErrorCodes with no mapping at all. That's the best it can do if it must return an SQLErrorCodes instance since it doesn't know the database engine. Unfortunately, this changes the expected behavior of upstream components like JdbcTemplate, which caches the SQLErrorCodes in an SQLErrorCodeSQLExceptionTranslator. In other words, the exception corrupts the JdbcTemplate exception translator and the conveniences mentioned above go out the window!

Consider the following code snippet in a long-running application:

try {
    jdbcTemplate.update(...);
} catch (DuplicateKeyException e) {
    // handle unique constraint violation
}

Suppose a SQLException occurs due to a unique constraint violation with MySQL vendor code 1062. The execution path above executes, resulting in a SQLErrorCodes instance that has code 1062 in the duplicateKeyCodes bucket. Spring JDBC translates that to a DuplicateKeyException and all is well!

However, suppose instead that the error path above occurs resulting in an empty SQLErrorCodes. Now code 1062 isn't represented at all and the exception translator has to employ fallback logic that results in a DataIntegrityViolationException. The exception is not handled by the catch block and mayhem ensues.

This unpredictable behavior strikes me as extremely dangerous behavior. In my case, it surfed in the production environment and had quite a noticeable impact.

Fortunately, a quick workaround is available by simply setting the exception translator and database product name up front. However, I didn't realize that until it was too late.

The issue can be trivially reproduced using a debug session in which the connection is closed. See attached screenshots.

Screen Shot 2020-09-02 at 8 43 38 PM
Screen Shot 2020-09-02 at 8 43 54 PM
Screen Shot 2020-09-02 at 8 44 11 PM
Screen Shot 2020-09-02 at 8 44 17 PM
Screen Shot 2020-09-02 at 8 44 50 PM
Screen Shot 2020-09-02 at 8 47 48 PM
Screen Shot 2020-09-02 at 8 46 42 PM

@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged or decided on label Sep 3, 2020
@jhoeller jhoeller self-assigned this Sep 3, 2020
@jhoeller jhoeller added in: data Issues in data modules (jdbc, orm, oxm, tx) type: enhancement A general enhancement and removed status: waiting-for-triage An issue we've not yet triaged or decided on labels Sep 3, 2020
@jhoeller jhoeller added this to the 5.2.9 milestone Sep 3, 2020
@jhoeller
Copy link
Contributor

jhoeller commented Sep 3, 2020

Good point, while an empty SQLErrorCodes instance would not be cached in SQLErrorCodesFactory in such a scenario, it would indeed be cached in SQLErrorCodeSQLExceptionTranslator. I've addressed this with two measures:

  1. SQLErrorCodeSQLExceptionTranslator calls SQLErrorCodesFactory with a new resolveErrorCodes method now where it receives null for the "could not initialize DatabaseMetaData" case, going without error codes for the current task but attempting reinitialization on the next translation task. This should cover the startup-time initialization case (where the database may not be ready yet) and also unfortunate on-demand initialization where the current Connection is not usable anymore.

  2. In case of JdbcUtils.extractDatabaseMetaData access failing, it subsequently attempts to obtain the DatabaseMetaData from a fresh Connection in case the current Connection was transaction-bound. This should specifically cover the current Connection not being usable anymore.

That said, I nevertheless wonder why a transaction-bound Connection would be prematurely closed when exception translation is being attempted. Was there a specific code path involved that possibly translates exceptions at a bad time? Or was it a transaction timeout maybe?

@ebarlas
Copy link
Author

ebarlas commented Sep 3, 2020

Thank you for prioritizing this!

Yes, very precise timing is required to exploit this race condition. This occurred in a high-traffic application server where there are plenty of opportunities to do just that due to the continuous stream of transactions. I've included part of the actual stack trace from production logs. This can actually occur quite routinely. In our cloud deployment landscape, application servers talk to database servers across availability zones and network hiccups do occur. Additionally, many database servers terminate pre-existing sessions during the momentary application of a configuration or policy change.

org.springframework.jdbc.support.MetaDataAccessException: Error while extracting DatabaseMetaData; nested exception is java.sql.SQLNonTransientConnectionException: No operations allowed after connection closed.
	at org.springframework.jdbc.support.JdbcUtils.extractDatabaseMetaData(JdbcUtils.java:342)
	at org.springframework.jdbc.support.JdbcUtils.extractDatabaseMetaData(JdbcUtils.java:366)
	at org.springframework.jdbc.support.SQLErrorCodesFactory.getErrorCodes(SQLErrorCodesFactory.java:212)
	at org.springframework.jdbc.support.SQLErrorCodeSQLExceptionTranslator.setDataSource(SQLErrorCodeSQLExceptionTranslator.java:134)
	at org.springframework.jdbc.support.SQLErrorCodeSQLExceptionTranslator.<init>(SQLErrorCodeSQLExceptionTranslator.java:97)
	at org.springframework.jdbc.support.JdbcAccessor.getExceptionTranslator(JdbcAccessor.java:105)
	at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:654)
...
org.springframework.jdbc.support.MetaDataAccessException: Error while extracting DatabaseMetaData; nested exception is java.sql.SQLNonTransientConnectionException: No operations allowed after connection closed.
	at org.springframework.jdbc.support.JdbcUtils.extractDatabaseMetaData(JdbcUtils.java:342)
	at org.springframework.jdbc.support.JdbcUtils.extractDatabaseMetaData(JdbcUtils.java:366)
	at org.springframework.jdbc.support.SQLErrorCodesFactory.getErrorCodes(SQLErrorCodesFactory.java:212)
	at org.springframework.jdbc.support.SQLErrorCodeSQLExceptionTranslator.setDataSource(SQLErrorCodeSQLExceptionTranslator.java:134)
	at org.springframework.jdbc.support.SQLErrorCodeSQLExceptionTranslator.<init>(SQLErrorCodeSQLExceptionTranslator.java:97)
	at org.springframework.jdbc.support.JdbcAccessor.getExceptionTranslator(JdbcAccessor.java:105)
	at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:654)
...
Caused by: com.mysql.cj.exceptions.ConnectionIsClosedException: No operations allowed after connection closed.
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:500)
	at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:481)
	at com.mysql.cj.exceptions.ExceptionFactory.createException(ExceptionFactory.java:61)
	at com.mysql.cj.exceptions.ExceptionFactory.createException(ExceptionFactory.java:105)
	at com.mysql.cj.exceptions.ExceptionFactory.createException(ExceptionFactory.java:151)
	at com.mysql.cj.NativeSession.checkClosed(NativeSession.java:1171)
	at com.mysql.cj.jdbc.ConnectionImpl.checkClosed(ConnectionImpl.java:576)
	at com.mysql.cj.jdbc.ConnectionImpl.getMetaData(ConnectionImpl.java:1167)
	... 121 more
Caused by: com.mysql.cj.exceptions.CJCommunicationsException: Communications link failure
The last packet successfully received from the server was 482 milliseconds ago. The last packet sent successfully to the server was 482 milliseconds ago.
	at jdk.internal.reflect.GeneratedConstructorAccessor252.newInstance(Unknown Source)
	at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:500)
	at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:481)
	at com.mysql.cj.exceptions.ExceptionFactory.createException(ExceptionFactory.java:61)
	at com.mysql.cj.exceptions.ExceptionFactory.createException(ExceptionFactory.java:105)
	at com.mysql.cj.exceptions.ExceptionFactory.createException(ExceptionFactory.java:151)
	at com.mysql.cj.exceptions.ExceptionFactory.createCommunicationsException(ExceptionFactory.java:167)
	at com.mysql.cj.protocol.a.NativeProtocol.readMessage(NativeProtocol.java:538)
	at com.mysql.cj.protocol.a.NativeProtocol.checkErrorMessage(NativeProtocol.java:702)
	at com.mysql.cj.protocol.a.NativeProtocol.sendCommand(NativeProtocol.java:641)
	at com.mysql.cj.protocol.a.NativeProtocol.sendQueryPacket(NativeProtocol.java:940)
	at com.mysql.cj.NativeSession.execSQL(NativeSession.java:1075)
	at com.mysql.cj.jdbc.ClientPreparedStatement.executeInternal(ClientPreparedStatement.java:930)
	at com.mysql.cj.jdbc.ClientPreparedStatement.executeQuery(ClientPreparedStatement.java:1003)
	at org.springframework.jdbc.core.JdbcTemplate$1.doInPreparedStatement(JdbcTemplate.java:696)
	at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:638)
	... 105 more
Caused by: java.io.EOFException: Can not read response from server. Expected to read 4 bytes, read 0 bytes before connection was unexpectedly lost.
	at com.mysql.cj.protocol.FullReadInputStream.readFully(FullReadInputStream.java:67)
	at com.mysql.cj.protocol.a.SimplePacketReader.readHeader(SimplePacketReader.java:63)
	at com.mysql.cj.protocol.a.SimplePacketReader.readHeader(SimplePacketReader.java:45)
	at com.mysql.cj.protocol.a.TimeTrackingPacketReader.readHeader(TimeTrackingPacketReader.java:52)
	at com.mysql.cj.protocol.a.TimeTrackingPacketReader.readHeader(TimeTrackingPacketReader.java:41)
	at com.mysql.cj.protocol.a.MultiPacketReader.readHeader(MultiPacketReader.java:54)
	at com.mysql.cj.protocol.a.MultiPacketReader.readHeader(MultiPacketReader.java:44)
	at com.mysql.cj.protocol.a.NativeProtocol.readMessage(NativeProtocol.java:532)
	... 113 more

@jhoeller
Copy link
Contributor

jhoeller commented Sep 3, 2020

This will be available in the upcoming 5.2.9.BUILD-SNAPSHOT from https://repo.spring.io/snapshot if you'd like to give it an early try. The 5.2.9 release is scheduled for Sep 15, so we still got a week to review and revise it if necessary.

@jhoeller jhoeller changed the title Problematic Fallback Behavior in SQLErrorCodesFactory getErrorCodes SQLErrorCodesFactory.getErrorCodes(DataSource) returns empty error codes if access to transactional connection fails Sep 3, 2020
@jhoeller jhoeller added type: bug A general bug and removed type: enhancement A general enhancement labels Sep 3, 2020
@jhoeller
Copy link
Contributor

jhoeller commented Sep 3, 2020

I'll treat the specific issue with the transactional connection access failing as a bug, backporting it to 5.1.x, 5.0.x and 4.3.x as well.

5.1.18 will get both mechanisms, just without any public surface area added (i.e. resolveErrorCodes remains non-public).
5.0.19 and 4.3.29 will only receive the fallback to a non-transactional connection in JdbcUtils.extractDatabaseMetaData.

@spring-projects-issues spring-projects-issues added status: backported An issue that has been backported to maintenance branches and removed for: backport-to-5.1.x labels Sep 3, 2020
@junminZ
Copy link

junminZ commented Jun 7, 2022

thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in: data Issues in data modules (jdbc, orm, oxm, tx) status: backported An issue that has been backported to maintenance branches type: bug A general bug
Projects
None yet
Development

No branches or pull requests

4 participants