-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a "lazy strategy" option that prevents refreshes from happening in the background #992
Comments
We have been in contact with the google support, but after 3 months the only sensible response that we have gotten so far is that it might be related to the fact that Cloud Run uses CPU throttling and does not support background tasks. Since the refresh is done periodically as a background task, this might be an issue. This is the support ticket: https://console.cloud.google.com/support/cases/detail/v2/29992086?project=veritru-dev-332314 |
We have created a patch to remove the background process, and this seems to solve the problem. We have used the same patch both in our Cloud Run services and in our Cloud Functions, and so far it seems to work. We have not seen any failures in our logs since we applied the patch. We have been running with this patch for 8 days now, with zero errors. Before applying the patch we could see failures every other day or so. This patch solves the problem for us: truid-app/google-cloud-patch@bf94b6b This solution is probably not ideal, because it completely disables the background process and refreshes the certificate on the thread that needs it while processing a request. In cases where you don't use CPU throttling you probably would like the certificate to be updated in the background, instead of adding a delay to the request processing. |
So to be clear, it doesn't look like you've removed the background process, but have just made it force a new refresh before an error has occurred rather than after. It's still probably happening in the background, but now it'll fail silently. Maybe an ideal solution would be to offer some option to specify a retry strategy, and add a "lazy" option that retries as needed rather than automatically. |
As I wrote, this is just a proposal. There are probably better ways to solve the problem.
There is no background process. When there is no traffic towards our services, no refresh is done. I would see in the logs if a refresh happened, and there is none. The constructor still schedules a single refresh operation, because the I removed the code that schedules another refresh automatically. Instead it is forced when you access the SSL data. So the pending refresh job only exists for a short time while it is being executed. It is never scheduled to run in the future.
It's not failing silently. We would still see the failed refresh in the logs, even if it doesn't cause any incoming traffic to fail. We have not seen any refresh failures after this patch was applied.
Yes, I think that would be a good idea |
Posting some rational here for why we consider this a P2 for now:
|
The lazy refresh strategy only refreshes credentials and certificate information when the application attempts to establish a new database connection. On Cloud Run and other serverless runtimes, this is more reliable than the default background refresh strategy. Fixes #992.
chore: Refactor RefreshAheadConnectionInfoCache. Part of #992. The lazy refresh strategy only refreshes credentials and certificate information when the application attempts to establish a new database connection. On Cloud Run and other serverless runtimes, this is more reliable than the default background refresh strategy. Fixes #992.
The lazy refresh strategy only refreshes credentials and certificate information when the application attempts to establish a new database connection. On Cloud Run and other serverless runtimes, this is more reliable than the default background refresh strategy. Fixes #992.
The lazy refresh strategy only refreshes credentials and certificate information when the application attempts to establish a new database connection. On Cloud Run and other serverless runtimes, this is more reliable than the default background refresh strategy. Fixes #992. WIP Refactor BaseConnectionInfoCache
The lazy refresh strategy only refreshes credentials and certificate information when the application attempts to establish a new database connection. On Cloud Run and other serverless runtimes, this is more reliable than the default background refresh strategy. Fixes #992.
The lazy refresh strategy only refreshes credentials and certificate information when the application attempts to establish a new database connection. On Cloud Run and other serverless runtimes, this is more reliable than the default background refresh strategy. Fixes #992. WIP Refactor BaseConnectionInfoCache chore: Refactor RefreshAheadConnectionInfoCache. Part of #992. The lazy refresh strategy only refreshes credentials and certificate information when the application attempts to establish a new database connection. On Cloud Run and other serverless runtimes, this is more reliable than the default background refresh strategy. Fixes #992. WIP Refactor BaseConnectionInfoCache
The lazy refresh strategy only refreshes credentials and certificate information when the application attempts to establish a new database connection. On Cloud Run and other serverless runtimes, this is more reliable than the default background refresh strategy. Fixes #992.
The lazy refresh strategy only refreshes credentials and certificate information when the application attempts to establish a new database connection. On Cloud Run and other serverless runtimes, this is more reliable than the default background refresh strategy. Fixes #992. WIP Refactor BaseConnectionInfoCache chore: Refactor RefreshAheadConnectionInfoCache. Part of #992. The lazy refresh strategy only refreshes credentials and certificate information when the application attempts to establish a new database connection. On Cloud Run and other serverless runtimes, this is more reliable than the default background refresh strategy. Fixes #992. WIP Refactor BaseConnectionInfoCache
The lazy refresh strategy only refreshes credentials and certificate information when the application attempts to establish a new database connection. On Cloud Run and other serverless runtimes, this is more reliable than the default background refresh strategy. Fixes #992. WIP Refactor BaseConnectionInfoCache chore: Refactor RefreshAheadConnectionInfoCache. Part of #992. The lazy refresh strategy only refreshes credentials and certificate information when the application attempts to establish a new database connection. On Cloud Run and other serverless runtimes, this is more reliable than the default background refresh strategy. Fixes #992. WIP Refactor BaseConnectionInfoCache
…Part of #992.
…Part of #992.
…Part of #992.
…Part of #992.
This makes a number of refactoring changes to align the Java connector with other implementations. - Introduce a ConnectionInfoCache interface - Rename DefaultConnectionInfoCache to RefreshAheadConnectionInfoCache - Update and simplify instantiation logic of RefreshAheadConnectionInfoCache Part of #992
…Part of #992.
…Part of #992.
The lazy refresh strategy only refreshes credentials and certificate information when the application attempts to establish a new database connection. On Cloud Run and other serverless runtimes, this is more reliable than the default background refresh strategy. Fixes #992
Bug Description
Every now and then we see a failure to refresh the ephemeral certificate used to connect to Cloud SQL.
Most of the time this is just an annoying error in the log, but the service tries again and succeeds most of the time. This is still annoying, since it adds a lot of noise to our monitoring.
But every now and then also the retry is failing, and we lose traffic because we don't have any certificate to authenticate to the DB.
The error that we get is this:
Environment
Base docker image is
eclipse-temurin:11-jre-focal
openjdk 11.0.16.1 2022-08-12
OpenJDK Runtime Environment Temurin-11.0.16.1+1 (build 11.0.16.1+1)
OpenJDK 64-Bit Server VM Temurin-11.0.16.1+1 (build 11.0.16.1+1, mixed mode, sharing)
com.google.cloud.sql:jdbc-socket-factory-core:jar:1.6.3
The text was updated successfully, but these errors were encountered: