-
Notifications
You must be signed in to change notification settings - Fork 901
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Delete and re-create v2 function on Cloud Run API quota exhaustion #5719
Conversation
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## master #5719 +/- ##
==========================================
+ Coverage 54.98% 55.05% +0.07%
==========================================
Files 333 333
Lines 22990 23006 +16
Branches 4710 4713 +3
==========================================
+ Hits 12640 12665 +25
+ Misses 9220 9216 -4
+ Partials 1130 1125 -5
☔ View full report in Codecov by Sentry. |
result?: any; | ||
error?: any; | ||
} | ||
|
||
const defaultRetryCodes = [429, 409, 503]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit* DEFAULT_RETRY_CODES
?
pollerName: `create-${endpoint.codebase}-${endpoint.region}-${endpoint.id}`, | ||
operationResourceName: op.name, | ||
}); | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit* shouldn't we be adding { retryCodes: [CLOUD_RUN_RESOURCE_EXHAUSTED_CODE] }
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I deliberately don't want the default fabricator retry logic here and want to surface the error to the .catch(...)
in line 360 to add custom error handling logic for error code CLOUD_RUN_RESOURCE_EXHAUSTED_CODE. The reason is that on those errors, we need to delete the function before retrying.
const op: Operation = { func }; | ||
async run<T>(func: () => Promise<T>, opts?: RunOptions): Promise<T> { | ||
// merge and de-duplicate default and provided retry codes | ||
let retryCodes = [...defaultRetryCodes, ...(opts?.retryCodes || [])]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it's better to make retryCodes explicit and allow people to extend DEFAULT_RETRY_CODES manually
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by "manually"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OOof this has been waiting around much longer than expected. Feel free to ping me if you're ever waiting on me for more than ~1d.
I'm suggesting that it's probably a more typical design that opts.retryCodes
is literally the only retry codes that will be retried. If I was a consumer of the API and said retryCodes: [420]
but was also getting retries on 429 I'd be pretty surprised and frustrated. Rather, make DEFAULT_RETRY_CODES
a public part of the API and make opts.retryCodes
completely override DEFAULT_RETRY_CODES
. Then, if I wanted to handle all defaults and 420, I'd use retryCodes: [...DEFAULT_RETRY_CODES, 420]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Today, the executor retries on 429, 409, and 503 error codes for every API call wrapped in run(...)
, and I would assume that we would like to avoid breaking existing behavior that may rely on retrying the default codes. So most executor runs would need to be edited to .run(async () => {...}, { retryCodes: DEFAULT_RETRY_CODES })
which may actually turn out to be a non-trivial re-write.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to say that if {retryCodes} isn't specified, it should default to DEFAULT_RETRY_CODES. If {retryCodes} is specified, we should only retry on the specified codes. It follows the principle of least surprises to only retry on the codes which we were told to retry or have sane defaults when unspecified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see what you mean — definitely agree with this approach, next commit implements the behavior you're describing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM pending feedback
@@ -5,15 +5,32 @@ import { ThrottlerOptions } from "../../../throttler/throttler"; | |||
* An Executor runs lambdas (which may be async). | |||
*/ | |||
export interface Executor { | |||
run<T>(func: () => Promise<T>): Promise<T>; | |||
run<T>(func: () => Promise<T>, opts?: RunOptions): Promise<T>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super nit (don't feel obligated to take it, and don't feel the need to go through re-review if you do). Typically callbacks are the last parameter. In langauges like Scala, Swift, and Ruby, this allows a special syntax where the callback is after the function call. We don't have that candy (yet) in JavaScript, but it's still pretty common.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah interesting, definitely something to keep in mind — unfortunately in this case, opts
is an optional parameter and func
is required so I can't switch the parameter order without making opts
required as well, but I'm guessing that the idiomatic approach would be to make opts
required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way we did this in cf3v2 is using overloading:
run<T>(func: () => Promise<T>): Promise<T>;
run<T>(opts: RuntimeOptions, func: () => Promise<T>): Promise<T>;
// impl
run<T>(funcOrOpts: RuntimeOptions | () => Promise<T>, func?: () => Promise<T>) {
let opts: RuntimeOptions = {};
if (func) {
opts = funcOrOpts;
} else {
func = funcOrOpts;
}
// ...
}
If this were a public tool, we might push for this. But it's an internal tool that's used only a few times in our codebase, so please don't go through the trouble.
const op: Operation = { func }; | ||
async run<T>(func: () => Promise<T>, opts?: RunOptions): Promise<T> { | ||
// merge and de-duplicate default and provided retry codes | ||
let retryCodes = opts?.retryCodes || []; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would just make this let retryCodes = opts?.retryCodes || DEFAULT_RETRY_CODES
I think you're trying to make it so that someone who explicitly passes zero retry codes gets DEFAULT_RETRY_CODES, but I would presume such a user is using the queue for throttling, but passed retryCodes: []
because they explicitly don't want to retry.
On large deployments of v2 functions (~80+), our users consistently run into Google Cloud Function's WRITE API quota. To deal with this issue, our CLI has backoff and retry logic baked into the function deploy process. However, the deploy logic is failing to detect cases in which the GCF API is running into the Cloud Run API quota limits when making Cloud Run requests on the user's behalf. To successfully retry a failed createFunction request, we need to first delete the broken GCF function resource before sending the GCF API another createFunction request.
This PR also includes minor refactors to the QueueExecutor logic to retry dynamically based on error codes specified by the request (e.g. deleteV2Function needs to retry on error code 8, while createV2Function does not want to retry on error code 8 until the broken function resource has been deleted).