Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apply cost constraints to ValidatingAdmissionPolicy #115747

Merged
merged 3 commits into from
Mar 7, 2023

Conversation

cici37
Copy link
Contributor

@cici37 cici37 commented Feb 14, 2023

What type of PR is this?

/kind feature

What this PR does / why we need it:

Apply cost constraints to ValidatingAdmissionPolicy

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

This change contains:

  1. Apply cel runtime cost to ValidatingAdmissionPolicy
  2. Add params for testing purpose with comments
  3. Move cel cost related config under API review(since the numbers should be API facing and changes to those could make existing persisted CRDs or ValidatingAdmissionPolicy fail validation).

Does this PR introduce a user-facing change?

Added CEL runtime cost calculation into ValidatingAdmissionPolicy, matching the evaluation cost
restrictions that already apply to CustomResourceDefinition.
If rule evaluation uses more compute than the limit, the API server aborts the evaluation and the
admission check that was being performed is aborted; the `failurePolicy` for the ValidatingAdmissionPolicy
determines the outcome.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

- [KEP]: https://github.com/kubernetes/enhancements/issues/3488

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/apiserver kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Feb 14, 2023
@k8s-triage-robot
Copy link

This PR may require API review.

If so, when the changes are ready, complete the pre-review checklist and request an API review.

Status of requested reviews is tracked in the API Review project.

@cici37
Copy link
Contributor Author

cici37 commented Feb 14, 2023

/cc @jpbetz

@cici37
Copy link
Contributor Author

cici37 commented Feb 14, 2023

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Feb 14, 2023
@sftim
Copy link
Contributor

sftim commented Feb 15, 2023

Changelog suggestion:

Added CEL runtime cost calculation into ValidatingAdmissionPolicy. ValidatingAdmissionPolicy will fail if the
runtime cost exceeds the per-call limit for each validation expression or cost of all expressions exceed `runtimeCELBudget`.

However, I'd prefer to be more clear:

  • is this a new admission time check?
  • what does “fail” mean: fail-open or fail-closed.
  • where does somebody specify runtimeCELBudget, and is that the capitalization that they'd see?

Comment on lines 22 to 26
PerCallLimit = 1000000

// RuntimeCELCostBudget is the overall cost budget for runtime CEL validation cost per ValidatingAdmissionPolicy or CustomResource
// current RuntimeCELCostBudget gives roughly 1 seconds for the validation
RuntimeCELCostBudget = 10000000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we include a unit into these constant names? Even if it's PerCallLimitFooBars or PerCallLimitApproximateSeconds.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a little tricky to define the unit.. The cost is the cost of operation defined by cel-go. Open to suggestions on this one :)

Copy link
Contributor

@jpbetz jpbetz Feb 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if this helps, but I'll make a few statements about cost:

  • CEL cost units represent an evaluation of a basic CEL compute operation. CEL cost units do not have a 1:1 correspondence with machine instructions (or CPU cycles), but each can be though of roughly as the compute required for CEL to evaluate simple operations like a integer comparison.
  • For any CEL expression and input, the CEL cost will always be exactly the same regardless or platform or system load.
  • CEL cost units are tallied during CEL evaluation, but do not directly represent the metering of CPU utilization.

@cici37
Copy link
Contributor Author

cici37 commented Feb 15, 2023

Thanks for the suggestion!

  • is this a new admission time check?

It is the cel-go cost check which prevent us from long running on cel validation. CRD validation rules has this check for quite some time. This PR is to apply the same constraints on ValidatingAdmissionPolicy.

  • what does “fail” mean: fail-open or fail-closed.

Fail means the admission validation will fail and the FailurePolicy will apply.

  • where does somebody specify runtimeCELBudget, and is that the capitalization that they'd see?

This is the number we assigned based on evaluation hence not allow users to specify

@sftim
Copy link
Contributor

sftim commented Feb 15, 2023

OK, try this release note:

Added CEL runtime cost calculation into ValidatingAdmissionPolicy, matching the evaluation cost
restrictions that already apply to CustomResourceDefinition.
If rule evaluation uses more compute than the limit, the API server aborts the evaluation and the
admission check that was being performed is aborted; the `failurePolicy` for the ValidatingAdmissionPolicy
determines the outcome.

@cici37
Copy link
Contributor Author

cici37 commented Feb 15, 2023

/assign @liggitt @jpbetz
since you have reviewed the similar changes in CRD validation rules :)

@liggitt liggitt added the api-review Categorizes an issue or PR as actively needing an API review. label Feb 16, 2023
@cici37
Copy link
Contributor Author

cici37 commented Feb 22, 2023

/retest

Copy link
Contributor

@jpbetz jpbetz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approach looks great. The "per binding" approach to the budget was what I was hoping to see here! I added a few minor comments but nothing blocking. LGTM once those are addressed.

@cici37
Copy link
Contributor Author

cici37 commented Feb 27, 2023

/test pull-kubernetes-e2e-gce

@liggitt liggitt moved this from In progress to API review completed, 1.27 in API Reviews Mar 2, 2023
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 2, 2023
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 5, 2023
}
remainingBudget -= int64(*rtCost)
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: here we make the behavior same as the CRD validation rule path, which is whenever issues in retrieving cost, per expression cost exceed or runtime budge exceed, we stop running following expressions and return error.
However, it will make the cost exceed error higher priority than other errors and return only per expression exceed error back even there are another validation failures existing.
Should we treat the per expression cost exceed error same as other errors and save per evaluationResult? And we could also consider keeping the following validations running. What do you think? @jpbetz

Copy link
Contributor

@jpbetz jpbetz Mar 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to only return one per expression exceeded error back. Users shouldn't expect to get full information back when the cost system kicks in, since the point of the cost system is to prevent running CEL expressions once the limit is exceeded.

EDIT: What I mean is that I think the code in this PR is correct. Just error out when the limit is hit.

@jpbetz
Copy link
Contributor

jpbetz commented Mar 6, 2023

/approve
For CEL packages

/assign @liggitt for API review approval. The only change here is passing the runtime limit when compiling expressions during API validation (which doesn't actually change anything).

@liggitt
Copy link
Member

liggitt commented Mar 6, 2023

/approve

CI looks unhappy, but the API const relocation and package addition in vendor.txt lgtm
will defer review/lgtm of admission changes to @jpbetz

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cici37, jpbetz, liggitt

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 6, 2023
@cici37
Copy link
Contributor Author

cici37 commented Mar 6, 2023

/approve

CI looks unhappy, but the API const relocation and package addition in vendor.txt lgtm will defer review/lgtm of admission changes to @jpbetz

An duplicated imports get in while rebasing. I removed it and CI should be happy now. Thank you!

@liggitt
Copy link
Member

liggitt commented Mar 6, 2023

still some relevant lint/vet/unit failures, looks like

@jpbetz
Copy link
Contributor

jpbetz commented Mar 6, 2023

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 6, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 15f783c147ded42118aa8d1fdeea53bde5d0f9c7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-review Categorizes an issue or PR as actively needing an API review. approved Indicates a PR has been approved by an approver from all required OWNERS files. area/apiserver cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Status: API review completed, 1.27
Development

Successfully merging this pull request may close these issues.

None yet

6 participants