Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

馃摉 Update provider contract to account for the paused condition #10519

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

theobarberbany
Copy link

@theobarberbany theobarberbany commented Apr 25, 2024

What this PR does / why we need it:
This change updates the provider contract to account for a new paused
condition.

It is intended to start as an optional condition, but then become
required at a later date. See issue for more context.

Relates #10130

/area documentation

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Apr 25, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign oscr for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added do-not-merge/needs-area PR is missing an area label needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Apr 25, 2024
@k8s-ci-robot
Copy link
Contributor

Hi @theobarberbany. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@theobarberbany
Copy link
Author

/area documentation

@k8s-ci-robot k8s-ci-robot added area/documentation Issues or PRs related to documentation and removed do-not-merge/needs-area PR is missing an area label labels Apr 26, 2024
This change updates the provider contract to account for a new paused
condition.

It is intended to start as an optional condition, but then become
required at a later date.
@theobarberbany theobarberbany force-pushed the pause-condition-update-contract branch from 285b232 to 2ff8fbd Compare April 26, 2024 11:59
@theobarberbany theobarberbany changed the title WIP: Update provider contract to account for the paused condition Update provider contract to account for the paused condition Apr 26, 2024
@theobarberbany theobarberbany marked this pull request as ready for review April 26, 2024 12:18
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 26, 2024
// is paused.
if annotations.IsPaused(cluster, m) {
log.Info("Reconciliation is paused for this object, setting Paused condition")
conditions.MarkTrue(m, clusterv1.PausedCondition)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this have additional information as to why it's being paused? A reason, a message, a severity?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this occured to me when working on this. I mean to ask but lost track!

We don't currently have a helper to set a reason with a MarkTrue - I could make one, but was wondering if there was reasoning around not having done so previously?

e.g:

// TrueCondition returns a condition with Status=True and the given type.
func TrueCondition(t clusterv1.ConditionType) *clusterv1.Condition {
return &clusterv1.Condition{
Type: t,
Status: corev1.ConditionTrue,
}
}
// FalseCondition returns a condition with Status=False and the given type.
func FalseCondition(t clusterv1.ConditionType, reason string, severity clusterv1.ConditionSeverity, messageFormat string, messageArgs ...interface{}) *clusterv1.Condition {
return &clusterv1.Condition{
Type: t,
Status: corev1.ConditionFalse,
Reason: reason,
Severity: severity,
Message: fmt.Sprintf(messageFormat, messageArgs...),
}
}
// UnknownCondition returns a condition with Status=Unknown and the given type.
func UnknownCondition(t clusterv1.ConditionType, reason string, messageFormat string, messageArgs ...interface{}) *clusterv1.Condition {
return &clusterv1.Condition{
Type: t,
Status: corev1.ConditionUnknown,
Reason: reason,
Message: fmt.Sprintf(messageFormat, messageArgs...),
}
}

Copy link
Contributor

@JoelSpeed JoelSpeed Apr 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah good question, for this condition I definitely imagine it having a reason or message. "PausedAnnotationPresent - Resource is paused as it is annotated with the pause annotation" or "ClusterPaused - Resource is paused due to cluster being paused"

Perhaps we just haven't needed that kind of condition before?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, sounds good to me. I'll update the WIP PR and these docs :)

Perhaps we just haven't needed that kind of annotation before?

Yep, this makes sense!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's an interesting one.
According to https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20200506-conditions.md, condition types MUST have a consistent polarity (i.e. "True = good").

Also

Condition types in Cluster API core objects MUST represent the operational state of a component in the cluster, where the operational state is when the component is ready to serve application workloads...
Operations like upgrades, scaling-up, or scaling-down, even if orchestrated in a non-disruptive fashion, MUST be considered as a deviation from the normal operational state of the cluster...

So, if we stick to current conditions standard in Cluster, unfortunately paused isn't a compliant condition, because it doesn't have a positive polarity (and this is why it doesn't fit into existing helpers).

The idea of improving conditions have been around since some time now (see eg #7635 + linked docs); we should figure it out if this goes in the critical path for this change or not

Copy link
Contributor

@JoelSpeed JoelSpeed Apr 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

condition types MUST have a consistent polarity (i.e. "True = good").

Well that is an interesting constraint, do you have any idea where that came from? I believe it's pretty common to have conditions that are not True = good, for example, Node's have DiskPressure and other pressures where True = bad

Skimming through the enhancement I have little issue otherwise with what's prescribed there, it seems reasonable apart from this constraint on True = good and the allowed suffixes for conditions 馃

I don't see any such constraints in the upstream conditions KEP. In fact, in the API conventions you have:

Condition type names should make sense for humans; neither positive nor negative polarity can be recommended as a general rule


So, if we stick to current conditions standard in Cluster, unfortunately paused isn't a compliant condition, because it doesn't have a positive polarity (and this is why it doesn't fit into existing helpers).

In theory we could rename the condition to make it compliant, but, to me at least, that feels like we are cutting our nose to spite our face. Paused is a natural term for what we are trying to represent here, ReconcileAllowed or alternatives that fit the convention feel very awkward to me. And as above, this is not recommended in Kube in general.

Also, reviewing this summary, it appears there are a large number of exceptions to the rules there already? Are those still accurate, I appreciate the doc is 18 months old at this point

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assumption about polarity is backed into the condition package everyone is using to aggregate conditions, roll them up across the hierarchy of objects etc. so I don't think we have exceptions as of today, but we have a lot of room for improvements which I tried to capture in the doc.

I also fully agree we should do a new iteration on the package above + overall conditions, happy to pair if someone is interested in it

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also fully agree we should do a new iteration on the package above + overall conditions, happy to pair if someone is interested in it

I'd be happy to take a stab at this, and also to pair on it :)

Happy to have a quick catch up (meet, slack etc) to clarify expectations as well. Let me know when is good for you!

Copy link
Member

@sbueringer sbueringer May 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, stupid question. Isn't Paused neither good nor bad? I also think it doesn't really make sense to use it in summaries. How would Paused true or false aggregate into e.g. a Ready condition? (in my opinion neither true nor false should influence something like Ready)

For me the concept of polarity (in the sense of good/bad) doesn't make sense for something like "Paused".

(apart from "Paused" being clearly better then something like "Unpaused" because "Paused" doesn't invert the meaning like "Unpaused" does, so it's easier to grasp)

If it's only baked into the condition package in aggregations we should be okay, as aggregations just don't make sense for this condition. (We also already have other conditions that we intentionally don't aggregate because it doesn't make sense)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Discussed this a bit with Fabrizio and Christian, I think the tl;dr for now is that we likely want to get rid of the polarity thing. I think @fabriziopandini will expand on it somewhere in a bit)

@sbueringer sbueringer changed the title Update provider contract to account for the paused condition 馃摉 Update provider contract to account for the paused condition Apr 29, 2024
@sbueringer
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 29, 2024
// Return early and set the paused condition to True if the object or Cluster
// is paused.
if annotations.IsPaused(cluster, m) {
log.Info("Reconciliation is paused for this object, setting Paused condition")
Copy link
Member

@sbueringer sbueringer May 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we need/want "setting Paused condition" in the log (no strong opinion)

(technically we are also setting the paused condition if we set it to false, and we only actually "set" it if it wasn't already set to the same value before)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree here - I think this sounds a little weird.

I'll leave it at "Reconciliation is paused for this object"

```go
// Return early and set the paused condition to True if the object or Cluster
// is paused.
if annotations.IsPaused(cluster, m) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This snippet assumes that there is something writing the modified object in defer. While we usually do this with the patch helper:

  1. it's not a contract, not sure who is currently doing this (probably most though)
  2. As far as I can tell today most of our controllers are creating the patchHelper after they check for paused

Maybe the best way to address this is to just add a note here that we assume that the change to the object has to be written, e.g. via the patchHelper in defer

@@ -233,6 +233,18 @@ The `status` object **may** define several fields:
* `externalManagedControlPlane` - is a bool that should be set to true if the Node objects do not
exist in the cluster. For example, managed control plane providers for AKS, EKS, GKE, etc, should
set this to `true`. Leaving the field undefined is equivalent to setting the value to `false`.
* `paused` - is a condition that reports if the cluster or control plane resource is paused. It should check if 'spec.paused' is set on the cluster, and for the paused annotation on the resource. This is currently optional, but will become required in future.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually these bullet points are 1:1 mapping to API fields with the same name.I think it would be good to write this somewhat informally like this. .conditions[Type="ClusterPaused"] (or whatever string we pick for this condition)

(Not sure if we have other places with prior art how to document conditions in these listings)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll have a look for prior art, and if not update to your suggestion :)

// is paused.
if annotations.IsPaused(cluster, m) {
log.Info("Reconciliation is paused for this object, setting Paused condition")
conditions.MarkTrue(m, clusterv1.PausedCondition)
Copy link
Member

@sbueringer sbueringer May 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, stupid question. Isn't Paused neither good nor bad? I also think it doesn't really make sense to use it in summaries. How would Paused true or false aggregate into e.g. a Ready condition? (in my opinion neither true nor false should influence something like Ready)

For me the concept of polarity (in the sense of good/bad) doesn't make sense for something like "Paused".

(apart from "Paused" being clearly better then something like "Unpaused" because "Paused" doesn't invert the meaning like "Unpaused" does, so it's easier to grasp)

If it's only baked into the condition package in aggregations we should be okay, as aggregations just don't make sense for this condition. (We also already have other conditions that we intentionally don't aggregate because it doesn't make sense)

return ctrl.Result{}, nil
}

conditions.MarkFalse(m, clusterv1.PausedCondition, clusterv1.ResourceNotPausedReason, clusterv1.ConditionSeverityInfo, "Resource is operating as expected")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ResourceNotPausedReason seems very strange to me. Do we need a reason for something not being paused?

Copy link
Member

@sbueringer sbueringer May 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could be addressed as part of getting rid of the polarity thing. Basically currently we sort of assume that we only need a reason if a condition is "not good" (i.e. not true). Without polarity we can set reason/message independent of true/false. This allows us to:

  • not provide a reason when Paused is false
  • provide a reason when Paused is true

@k8s-ci-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/documentation Issues or PRs related to documentation cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants