Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider deprecation of revision management in MachineDeployments #10479

Open
sbueringer opened this issue Apr 22, 2024 · 5 comments
Open

Consider deprecation of revision management in MachineDeployments #10479

sbueringer opened this issue Apr 22, 2024 · 5 comments
Labels
kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. priority/backlog Higher priority than priority/awaiting-more-evidence.

Comments

@sbueringer
Copy link
Member

sbueringer commented Apr 22, 2024

As of today MachineDeployments manage a history of MachineSet objects.

This feature roughly consists of:

  • Keeping a number of old MachineSets around (can be configured via MD.spec.revisionHistoryLimit)
  • Re-using old MachineSets if after a MachineDeployment update the Machine templates of MD and MS match again
  • Tracking revisions and revision history via annotations on MD and MS objects
  • corresponding clusterctl commands: clusterctl alpha rollout, see also: clusterctl rollout #3439
    • most notable the rollout undo command which allows to rollback to a previous revision of the MachineDeployment

This feature (afaik) has been mostly copied from k/k Deployments & ReplicaSets

Reasons why we should consider deprecation:

  • As far as we are aware it is barely used by CAPI users and it leads to a lot of complexity in the MD controller (i.e. it makes it hard to introduce new features, like when we introduced the label/annotation in-place propagation). Indicators that it's barely used:
    • the corresponding clusterctl alpha rollout command is still alpha
    • we barely get any questions / requests about the entire feature
  • It's not usable with GitOps
  • It's highly recommended to have a history of YAMLs applied to a management cluster. With that it's easy to rollback. For clean operations that's already required for all objects apart from MachineDeployments (e.g. Cluster, KCP, InfraCluster, ...)
  • It's almost not usable without clusterctl (at least it's relatively complicated to use)
  • It's not usable with ClusterClass
  • In a lot of cases it's not safe to rollback (e.g. after Kubernetes upgrades)

So overall I think considering the current state of the feature, it's interoperability and that it's as far as we can tell not really used we should really consider if it's worth the complexity and otherwise consider deprecation.

If you are using the feature, please let us know.

@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

CAPI contributors will take a look as soon as possible, apply one of the triage/* labels and provide further guidance.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Apr 22, 2024
@fabriziopandini
Copy link
Member

/priority backlog
/kind cleanup

@k8s-ci-robot k8s-ci-robot added priority/backlog Higher priority than priority/awaiting-more-evidence. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. labels Apr 23, 2024
@enxebre
Copy link
Member

enxebre commented Apr 30, 2024

As consumer I think beyond the clusterctl rollout exposure we could say revisionHistoryLimit is actually more of an implementation detail and the same functionality can be achieved by rolling out forward towards any config.
As a provider/admin/SRE the only drawback I can think of is that we lose a mean to introspect history of a MachineDeployment for troubleshooting purposes. Overall I think deprecation probably adds more value (sustainability) than it takes.

@fabriziopandini
Copy link
Member

fabriziopandini commented May 22, 2024

+1 to deprecate an remove.
There so many open questions and the interest around this feature is very low (+ we have very few clusterctl maintainers left)

@chrischdi
Copy link
Member

chrischdi commented May 27, 2024

Also xref PR which would get obsolete:

@sbueringer sbueringer added the kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API label May 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. priority/backlog Higher priority than priority/awaiting-more-evidence.
Projects
None yet
Development

No branches or pull requests

5 participants