Scaling strategy to limit number of Machine pending or provisioning #8808

lentzi90 · 2023-06-07T06:56:37Z

What would you like to be added (User Story)?

An an operator, I would like to control how fast new Machines are created when I create large clusters to avoid overwhelming controllers and infrastructure.

Detailed Description

I want a way to limit the number of Machines that are pending or provisioning. Currently, when creating large clusters we start out small and scale gradually to avoid issues. However, this could be easily automated and solved for all providers if built in to CAPI.

In the Bare Metal Operator we have a PROVISIONING_LIMIT for exactly this reason. It limits the number of BareMetalHosts that are provisioned simultaneously. Having something similar in CAPI would be very useful.

I'm not sure where it would make sense to add this option though. It could be set on the Cluster, the KCP and/or MachineDeployment for example. to get granular control. Or it could be a flag for the controllers. What do you think would work best?

Anything else you would like to add?

No response

Label(s) to be applied

/kind feature
/area machine

The text was updated successfully, but these errors were encountered:

killianmuldoon · 2023-06-07T10:17:38Z

/triage accepted

This is an interesting idea - similar to what was implemented for upgrades in #8432. Maybe these could be generalized into an overall rollout strategy.

fabriziopandini · 2023-06-14T19:47:11Z

As per 14th Jun office hour discussion, we should discuss UX/API in a doc

lentzi90 · 2023-06-16T05:27:17Z

Older related issue that was closed due to inactivity: #4022
Linking for reference

lentzi90 · 2023-06-28T06:45:51Z

I have started on a document here: https://docs.google.com/document/d/1FjX5rQGYHCyDqRdANWcAWP4AmoxudoPl8IARjXylGzY/edit?usp=sharing
Please check it and comment/edit 🙂

fabriziopandini · 2023-06-28T13:47:26Z

cc @vincepri @enxebre

lentzi90 · 2024-01-08T10:21:15Z

We ended up setting limits in the "cloud" provider instead. Granted this does not solve the issue for controllers, but they can be pretty well handled by existing config options (e.g. concurrency, resources and rate limits).
Closing (but feel free to reopen if there is interest to continue this in the community)

k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. area/machine Issues or PRs related to machine lifecycle management needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jun 7, 2023

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jun 7, 2023

lentzi90 changed the title ~~Provisioning strategy to limit number of Machine pending or provisioning~~ Scaling strategy to limit number of Machine pending or provisioning Jun 28, 2023

lentzi90 closed this as completed Jan 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scaling strategy to limit number of Machine pending or provisioning #8808

Scaling strategy to limit number of Machine pending or provisioning #8808

lentzi90 commented Jun 7, 2023

killianmuldoon commented Jun 7, 2023

fabriziopandini commented Jun 14, 2023

lentzi90 commented Jun 16, 2023

lentzi90 commented Jun 28, 2023

fabriziopandini commented Jun 28, 2023

lentzi90 commented Jan 8, 2024

Scaling strategy to limit number of Machine pending or provisioning #8808

Scaling strategy to limit number of Machine pending or provisioning #8808

Comments

lentzi90 commented Jun 7, 2023

What would you like to be added (User Story)?

Detailed Description

Anything else you would like to add?

Label(s) to be applied

killianmuldoon commented Jun 7, 2023

fabriziopandini commented Jun 14, 2023

lentzi90 commented Jun 16, 2023

lentzi90 commented Jun 28, 2023

fabriziopandini commented Jun 28, 2023

lentzi90 commented Jan 8, 2024