-
Notifications
You must be signed in to change notification settings - Fork 289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Distribute jobs more evenly across hosts #1929
Comments
Hi @nick-f thanks for your interest in the buildkite-agent! Apologies for taking a long time to get back to you. The experience for running multiple agents on difference sized hosts is somewhat lacking, as you are finding. In particular, the backend scheduler is not fully aware of the assignment of agents to hosts. From its perspective, there are scheduled jobs and there are agents available to run those jobs, and it assigns the jobs to the agents without knowledge of how the agents are utilising their hosts. This decoupling keeps the scheduler simple, but an unfortunate side effect is situations where hosts are not being fully utilised. The It would be a significant redesign of the scheduler to make it more aware of both the hosts and the worker agent, and while this is a paint point for a significant portion of our customers, it is also not a problem at all for others. So we are concentrating our efforts at the moment on running making the buildkite-agent runnable in a Kubernetes clusters. There, agents workers are spun up on demand, and we can take advantage of primitives offered in that ecosystem to bin pack jobs to host. So hopefully we will soon have a better story to tell in this space. |
If the priorities were flipped (i.e. agents with spawn priority 1 were used first, etc.) then that would at least give the ability to spread the load across all hosts. For my example situation, the extra agents on the more powerful hosts would be used as overflow, once all the other hosts' agents are in use. The priority as it is now doesn't allow for this.
Unfortunately that won't help us at all with our use case (we're running iOS tests on physical Mac Minis) and doesn't seem to be related to this issue or a solution to it at all. If there's somewhere else to submit this feedback to as well I'm happy to do it. Just let me know where it should go. |
With the release of v3.45.0 and enabling the experimental flag, the load is being spread out across hosts now 🎉 I'll leave this open while #1967 is still open, but it's looking good so far. Thanks! |
Is your feature request related to a problem? Please describe.
We have hosts that can have different numbers of spawned agents. The priority for these is set by the spawn ID with the spawn-with-priority option.
The way that priorities work now are that higher numbered priorities are used first.
If
hostA
has 1 spawned agent running andhostB
has 3 spawned agents running,hostB
is going to be running at least 2 or maybe 3 tests whilehostA
is sitting idle waiting for jobs to be assigned to it.Assuming all agents are idle, the order that jobs are assigned is:
hostB agent3
hostB agent2
hostA agent1
orhostB agent1
hostA agent1
orhostB agent1
(whichever was not given a job before)Describe the solution you'd like
The next agent would be chosen based on the spawned agent utilisation of each host.
hostA
has 1 spawned agent with 1 job running (100% utilisation)hostB
has 3 spawned agents with 1 job running (33% utilisation)hostC
has 5 spawned agents with 1 job running (20% utilisation)The next host to be assigned work would be
hostC
because the current utilisation is the lowest. The agent onhostC
that is given the work is determined based on the priority.(Ideally that spawned agent prioritisation could also be flipped so
hostC agent1
would be the first to be used instead ofhostC agent5
. Having that as a configuration option would be ace! I can split that out into a separate feature request if needed.)hostA
has 1 spawned agent with 1 job running (100% utilisation)hostB
has 3 spawned agents with 1 job running (33% utilisation)hostC
has 5 spawned agents with 2 jobs running (40% utilisation)Now, with
hostC
utilisation at 40%, the next host to be assigned a job would behostB
.Describe alternatives you've considered
I've spoken with Jarryd from Buildkite about this issue, but there doesn't appear to be any existing solutions for this use case.
Setting host priority doesn't work for situations where there are, say, two agents on a host. If that host is meant to be used first due to host priority, then the same situation would occur as the original problem, where one host is doing all the work while the other is sitting idle.
Additional context
We set the number of spawn agents in each host's config.
There are a variety of hardware profiles for our hosts, so some can only run one agent at a time, some run 3, and we're about to start trialling hosts that should be able to run 6 or more agents 🤞
The text was updated successfully, but these errors were encountered: