-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix issue with buildkite-agent Job API when forwarding the job to the VM #85
Conversation
To avoid repeating it on each individual declaration, which were already all `public`
- Override the `BUILDKITE_SOCKETS_PATH` (which defaults to `$HOME/.buildkite-agent/sockets`, resolved on the host while `$HOME` is different on the VM, so that caused a failure when `buildkite-agent bootstrap` tried to `mkdir` that directory - Disallow `BUILDKITE_AGENT_JOB_API_SOCKET` and `BUILDKITE_AGENT_JOB_API_TOKEN`, which are exported by the host's `buildkite-agent` when it creates the Job API socket (see https://github.com/buildkite/agent/blob/45d491fdb44072fe4c2a7aac79490defc95b8dcf/internal/job/api.go#L37-L38), which we want the VM to create its own
b190859
to
7a8324b
Compare
- `BUILDKITE_BUILD_CHECKOUT_PATH` is not used in VMs now that we use git-mirrors. Besides, this env var is listed as part of `disallowedKeys`, so that key will be pruned from the dictionary before the script is generated via `scriptBuilder.build()` anyway - `BUILDKITE_HOOKS_PATH` and `BUILDKITE_PLUGINS_PATH` are part of `overriddenKeys` so will be overridden in the next lines of the code before `scriptBuilder.build()` is called too So in practice those `addEnvironmentVariables` didn't do anything and did not impact the generated script at all in the end.
As per POSIX conventions (and to avoid odd outputs when debugging the scripts' content with `cat`, for example)
7a8324b
to
484a6f5
Compare
That way when we temporarily switch that lane to use `readonly: false` when we need to regenerate the profile, this makes sure that the newly generated profile uses that specific cert, instead of _fastlane_ trying to guess and potentially picking your _personal_ Apple Development profile. This is what happened to me when I renewed the cert 2 days ago trying to fix code signing issues when developing `hostmgr` locally.
6bedd49
to
3c5c893
Compare
Note for bookkeeping: that PR initially had issues with code-signing Turns out the profiles expired, but
In the end, the error was due to a recent ASC API change between Wednesday and Friday 😞 . I submitted a fix in fastlane core, after which I was finally (!) able to renew the profiles and make CI go green. |
What / TL;DR
Fixes an issue with the latest version of
buildkite-agent
used in thexcode-15.3
VM image (and later ones) that prevents jobs from being transferred from the host to the VMWhy / Issue details
In the latest versions of
buildkite-agent
, the Job API experiment has been de-experimented and enabled by default.As a result,
buildkite-agent bootstrap
now tries to create a Unix socket at theBUILDKITE_SOCKETS_PATH
path, then exposes the created socket path and token asBUILDKITE_AGENT_JOB_API_SOCKET
andBUILDKITE_AGENT_JOB_API_TOKEN
env vars.The issue is that the default value for this path (aka
--sockets-path
option ofbuildkite-agent bootstrap
) is$HOME/.buildkite-agent/sockets
, so when ourhostmgr generate buildkite-job
command generates the script to handle the job in the VM, it exports allBUILDKITE_*
env vars in that script… including theBUILDKITE_SOCKETS_PATH
which was resolved to/Users/administrator/.buildkite-agent/sockets
on the host. This resulted inbuildkite-agent bootstrap
failing oncreating socket directory: mkdir /Users/administrator: permission denied
error.How
BUILDKITE_SOCKETS_PATH
env var in the generated script to/opt/ci/var/tmp/sockets
BUILDKITE_AGENT_JOB_API_SOCKET
andBUILDKITE_AGENT_JOB_API_TOKEN
in the generated scriptI also took the occasion of this PR to:
addEnvironmentVariable
calls forBUILDKITE_BUILD_CHECKOUT_PATH
,BUILDKITE_HOOKS_PATH
andBUILDKITE_PLUGINS_PATH
pointing to/usr/local/var/…
, as those are legacy paths; besides those env keys are part of eitherdisallowedKeys
oroverriddenKeys
, so were removed or overridden later in the code, beforescriptBuilder.build()
is called… so those particularaddEnvironmentVariable
calls were not impacting the generated script code after all .Paths.tempFilePath
constant which had the exact same value asPaths.tempDirectory
, and replace its only call sitevar
tolet
inPaths
constants and rearrange their order and grouping a bitcat
, for example 😉 )Testing
As it wasn't easy to test this without releasing and deploying a new
hostmgr
version to our Mac hosts, instead I:SSH'd into one of the hosts (I picked
MV-MKE-ARM64-014
)Manually modify
/opt/ci/hooks/command
script like below, to unsetBUILDKITE_AGENT_JOB_API_SOCKET
andBUILDKITE_AGENT_JOB_API_TOKEN
and setBUILDKITE_SOCKETS_PATH
to hardcoded value, and thus simulate the same change made in thishostmgr
code:Modified the DayOne-Apple pipeline in the pending Xcode-15.3 update PR, to enforce the job to run on that specific
MV-MKE-ARM64-014
hostValidated that the Job successfully passed the step related to Job API in the VM and the transfer and running of the job to the VM working as expected, fixing the issue.
I then removed my patch of
/opt/ci/hooks/command
to restoreMV-MKE-ARM64-014
to its previous state.What's Next
Once this lands, I'll generate a new release of
hostmgr
(probably a non-beta0.50.0
) and work on deploying it (but probably not today, as it's a Friday and thus submission + code freeze day for many apps, so not the best day to interrupt CI (or risk breaking it during failed deployment 😅 ).