🤫 WIP: Secret experiment #1848

moskyb · 2022-11-24T05:54:03Z

This is something that's been on my mind for a little while: how do we make secrets awesome in the agent?

At the moment, we rely on build scripts to fetch secrets at run time, but what if we could make it part of the agent lifecycle, and handle everything all nice?

This PR is an experiment to that end. It pretends that there are elements in the pipeline.yml that look like this:

secret_providers:
  - id: ssm
    type: aws-ssm
    config:
      iam_role: arn:aws:iam::123456789012:role/buildkite-role
      assume_via_oidc: true

  - id: vault
    type: vault_kvv2
    config:
      address: https://vault.example.com
      mount_path: secret
      auth_type: kubernetes
      auth_config:
        role: buildkite
        service_account_token_path: /var/run/secrets/kubernetes.io/serviceaccount/token

secrets: # Global to build, every step has these secrets injected
  - env_var: WIDGET_CORP_API_KEY
    key: path/to/widget-corp-api-key
    provider_id: vault

  - file: ~/.ssh/id_rsa
    key: path/to/ssh-key
    provider_id: ssm

# and/or

steps:
  # ...

  - name: deploy
    key: deploy
    command: ".buildkite/steps/deploy.sh"
    secrets: # Local to step, merged with globals serverside
      - env_var: GITHUB_DEPLOY_TOKEN
        key: /path/to/github-deploy-token
        provider_id: vault

and provides the agent a way to fetch those secrets in such a way that the secret store itself is abstracted away from the buildkite backend - that is, the backend would have no access to customer secrets - and in such a way that multiple stores can be used.

This PR is a super duper POC: there's only one secret backend (SSM parameter store) implemented, there are no tests, and some things are completely broken - using OIDC to log into aws, and writing secrets to files.

Hopefully this PR shows more or less what i'd like to get at, but not necessarily soon.

secrets/provider_registry.go

moskyb · 2022-11-24T05:57:33Z

secrets/provider_registry.go

+}
+
+type ProviderRegistryConfig struct {
+	Shell *shell.Shell


i don't love passing the bootstrap's shell into the provider registry, but it would be nice for it to be able to log things, and when it creates secrets.EnvSecrets, it needs to be able to pass the shell's env into them so that they can store themselves. This seems a bit nasty and i'd appreciate some C&C on how to make it a bit better

moskyb · 2022-11-24T05:58:49Z

secrets/provider_registry.go

+	for _, c := range configs {
+		wg.Add(1)
+		go func(config SecretConfig) {
+			defer wg.Done()
+
+			secret, err := pr.Fetch(config)
+			if err != nil {
+
+				errorsCh <- err
+				return
+			}
+			secretsCh <- secret
+		}(c)
+	}
+
+	go func() {
+		for err := range errorsCh {
+			errors = append(errors, err)
+		}
+	}()
+
+	go func() {
+		for secret := range secretsCh {
+			secrets = append(secrets, secret)
+		}
+	}()


maybe this is getting too goroutiney (edit: in the sense that the concurrency stuff makes the code harder to parse for humans). do we really need to fetch things in parallel? we can, so we might as well, was my justification, but i think the reality is that the number of secrets in each pipeline is gonna be pretty small, and they could probably be serialised.

If you keep the parallelisation, might want to at least cap the number of worker goroutines at runtime.NumCPU or similar.

Also is it worth returning each error, or should it treat the whole operation as failed if any of them fail?

Also while you've been out, I've been busy sprinkling context.Context args through the codebase - can requests be cancelled early through use of a context somehow?

If you keep the parallelisation, might want to at least cap the number of worker goroutines at runtime.NumCPU or similar

what would the benefit of limiting concurrency be? given that calls to secret managers are going to be IO bound, i'm happy to let greenthreading/goroutine magic handle throttling our concurrency.

Is it worth returning each error, or should it treat the whole operation as failed if any of them fail?

I think it's probably most useful to do both - ie, any len(errors) > 0 should be treated as a failure, but it's also nice to be able to tell the user exactly which secret failed to be fetched, and for what reason.

can requests be cancelled early through use of a context somehow?

yep, i think it's totally reasonable to change the Provider.Fetch interface to accept a context. i'll make that change now :)

what would the benefit of limiting concurrency be? given that calls to secret managers are going to be IO bound, i'm happy to let greenthreading/goroutine magic handle throttling our concurrency.

That's the happy path - what if they each open a connection simultaneously, exhausting TCP source ports? Or there's a config error?

secrets/secret_config.go

secrets/secret.go

Add example secrets pipeline

babe4c1

moskyb force-pushed the secret-experiment branch from ae5ba32 to fc8dacf Compare November 24, 2022 05:55

moskyb commented Nov 24, 2022

View reviewed changes

secrets/secret.go Show resolved Hide resolved

Secret experiment

9028219

moskyb force-pushed the secret-experiment branch from fc8dacf to 9028219 Compare November 24, 2022 22:53

Fix race condition causing some secrets to never appear

177b438

moskyb force-pushed the secret-experiment branch from 9839a8b to 177b438 Compare November 28, 2022 21:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🤫 WIP: Secret experiment #1848

🤫 WIP: Secret experiment #1848

moskyb commented Nov 24, 2022 •

edited

moskyb Nov 24, 2022

moskyb Nov 24, 2022 •

edited

DrJosh9000 Nov 24, 2022

moskyb Nov 24, 2022

DrJosh9000 Nov 28, 2022

🤫 WIP: Secret experiment #1848

Are you sure you want to change the base?

🤫 WIP: Secret experiment #1848

Conversation

moskyb commented Nov 24, 2022 • edited

moskyb Nov 24, 2022

Choose a reason for hiding this comment

moskyb Nov 24, 2022 • edited

Choose a reason for hiding this comment

DrJosh9000 Nov 24, 2022

Choose a reason for hiding this comment

moskyb Nov 24, 2022

Choose a reason for hiding this comment

DrJosh9000 Nov 28, 2022

Choose a reason for hiding this comment

moskyb commented Nov 24, 2022 •

edited

moskyb Nov 24, 2022 •

edited