container job workflow pod fails to initialize - HttpError: HTTP request failed #3493

sofiegonzalez · 2024-05-01T15:28:28Z

Checks

I've already read https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/troubleshooting-actions-runner-controller-errors and I'm sure my issue is not covered in the troubleshooting guide.
I am using charts that are officially provided

Controller Version

latest

Deployment Method

Helm

Checks

This isn't a question or user support case (For Q&A and community support, go to Discussions).
I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes

To Reproduce

1. set the runner to be in containerMode: kubernetes
2. create a PersistentVolumeClaim named `work` for the pods to use

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: work
  namespace: '${namespace}'
spec:
  storageClassName: <storageclass_name>
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi

3. create a job that runs in a custom container. the `/var/run/secrets/kubernetes.io/serviceaccount` is mounted from the runner pod to access the `kube-api-access-` secrets

  job1:
    runs-on: gha-runner-scale-set
    container:
      image: <personal-container>
      volumes:
        - /var/run/secrets/kubernetes.io/serviceaccount:/var/run/secrets/kubernetes.io/serviceaccount

4. the container job tries to start, i can see a PVC created for the pod and bound, but the runner is unable to start the job and return this error. This only happens on container jobs and when a container step starts to run

CI Logs on the Initialize Container step

`Run '/home/runner/k8s/index.js'
  shell: /home/runner/externals/node16/bin/node {0}
Error: HttpError: HTTP request failed
Error: Process completed with exit code 1.
Error: Executing the custom container implementation failed. Please contact your self hosted runner administrator.`

Describe the bug

Hi, My main issue is that the CI fails when I try to start a container job in containerMode: kubernetes with the error Error: HttpError: HTTP request failed. This is blocking us from making progress.

I have followed the github actions scale sets video on youtube, and tried to recreate the same configurations. The main difference being that I am using a PVC I have created through a manifest and am applying that with terraform. I am also using an docker image we built from a public docker repo, it is pull-able without authentication.

Right as the container job starts, the pod dies and fails to initialize. I can see the PVC was bound correctly. I am not sure what the Error: HttpError: HTTP request failed error means or what it is referring to.

Describe the expected behavior

The container job should start up and create a <pod_name>-workflow pod to run the container.

Additional Context

runner values.yaml
`## githubConfigUrl is the GitHub url for where you want to configure runners
## ex: https://github.com/myorg/myrepo or https://github.com/myorg
# default to mono for now
githubConfigUrl: <repo>

## githubConfigSecret is the k8s secrets to use when auth with GitHub API.
## You can choose to use GitHub App or a PAT token
# githubConfigSecret: gha-runner-scale-set-secret
githubConfigSecret:
  ### GitHub Apps Configuration
  ## NOTE: IDs MUST be strings, use quotes
  github_app_id: <gh_id>
  github_app_installation_id: <gh_install_id>
  github_app_private_key: <gh_pk>

  ### GitHub PAT Configuration
# github_token: ""
## If you have a pre-define Kubernetes secret in the same namespace the gha-runner-scale-set is going to deploy,
## you can also reference it via `githubConfigSecret: pre-defined-secret`.
## You need to make sure your predefined secret has all the required secret data set properly.
##   For a pre-defined secret using GitHub PAT, the secret needs to be created like this:
##   > kubectl create secret generic pre-defined-secret --namespace=my_namespace --from-literal=github_token='ghp_your_pat'
##   For a pre-defined secret using GitHub App, the secret needs to be created like this:
##   > kubectl create secret generic pre-defined-secret --namespace=my_namespace --from-literal=github_app_id=123456 --from-literal=github_app_installation_id=654321 --from-literal=github_app_private_key='-----BEGIN CERTIFICATE-----*******'
# githubConfigSecret: pre-defined-secret

## proxy can be used to define proxy settings that will be used by the
## controller, the listener and the runner of this scale set.
#
# proxy:
#   http:
#     url: http://proxy.com:1234
#     credentialSecretRef: proxy-auth # a secret with `username` and `password` keys
#   https:
#     url: http://proxy.com:1234
#     credentialSecretRef: proxy-auth # a secret with `username` and `password` keys
#   noProxy:
#     - example.com
#     - example.org

# maxRunners is the max number of runners the autoscaling runner set will scale up to.
maxRunners: 5

# minRunners is the min number of idle runners. The target number of runners created will be
# calculated as a sum of minRunners and the number of jobs assigned to the scale set.
minRunners: 2

# runnerGroup: "default"

runnerScaleSetName: "gha-runner-scale-set"

## A self-signed CA certificate for communication with the GitHub server can be
## provided using a config map key selector. If `runnerMountPath` is set, for
## each runner pod ARC will:
## - create a `github-server-tls-cert` volume containing the certificate
##   specified in `certificateFrom`
## - mount that volume on path `runnerMountPath`/{certificate name}
## - set NODE_EXTRA_CA_CERTS environment variable to that same path
## - set RUNNER_UPDATE_CA_CERTS environment variable to "1" (as of version
##   2.303.0 this will instruct the runner to reload certificates on the host)
##
## If any of the above had already been set by the user in the runner pod
## template, ARC will observe those and not overwrite them.
## Example configuration:
#
# githubServerTLS:
#   certificateFrom:
#     configMapKeyRef:
#       name: config-map-name
#       key: ca.crt
#   runnerMountPath: /usr/local/share/ca-certificates/

## Container mode is an object that provides out-of-box configuration
## for dind and kubernetes mode. Template will be modified as documented under the
## template object.
##
## If any customization is required for dind or kubernetes mode, containerMode should remain
## empty, and configuration should be applied to the template.
containerMode:
  type: "kubernetes"  #type can be set to dind or kubernetes
  ## the following is required when containerMode.type=kubernetes
  kubernetesModeWorkVolumeClaim:
    accessModes: ["ReadWriteOnce"]
    # For local testing, use https://github.com/openebs/dynamic-localpv-provisioner/blob/develop/docs/quickstart.md to provide dynamic provision volume with storageClassName: openebs-hostpath
    storageClassName: <storageclass_name>
    resources:
      requests:
        storage: 1Gi
  # kubernetesModeServiceAccount:
  #   annotations:

# listenerTemplate is the PodSpec for each listener Pod
# For reference: https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-v1/#PodSpec
# listenerTemplate:
#   spec:
#     containers:
#     # Use this section to append additional configuration to the listener container.
#     # If you change the name of the container, the configuration will not be applied to the listener,
#     # and it will be treated as a side-car container.
#     - name: listener
#       securityContext:
#         runAsUser: 1000
#     # Use this section to add the configuration of a side-car container.
#     # Comment it out or remove it if you don't need it.
#     # Spec for this container will be applied as is without any modifications.
#     - name: side-car
#       image: example-sidecar

## template is the PodSpec for each runner Pod
## For reference: https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-v1/#PodSpec
template:
  ## template.spec will be modified if you change the container mode
  ## with containerMode.type=dind, we will populate the template.spec with following pod spec
  ## template:
  ##   spec:
  ##     initContainers:
  ##     - name: init-dind-externals
  ##       image: ghcr.io/actions/actions-runner:latest
  ##       command: ["cp", "-r", "-v", "/home/runner/externals/.", "/home/runner/tmpDir/"]
  ##       volumeMounts:
  ##         - name: dind-externals
  ##           mountPath: /home/runner/tmpDir
  ##     containers:
  ##     - name: runner
  ##       image: ghcr.io/actions/actions-runner:latest
  ##       command: ["/home/runner/run.sh"]
  ##       env:
  ##         - name: DOCKER_HOST
  ##           value: unix:///var/run/docker.sock
  ##       volumeMounts:
  ##         - name: work
  ##           mountPath: /home/runner/_work
  ##         - name: dind-sock
  ##           mountPath: /var/run
  ##     - name: dind
  ##       image: docker:dind
  ##       args:
  ##         - dockerd
  ##         - --host=unix:///var/run/docker.sock
  ##         - --group=$(DOCKER_GROUP_GID)
  ##       env:
  ##         - name: DOCKER_GROUP_GID
  ##           value: "123"
  ##       securityContext:
  ##         privileged: true
  ##       volumeMounts:
  ##         - name: work
  ##           mountPath: /home/runner/_work
  ##         - name: dind-sock
  ##           mountPath: /var/run
  ##         - name: dind-externals
  ##           mountPath: /home/runner/externals
  ##     volumes:
  ##     - name: work
  ##       emptyDir: {}
  ##     - name: dind-sock
  ##       emptyDir: {}
  ##     - name: dind-externals
  ##       emptyDir: {}
  ######################################################################################################
  ## with containerMode.type=kubernetes, we will populate the template.spec with following pod spec
  ## template:
  ##   spec:
  ##     containers:
  ##     - name: runner
  ##       image: ghcr.io/actions/actions-runner:latest
  ##       command: ["/home/runner/run.sh"]
  ##       env:
  ##         - name: ACTIONS_RUNNER_CONTAINER_HOOKS
  ##           value: /home/runner/k8s/index.js
  ##         - name: ACTIONS_RUNNER_POD_NAME
  ##           valueFrom:
  ##             fieldRef:
  ##               fieldPath: metadata.name
  ##         - name: ACTIONS_RUNNER_REQUIRE_JOB_CONTAINER
  ##           value: "true"
  ##       volumeMounts:
  ##         - name: work
  ##           mountPath: /home/runner/_work
  ##     volumes:
  ##       - name: work
  ##         ephemeral:
  ##           volumeClaimTemplate:
  ##             spec:
  ##               accessModes: [ "ReadWriteOnce" ]
  ##               storageClassName: "local-path"
  ##               resources:
  ##                 requests:
  ##                   storage: 1Gi
  metadata:
    annotations:
      iam.amazonaws.com/role: <iam_role>
  spec:
    securityContext:
      runAsUser: 1001
      runAsGroup: 123
      fsGroup: 123
    containers:
    - name: runner
      image: ghcr.io/actions/actions-runner:latest
      command: ["/home/runner/run.sh"]
      env:
        - name: ACTIONS_RUNNER_CONTAINER_HOOKS
          value: /home/runner/k8s/index.js
        - name: ACTIONS_RUNNER_POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: ACTIONS_RUNNER_REQUIRE_JOB_CONTAINER
          value: "false"
      volumeMounts:
        - name: work
          mountPath: /home/runner/_work
    # volumes:
    #   - name: work
    #     ephemeral:
    #       volumeClaimTemplate:
    #         spec:
    #           accessModes: [ "ReadWriteOnce" ]
    #           storageClassName: "local-path"
    #           resources:
    #             requests:
    #               storage: 1Gi
## Optional controller service account that needs to have required Role and RoleBinding
## to operate this gha-runner-scale-set installation.
## The helm chart will try to find the controller deployment and its service account at installation time.
## In case the helm chart can't find the right service account, you can explicitly pass in the following value
## to help it finish RoleBinding with the right service account.
## Note: if your controller is installed to only watch a single namespace, you have to pass these values explicitly.
# controllerServiceAccount:
#   namespace: arc-system
#   name: test-arc-gha-runner-scale-set-controller`

runner pvc.tpl

`apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: work
  namespace: '${namespace}'
spec:
  storageClassName: <storageclass_name>
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi`

Controller Logs

https://gist.github.com/sofiegonzalez/74277e957c6955cf88d94cc4516d2a1e

Runner Pod Logs

https://gist.github.com/sofiegonzalez/9e24f4e38db35dc967b255187826cf3b

The text was updated successfully, but these errors were encountered:

github-actions · 2024-05-01T15:28:58Z

Hello! Thank you for filing an issue.

The maintainers will triage your issue shortly.

In the meantime, please take a look at the troubleshooting guide for bug reports.

If this is a feature request, please review our contribution guidelines.

sofiegonzalez · 2024-05-01T19:05:27Z

I was able to spin up a <runner_pod_name>-workflow pod by adding this to the values.yaml pod spec:

    spec:
        serviceAccount: gha-runner-scale-set-gha-rs-kube-mode

I got the solution from this comment. i dont understand why this fixed my issue, as the pod already has this service account definition in the pod spec on the cluster:

  ...
  securityContext: {}
  serviceAccount: gha-runner-scale-set-gha-rs-kube-mode
  serviceAccountName: gha-runner-scale-set-gha-rs-kube-mode
  ...

sofiegonzalez added bug Something isn't working gha-runner-scale-set Related to the gha-runner-scale-set mode needs triage Requires review from the maintainers labels May 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

container job workflow pod fails to initialize - HttpError: HTTP request failed #3493

container job workflow pod fails to initialize - HttpError: HTTP request failed #3493

sofiegonzalez commented May 1, 2024 •

edited

github-actions bot commented May 1, 2024

sofiegonzalez commented May 1, 2024

container job workflow pod fails to initialize - HttpError: HTTP request failed #3493

container job workflow pod fails to initialize - HttpError: HTTP request failed #3493

Comments

sofiegonzalez commented May 1, 2024 • edited

Checks

Controller Version

Deployment Method

Checks

To Reproduce

Describe the bug

Describe the expected behavior

Additional Context

Controller Logs

Runner Pod Logs

github-actions bot commented May 1, 2024

sofiegonzalez commented May 1, 2024

sofiegonzalez commented May 1, 2024 •

edited