Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Google Container Registry / Google Artifact Registry ImagePullSecret connection problem #2073

Open
tidusete opened this issue May 15, 2024 · 8 comments · May be fixed by #2108
Open

Google Container Registry / Google Artifact Registry ImagePullSecret connection problem #2073

tidusete opened this issue May 15, 2024 · 8 comments · May be fixed by #2108
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@tidusete
Copy link
Contributor

What steps did you take and what happened:

  1. I had an old version of trivy-operator running with the helm chart_version: 0.18.0. which was working using the GCR/GAR in order to pull the images that have to be scanned.

  2. In order to setup the authentication, I just followed your documentation and I used the second option

  3. I've updated the helm version and bump it to the last stable version 0.22.1 (I knew this bump was a little bit risky)

  4. As soon as I bumped the version, all the scan-vulnerabilityreport started to end up in error state. So I checked the logs and I saw the following:

Defaulted container "XXXXXXX" out of: XXXXXXX, YYYY-YYY-YY-YYYY-YYYYYYY (init)
^[[A2024-05-15T10:52:25.529Z	FATAL	image scan error: scan error: unable to initialize a scanner: unable to initialize an image scanner: 4 errors occurred:
	* docker error: unable to inspect the image (europe-west1-docker.pkg.dev/ZZZZZZZ/ZZZZZZ/ZZZZZZZZ:master): Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
	* containerd error: containerd socket not found: /run/containerd/containerd.sock
	* podman error: unable to initialize Podman client: no podman socket found: stat podman/podman.sock: no such file or directory
	* remote error: GET https://europe-west1-docker.pkg.dev/v2/token?scope=repository%ZZZZZZZ%ZZZZZZ%2FZZZZZZZZ%3Apull&service=: DENIED: Unauthenticated request. Unauthenticated requests do not have permission "artifactregistry.repositories.downloadArtifacts" on resource "projects/ZZZZZZZ/locations/europe-west1/repositories/ZZZZZZZ" (or it may not exist)

What did you expect to happen:
I expected to trivy-operator keep working like it was working before.

Anything else you would like to add:

I started to check all the possible Google container registry PR related to that problem:

And after investigate for a while I realized that maybe can be related to this one:

Why? Because on this MR, you are checking if the key is _json_key but following the Google Artifact Registry configuration document you can have either a _json_key or a _json_key_base64.
In case that you have a _json_key_base64 all the content of your file is base64-encoded so I have the suspect that trivy mounts the credentials correctly from the secret stored at scan-vulnerabilityreport-xxxx but its encoded base64 so trivy is not able to use it.
I have been doing some "tests" and this issue begins to happen at the TAG v0.16.1.
I 've checked the diff between the 2 tags version and I realized that was when you introduced the changes of the:

Is it possible that the problem relays on the fact that on the file pkg/plugins/trivy/plugin.go we are checking just _json_key and should be as well _json_key_base64 and implement a function that decode it?
Im quite confused because on v0.16.0 is working.

We can not change our kubernetes.io/dockerconfigjson to use _json_key and not _json_key_base64 because we want to avoid to have problems with the accolade and other special characters.

Can you check it please? Im stuck on the tag v0.16.0 but using the helm values from the v0.20.1 and it works but I would like to able to bump it for consistency

@tidusete tidusete added the kind/bug Categorizes issue or PR as related to a bug. label May 15, 2024
@chen-keinan
Copy link
Collaborator

@tidusete thanks for the feedback, I'l have a look and update you with my finding

@chen-keinan
Copy link
Collaborator

chen-keinan commented May 19, 2024

@tidusete can you please share. you config maps ?

can you please add an example how you create the secret use in imagePullSecret ?
please and an example

@tidusete
Copy link
Contributor Author

Hello! Sorry for the delay. I've been a bit busy.

Regarding the use of the ImagePullSecret, I decided to go with the option "Define Secrets through Trivy-Operator configuration," it works perfectly and I dont have to create the secret on the same namespace that I have running trivy although both run on the same cluster

Please note that I have Trivy and the scanner jobs running in a different namespace than the one where the container image credentials are defined. In that namespace, I have my application running.

Here are the steps I followed to create the credentials:

  1. Create a Service Account with the necessary permissions.
  2. Generate a private JSON key from the GCP UI. Let's call it privatekey_sa.json.
  3. Download this privatekey_sa.json on my laptop

Now lets create the secret that have the credentials.

kubectl -n namespace1 create secret docker-registry reg-secret \
  --dry-run=client \
  --output=yaml \
  --docker-server=europe-west1-docker.pkg.dev/{service_account}/registry01 \
  --docker-username=_json_key_base64 \
  --docker-password=$(base64 -w 0 privatekey_sa.json) \
  --docker-email=registry@europe-west1-docker.pkg.dev/{service_account}/registry01 > reg-secret.yaml

kubectl apply -f reg-secret.yaml -n namespace1

The secret works because all the services use it, and it is defined in every deployment as imagePullSecret, allowing the deployments to pull the images.

I used the default values from the Helm chart version v0.20.1 and overrode the following settings:

helm_override:
  automountServiceAccountToken: true
  scanJobAutomountServiceAccountToken: false
  targetNamespaces: "namespace1"
  operator:
    privateRegistryScanSecretsNames: {"namespace1":"reg-secret"}
  trivy:
    resources:
      limits:
        cpu: 500m
        memory: 1Gi

Additionally, I configured the Trivy Operator with the following settings:

Here is the info of the configmaps:

trivy-operator:
  node.collector.imageRef: ghcr.io/aquasecurity/node-collector:0.1.4
  nodeCollector.volumeMounts: '[{"mountPath":"/var/lib/etcd","name":"var-lib-etcd","readOnly":true},{"mountPath":"/var/lib/kubelet","name":"var-lib-kubelet","readOnly":true},{"mountPath":"/var/lib/kube-scheduler","name":"var-lib-kube-scheduler","readOnly":true},{"mountPath":"/var/lib/kube-controller-manager","name":"var-lib-kube-controller-manager","readOnly":true},{"mountPath":"/etc/systemd","name":"etc-systemd","readOnly":true},{"mountPath":"/lib/systemd/","name":"lib-systemd","readOnly":true},{"mountPath":"/etc/kubernetes","name":"etc-kubernetes","readOnly":true},{"mountPath":"/etc/cni/net.d/","name":"etc-cni-netd","readOnly":true}]'
  nodeCollector.volumes: '[{"hostPath":{"path":"/var/lib/etcd"},"name":"var-lib-etcd"},{"hostPath":{"path":"/var/lib/kubelet"},"name":"var-lib-kubelet"},{"hostPath":{"path":"/var/lib/kube-scheduler"},"name":"var-lib-kube-scheduler"},{"hostPath":{"path":"/var/lib/kube-controller-manager"},"name":"var-lib-kube-controller-manager"},{"hostPath":{"path":"/etc/systemd"},"name":"etc-systemd"},{"hostPath":{"path":"/lib/systemd"},"name":"lib-systemd"},{"hostPath":{"path":"/etc/kubernetes"},"name":"etc-kubernetes"},{"hostPath":{"path":"/etc/cni/net.d/"},"name":"etc-cni-netd"}]'
  report.recordFailedChecksOnly: "true"
  scanJob.compressLogs: "true"
  scanJob.podTemplateContainerSecurityContext: '{"allowPrivilegeEscalation":false,"capabilities":{"drop":["ALL"]},"privileged":false,"readOnlyRootFilesystem":true}'
  vulnerabilityReports.scanner: Trivy

trivy-operator-trivy-config:
  trivy.additionalVulnerabilityReportFields: ""
  trivy.command: image
  trivy.dbRepository: ghcr.io/aquasecurity/trivy-db
  trivy.dbRepositoryInsecure: "false"
  trivy.ignoreUnfixed: "true"
  trivy.javaDbRepository: ghcr.io/aquasecurity/trivy-java-db
  trivy.mode: Standalone
  trivy.repository: ghcr.io/aquasecurity/trivy
  trivy.resources.limits.cpu: 500m
  trivy.resources.limits.memory: 1Gi
  trivy.resources.requests.cpu: 100m
  trivy.resources.requests.memory: 100M
  trivy.severity: UNKNOWN,LOW,MEDIUM,HIGH,CRITICAL
  trivy.skipJavaDBUpdate: "false"
  trivy.slow: "true"
  trivy.supportedConfigAuditKinds: Workload,Service,Role,ClusterRole,NetworkPolicy,Ingress,LimitRange,ResourceQuota
  trivy.tag: 0.50.2
  trivy.timeout: 5m0s
  trivy.useBuiltinRegoPolicies: "true"

trivy-operator-config:
  CONTROLLER_CACHE_SYNC_TIMEOUT: 5m
  OPERATOR_ACCESS_GLOBAL_SECRETS_SERVICE_ACCOUNTS: "true"
  OPERATOR_BATCH_DELETE_DELAY: 10s
  OPERATOR_BATCH_DELETE_LIMIT: "10"
  OPERATOR_BUILT_IN_TRIVY_SERVER: "false"
  OPERATOR_CACHE_REPORT_TTL: 120h
  OPERATOR_CLUSTER_COMPLIANCE_ENABLED: "false"
  OPERATOR_CLUSTER_SBOM_CACHE_ENABLED: "false"
  OPERATOR_CONCURRENT_NODE_COLLECTOR_LIMIT: "1"
  OPERATOR_CONCURRENT_SCAN_JOBS_LIMIT: "2"
  OPERATOR_CONFIG_AUDIT_SCANNER_ENABLED: "false"
  OPERATOR_CONFIG_AUDIT_SCANNER_SCAN_ONLY_CURRENT_REVISIONS: "true"
  OPERATOR_EXPOSED_SECRET_SCANNER_ENABLED: "true"
  OPERATOR_HEALTH_PROBE_BIND_ADDRESS: :9090
  OPERATOR_INFRA_ASSESSMENT_SCANNER_ENABLED: "false"
  OPERATOR_LOG_DEV_MODE: "false"
  OPERATOR_MERGE_RBAC_FINDING_WITH_CONFIG_AUDIT: "false"
  OPERATOR_METRICS_BIND_ADDRESS: :8080
  OPERATOR_METRICS_CLUSTER_COMPLIANCE_INFO_ENABLED: "false"
  OPERATOR_METRICS_CONFIG_AUDIT_INFO_ENABLED: "false"
  OPERATOR_METRICS_EXPOSED_SECRET_INFO_ENABLED: "false"
  OPERATOR_METRICS_FINDINGS_ENABLED: "true"
  OPERATOR_METRICS_IMAGE_INFO_ENABLED: "false"
  OPERATOR_METRICS_INFRA_ASSESSMENT_INFO_ENABLED: "false"
  OPERATOR_METRICS_RBAC_ASSESSMENT_INFO_ENABLED: "false"
  OPERATOR_METRICS_VULN_ID_ENABLED: "true"
  OPERATOR_PRIVATE_REGISTRY_SCAN_SECRETS_NAMES: '{}'
  OPERATOR_RBAC_ASSESSMENT_SCANNER_ENABLED: "false"
  OPERATOR_SBOM_GENERATION_ENABLED: "false"
  OPERATOR_SCAN_JOB_RETRY_AFTER: 30s
  OPERATOR_SCAN_JOB_TIMEOUT: 5m
  OPERATOR_SCAN_JOB_TTL: ""
  OPERATOR_SCANNER_REPORT_TTL: 24h
  OPERATOR_SEND_DELETED_REPORTS: "false"
  OPERATOR_VULNERABILITY_SCANNER_ENABLED: "true"
  OPERATOR_VULNERABILITY_SCANNER_SCAN_ONLY_CURRENT_REVISIONS: "true"
  OPERATOR_WEBHOOK_BROADCAST_TIMEOUT: 30s
  OPERATOR_WEBHOOK_BROADCAST_URL: ""
  TRIVY_SERVER_HEALTH_CHECK_CACHE_EXPIRATION: 10h

The secrets are completely empty:

trivy-operator                                      Opaque               0      12d
trivy-operator-trivy-config                         Opaque               0      12d

I hope this helps! If you have any further questions, please let me know.

@chen-keinan
Copy link
Collaborator

@tidusete is image scanning now works with new configuration applied as describe above ?

@tidusete
Copy link
Contributor Author

Yes, Im following this documentation. (Looks like all the authentication methods are working).
I'm also using standalone mode.
But as soon as I bump the trivy-operator from tag v0.16.0 to above it will not work anymore.

@chen-keinan
Copy link
Collaborator

@tidusete by comparing v0.16.0 to latest I see that the logic in #1404 is not supported, I can add a helm flag to fall back to old method as specified in v0.16.0 if it helps

@tidusete
Copy link
Contributor Author

Yes, I would really appreciate it to be able to bump the version of the trivy-operator and being able to keep using the .dockerconfigjson because I already have it defined and working.

Can you provide more information about your discover? I'm really curious

@chen-keinan
Copy link
Collaborator

chen-keinan commented May 27, 2024

Yes, I would really appreciate it to be able to bump the version of the trivy-operator and being able to keep using the .dockerconfigjson because I already have it defined and working.

Can you provide more information about your discover? I'm really curious

Its simple if the image is identified as gcr relates the it will use the change made in #1404 otherwise it will do the default.
I'll add another helm param for it to satisfied the condition

@chen-keinan chen-keinan linked a pull request May 28, 2024 that will close this issue
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants