-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
secured cluster services not able to authenticate with central service after restarting central db #9726
Comments
More details for your analysis, Stackrox version: 4.1.0 Sensor logs below for reference, pkg/grpc: 2024/02/02 18:58:07.338266 server.go:216: Info: Launching backend gRPC listener |
Dear stackrox team any update on the above issue? |
Hey @naveen2131-hue @vinspub, I'm not sure I understand what is the operation that was performed on the PVC. Perhaps that this removed the existing init bundle from the database. From the logs, it looks like a new init bundle needs to be applied to the Secured Cluster to apply new certificates. You can check how to do that here https://docs.openshift.com/acs/3.66/installing/install-ocp-operator.html#generate-init-bundle-operator. After applying the init bundle, it is probably needed to restart all the Secured Cluster workloads (sensor, scanner, etc.). Let me know if that resolves the issue! |
@ludydoo Thanks for the reply. It seems like a certificate error in the central and scanner. We have properly configured the PV and rolled back successfully and we are able to view all existing policies we created in console. we are getting the bad certificate and tls: failed to verify certificate: x509: certificate signed by unknown authority in scanner and central pods. Due to this the vulnerability definitions also not updated and scanner certificate expiry status couldn't be fetched. Warning alert:failed to determine scanner cert expiry error: failed to contact scanner at https://scanner.stackrox.svc:8080: dial tcp 10.100.231.16:8080: connect: connection refused |
@vinspub did you try the steps I mentioned earlier (re-applying an init bundle)? |
@ludydoo We are good to re-applying an init bundle. Please correct me, if my understanding is wrong.
Bcz we are getting this bad certificate error and vulnerability definitions update error from past two days, before central db pod restarts. |
@vinspub that's mostly accurate.
So re-applying the init-bundle to the SecuredCluster would hopefully fix the TLS connection issues between central and the secured cluster. If there are also TLS connection issues between the central components themselves, I would click on the "Reissuing internal certificates" link (shown in the screenshot you attached) and follow those instructions. |
This should get you up and running: https://docs.openshift.com/acs/4.3/configuration/reissue-internal-certificates.html |
@ludydoo Despite we recreated the init bundle, we are getting this below error in central pod, root logger: 2024/03/19 14:58:31.672653 logger.go:77: Warn: pkg/grpc/authn/interceptor.go:30 - Cannot extract identity: could not verify certificate: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "StackRox Certificate Authority") I have attached Sensor pod log for reference. Kindly assist us to resolve the issue... |
@ludydoo, Any update on the below error, we are eagerly waiting for your reply. root logger: 2024/03/21 05:34:34.836598 logger.go:77: Warn: pkg/grpc/authn/interceptor.go:30 - Cannot extract identity: could not verify certificate: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "StackRox Certificate Authority") |
@vinspub what installation method are you using ? |
@ludydoo We are deploying using Helm. More Analysis we found for your reference,
|
@ludydoo Any update on the above queries. May I know the reason, why we are getting the error in auto-generated internal certificates? This issue occurs after the central services are running successfully for a few days. Installation Details: |
@SimonBaeumer I think we need your touch |
@ludydoo @SimonBaeumer Any update on the above issue? |
Hi @vinspub, Do you run a LoadBalancer in front of your Central with another TLS certificate? If yes, the CA of that cert must be added to Central's additional-ca configuration. For this you can configured it here in your values. Could you update your environment to the latest ACS version? 4.1 is quite old, ACS is at version 4.4. If the issue still persists after upgrading we need to have a deeper look into the certificates and rotations. |
Kindly help us to resolve the below issue, Why we are getting the bad certificate error.
Note: The Scanner and central certificates are not expired
In central pod getting below error in log,
tlsconfig: 2024/02/02 18:27:11.454870 tlsconfig.go:155: Info: Default TLS certificate file "/run/secrets/stackrox.io/default-tls-cert/tls.crt" does not exist. Skipping
51
pkg/grpc/authn: 2024/02/02 18:27:11.614534 rate_limited_logger.go:69: Warn: Cannot extract identity: could not verify certificate: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "StackRox Certificate Authority")
50
tlsconfig: 2024/02/02 18:29:02.557744 tlsconfig.go:50: Info: Skipping additional CA directory entry "..2024_02_02_13_15_39.1377428811" because it is a directory
In Scanner Pod getting below error in log,
2024/02/02 17:33:25 http: TLS handshake error from 10.0.x.x:59392: remote error: tls: bad certificate
error":"fetching update from URL: executing request: Get "https://central.stackrox.svc/api/extensions/scannerdefinitions?uuid=e5e73d51-8941-4831-96fe-4822153c2c70\": tls: failed to verify certificate: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "StackRox Certificate Authority")"}
33
{"Event":"Starting an update cycle","Level":"info","Location":"updater.go:56","Time":"2024-02-02 17:39:20.675431"}
The text was updated successfully, but these errors were encountered: