Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docu Improvement: Please don't hesitate to specify this hint into the clusterissuer/cert-manager Setup Guides #22095

Closed
2 tasks done
theyo-tester opened this issue May 15, 2024 · 3 comments
Labels
enhancement New feature or request

Comments

@theyo-tester
Copy link

theyo-tester commented May 15, 2024

Is your feature request related to a problem?

I was just not able to issue and signe a certificate, even if everything seemed to work as expected.
I have spent more than one day finding the issue and I almost gave up.
I was already filling in a new Bug Report because I was not able to see/find a relevant hint it in the logs. While writing everything I knew about the issue I found a log entry that by a search led me to the issue and the solution to it.

At some point in the past I have specified an additional Domain in 'Network->Edit Global Configuration->Additional Domains' not knowing what the consequences would be.
image

At the first glance, this config seems to be innocent but it is messing up the issuing of certificates by the cert manager big time!

The ?-Mark gives some hint about this but here is the clear explanation: Additional Domains will land in the /etc/resolv.conf as a search Domain.
This leads to resolving domains like "https://acme-v02.api.letsencrypt.org/directory" to the localhost (which is the websecure entrypoint of traefik) IF you have a Wildcard entry for that domain, thus the certification validations of letsencrypt fails.
I found the relevant error in the cert-manager-controller logs and it looks like this:

...setup.go:265] "failed to register an ACME account" err="Get \"https://acme-v02.api.letsencrypt.org/directory\": 
tls: failed to verify certificate: x509:  certificate is valid for <someRandomNumber>.traefik.default, 
not acme-v02.api.letsencrypt.org" logger="cert-manager.clusterissuers" 
resource_name="cert" resource_namespace="" resource_kind="ClusterIssuer" resource_version="v1" related_resource_name="cert-acme-clusterissuer-account-key" related_resource_namespace="ix-cert-manager" related_resource_kind="Secret"

The simple fix was:

  • to remove the Additional Domains entry, which removed the "search" line in /etc/resolv.conf
  • remove and reinstall the cert-manager. Maybe a remove is not needed in the first place, but, the resolve.conf from host has to be propagated somehow to the pod
  • re-apply/update the configs in clusterissuer and/or apps to trigger a new certificate issuance.

Describe the solution you'd like

Please specify in bold letters in the relevant guides
https://truecharts.org/charts/premium/clusterissuer/how-to/
https://truecharts.org/charts/system/cert-manager/

That 'Network->Edit Global Configuration->Additional Domains' should remain empty! Or it should at least not point to the external public FQDN or to an domain name with a wildcard in place
In other words, there should be no search-entry of a public domain.tld in the /etc/resolv.conf, that also has a *.domain.tld in place, defined in cloudflare f.i..

Otherwise the ssl certificate issuance will not work and you will not know why, until you dig deep int-o the relevant config.

Describe alternatives you've considered

the alternative would be frustrated users 😅, if it happens to be that they wrongly specified Additional search Domains.

Additional context

No response

I've read and agree with the following

  • I've checked all open and closed issues and my request is not there.
  • I've checked all open and closed pull requests and my request is not there.
@theyo-tester theyo-tester added the enhancement New feature or request label May 15, 2024
@Ornias1993
Copy link
Member

Ornias1993 commented May 16, 2024

Thanks for the report, next time please keep things a little shorter, at list the title.

That being said:
I don't think we want to go as far as documenting every single mistake one can make when configuring the OS itself, as we're not TrueNAS Support and want to move away from giving platform specific guidance where possible.


The better solution would be for iX-Systems, who make the OS, to (better) clearify what these functions do. Thats not really our job.

@theyo-tester
Copy link
Author

ok. thank you for the response.
Sorry for the long report; I just wanted to make the context clear.
I subscribe to your point regarding the responsibility for clarification.
However, this issue did not appear until I used these charts...:) and no, I don't say, that it is the fault of the charts, everything good 👍

Still, I think that this is not a very obvious issue and it could be very frustrating.
It is just a matter of an additional sentence, in the end 😉

On the other side, this is not only related to TrueNAS, it can also happen without it, if there is a search entry in the resolv.conf

BR

@Ornias1993
Copy link
Member

Ornias1993 commented May 16, 2024

Its not limited to the charts, it affects all software and all charts trying to reach out...

Actually on every system possible, this is basically Linux-101.
That goes far byond a project "just" building helm charts.

@Ornias1993 Ornias1993 closed this as not planned Won't fix, can't repro, duplicate, stale May 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants