Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why is 0.0.0.0 the default host? #8510

Open
yurishkuro opened this issue Sep 23, 2023 · 21 comments
Open

Why is 0.0.0.0 the default host? #8510

yurishkuro opened this issue Sep 23, 2023 · 21 comments

Comments

@yurishkuro
Copy link
Member

I am running OTLP receivers with default settings:

receivers:
  otlp:
    protocols:
      grpc:
      http:

The start up logs are littered with these warnings:

internal@v0.85.0/warning.go:40 Using the 0.0.0.0 address exposes this server to every network interface, which may facilitate Denial of Service attacks {"kind": "receiver", "name": "otlp", "data_type": "traces", "documentation": "https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacks"}

Why isn't the default config for those exporters is such that it follows the mentioned best practices?

@mx-psi
Copy link
Member

mx-psi commented Sep 25, 2023

Previous discussion can be found on #6151. If I recall correctly ultimately we decided on a SIG meeting around that time not to change the default for backwards compatibility reasons.

@mx-psi
Copy link
Member

mx-psi commented Sep 25, 2023

If we were to change it, what default value would you suggest? localhost? Another problem here is that there is no good default value that works in all cases. Specifically, the containerized cases usually depend on the network configuration, we are able to provide a good default on the Helm chart but not in other cases.

@yurishkuro
Copy link
Member Author

If we believe the default value is unsafe, it should be changed (despite backwards compatibility concerns). If it is safe (and appropriate in some scenarios) it should not log the warning.

The current situation is a bad user experience.

@atoulme
Copy link
Contributor

atoulme commented Sep 25, 2023

I believe 127.0.0.1 should be default. There is no good reason to have warnings show up on the default setting.
We should have an info message indicating we're using 127.0.0.1.
For our distros and Docker image, the default config should have a way to override the default by using an environment variable. That should be documented. The env var should be a well known name that we standardize on.

@mx-psi
Copy link
Member

mx-psi commented Sep 26, 2023

the default config should have a way to override the default by using an environment variable

We don't support this (maybe we should), see #4384 for a related feature request.

@mx-psi
Copy link
Member

mx-psi commented Sep 26, 2023

Filed open-telemetry/opentelemetry-collector-releases#408 so we can address the Docker case independently from the general default.

A quick search on contrib reveals at least the following components in that repository are affected:

But there may be others, e.g. the sapm receiver which sets the endpoint to :7276, also a valid syntax for an unspecified address.

@Aneurysm9
Copy link
Member

the default config should have a way to override the default by using an environment variable

We don't support this (maybe we should), see #4384 for a related feature request.

While I think that would be a nice addition to the env confmap provider, I don't think it's necessary (or sufficient) here. All default configuration can be overridden by explicit configuration and all explicit configuration can be provided through the environment. Allowing fallbacks to be specified for the env provider would still require changes to the input configuration to specify that those fields should come from that provider and would need to specify the fallback at point of use.

Making some configuration fields take input from the environment in the absence of explicit configuration would be a much more significant change and one that I worry would make it even more difficult to reason about the effective configuration that would result from any given collector invocation.

@yurishkuro
Copy link
Member Author

I agree, fallback env var seems like an overkill, and a slippery slope. Hopefully most people manage configurations as code / programmatically, so they can always do this themselves by reusing some base var={desired-ip} in their configs.

This issue is about the warning, we can either change the default or remove the warning. I personally would be fine with just removing the warning, because 0.0.0.0 is a reasonable default in many cases (e.g. :8080 string in Go means 0.0.0.0:8080 and nobody bats an eye at that).

@mx-psi
Copy link
Member

mx-psi commented Sep 26, 2023

For context (see issue linked on my first message) the main rationale for the warning was that this is a known security weakness (CWE-1327) and that we had a specific vulnerability (GHSA-69cg-p879-7622) where the current default made the security implications worse than they could have been.

Since this is ultimately a security-related decision I would be interested in what the @open-telemetry/sig-security-maintainers think.

@yurishkuro
Copy link
Member Author

GHSA-69cg-p879-7622 doesn't look like it's related at all: "HTTP/2 connection can hang during closing if shutdown were preempted by a fatal error"

CWE-1327 - just because it exists doesn't make it a valid one. Starting a server that does not require TLS is also not the best practice, but it's important for dev usability. 0.0.0.0 doesn't seem any different. Do we log a warning when starting servers with Insecure settings?

@atoulme
Copy link
Contributor

atoulme commented Sep 26, 2023

CWE-1327 - just because it exists doesn't make it a valid one. Starting a server that does not require TLS is also not the best practice, but it's important for dev usability. 0.0.0.0 doesn't seem any different. Do we log a warning when starting servers with Insecure settings?

We could and should :)

@mx-psi
Copy link
Member

mx-psi commented Sep 27, 2023

GHSA-69cg-p879-7622 doesn't look like it's related at all: "HTTP/2 connection can hang during closing if shutdown were preempted by a fatal error"

This is described in more detail in #6151 and was discussed on the SIG meeting at the time. Copying from the issue:

While some level of impact was unavoidable on this receiver since exposing an HTTP/2 server is part of its core functionality, the OTLP receiver default endpoints (0.0.0.0:4317 and 0.0.0.0:4318) made it so that this vulnerability could be leveraged more easily than it could have been.

So the relationship is that because of binding to all network interfaces the vulnerability was easier to exploit.

CWE-1327 - just because it exists doesn't make it a valid one.

The fact that is listed as CWE should count as some evidence that this is important; it's fair to argue against it but you should provide equivalent evidence against it.

Starting a server that does not require TLS is also not the best practice, but it's important for dev usability. 0.0.0.0 doesn't seem any different. Do we log a warning when starting servers with Insecure settings?

Actually this is a good case for doing something for 0.0.0.0. We enable TLS by default (see here), so I don't think that's a good argument against nudging users to be secure by default when it comes to choosing what network interfaces to bind to.

The equivalent to the approach we took for TLS would be to change the default to localhost. Here instead we chose to warn people about it and changing the defaults in downstream tools (e.g. the Helm chart) because of backwards compatibility concerns. The only difference is that we got to the TLS case sooner when we had fewer users and were making breaking changes more frequently.

@jpkrohling
Copy link
Member

I think we warned users long enough already, we could indeed consider using localhost as the default host to bind the port to.

@mx-psi
Copy link
Member

mx-psi commented Sep 27, 2023

Inspired by #7769 (comment) I want to vote on what we should do here.

What should we do with this issue?

🚀 Do nothing, leave the warning as is, don't change the default
❤️ Change default to localhost on all receivers and remove the warning
🎉 Change default to localhost on all receivers but don't remove the warning
👀 Remove the warning, make no changes to the defaults

Vote for all options that are acceptable to you

@codeboten
Copy link
Contributor

@mx-psi another option would be to make this a feature-gate to give people a chance to change over more smoothly

@mx-psi
Copy link
Member

mx-psi commented Sep 27, 2023

@mx-psi another option would be to make this a feature-gate to give people a chance to change over more smoothly

I'd say we can use the poll to decide what the end result should be and we can discuss the process to get there once we know that. Does that sound okay?

@hughesjj
Copy link

hughesjj commented Oct 2, 2023

Well, if we're voting, so far "change default to localhost" is winning, leaning towards additionally removing the warning.

@mx-psi
Copy link
Member

mx-psi commented Oct 3, 2023

Well, if we're voting, so far "change default to localhost" is winning, leaning towards additionally removing the warning.

Indeed. I think doing this with a feature gate as @codeboten suggested makes sense. There is one slight inconvenience, which is that this feature gate must be shared across core and contrib modules. One way to work around this is to, on contrib, VisitAll gates in the global registry until getting one with the matching ID(). Another way is to have it as part of the public API of core. I will open a couple PRs with the first approach by the end of the week.

@mx-psi
Copy link
Member

mx-psi commented Oct 5, 2023

PTAL at #8622 as a first step, I am working on the contrib PR but would want to validate the design here first. I ended up exposing this as part of the public API since otherwise it is hard to reason about initialization and registration order.

@bboreham
Copy link
Contributor

Hi, just wanted to suggest that you have somewhere in the docs a warning about the case where localhost gets resolved via DNS to some real address and things go very weird. This has happened enough times to me that I reflexively type 127.0.0.1 instead, but I recognize that ipv6 is a thing.

@mx-psi
Copy link
Member

mx-psi commented Jan 22, 2024

@bboreham moved this to #9338, thanks for the suggestion :)

mx-psi added a commit that referenced this issue Jan 24, 2024
…ocalhost defaults for server-like components (#8622)

**Description:** 

- Define `component.UseLocalHostAsDefaultHost` in the
`internal/localhostgate` package.
- Define `featuregate.ErrIsAlreadyRegistered` error, returned by
`Register` when a gate is already registered.
- Adds support for the localhost gate on the OTLP receiver.

This PR does not remove the current warning in any way, we can remove
this separately.

**Link to tracking Issue:** Updates #8510

**Testing:** Adds unit tests

**Documentation:** Document on OTLP receiver template and add related
logging.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants