Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log a warning when an environment variable without a value is expanded #5615

Closed
dyladan opened this issue Jul 1, 2022 · 12 comments · Fixed by #10135
Closed

Log a warning when an environment variable without a value is expanded #5615

dyladan opened this issue Jul 1, 2022 · 12 comments · Fixed by #10135

Comments

@dyladan
Copy link
Member

dyladan commented Jul 1, 2022

Related to #5614

Is your feature request related to a problem? Please describe.

In open-telemetry/opentelemetry-collector-contrib#11846 the user was trying to use "$ConnectionString" as a Username. The collector automatically interprets $ as an environment variable expansion. The environment variable was empty, so the Username field was interpreted as empty and failed validation. There was nothing logged in the collector to alert this user that their configuration was being mangled by the failed environment expansion.

Describe the solution you'd like

When an environment variable expansion has no value, log a warning.

@dyladan
Copy link
Member Author

dyladan commented Jul 1, 2022

Another possible "fix" for this is to log the full collector config at startup. IIRC this is what telegraf does. This would allow the user to see that their config is messed up.

One major issue with this solution is that you would need to reliably hide secrets while also making it possible to see if an expansion failed. We currently don't have any way to distinguish between expansions that are secret and should be obfuscated in logs and expansions that are not.

@mx-psi
Copy link
Member

mx-psi commented Jul 2, 2022

I think we should fail fast instead of logging a warning. This is a breaking change, but I think it's worth it in terms of improved troubleshooting and we can have a feature gate to make the transition smoother. We probably also want to prioritize implementing #5228, which will be useful in this case.

For printing the configuration, we have #5223. If you have ideas on how to reliably hide secrets please participate in the discussion there!

@dyladan
Copy link
Member Author

dyladan commented Jul 5, 2022

I also think a fail-fast strategy is a good one. I didn't suggest it initially because I didn't know how conservative the collector SIG tends to be about those types of breaks. One possible midway solution is to have a strict mode which would toggle behavior between warn and failure

edit: a strict mode may also allow other similar enhancements such as failing to start up when an unknown config key is present which may catch typos in some cases

@mx-psi
Copy link
Member

mx-psi commented Jul 6, 2022

failing to start up when an unknown config key is present which may catch typos in some cases

We already do this by default, so it makes all the more sense to be consistent with environment variables :) If I find some time this week I will try to address this

@mx-psi
Copy link
Member

mx-psi commented Aug 5, 2022

I started making a PoC for raising a warning (and failing if enabled through a feature gate) on #5734. Since the logger is not available when loading the configuration, we have to signal the need to log a warning in some way. On the PR, I did this with a special error type ('NonFatalError') that needs to be handled by all configuration-related functions.

There are other alternatives, like changing providers/converters signature to return some sort of 'diagnostics' struct that includes any warnings, or giving up on logging a warning altogether.

@open-telemetry/collector-approvers What do you think is the best approach?

@mx-psi
Copy link
Member

mx-psi commented Feb 3, 2023

I am adding this to the confmap milestone, we need to discuss if we want to change the behavior or if we want to make any changes on the Go API to allow for non fatal errors before reaching 1.0.

@Aneurysm9
Copy link
Member

Not a fan if the idea of failing if environment expansion encounters an unset value. That may or may not be an error as a user may wish to use that to add context to a value rather than be the only content of a value. Logging that a replacement failed makes sense and would be a good addition.

@mx-psi
Copy link
Member

mx-psi commented Feb 10, 2023

[failing if environment expansion encounters an unset value] may or may not be an error as a user may wish to use that to add context to a value rather than be the only content of a value.

Would it be possible to support that through support for setting a default value if unset? (Not sure if we should do that, I am just trying to understand your point here)

@Aneurysm9
Copy link
Member

[failing if environment expansion encounters an unset value] may or may not be an error as a user may wish to use that to add context to a value rather than be the only content of a value.

Would it be possible to support that through support for setting a default value if unset? (Not sure if we should do that, I am just trying to understand your point here)

Maybe, but in that case I'd argue that the default default value should be an empty string which takes us right back to where we are today.

bogdandrutu pushed a commit that referenced this issue Feb 1, 2024
…nd providers (#9443)

**Description:** 

For both #5615 and #9162 we need to be able to log during the confmap
resolution.

My proposed solution is to pass a `*zap.Logger` to converters and
providers during initialization. These components can then use this to
log any warnings they need. This PR does the first step: being able to
pass anything to converters and providers during initialization.

The obvious alternative to this is to change the interface of
`confmap.Provider` and `confmap.Converter` to pass any warnings in an
explicit struct. I think the `*zap.Logger` alternative is more natural
for developers of providers and converters: you just use a logger like
everywhere else in the Collector.

One problem for the Collector usage of `confmap` is: How does one pass a
`*zap.Logger` before knowing how a `*zap.Logger` should be configured? I
think we can work around this by:
1. Passing a special 'deferred' Logger that just stores the warnings
without actually logging them (we can use something like
`zaptest/observer` for this)
2. Resolving configuration
3. Building a `*zap.Logger` with said configuration
4. Logging the entries stored in (1) with the logger from (3) (using
`zaptest/observer` we can do that by taking the `zapcore.Core` out of
the logger and manually writing)

**We don't actually need ProviderSettings today, just ConverterSettings,
but I think it can still be useful.**

**Link to tracking Issue:** Relates to #5615 and #9162

---------

Co-authored-by: Evan Bradley <11745660+evan-bradley@users.noreply.github.com>
@atoulme
Copy link
Contributor

atoulme commented Mar 13, 2024

This was discussed in SIG meeting 3/13 as part of the open-telemetry/opentelemetry-collector-contrib#9984 issue review.

mx-psi pushed a commit that referenced this issue Apr 4, 2024
…9837)

**Description:** 
Creates a logger in the confmap.ProviderSettings and uses it to log when
there is a missing or blank environment variable referenced in config.
For now the noop logger is used everywhere except tests.

**Link to tracking Issue:**
[5615](#5615)

**Testing:** 
I wrote unit tests that ensured 

1. logging occurred when an environment variable was unset 
2. logging occcured when the env var was empty.  
3. there was no log when an env var was used correctly

I also started the otel collector with the sample config - and added an
env var reference in the sample config. I then inserted a print
statement next to each log call to see whether my code paths were hit in
the live application. I then went through the 3 cases mentioned above
and ensured that logging behavior was accurate.
@TylerHelmuth
Copy link
Member

@mx-psi is this closed by #9837

@mx-psi
Copy link
Member

mx-psi commented Apr 19, 2024

@TylerHelmuth No, because the logger is noop. We need to pass a functioning logger, which is still a WIP

codeboten pushed a commit that referenced this issue May 6, 2024
#### Description

This is an RFC to help us decide how we want `otelcol` to provide a
logger before the primary logger is created. As we discuss I will update
the doc. Before this is merged we should have decided on a solution and
the Accepted Solution section must be updated.

Related to
#10056

#### Link to tracking issue
This unblocks:
- #9162
- #5615

---------

Co-authored-by: Pablo Baeyens <pbaeyens31+github@gmail.com>
Co-authored-by: Evan Bradley <11745660+evan-bradley@users.noreply.github.com>
mx-psi added a commit that referenced this issue May 10, 2024
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description

Provides a logger to confmap that buffers logs in memory until the
primary logger can be used. Once the primary logger exists, places that
used the original logger are given the updated Core.

If an error occurs that would shut down the collector before the primary
logger could be created, the logs are written to stdout/err using a
fallback logger.

Alternative to
#10008

I've pushed the testing I did to show how the logger successfully
updates. Before config resolution the debug log in confmap is not
printed, but afterwards it is.

test config:
```yaml
receivers:
  nop:

exporters:
  otlphttp:
    endpoint: http://0.0.0.0:4317
    headers:
      # Not set
      x-test: ${env:TEMP3}
  debug:
    # set to "detailed"
    verbosity: $TEMP

service:
  telemetry:
    logs:
      level: debug
  pipelines:
    traces:
      receivers:
        - nop
      exporters:
        - debug
```


![image](https://github.com/open-telemetry/opentelemetry-collector/assets/12352919/6a17993f-1f97-4c54-9165-5c34dd58d108)

<!-- Issue number if applicable -->
#### Link to tracking issue
Related to
#9162
Related to
#5615

<!--Describe what testing was performed and which tests were added.-->
#### Testing
If we like this approach I'll add tests

<!--Describe the documentation added.-->
#### Documentation

---------

Co-authored-by: Dan Jaglowski <jaglows3@gmail.com>
Co-authored-by: Pablo Baeyens <pbaeyens31+github@gmail.com>
@mx-psi mx-psi closed this as completed in 1d52fb9 May 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment