Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(dist/features): ship tracing and friends by default #3803

Draft
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

rami3l
Copy link
Member

@rami3l rami3l commented May 2, 2024

Part of #3790.

Rationale

Currently, helping out the Rustup team by enabling local tracing is quite a tedious process (esp. for community contributors), requiring rebuilding Rustup from the exact commit with an extra feature, otel:

## Usage
The normal [OTLP environment
variables](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/protocol/exporter.md)
can be used to customise its behaviour, but often the simplest thing is to just
run a Jaeger docker container on the same host:
```sh
docker run -d --name jaeger -e COLLECTOR_ZIPKIN_HOST_PORT=:9411 -e COLLECTOR_OTLP_ENABLED=true -p 6831:6831/udp -p 6832:6832/udp -p 5778:5778 -p 16686:16686 -p 4317:4317 -p 4318:4318 -p 14250:14250 -p 14268:14268 -p 14269:14269 -p 9411:9411 jaegertracing/all-in-one:latest
```
Then build rustup-init with tracing:
```sh
cargo build --features=otel
```
Run the operation you want to analyze:
```sh
RUSTUP_FORCE_ARG0="rustup" ./target/debug/rustup-init show
```
And [look in Jaeger for a trace](http://localhost:16686/search?service=rustup).

After some experiment, it turned out that we actually can ship the tracing features by default without forcing the user to face OTEL connection errors on a daily basis.

To clarify, this does not mean Rustup is setting up a central (a.k.a. phone-home-style) telemetry mechanism, and we will keep the tracing disabled by default unless RUST_LOG has been explicitly set.

Concerns

@rami3l rami3l added this to the 1.28.0 milestone May 2, 2024
@rami3l

This comment was marked as outdated.

@djc
Copy link
Contributor

djc commented May 2, 2024

I think we should use a basic console-based tracing-subscriber setup:

pub(super) fn subscribe() -> tracing::subscriber::DefaultGuard {
    let sub = tracing_subscriber::FmtSubscriber::builder()
        .with_max_level(tracing::Level::TRACE)
        .with_writer(|| TestWriter)
        .finish();
    tracing::subscriber::set_default(sub)
}

struct TestWriter;

impl Write for TestWriter {
    fn write(&mut self, buf: &[u8]) -> io::Result<usize> {
        print!(
            "{}",
            str::from_utf8(buf).expect("tried to log invalid UTF-8")
        );
        Ok(buf.len())
    }
    fn flush(&mut self) -> io::Result<()> {
        io::stdout().flush()
    }
}

This is what I've been using in Quinn for many years. Every test just starts by calling let _guard = subscribe();, which has been a highly effective tool for test observability.

If we do this we can massively simplify the scaffolding for traces/logging:

  • Get rid of all the opentelemetry dependencies, which I don't think we need
  • Remove the use of the otel feature
  • Remove the custom test macro

@rbtcollins
Copy link
Contributor

Currently, helping out the Rustup team by enabling local tracing is quite a tedious process (esp. for community contributors), requiring rebuilding Rustup from the exact commit with an extra feature, otel:

I don't think this is true. We've not asked people to build with otel enabled that I'm remembering. The OS level traces we use to debug fundamental problems are from strace / truss and other similar tools. Otel / tracing! is not deployed widely enough within rustup to be a replacement for such things.

@rbtcollins
Copy link
Contributor

Get rid of all the opentelemetry dependencies, which I don't think we need
Remove the use of the otel feature
Remove the custom test macro

Please don't - while the OS level debugging is vital, for doing investigations on performance, having a nice report with spans and the detailed call tree is very useful, and since we configure it off by default it has very little overhead to maintain or build with. Really only the all-features-build test matters.

@djc
Copy link
Contributor

djc commented May 2, 2024

Get rid of all the opentelemetry dependencies, which I don't think we need
Remove the use of the otel feature
Remove the custom test macro

Please don't - while the OS level debugging is vital, for doing investigations on performance, having a nice report with spans and the detailed call tree is very useful, and since we configure it off by default it has very little overhead to maintain or build with. Really only the all-features-build test matters.

How many times have you used it in the past year? IMO while custom test macro + opentelemetry dependencies may not impose run-time overhead for downstream users, it does impose significant maintenance overhead that may not be warranted for the additional insight compared to just tracing-subscriber built-in (and maybe tokio-console level output).

@djc
Copy link
Contributor

djc commented May 2, 2024

Currently, helping out the Rustup team by enabling local tracing is quite a tedious process (esp. for community contributors), requiring rebuilding Rustup from the exact commit with an extra feature, otel:

I don't think this is true. We've not asked people to build with otel enabled that I'm remembering. The OS level traces we use to debug fundamental problems are from strace / truss and other similar tools. Otel / tracing! is not deployed widely enough within rustup to be a replacement for such things.

IMO the important point is that we should have a user-facing solution like "enable RUST_LOG=trace and give us the output of that", which seems like a decent method of getting better insight into problems that happen only in specific environments, which seems to be an important source of issues for rustup.

@rami3l
Copy link
Member Author

rami3l commented May 3, 2024

I just checked and it looks like tokio-console doesn’t currently have a timeline view (tokio-rs/console#129), so I imagine opentelemetry and jaeger are here to stay for longer...

For now I plan to:

  • Ship tracing by default with a console-based subscriber.
  • If possible, reimplement our current logging system using that subscriber (with a fmt::layer() that mimics the original output style).
  • Keep the opentelemetry related stuff behind the otel feature.

More specifically, I imagine having multiple subscribers (tokio-rs/tracing#971) based on env vars and features:

  • A "classic" subscriber that targets process().stderr(), has a classic output format, will only print rustup log lines up to a certain level (info or verbose, depending on the input flags), and will be disabled if RUST_LOG is set.
  • A "tracing" subscriber that also targets process().stderr() and is not limited to rustup (so we could have tonic log lines as well, for example). Its activation is mutually exclusive with the "classic" subscriber, and the precise logging level will be controlled by RUST_LOG.
  • An OpenTelemetry subscriber (could be replaced by tokio-console in the future, but not just yet) available behind the otel feature, enabled simultaneously with the "tracing" subscriber.

Finally:

  • With the consistent use of RUST_LOG, RUSTUP_DEBUG should be retired accordingly.
  • We need to make sure this subscriber's use of CLI colors is coherent with the current system (incl. env variable controls via RUSTUP_TERM_COLOR, etc).

Waiting for #3367 might be worthwhile, since this will change the startup process.

@djc
Copy link
Contributor

djc commented May 30, 2024

#3367 has been merged, would be good to rebase this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants