Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change to owned LoggerProvider #1455

Merged

Conversation

hdost
Copy link
Contributor

@hdost hdost commented Dec 26, 2023

The interface to an item shouldn't take an inner value since it's considered inner, this also allows for further optimizations in the future as it hides the complexity from the user.

Relates #1209

Changes

Please provide a brief description of the changes here.

Merge requirement checklist

  • CONTRIBUTING guidelines followed
  • Unit tests added/updated (if applicable)
  • Appropriate CHANGELOG.md files updated for non-trivial, user-facing changes
  • Changes in public API reviewed (if applicable)

@hdost hdost requested a review from a team as a code owner December 26, 2023 22:12
@hdost hdost force-pushed the bug/1209-fix-upgrade-performance branch from f741d94 to 86767ad Compare December 27, 2023 09:20
Copy link

codecov bot commented Dec 27, 2023

Codecov Report

Attention: 2 lines in your changes are missing coverage. Please review.

Comparison is base (8688e7e) 61.8% compared to head (18ee088) 61.8%.

Files Patch % Lines
opentelemetry-sdk/src/logs/log_emitter.rs 90.9% 1 Missing ⚠️
opentelemetry/src/logs/noop.rs 0.0% 1 Missing ⚠️
Additional details and impacted files
@@          Coverage Diff          @@
##            main   #1455   +/-   ##
=====================================
  Coverage   61.8%   61.8%           
=====================================
  Files        144     144           
  Lines      19810   19805    -5     
=====================================
- Hits       12256   12254    -2     
+ Misses      7554    7551    -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@lalitb
Copy link
Member

lalitb commented Dec 27, 2023

Trying to understand more here - so with this PR, each logger holds its own complete instance of LoggerProvider, which would be different from the other loggers, and also different from the global LoggerProvider. And will calling shutdown on the global loggerProvider ensure that the logs through these loggers are not exported ?

Also, would there be increased memory usage if many Logger instances are created, as each carries a full copy of the LoggerProvider and its internal state.

Also, please change the PR title to reflect LoggerProvider instead of TracerProvider :)

@hdost hdost changed the title Change to owned TracerProvider Change to owned LoggerProvider Dec 27, 2023
@hdost
Copy link
Contributor Author

hdost commented Dec 27, 2023

Trying to understand more here - so with this PR, each logger holds its own complete instance of LoggerProvider, which would be different from the other loggers, and also different from the global LoggerProvider. And will calling shutdown on the global loggerProvider ensure that the logs through these loggers are not exported ?

The inner is still the same, so it should, I'll double check that there's a test.

Also, would there be increased memory usage if many Logger instances are created, as each carries a full copy of the LoggerProvider and its internal state.

It won't be a full copy, the inner is an Arc, so the outer would be an additional pointer memory wise.

Also, please change the PR title to reflect LoggerProvider instead of TracerProvider :)

Done 😅

@lalitb
Copy link
Member

lalitb commented Dec 28, 2023

It won't be a full copy, the inner is an Arc, so the outer would be an additional pointer memory wise.

Ah good point. I didn't realize that inner is Arc type, and since there is no owned trait implementation, the to_owned() will fallback to default clone, and so all outer loggerprovider instances would share the single inner instance. As @TommyCpp suggested, it might be more transparent and clearer to use self.clone() explicitly in this context.

@lalitb
Copy link
Member

lalitb commented Dec 28, 2023

And will calling shutdown on the global loggerProvider ensure that the logs through these loggers are not exported ?

It would be good to validate this use-case. It seems shutdown_logger_provider will invoke shutdown on the LogProcessors when the inner instance is dropped. And now if any of the loggers have it's reference in separate thread, the shutdown will not be invoked. One option can be to add a new LoggerProvider::Shutdown() method, and explicitly invoke it from within shutdown_logger_provider . This new method will invoke shutdown on all processors, and so ensure that all the existing logs are flushed. And no new logs can be exported with processors in shutdown state.

@TommyCpp
Copy link
Contributor

Hmm I think this PR is the same as #1446 ?

@hdost
Copy link
Contributor Author

hdost commented Jan 2, 2024

Hmm I think this PR is the same as #1446 ?

Yes it's meant to be an alternative because right now the initial interface requires passing the under which is inherently leaky as an interface. This aims to remove the necessity.

@lalitb
Copy link
Member

lalitb commented Jan 3, 2024

Surprisingly, this PR has better perf numbers over #1446, as evident from the stress results. I initially thought both should have similar stats.

/home/labhas/opentelemetry-rust/stress$ cargo run --bin logs --release

#1455 (current PR):
Number of threads: 8
Throughput: 20,641,400 iterations/sec
Throughput: 20,336,600 iterations/sec

#1446:
Number of threads: 8
Throughput: 14,935,400 iterations/sec
Throughput: 15,633,600 iterations/sec

#main branch:
Number of threads: 8
Throughput: 11,473,200 iterations/sec
Throughput: 11,909,200 iterations/sec

@hdost
Copy link
Contributor Author

hdost commented Jan 4, 2024

That's a nice benefit as well. I believe it's because we're really doing less things in this one.

Side note: I really would love to run whatever machines y'all are running because my same stress benchmarks never get as good of results. Maybe after the next refresh for my laptop 😅

@lalitb
Copy link
Member

lalitb commented Jan 4, 2024

And will calling shutdown on the global loggerProvider ensure that the logs through these loggers are not exported ?

It would be good to validate this use-case. It seems shutdown_logger_provider will invoke shutdown on the LogProcessors when the inner instance is dropped. And now if any of the loggers have it's reference in separate thread, the shutdown will not be invoked. One option can be to add a new LoggerProvider::Shutdown() method, and explicitly invoke it from within shutdown_logger_provider . This new method will invoke shutdown on all processors, and so ensure that all the existing logs are flushed. And no new logs can be exported with processors in shutdown state.

Thinking about this scenario again, it seems that the potential issue of leaking providers might not be as significant if we can document it clearly to explain the risk. Specifically, providers could be leaked if any of the tasks indefinitely retain their reference, which can result in a shutdown not triggering.

@cijothomas
Copy link
Member

Hmm I think this PR is the same as #1446 ?

Yes it's meant to be an alternative because right now the initial interface requires passing the under which is inherently leaky as an interface. This aims to remove the necessity.

Just catching up on PRs! Could you describe any pros/cons of this vs the alternate in the PR desc? Also, if feasible, can you add the stress test numbers you are observing. I am not seeing the same gains as Lalit has posted above, so wondering what is the actual gain! (i'll try to run more tests and post back as well soon)

@lalitb
Copy link
Member

lalitb commented Jan 19, 2024

My earlier results were on the Azure dev box, but I can also see better perf on my Windows desktop with WSL2:

#main branch:
Running /home/labhas/obs/ot/opentelemetry-rust/target/release/logs
Number of threads: 10
Throughput: 9,659,000 iterations/sec
Throughput: 9,166,800 iterations/sec
Throughput: 9,068,400 iterations/sec
Throughput: 9,013,600 iterations/sec

#this PR:
Running /home/labhas/obs/ot/opentelemetry-rust/target/release/logs
Number of threads: 10
Throughput: 28,168,200 iterations/sec
Throughput: 26,013,800 iterations/sec
Throughput: 23,196,200 iterations/sec
Throughput: 24,413,400 iterations/sec

@hdost hdost force-pushed the bug/1209-fix-upgrade-performance branch from 85ef890 to 9de34eb Compare February 6, 2024 13:55
@hdost
Copy link
Contributor Author

hdost commented Feb 6, 2024

Are we still thinking that we need to have the testing before this gets merged?

I'm trying to thing what would be a valid testing scenario that we'd want to run.

@cijothomas
Copy link
Member

Are we still thinking that we need to have the testing before this gets merged?

I'm trying to thing what would be a valid testing scenario that we'd want to run.

From SIG Call:
We don't need to block the PR for tests - a targetted test can be added as a followup.
Scenario: #1455 (comment)
There are other issues reporting about shutdown scenarios, maybe we need a comprehensive testing targetting shutdowns, separate from this PR.

Copy link
Member

@cijothomas cijothomas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
Maybe good to add changelog entry to the sdk.
We can definitely follow up to do targeted tests for shutdown part.

@hdost hdost force-pushed the bug/1209-fix-upgrade-performance branch from 67fb1cc to e8f069c Compare February 7, 2024 02:00
The interface to an item shouldn't take an inner value since it's
considered inner, this also allows for further optimizations in the
future as it hides the complexity from the user.

Rational:

This removes exposing the inner which doesn't need to be provided
outside of the class. The advantage of this approach is that it's a
cleaner implementation. This also removes a weak reference upgrade from
the hotpath since we need to have a strong reference in order to access
the information.

Relates open-telemetry#1209
NoopLogger should be using a little resources as possible ideally none.
This should help accomplish that.
@hdost hdost force-pushed the bug/1209-fix-upgrade-performance branch from e8f069c to 18ee088 Compare February 7, 2024 02:04
@cijothomas cijothomas merged commit 80da656 into open-telemetry:main Feb 7, 2024
15 checks passed
@cijothomas
Copy link
Member

I ran stress test for logs after modifying the no-op processor to return false for event_enabled.
main branch:

Number of threads: 8
Throughput: 75,537,000 iterations/sec
Throughput: 77,545,600 iterations/sec

This pr:

Number of threads: 8
Throughput: 579,008,400 iterations/sec
Throughput: 586,900,800 iterations/sec

On the other hand, if using a no-op loglayer with event_enabled returning false, the same test gives below:

Number of threads: 8
Throughput: 1,106,156,200 iterations/sec
Throughput: 1,069,867,600 iterations/sec

This test shows that OTel still has a lot more room for further optimizations. Those can be explored in the future. This PR already boosts the perf significantly!

github-merge-queue bot pushed a commit to TheHackerApp/logging that referenced this pull request May 7, 2024
[![Mend
Renovate](https://app.renovatebot.com/images/banner.svg)](https://renovatebot.com)

This PR contains the following updates:

| Package | Type | Update | Change |
|---|---|---|---|
|
[opentelemetry](https://togithub.com/open-telemetry/opentelemetry-rust)
| dependencies | minor | `0.21` -> `0.22` |
|
[opentelemetry-http](https://togithub.com/open-telemetry/opentelemetry-rust)
| dependencies | minor | `0.10` -> `0.11` |
|
[opentelemetry-otlp](https://togithub.com/open-telemetry/opentelemetry-rust/tree/main/opentelemetry-otlp)
([source](https://togithub.com/open-telemetry/opentelemetry-rust/tree/HEAD/opentelemetry-otlp))
| dependencies | minor | `0.14` -> `0.15` |
|
[opentelemetry_sdk](https://togithub.com/open-telemetry/opentelemetry-rust)
| dependencies | minor | `0.21` -> `0.22` |
|
[tracing-opentelemetry](https://togithub.com/tokio-rs/tracing-opentelemetry)
| dependencies | minor | `0.22` -> `0.23` |

---

### Release Notes

<details>
<summary>open-telemetry/opentelemetry-rust (opentelemetry)</summary>

###
[`v0.22.0`](https://togithub.com/open-telemetry/opentelemetry-rust/releases/tag/v0.22.0)

[Compare
Source](https://togithub.com/open-telemetry/opentelemetry-rust/compare/v0.21.0...opentelemetry-0.22.0)

### API

#### Added

-
[open-telemetry/opentelemetry-rust#1410
Add experimental synchronous gauge. This is behind the feature flag, and
can be enabled by enabling the feature otel_unstable for opentelemetry
crate.

-
[open-telemetry/opentelemetry-rust#1410
Guidelines to add new unstable/experimental features.

#### Changed

- Modified AnyValue.Map to be backed by HashMap instead of custom
OrderMap, which internally used IndexMap. There was no requirement to
maintain the order of entries, so moving from IndexMap to HashMap offers
slight performance gains, and avoids IndexMap dependency. This affects
body and attributes of LogRecord.
[open-telemetry/opentelemetry-rust#1353
- Add TextMapCompositePropagator
[open-telemetry/opentelemetry-rust#1373
- Turned off events for NoopLogger to save on operations
[open-telemetry/opentelemetry-rust#1455

#### Removed

- Removed OrderMap type as there was no requirement to use this over
regular HashMap.
[open-telemetry/opentelemetry-rust#1353
- Remove API for Creating Histograms with signed integers.
[open-telemetry/opentelemetry-rust#1371
- Remove global::shutdown_meter_provider, use SdkMeterProvider::shutdown
directly instead
([#&#8203;1412](https://togithub.com/open-telemetry/opentelemetry-rust/issues/1412)).

### SDK

##### Deprecated

- XrayIdGenerator in the opentelemetry-sdk has been deprecated and moved
to version 0.10.0 of the opentelemetry-aws crate.

##### Added

-
[#&#8203;1410](https://togithub.com/open-telemetry/opentelemetry-rust/pull/1410)
Add experimental synchronous gauge

-
[#&#8203;1471](https://togithub.com/open-telemetry/opentelemetry-rust/pull/1471)
Configure batch log record processor via
[`OTEL_BLRP_*`](https://togithub.com/open-telemetry/opentelemetry-specification/blob/main/specification/configuration/sdk-environment-variables.md#batch-logrecord-processor)
environment variables and via `OtlpLogPipeline::with_batch_config`

-
[#&#8203;1503](https://togithub.com/open-telemetry/opentelemetry-rust/pull/1503)
Make the documentation for In-Memory exporters visible.

-
[#&#8203;1526](https://togithub.com/open-telemetry/opentelemetry-rust/pull/1526)
Performance Improvement : Creating Spans and LogRecords are now faster,
by avoiding expensive cloning of `Resource` for every Span/LogRecord.

##### Changed

-   **Breaking**

[#&#8203;1313](https://togithub.com/open-telemetry/opentelemetry-rust/pull/1313)

[#&#8203;1350](https://togithub.com/open-telemetry/opentelemetry-rust/pull/1350)
Changes how Span links/events are stored to achieve performance gains.
See
    below for details:

*Behavior Change*: When enforcing `max_links_per_span`,
`max_events_per_span`
from `SpanLimits`, links/events are kept in the first-come order. The
previous
    "eviction" based approach is no longer performed.

    *Breaking Change Affecting Exporter authors*:

`SpanData` now stores `links` as `SpanLinks` instead of `EvictedQueue`
where
    `SpanLinks` is a struct with a `Vec` of links and `dropped_count`.

`SpanData` now stores `events` as `SpanEvents` instead of `EvictedQueue`
where
    `SpanEvents` is a struct with a `Vec` of events and `dropped_count`.

- **Breaking** Remove `TextMapCompositePropagator`
[#&#8203;1373](https://togithub.com/open-telemetry/opentelemetry-rust/pull/1373).
Use `TextMapCompositePropagator` in opentelemetry API.

-
[#&#8203;1375](https://togithub.com/open-telemetry/opentelemetry-rust/pull/1375/)
Fix metric collections during PeriodicReader shutdown

- **Breaking**
[#&#8203;1480](https://togithub.com/open-telemetry/opentelemetry-rust/pull/1480)
Remove fine grained `BatchConfig` configurations from
`BatchLogProcessorBuilder` and `BatchSpanProcessorBuilder`. Use
`BatchConfigBuilder` to construct a `BatchConfig` instance and pass it
using `BatchLogProcessorBuilder::with_batch_config` or
`BatchSpanProcessorBuilder::with_batch_config`.

- **Breaking**
[#&#8203;1480](https://togithub.com/open-telemetry/opentelemetry-rust/pull/1480)
Remove mutating functions from `BatchConfig`, use `BatchConfigBuilder`
to construct a `BatchConfig` instance.

- **Breaking**
[#&#8203;1495](https://togithub.com/open-telemetry/opentelemetry-rust/pull/1495)
Remove Batch LogRecord\&Span Processor configuration via non-standard
environment variables. Use the following table to migrate from the no
longer supported non-standard environment variables to the standard
ones.

| No longer supported             | Standard equivalent       |
|---------------------------------|---------------------------|
| OTEL_BLRP_SCHEDULE_DELAY_MILLIS | OTEL_BLRP_SCHEDULE_DELAY  |
| OTEL_BLRP_EXPORT_TIMEOUT_MILLIS | OTEL_BLRP_EXPORT_TIMEOUT  |
| OTEL_BSP_SCHEDULE_DELAY_MILLIS  | OTEL_BSP_SCHEDULE_DELAY   |
| OTEL_BSP_EXPORT_TIMEOUT_MILLIS  | OTEL_BSP_EXPORT_TIMEOUT   |

- **Breaking**
[1455](https://togithub.com/open-telemetry/opentelemetry-rust/pull/1455)
Make the LoggerProvider Owned
- `Logger` now takes an Owned Logger instead of a
`Weak<LoggerProviderInner>`
    -   `LoggerProviderInner` is no longer `pub (crate)`
- `Logger.provider()` now returns `&LoggerProvider` instead of an
`Option<LoggerProvider>`

-
[1519](https://togithub.com/open-telemetry/opentelemetry-rust/pull/1519)
Performance improvements
when calling `Counter::add()` and `UpDownCounter::add()` with an empty
set of attributes
    (e.g. `counter.Add(5, &[])`)

##### Fixed

-
[#&#8203;1481](https://togithub.com/open-telemetry/opentelemetry-rust/pull/1481)
Fix error message caused by race condition when using PeriodicReader

</details>

<details>
<summary>open-telemetry/opentelemetry-rust
(opentelemetry-otlp)</summary>

###
[`v0.15.0`](https://togithub.com/open-telemetry/opentelemetry-rust/blob/HEAD/opentelemetry-otlp/CHANGELOG.md#v0150)

[Compare
Source](https://togithub.com/open-telemetry/opentelemetry-rust/compare/opentelemetry-otlp-0.14.0...opentelemetry-otlp-0.15.0)

##### Added

- Support custom channels in topic exporters
[#&#8203;1335](https://togithub.com/open-telemetry/opentelemetry-rust/pull/1335)
- Allow specifying OTLP Tonic metadata from env variable
[#&#8203;1377](https://togithub.com/open-telemetry/opentelemetry-rust/pull/1377)

##### Changed

- Update to tonic 0.11 and prost 0.12
[#&#8203;1536](https://togithub.com/open-telemetry/opentelemetry-rust/pull/1536)

##### Fixed

- Fix `tonic()` to the use correct port.
[#&#8203;1556](https://togithub.com/open-telemetry/opentelemetry-rust/pull/1556)

##### Removed

- **Breaking** Remove support for surf HTTP client
[#&#8203;1537](https://togithub.com/open-telemetry/opentelemetry-rust/pull/1537)
- **Breaking** Remove support for grpcio transport
[#&#8203;1534](https://togithub.com/open-telemetry/opentelemetry-rust/pull/1534)

</details>

<details>
<summary>tokio-rs/tracing-opentelemetry
(tracing-opentelemetry)</summary>

###
[`v0.23.0`](https://togithub.com/tokio-rs/tracing-opentelemetry/blob/HEAD/CHANGELOG.md#0230-February-26-2024)

[Compare
Source](https://togithub.com/tokio-rs/tracing-opentelemetry/compare/v0.22.0...v0.23.0)

##### Breaking Changes

-   Upgrade to opentelemetry 0.22. Refer to the upstream

[changelog](https://togithub.com/open-telemetry/opentelemetry-rust/releases/tag/v0.22.0)
for more information. In particular, i64 histograms will silently
downgrade to
    key/value exports.

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined),
Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you
are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the
rebase/retry checkbox.

👻 **Immortal**: This PR will be recreated if closed unmerged. Get
[config help](https://togithub.com/renovatebot/renovate/discussions) if
that's undesired.

---

- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box

---

This PR has been generated by [Mend
Renovate](https://www.mend.io/free-developer-tools/renovate/). View
repository job log
[here](https://developer.mend.io/github/TheHackerApp/logging).

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy4yMTIuMCIsInVwZGF0ZWRJblZlciI6IjM3LjM0MC4xMCIsInRhcmdldEJyYW5jaCI6Im1haW4ifQ==-->

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants