Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Resource Sharing Across Exporters with Arc Implementation #1526

Merged
merged 18 commits into from Feb 15, 2024

Conversation

lalitb
Copy link
Member

@lalitb lalitb commented Feb 12, 2024

Changes

This PR introduces a refactor of the Resource structure to improve the efficiency of resource sharing across exporters for owned resource attributes. By encapsulating the Resource data within a ResourceInner struct and wrapping it with an Arc (Atomic Reference Counting), we eliminate the need for deep copies of resource data, thereby reducing memory overhead and improving performance.

Existing

#[derive(Clone, Debug, PartialEq)]
pub struct Resource {
    attrs: HashMap<Key, Value>,
    schema_url: Option<Cow<'static, str>>,
}

PR

/// Inner structure of `Resource` holding the actual data.
/// This structure is designed to be shared among `Resource` instances via `Arc`.
#[derive(Debug, Clone, PartialEq)]
struct ResourceInner {
    attrs: HashMap<Key, Value>,
    schema_url: Option<Cow<'static, str>>,
}

/// An immutable representation of the entity producing telemetry as attributes.
/// Utilizes `Arc` for efficient sharing and cloning.
#[derive(Clone, Debug, PartialEq)]
pub struct Resource {
    inner: Arc<ResourceInner>,
}

The stress test results (default has 4 resource attributes) for the main and PR branch. The results are more pronounced if the resource is added with LoggerProviderBuilder::with_resource and TracerProviderBuilder::with_resource methods as indicated here, as the resource created this way are owned, and can't be optimized by the compiler.

Spans:

main

Number of threads: 10
Throughput: 8,584,000 iterations/sec
Throughput: 8,560,000 iterations/sec
Throughput: 8,793,800 iterations/sec
Throughput: 8,465,200 iterations/sec

PR

Number of threads: 10
Throughput: 9,059,800 iterations/sec
Throughput: 8,719,200 iterations/sec
Throughput: 8,831,400 iterations/sec
Throughput: 8,767,800 iterations/sec

logs

main

Number of threads: 10
Throughput: 23,298,600 iterations/sec
Throughput: 21,358,600 iterations/sec
Throughput: 21,797,000 iterations/sec

PR

Number of threads: 10
Throughput: 25,054,600 iterations/sec
Throughput: 24,265,000 iterations/sec
Throughput: 24,576,600 iterations/sec
Throughput: 23,008,000 iterations/sec


## Merge requirement checklist

* [x] [CONTRIBUTING](https://github.com/open-telemetry/opentelemetry-rust/blob/main/CONTRIBUTING.md) guidelines followed
* [x] Unit tests added/updated (if applicable)
* [ ] Appropriate `CHANGELOG.md` files updated for non-trivial, user-facing changes
* [x] Changes in public API reviewed (if applicable)

@lalitb lalitb requested a review from a team as a code owner February 12, 2024 17:17
Copy link

codecov bot commented Feb 12, 2024

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (650904f) 63.2% compared to head (79d1896) 63.3%.

Files Patch % Lines
opentelemetry-sdk/src/resource/mod.rs 98.7% 1 Missing ⚠️
Additional details and impacted files
@@          Coverage Diff          @@
##            main   #1526   +/-   ##
=====================================
  Coverage   63.2%   63.3%           
=====================================
  Files        144     144           
  Lines      20299   20315   +16     
=====================================
+ Hits       12847   12863   +16     
  Misses      7452    7452           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@cijothomas
Copy link
Member

Can you also add benchmarks to logs, so we can how much this would improve the common use case (like 5-10 Resource attributes)?
(Would be good to share the stress test for the 5-10 attributes, to see how much this impacts normal usages.)

@lalitb
Copy link
Member Author

lalitb commented Feb 12, 2024

Just to add, the stress test uses NoOpLogProcessor, so the performance is measured for creating the LogData structure for each event (which involves copying of resources in existing design), and not processing by the exporter.

(Would be good to share the stress test for the 5-10 attributes, to see how much this impacts normal usages.)

With PR, the throughput doesn't change as we increase the number of attributes, as we are just doing shared resource increment and no real data copy. However, with existing code, the throughput reduces drastically as we increase the resource attributes.

5 resource attributes:

main:
Number of threads: 8
Throughput: 12,364,800 iterations/sec
Throughput: 12,177,400 iterations/sec
Throughput: 12,345,800 iterations/sec

PR:
Number of threads: 8
Throughput: 22,066,200 iterations/sec
Throughput: 22,142,600 iterations/sec
Throughput: 21,524,400 iterations/sec

10 resource attributes:
main:
Number of threads: 8
Throughput: 8,696,200 iterations/sec
Throughput: 8,750,200 iterations/sec
Throughput: 8,767,400 iterations/sec
Throughput: 8,888,200 iterations/sec

PR:
Number of threads: 8
Throughput: 22,376,000 iterations/sec
Throughput: 21,866,200 iterations/sec
Throughput: 22,334,200 iterations/sec

100 resource attributes:
main:
Number of threads: 8
Throughput: 1,357,600 iterations/sec
Throughput: 1,324,000 iterations/sec
Throughput: 1,353,000 iterations/sec

PR:
Number of threads: 8
Throughput: 22,715,400 iterations/sec
Throughput: 21,450,800 iterations/sec
Throughput: 22,922,000 iterations/sec

Can you also add benchmarks to logs,

Modified the benchmark to include attributes in common scenario:

                KeyValue::new("service.name", "my-service"),
                KeyValue::new("service.version", "1.0.0"),
                KeyValue::new("service.environment", "production"),
                KeyValue::new("service.instance.id", "1234"),

The results:
main:

simple-log/no-context time: [254.13 ns 255.45 ns 257.16 ns]
simple-log/with-context time: [251.62 ns 252.16 ns 252.76 ns]
simple-log-with-int/no-context time: [299.27 ns 300.59 ns 302.25 ns]
simple-log-with-int/with-context time: [302.25 ns 302.83 ns 303.50 ns]

PR:

simple-log/no-context time: [167.73 ns 168.10 ns 168.54 ns]
simple-log/with-context time: [162.33 ns 162.68 ns 163.13 ns]
simple-log-with-int/no-context time: time: [224.12 ns 225.28 ns 226.71 ns]
simple-log-with-int/with-context: time: [218.04 ns 218.28 ns 218.55 ns]

Copy link
Contributor

@hdost hdost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent gains ☺️ thanks again.

resource.attrs.insert(key, value);
// This call ensures that if the Arc is not uniquely owned,
// the data is cloned before modification, preserving safety.
// If the Arc is uniquely owned, it simply returns a mutable reference to the data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a constructor, at what point would the inner not be uniquely owned?
That being said is there a reason we shouldn't put the make_mut before the loop?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it would be safe to move make_mut before loop at this point. The inner is ref-incremented only during LogData/SpanData creation.

Copy link
Contributor

@hdost hdost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the work again!

Copy link
Contributor

@TommyCpp TommyCpp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great performance improvements!

Copy link
Member

@cijothomas cijothomas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
Before merge, I'd suggest to add changelog entry to let users know of the perf gains.
Also, please update PR description with before/after numbers.

@cijothomas cijothomas merged commit 476f2c1 into open-telemetry:main Feb 15, 2024
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants