Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate if attribute filtering should be in the instrument or aggregator #3011

Open
MrAlias opened this issue Jul 12, 2022 · 6 comments
Open
Labels
area:metrics Part of OpenTelemetry Metrics blocked:design Waiting on design work to be completed before implementation can start. blocked:specification Waiting on clarification of the OpenTelemetry specification before progress can be made pkg:SDK Related to an SDK package

Comments

@MrAlias
Copy link
Contributor

MrAlias commented Jul 12, 2022

Investigate which is more optimal

  1. Filter attributes at the instrument level when a "record" operation is called.
  2. Filter attributes at the aggregator level when an Aggregation collection is called.

Ensure we implement the optimal one.

@MrAlias MrAlias added pkg:SDK Related to an SDK package area:metrics Part of OpenTelemetry Metrics labels Jul 12, 2022
@MrAlias MrAlias added this to the Metric SDK: Beta milestone Jul 12, 2022
@jmacd
Copy link
Contributor

jmacd commented Jul 18, 2022

IMO filtering is better done in the aggregator, since the cost will be once per export interval instead of once per observation.

@MrAlias MrAlias removed this from the Metric SDK: Beta milestone Oct 20, 2022
@MrAlias MrAlias self-assigned this Dec 15, 2023
@MrAlias
Copy link
Contributor Author

MrAlias commented Dec 18, 2023

Initial benchmarking for the last-value aggregation (the simplistic one) shows the expected performance differences.

Benchmark:

func benchmarkFiltered[N int64 | float64](factory func(attribute.Filter) (Measure[N], ComputeAggregation)) func(*testing.B) {
	nAttr := []int{1, 10, 100}       // Number of distinct attribute sets.
	nMeas := []int{1, 10, 100, 1000} // Number of measurements made per attribute set.
	return func(b *testing.B) {
		for _, attributeCap := range nAttr {
			for _, measurements := range nMeas {
				name := fmt.Sprintf("Attributes/%d/Measurements/%d", attributeCap, measurements)
				b.Run(name, func(b *testing.B) {
					attrs := make([]attribute.Set, attributeCap)
					for i := range attrs {
						attrs[i] = attribute.NewSet(
							userAlice,
							attribute.Int("value", i),
						)
					}

					got := &bmarkRes
					ctx := context.Background()
					meas, comp := factory(attrFltr)

					b.ReportAllocs()
					b.ResetTimer()
					for n := 0; n < b.N; n++ {
						for m := 0; m < measurements; m++ {
							for _, attr := range attrs {
								meas(ctx, 1, attr)
							}
						}

						assert.Equal(b, 1, comp(got), "attributes not filtered")
					}
				})
			}
		}
	}
}
$ go test -run='^$' -bench=LastValue/Filtered/Int64 -count=10 > old.txt  # run on main
...
$ go test -run='^$' -bench=LastValue/Filtered/Int64 -count=10 > new.txt  # run on test branch
...
$ benchstat old.txt new.txt
goos: linux
goarch: amd64
pkg: go.opentelemetry.io/otel/sdk/metric/internal/aggregate
cpu: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz
                                                            │   old.txt   │                new.txt                │
                                                            │   sec/op    │    sec/op     vs base                 │
LastValue/Filtered/Int64/Attributes/1/Measurements/1-8        1.154µ ± 3%    1.779µ ± 2%   +54.23% (p=0.000 n=10)
LastValue/Filtered/Int64/Attributes/1/Measurements/10-8       4.776µ ± 3%    3.772µ ± 1%   -21.03% (p=0.000 n=10)
LastValue/Filtered/Int64/Attributes/1/Measurements/100-8      40.46µ ± 4%    23.15µ ± 2%   -42.79% (p=0.000 n=10)
LastValue/Filtered/Int64/Attributes/1/Measurements/1000-8     398.4µ ± 3%    218.9µ ± 6%   -45.06% (p=0.000 n=10)
LastValue/Filtered/Int64/Attributes/10/Measurements/1-8       4.836µ ± 2%   10.019µ ± 2%  +107.18% (p=0.000 n=10)
LastValue/Filtered/Int64/Attributes/10/Measurements/10-8      40.57µ ± 3%    30.68µ ± 1%   -24.39% (p=0.000 n=10)
LastValue/Filtered/Int64/Attributes/10/Measurements/100-8     406.0µ ± 3%    240.3µ ± 1%   -40.82% (p=0.000 n=10)
LastValue/Filtered/Int64/Attributes/10/Measurements/1000-8    4.088m ± 3%    2.349m ± 1%   -42.55% (p=0.000 n=10)
LastValue/Filtered/Int64/Attributes/100/Measurements/1-8      41.33µ ± 2%    87.16µ ± 1%  +110.90% (p=0.000 n=10)
LastValue/Filtered/Int64/Attributes/100/Measurements/10-8     411.0µ ± 3%    291.8µ ± 1%   -29.00% (p=0.000 n=10)
LastValue/Filtered/Int64/Attributes/100/Measurements/100-8    4.058m ± 3%    2.417m ± 1%   -40.42% (p=0.000 n=10)
LastValue/Filtered/Int64/Attributes/100/Measurements/1000-8   40.43m ± 2%    22.82m ± 6%   -43.56% (p=0.000 n=10)
geomean                                                       144.2µ         119.3µ        -17.28%

                                                            │     old.txt     │               new.txt                │
                                                            │      B/op       │     B/op      vs base                │
LastValue/Filtered/Int64/Attributes/1/Measurements/1-8             216.0 ± 0%     328.0 ± 0%  +51.85% (p=0.000 n=10)
LastValue/Filtered/Int64/Attributes/1/Measurements/10-8           1944.0 ± 0%     327.0 ± 0%  -83.18% (p=0.000 n=10)
LastValue/Filtered/Int64/Attributes/1/Measurements/100-8         19224.0 ± 0%     328.0 ± 0%  -98.29% (p=0.000 n=10)
LastValue/Filtered/Int64/Attributes/1/Measurements/1000-8       192024.0 ± 0%     327.5 ± 0%  -99.83% (p=0.000 n=10)
LastValue/Filtered/Int64/Attributes/10/Measurements/1-8          1.898Ki ± 0%   2.057Ki ± 0%   +8.33% (p=0.000 n=10)
LastValue/Filtered/Int64/Attributes/10/Measurements/10-8        18.773Ki ± 0%   2.056Ki ± 0%  -89.05% (p=0.000 n=10)
LastValue/Filtered/Int64/Attributes/10/Measurements/100-8      187.523Ki ± 0%   2.055Ki ± 0%  -98.90% (p=0.000 n=10)
LastValue/Filtered/Int64/Attributes/10/Measurements/1000-8    1875.026Ki ± 0%   2.058Ki ± 1%  -99.89% (p=0.000 n=10)
LastValue/Filtered/Int64/Attributes/100/Measurements/1-8         18.77Ki ± 0%   18.93Ki ± 0%   +0.83% (p=0.000 n=10)
LastValue/Filtered/Int64/Attributes/100/Measurements/10-8       187.52Ki ± 0%   18.93Ki ± 0%  -89.90% (p=0.000 n=10)
LastValue/Filtered/Int64/Attributes/100/Measurements/100-8     1875.03Ki ± 0%   18.97Ki ± 0%  -98.99% (p=0.000 n=10)
LastValue/Filtered/Int64/Attributes/100/Measurements/1000-8   18750.06Ki ± 0%   19.43Ki ± 0%  -99.90% (p=0.000 n=10)
geomean                                                          60.02Ki        2.323Ki       -96.13%

                                                            │    old.txt    │              new.txt               │
                                                            │   allocs/op   │ allocs/op   vs base                │
LastValue/Filtered/Int64/Attributes/1/Measurements/1-8           3.000 ± 0%   4.000 ± 0%  +33.33% (p=0.000 n=10)
LastValue/Filtered/Int64/Attributes/1/Measurements/10-8         21.000 ± 0%   4.000 ± 0%  -80.95% (p=0.000 n=10)
LastValue/Filtered/Int64/Attributes/1/Measurements/100-8       201.000 ± 0%   4.000 ± 0%  -98.01% (p=0.000 n=10)
LastValue/Filtered/Int64/Attributes/1/Measurements/1000-8     2001.000 ± 0%   4.000 ± 0%  -99.80% (p=0.000 n=10)
LastValue/Filtered/Int64/Attributes/10/Measurements/1-8          21.00 ± 0%   22.00 ± 0%   +4.76% (p=0.000 n=10)
LastValue/Filtered/Int64/Attributes/10/Measurements/10-8        201.00 ± 0%   22.00 ± 0%  -89.05% (p=0.000 n=10)
LastValue/Filtered/Int64/Attributes/10/Measurements/100-8      2001.00 ± 0%   22.00 ± 0%  -98.90% (p=0.000 n=10)
LastValue/Filtered/Int64/Attributes/10/Measurements/1000-8    20001.00 ± 0%   22.00 ± 0%  -99.89% (p=0.000 n=10)
LastValue/Filtered/Int64/Attributes/100/Measurements/1-8         201.0 ± 0%   203.0 ± 0%   +1.00% (p=0.000 n=10)
LastValue/Filtered/Int64/Attributes/100/Measurements/10-8       2001.0 ± 0%   203.0 ± 0%  -89.86% (p=0.000 n=10)
LastValue/Filtered/Int64/Attributes/100/Measurements/100-8     20001.0 ± 0%   203.0 ± 0%  -98.99% (p=0.000 n=10)
LastValue/Filtered/Int64/Attributes/100/Measurements/1000-8   200001.0 ± 0%   203.0 ± 0%  -99.90% (p=0.000 n=10)
geomean                                                          660.4        26.14       -96.04%

In the trivial case, where there is one measurement per distinct attribute set, there is an decrease in CPU performance and an increase in memory use. This makes sense as the backing array for the aggregate will be the performance bottle-neck, not the filtering computation.

However, for more realistic workloads, where there are many measurements for distinct attribute sets, the CPU performance increased and memory use decreased. Importantly though, the allocation scaled by O(N) for N being the number of distinct attributes instead of O(N+M) for M being the number of measurements made in a collection. This allocation detail is not reflected in the CPU performance, but will be a major factor in real world scenarios based on the GC pressure that will be removed (in all but the trivial case).

This initial testing shows this change should continue to be pursued.

@MrAlias
Copy link
Contributor Author

MrAlias commented Dec 19, 2023

Cardinality limiting is going to complicate this. Limiting is currently done on the measurement of values. However, with this filtering being done on collection the current limiting will "over limit".

@MrAlias
Copy link
Contributor Author

MrAlias commented Dec 19, 2023

Cardinality limiting is going to complicate this. Limiting is currently done on the measurement of values. However, with this filtering being done on collection the current limiting will "over limit".

This interaction needs to be brought to the specification prior to the cardinality limit being stabilized.

@MrAlias
Copy link
Contributor Author

MrAlias commented Dec 19, 2023

Cardinality limiting is going to complicate this. Limiting is currently done on the measurement of values. However, with this filtering being done on collection the current limiting will "over limit".

This interaction needs to be brought to the specification prior to the cardinality limit being stabilized.

open-telemetry/opentelemetry-specification#3798

@MrAlias
Copy link
Contributor Author

MrAlias commented Jan 29, 2024

Moving out of the post-GA project. There is not clear consensus on how to resolve open-telemetry/opentelemetry-specification#3803.

Give the solution to this will require inconsistent attribute filter values in favor of performance there needs to be a strong user desire to see this before it warrants the developer commitment.

@MrAlias MrAlias removed their assignment Jan 29, 2024
@MrAlias MrAlias added the blocked:design Waiting on design work to be completed before implementation can start. label Jan 29, 2024
@pellared pellared added the blocked:specification Waiting on clarification of the OpenTelemetry specification before progress can be made label Feb 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:metrics Part of OpenTelemetry Metrics blocked:design Waiting on design work to be completed before implementation can start. blocked:specification Waiting on clarification of the OpenTelemetry specification before progress can be made pkg:SDK Related to an SDK package
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants