Automated Dynamic Transaction Sampling Rate #11975

jeengbe · 2024-05-10T13:00:39Z

Problem Statement

Setting good sampling rates is difficult, with many factors you need to consider. You have to balance ever-changing RPS, request distribution amongst different endpoints, daily/weekly trends. Working with a limited budget and doing the maths for getting the numbers quite right even more.

Nobody does that. And even if you do, those numbers become outdated by the time the change is deployed.

The only feasible solution is a fixed transaction rate, much higher than necessary for certain endpoints. Say, you have a 100:1 distribution between two endpoints. Either you send waaay too many transactions for the first endpoint, or the latter is not instrumented enough.

Solution Brainstorm

The SDK should sample transactions intelligently. Instead of a plump probability, you instead allocate a total transaction volume that is properly distributed across different endpoints (transactions with the same name) such that all endpoints are represented equally in what is finally ingested.

This solves all the outlined pain points with sampling rates at once.

You could also still respect real request ratios, but slightly adjust sampling rates, such that a request ratio of 100:1 results in a sampling rate of e.g. 10:1 instead of 100:1 if the same sampling rate were used for both endpoints.

Issues With the Proposed Solution

Real request ratios are no longer reflected in Sentry as the goal of this suggestion is to align sampling rates across endpoints. That's fine, we have metrics for such analytics.
Changes to request distribution amongst endpoints could lead to false-positive transaction number alerts if an increase in one endpoint results in a "balancing" decrease of ingested transactions for another endpoint.

lforst · 2024-05-10T13:19:57Z

Hey man, it's a good idea. We've already built a feature around this called "Dynamic Sampling" which lives in the Sentry backend. Doing such things in the SDK is especially tricky and likely infeasible to implement in a sensical way due to the nature of distributed systems. In a distributed system you would have to synchronize state between all service instances just to keep track of sample rates. I wanted to link to docs just now but apparently this is completely abstracted away from users. Usually dynamic sampling kicks in after a certain number of transactions threshold.

I will refer you to @ale-cota who is leading the charge on the dynamic sampling feature. Maybe she can share some more things.

jeengbe · 2024-05-10T13:25:24Z

which lives in the Sentry backend

That's the point with moving this to the SDK 🙂 Dynamic sampling results in extra costs for you and extra costs for us, for what are essentially voided transactions. I forgot to clarify that, but this smart sampler would practically speaking only have an effect in entry point applications (or whatever you call projects that start a distributed trace).

If you take it to the extreme, you could have a /blog/:bloggg endpoint with 100 req/s and an /about-us that's visited twice a day. Unless you ingest an astronomical number of transactions, you'll never see a single /about-us trace, like ever.

This would complement? backend dynamic sampling.

lforst · 2024-05-10T13:31:15Z

Are you by chance asking for something like the tracesSampler option? There you can add logic on how to sample for every transaction separately. The sampling decision is also propagated to the downstream trace as usual.

jeengbe · 2024-05-10T13:35:35Z

You could implement this with a custom tracesSampler, but I'm likely not the only who would benefit from this.

jeengbe · 2024-05-10T13:36:28Z

I realise that I was not entirely correct with the "request volume". You could somewhat control ingestion volume with this, but the number would only ever match the transactions for the project at the root of the trace.

For smaller projects, however, where you only have a low number of services (which is the majority of Sentry customers, guessing??), you could more exactly control the transaction volume.

jeengbe · 2024-05-10T13:44:44Z

To add further, even for distributed systems, it can make a lot of sense to normalise/flatten request trends.

For example:

https://semrush.com/website/prisjakt.nu/overview/

There is little point in ingesting twice as many transactions in January just because you get more traffic that month. If you instead provide a fixed "roof", your Sentry numbers would be unaffected by seasonality.

lforst · 2024-05-10T13:45:33Z

We will backlog this but it is likely gonna end up very low on our prio pile. Not many people are asking for this and high-volume users are already benefitting from serverside dynamic sampling.

There is indeed a point of ingesting all the data, since you would be losing metrics data that comes with transactions and spans.

jeengbe · 2024-05-10T13:48:11Z

I understand. I have little insight in needs/priorities of others and can only guess based on my experience and how I use Sentry 🙂

jeengbe added the Type: Improvement label May 10, 2024

getsantry bot added the Waiting for: Product Owner label May 10, 2024

getsantry bot removed the Waiting for: Product Owner label May 10, 2024

getsantry bot added the Waiting for: Product Owner label May 10, 2024

getsantry bot removed the Waiting for: Product Owner label May 10, 2024

getsantry bot added the Waiting for: Product Owner label May 10, 2024

getsantry bot removed the Waiting for: Product Owner label May 10, 2024

getsantry bot added the Waiting for: Product Owner label May 10, 2024

lforst removed the Waiting for: Product Owner label May 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automated Dynamic Transaction Sampling Rate #11975

Automated Dynamic Transaction Sampling Rate #11975

jeengbe commented May 10, 2024 •

edited

lforst commented May 10, 2024

jeengbe commented May 10, 2024 •

edited

lforst commented May 10, 2024

jeengbe commented May 10, 2024 •

edited

jeengbe commented May 10, 2024 •

edited

jeengbe commented May 10, 2024 •

edited

lforst commented May 10, 2024

jeengbe commented May 10, 2024 •

edited

Automated Dynamic Transaction Sampling Rate #11975

Automated Dynamic Transaction Sampling Rate #11975

Comments

jeengbe commented May 10, 2024 • edited

Problem Statement

Solution Brainstorm

Issues With the Proposed Solution

lforst commented May 10, 2024

jeengbe commented May 10, 2024 • edited

lforst commented May 10, 2024

jeengbe commented May 10, 2024 • edited

jeengbe commented May 10, 2024 • edited

jeengbe commented May 10, 2024 • edited

lforst commented May 10, 2024

jeengbe commented May 10, 2024 • edited

jeengbe commented May 10, 2024 •

edited

jeengbe commented May 10, 2024 •

edited

jeengbe commented May 10, 2024 •

edited

jeengbe commented May 10, 2024 •

edited

jeengbe commented May 10, 2024 •

edited

jeengbe commented May 10, 2024 •

edited