Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automated Dynamic Transaction Sampling Rate #11975

Open
jeengbe opened this issue May 10, 2024 · 8 comments
Open

Automated Dynamic Transaction Sampling Rate #11975

jeengbe opened this issue May 10, 2024 · 8 comments

Comments

@jeengbe
Copy link
Contributor

jeengbe commented May 10, 2024

Problem Statement

Setting good sampling rates is difficult, with many factors you need to consider. You have to balance ever-changing RPS, request distribution amongst different endpoints, daily/weekly trends. Working with a limited budget and doing the maths for getting the numbers quite right even more.

Nobody does that. And even if you do, those numbers become outdated by the time the change is deployed.

The only feasible solution is a fixed transaction rate, much higher than necessary for certain endpoints. Say, you have a 100:1 distribution between two endpoints. Either you send waaay too many transactions for the first endpoint, or the latter is not instrumented enough.

Solution Brainstorm

The SDK should sample transactions intelligently. Instead of a plump probability, you instead allocate a total transaction volume that is properly distributed across different endpoints (transactions with the same name) such that all endpoints are represented equally in what is finally ingested.

This solves all the outlined pain points with sampling rates at once.

You could also still respect real request ratios, but slightly adjust sampling rates, such that a request ratio of 100:1 results in a sampling rate of e.g. 10:1 instead of 100:1 if the same sampling rate were used for both endpoints.

Issues With the Proposed Solution

  • Real request ratios are no longer reflected in Sentry as the goal of this suggestion is to align sampling rates across endpoints. That's fine, we have metrics for such analytics.
  • Changes to request distribution amongst endpoints could lead to false-positive transaction number alerts if an increase in one endpoint results in a "balancing" decrease of ingested transactions for another endpoint.
@lforst
Copy link
Member

lforst commented May 10, 2024

Hey man, it's a good idea. We've already built a feature around this called "Dynamic Sampling" which lives in the Sentry backend. Doing such things in the SDK is especially tricky and likely infeasible to implement in a sensical way due to the nature of distributed systems. In a distributed system you would have to synchronize state between all service instances just to keep track of sample rates. I wanted to link to docs just now but apparently this is completely abstracted away from users. Usually dynamic sampling kicks in after a certain number of transactions threshold.

I will refer you to @ale-cota who is leading the charge on the dynamic sampling feature. Maybe she can share some more things.

@jeengbe
Copy link
Contributor Author

jeengbe commented May 10, 2024

which lives in the Sentry backend

That's the point with moving this to the SDK 🙂 Dynamic sampling results in extra costs for you and extra costs for us, for what are essentially voided transactions. I forgot to clarify that, but this smart sampler would practically speaking only have an effect in entry point applications (or whatever you call projects that start a distributed trace).

If you take it to the extreme, you could have a /blog/:bloggg endpoint with 100 req/s and an /about-us that's visited twice a day. Unless you ingest an astronomical number of transactions, you'll never see a single /about-us trace, like ever.

This would complement? backend dynamic sampling.

@lforst
Copy link
Member

lforst commented May 10, 2024

Are you by chance asking for something like the tracesSampler option? There you can add logic on how to sample for every transaction separately. The sampling decision is also propagated to the downstream trace as usual.

@jeengbe
Copy link
Contributor Author

jeengbe commented May 10, 2024

You could implement this with a custom tracesSampler, but I'm likely not the only who would benefit from this.

@jeengbe
Copy link
Contributor Author

jeengbe commented May 10, 2024

I realise that I was not entirely correct with the "request volume". You could somewhat control ingestion volume with this, but the number would only ever match the transactions for the project at the root of the trace.

For smaller projects, however, where you only have a low number of services (which is the majority of Sentry customers, guessing??), you could more exactly control the transaction volume.

@jeengbe
Copy link
Contributor Author

jeengbe commented May 10, 2024

To add further, even for distributed systems, it can make a lot of sense to normalise/flatten request trends.

For example:

https://semrush.com/website/prisjakt.nu/overview/

There is little point in ingesting twice as many transactions in January just because you get more traffic that month. If you instead provide a fixed "roof", your Sentry numbers would be unaffected by seasonality.

@lforst
Copy link
Member

lforst commented May 10, 2024

We will backlog this but it is likely gonna end up very low on our prio pile. Not many people are asking for this and high-volume users are already benefitting from serverside dynamic sampling.

There is indeed a point of ingesting all the data, since you would be losing metrics data that comes with transactions and spans.

@jeengbe
Copy link
Contributor Author

jeengbe commented May 10, 2024

I understand. I have little insight in needs/priorities of others and can only guess based on my experience and how I use Sentry 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

No branches or pull requests

2 participants