New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automated Dynamic Transaction Sampling Rate #11975
Comments
Hey man, it's a good idea. We've already built a feature around this called "Dynamic Sampling" which lives in the Sentry backend. Doing such things in the SDK is especially tricky and likely infeasible to implement in a sensical way due to the nature of distributed systems. In a distributed system you would have to synchronize state between all service instances just to keep track of sample rates. I wanted to link to docs just now but apparently this is completely abstracted away from users. Usually dynamic sampling kicks in after a certain number of transactions threshold. I will refer you to @ale-cota who is leading the charge on the dynamic sampling feature. Maybe she can share some more things. |
That's the point with moving this to the SDK 🙂 Dynamic sampling results in extra costs for you and extra costs for us, for what are essentially voided transactions. I forgot to clarify that, but this smart sampler would practically speaking only have an effect in entry point applications (or whatever you call projects that start a distributed trace). If you take it to the extreme, you could have a /blog/:bloggg endpoint with 100 req/s and an /about-us that's visited twice a day. Unless you ingest an astronomical number of transactions, you'll never see a single /about-us trace, like ever. This would complement? backend dynamic sampling. |
Are you by chance asking for something like the |
You could implement this with a custom |
I realise that I was not entirely correct with the "request volume". You could somewhat control ingestion volume with this, but the number would only ever match the transactions for the project at the root of the trace. For smaller projects, however, where you only have a low number of services (which is the majority of Sentry customers, guessing??), you could more exactly control the transaction volume. |
To add further, even for distributed systems, it can make a lot of sense to normalise/flatten request trends. For example: https://semrush.com/website/prisjakt.nu/overview/ There is little point in ingesting twice as many transactions in January just because you get more traffic that month. If you instead provide a fixed "roof", your Sentry numbers would be unaffected by seasonality. |
We will backlog this but it is likely gonna end up very low on our prio pile. Not many people are asking for this and high-volume users are already benefitting from serverside dynamic sampling. There is indeed a point of ingesting all the data, since you would be losing metrics data that comes with transactions and spans. |
I understand. I have little insight in needs/priorities of others and can only guess based on my experience and how I use Sentry 🙂 |
Problem Statement
Setting good sampling rates is difficult, with many factors you need to consider. You have to balance ever-changing RPS, request distribution amongst different endpoints, daily/weekly trends. Working with a limited budget and doing the maths for getting the numbers quite right even more.
Nobody does that. And even if you do, those numbers become outdated by the time the change is deployed.
The only feasible solution is a fixed transaction rate, much higher than necessary for certain endpoints. Say, you have a 100:1 distribution between two endpoints. Either you send waaay too many transactions for the first endpoint, or the latter is not instrumented enough.
Solution Brainstorm
The SDK should sample transactions intelligently. Instead of a plump probability, you instead allocate a total transaction volume that is properly distributed across different endpoints (transactions with the same name) such that all endpoints are represented equally in what is finally ingested.
This solves all the outlined pain points with sampling rates at once.
You could also still respect real request ratios, but slightly adjust sampling rates, such that a request ratio of 100:1 results in a sampling rate of e.g. 10:1 instead of 100:1 if the same sampling rate were used for both endpoints.
Issues With the Proposed Solution
The text was updated successfully, but these errors were encountered: