Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(api): Added error_sampler option #2456

Merged
merged 10 commits into from Oct 20, 2023

Conversation

szokeasaurusrex
Copy link
Member

@szokeasaurusrex szokeasaurusrex commented Oct 18, 2023

Now, it is possible for Python SDK users to dynamically sample errors using the error_sampler option, similar how they are able to sample transactions dynamically with the traces_sampler option.

In other words, the new error_sampler is to the sample_rate as the existing traces_sampler is to the traces_sample_rate.

Fixes #2440

Comment on lines 1204 to 1216
# Baseline test with issues_sampler only, both floats and bools
(mock.MagicMock(return_value=1.0), None, 1),
(mock.MagicMock(return_value=0.0), None, 0),
(mock.MagicMock(return_value=True), None, 1),
(mock.MagicMock(return_value=False), None, 0),
# Baseline test with sample_rate only
(None, 0.0, 0),
(None, 1.0, 1),
# issues_sampler takes precedence over sample_rate
(mock.MagicMock(return_value=1.0), 0.0, 1),
(mock.MagicMock(return_value=0.0), 1.0, 0),
),
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also test different rates for different errors. Like ZeroDivisionError: 1.0, RuntimeError: 0.5 and such.

@@ -456,10 +456,11 @@ def _should_sample_error(
event, # type: Event

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

event contains information about the logged message via logging?

There is a hint parameter in the before_send that contains this information in log_record

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@saippuakauppias Could you please clarify your question? How exactly are you logging the message?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are using the LoggingIntegration, your log message should appear in the event passed to this function.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I am using LoggingIntegration and logging all warning messages to sentry. I tried to discard some messages that are not needed and in debug mode found that the path to the file that triggers the message is in hint['log_record'].pathname.

Actually, I need this as a minimum. I would like this feature to have a similar option to filter by module name.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@saippuakauppias we decided to also pass the hint into the events_sampler, so you will be able to make your sampling decision based on information contained in the hint.

sentry_sdk/client.py Outdated Show resolved Hide resolved
tests/test_client.py Outdated Show resolved Hide resolved
@sentrivana
Copy link
Contributor

@szokeasaurusrex Is it only errors passing through issues_sampler or also other event types?

If it's only errors, calling the option error_sampler might be better for clarity.

More importantly, are the events from the original issue errors (as opposed to e.g. messages) and would they pass through the issues_sampler?

@cleptric
Copy link
Member

An issue in Sentry represents N events, so calling it issue_sampler isn't the best way forward. IMO, this option should sample everything that ends up in capture_event.

Co-authored-by: Ivana Kellyerova <ivana.kellyerova@sentry.io>
@szokeasaurusrex
Copy link
Member Author

szokeasaurusrex commented Oct 19, 2023

@szokeasaurusrex Is it only errors passing through issues_sampler or also other event types?

If it's only errors, calling the option error_sampler might be better for clarity.

The _should_sample_error function that I have modified here is called in the capture_event functions, so I believe all events will be sampled. In light of @cleptric's comment, though, we might consider renaming issues_sampler to something like sampler. What do you think?

More importantly, are the events from the original issue errors (as opposed to e.g. messages) and would they pass through the issues_sampler?

I am unsure given the information I see in the original issue. @saippuakauppias, could you please provide the code where these errors/messages originate from, or provide us more detail with how these are being generated, so we can answer this question?

@sentrivana
Copy link
Contributor

sentrivana commented Oct 19, 2023

The _should_sample_error function that I have modified here is called in the capture_event functions, so I believe all events will be sampled. In light of @cleptric's comment, though, we might consider renaming issues_sampler to something like events_sampler. What do you think?

Yes, looks like it's everything but transactions and checkins:

if (
not is_transaction
and not is_checkin
and not self._should_sample_error(event)
):

In that case events_sampler/event_sampler sounds good! And it also means that it can be used to set the sample rate for any event, so that answers my second question.

IMO, this option should sample everything that ends up in capture_event.

@cleptric Checkins and transactions also end up in capture_event but wouldn't go through the sampler, which is ok I think. For transactions there is traces_sampler/traces_sample_rate and I assume there's currently no demand for being able to set checkin sample rates. And if there is, before_send is still a thing.

@sentrivana
Copy link
Contributor

Once we've settled on the API please don't forget to update the docs @szokeasaurusrex 📖

@szokeasaurusrex
Copy link
Member Author

szokeasaurusrex commented Oct 19, 2023

I discussed with @cleptric, and we now think we should pick a different name than event_sampler, since transactions are also technically events.

The best idea we came up with was to just call the new sampler sampler. This name would also fit the pattern where our sampler corresponds to the sample_rate the same way the traces_sampler corresponds to the traces_sample_rate. Do you have any other ideas @sentrivana, or are you okay with us proceeding naming the sampler sampler?

@saippuakauppias
Copy link

I am unsure given the information I see in the original issue. @saippuakauppias, could you please provide the code where these errors/messages originate from, or provide us more detail with how these are being generated, so we can answer this question?

I'm using django, so it's going to be a bit of a specific example here, but I think it's reproducible.

# settings/production.py

LOGGING = {
  'version': 1,
  'formatters': {
    'simple': {
        'format': '[%(asctime)s] %(levelname)s %(module)s %(message)s',
        'datefmt': '%Y-%m-%d %H:%M:%S'
    },
  },

  'handlers': {
    'sentry': {
      'level': 'WARNING',
      'class': 'base.sentry.SentryHandler',
    },
    'streamlogger': {
      'level': 'DEBUG',
      'class': 'logging.StreamHandler',
      'formatter': 'simple'
    },
  },

  'loggers': {
    'warn.logger': {
      'handlers': ['sentry', 'streamlogger'],
      'level': 'WARNING',
      'propagate': False
    },
  },
  
  'root': {
    'level': 'INFO',
    'handlers': ['streamlogger'],
  },
}
# base/sentry.py
from sentry_sdk.integrations.atexit import AtexitIntegration
from sentry_sdk.integrations.dedupe import DedupeIntegration
from sentry_sdk.integrations.django import DjangoIntegration
from sentry_sdk.integrations.excepthook import ExcepthookIntegration
from sentry_sdk.integrations.logging import EventHandler, LoggingIntegration
from sentry_sdk.integrations.modules import ModulesIntegration
from sentry_sdk.integrations.stdlib import StdlibIntegration


class SentryHandler(EventHandler):
    ...


def initialize(dsn, service, release, branch, debug: bool, breadcrumb_filters: list[dict] = None):
    from django.conf import settings

    sample_events_by_path = {
        f'{settings.PROJECT_ROOT}/some/module.py': 0.0001,
    }

    def prepare_event(event, hint):
        request_data = get_request_data()
        tags = event.setdefault('tags', {})

        # my hack for filtration here
        is_sampled = False
        if isinstance(hint, dict) and 'log_record' in hint:
            pathname = getattr(hint['log_record'], 'pathname', '')
            is_sampled = pathname in sample_events_by_path
            sample_rate = sample_events_by_path.get(pathname, 1)

            if sample_rate < 1.0 and random.random() < sample_rate:
                return None

        tags['is_sampled'] = is_sampled
        
        return event

    sentry_sdk.init(
        dsn=dsn,
        release=release or None,
        before_send=prepare_event,
        default_integrations=False,
        integrations=[
            StdlibIntegration(),
            ExcepthookIntegration(),
            DedupeIntegration(),
            AtexitIntegration(),
            ModulesIntegration(),
            DjangoIntegration(),
            LoggingIntegration(level=None, event_level=None),
        ],
    )
# some/module.py
import logging
import requests


logger = logging.getLogger('warn.logger')


def some_fun():
    try:
        resp = requsts.get('http://some.service.name')
        resp.raise_for_status()
    except Exception as e:
        logger.warning('Error get value from service', exc_info=True)

@sentrivana
Copy link
Contributor

I discussed with @cleptric, and we now think we should pick a different name than event_sampler, since transactions are also technically events.

The best idea we came up with was to just call the new sampler sampler. This name would also fit the pattern where our sampler corresponds to the sample_rate the same way the traces_sampler corresponds to the traces_sample_rate. Do you have any other ideas @sentrivana, or are you okay with us proceeding naming the sampler sampler?

I'm not opposed to calling it sampler per se, but I don't follow the reasoning. If the issue with event_sampler is that it's too broad, then sampler sounds even broader? 😄

FWIW I don't think there's an ideal solution here. But if I saw sampler as a user without reading the docs, I would assume it samples everything and is a generalization of/supersedes traces_sampler. On the other hand, event_sampler suggests some sort of dichotomy to traces_sampler -- i.e., those two probably sample different things.

@szokeasaurusrex
Copy link
Member Author

I discussed with @antonpirker this morning about naming the sampler as sampler versus events_sampler, he also was leaning towards events_sampler. Since the majority consensus here seems to be to go with events_sampler, let's go with events_sampler for now.

@szokeasaurusrex szokeasaurusrex changed the title feat(api): Added issues_sampler option feat(api): Added events_sampler option Oct 20, 2023
sentry_sdk/client.py Show resolved Hide resolved
Copy link
Contributor

@sentrivana sentrivana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, left one final suggestion.

else ("sample_rate", "contains")
)
logger.warning(
"The provided %s %s an invalid value. The value should be a float or a bool. Defaulting to sampling the event."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Including the sample_rate in the log message might make debugging easier.

@szokeasaurusrex szokeasaurusrex changed the title feat(api): Added events_sampler option feat(api): Added error_sampler option Oct 20, 2023
@szokeasaurusrex szokeasaurusrex enabled auto-merge (squash) October 20, 2023 13:26
@szokeasaurusrex szokeasaurusrex merged commit 085595b into master Oct 20, 2023
283 of 284 checks passed
@szokeasaurusrex szokeasaurusrex deleted the szokeasaurusrex/issues_sampler branch October 20, 2023 13:36
@saippuakauppias
Copy link

Thank you so much for implementing this idea!

Can you please tell me if I can hope that soon you will release a new version of the package so that I can install it from PyPI?

@szokeasaurusrex
Copy link
Member Author

@saippuakauppias We typically release new versions of the Python SDK every few weeks, so hopefully you will be able to install it from PyPI soon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Issues Sampling (like traces_sampler)
4 participants