Profiling not working #3026

ShuaiShao93 · 2024-04-26T18:54:22Z

How do you use Sentry?

Sentry Saas (sentry.io)

Version

1.45.0

Steps to Reproduce

initialize sentry globally, enable profiling
Add this block to a function. Note that self.execute_model is implemented in a subclass

4. wait for profiling to show up on sentry.io

Expected Result

Full profile for self.execute_model

Actual Result

The text was updated successfully, but these errors were encountered:

ShuaiShao93 · 2024-04-26T21:59:31Z

Adding a child span in self.execute_model seems to work, but I'm not sure this is intended

szokeasaurusrex · 2024-04-29T10:40:25Z

Hi @ShuaiShao93, I need some more information to understand your issue.

From the screenshots you provided, it looks like profiling is working as it is supposed to. Can you please explain exactly what is wrong?

If possible, assuming you are using SaaS (sentry.io), could you please also share a link to the profile in Sentry?

ShuaiShao93 · 2024-04-29T15:49:03Z

@szokeasaurusrex I can see more details after adding a nested span, like

But from the documentation of profiling, I didn't see anything about span.

szokeasaurusrex · 2024-04-29T16:08:01Z

@szokeasaurusrex I can see more details after adding a nested span, like

But from the documentation of profiling, I didn't see anything about span.

@ShuaiShao93 If I understand correctly, your problem is that the profile is missing some stack frames when you don't start a nested span. Is that right?

If that is correct, I also have a few more questions:

Can you please provide the code you ran to make this profile? I would like to see how you added the nested span
Can you please provide a link to this profile, and also a link to the profile from your initial comment?

ShuaiShao93 · 2024-04-29T17:43:56Z

Sure, thanks!

Can you please provide the code you ran to make this profile? I would like to see how you added the nested span

Original code

class AsyncBase:

    async def execute(
        self, requests: Sequence[c_pb_utils.InferenceRequest]
    ) -> Sequence[c_pb_utils.InferenceResponse]:
        try:
            with sentry_sdk.start_transaction(op="execute_model", name=self.model_name):
                return await self.execute_model(requests)
        except Exception as e:
            sentry_sdk.capture_exception(e)
            raise e

    async def execute_model(
        self, requests: Sequence[c_pb_utils.InferenceRequest]
    ) -> Sequence[c_pb_utils.InferenceResponse]:
        raise NotImplementedError("Subclasses must implement this method.")


class TritonPythonModel(AsyncBase):

  async def execute_model(
          self, requests: Sequence[c_pb_utils.InferenceRequest]
      ) -> Sequence[c_pb_utils.InferenceResponse]:
    ... # Business logic

After adding span

class AsyncBase:

    async def execute(
        self, requests: Sequence[c_pb_utils.InferenceRequest]
    ) -> Sequence[c_pb_utils.InferenceResponse]:
        try:
            with sentry_sdk.start_transaction(op="execute_model", name=self.model_name):
                return await self.execute_model(requests)
        except Exception as e:
            sentry_sdk.capture_exception(e)
            raise e

    async def execute_model(
        self, requests: Sequence[c_pb_utils.InferenceRequest]
    ) -> Sequence[c_pb_utils.InferenceResponse]:
        raise NotImplementedError("Subclasses must implement this method.")


def trace_sentry(func: Callable[P, T]) -> Callable[P, T]:
    """We write our own sentry tracer because the official one is untyped

    See sentry_sdk/tracing.py for details
    """

  @wraps(func)
  def func_with_tracing(*args: P.args, **kwargs: P.kwargs) -> T:
      if getenv("SENTRY_DSN"):
          span = get_current_span(Hub.current)
          if span is None:
              with start_span(
                  op=OP.FUNCTION, description=qualname_from_function(func)
              ):
                  return func(*args, **kwargs)
          else:
              with span.start_child(
                  op=OP.FUNCTION,
                  description=qualname_from_function(func),
              ):
                  return func(*args, **kwargs)
      else:
          return func(*args, **kwargs)

  return func_with_tracing


class TritonPythonModel(AsyncBase):

  @trace_sentry
  async def execute_model(
          self, requests: Sequence[c_pb_utils.InferenceRequest]
      ) -> Sequence[c_pb_utils.InferenceResponse]:
    ... # Business logic

Can you please provide a link to this profile, and also a link to the profile from your initial comment?

Before
After

The two links are from two different classes, but they have the same structure, just business code is different. I have a guess to: The class in the first link mostly calls into C++ code wrapped by Python, so it doesn't generate meaningful traces. If this is the reason, it's still better to show the Python wrapper function name in the profile if possible.

szokeasaurusrex · 2024-05-02T09:23:57Z

@ShuaiShao93 I think the reason you see a difference between the two profiles is because the more detailed one has a duration of just over 1 second, while the less detailed profile has a duration of only approximately 14 milliseconds. The difference in detail between the two profiles can most likely be explained by the Python profiler's design.

Specifically, the Python profiler is a sampling profiler, which means that instead of capturing the start and end of every function call, we simply sample the call stack at regular intervals. Our sampling rate is approximately 100 samples/second, so the Python profiler is only accurate to approximately 10 milliseconds. Any function calls that are shorter than 10 milliseconds, and which occur between taking two samples, would not show up in the profile. Since the less detailed profile lasts only 14 milliseconds, we only would expect to have at most two samples, so the level of detail we see in the profile is expected.

The one thing that confuses me, however, is that the profile from the original screenshot seems to have taken longer than 1 second. The 10 millisecond profiler accuracy does not explain this result, so we likely still have a problem here. Could you please share a link to a longer profile where you are missing detail?

szokeasaurusrex · 2024-05-02T09:26:54Z

The two links are from two different classes, but they have the same structure, just business code is different. I have a guess to: The class in the first link mostly calls into C++ code wrapped by Python, so it doesn't generate meaningful traces. If this is the reason, it's still better to show the Python wrapper function name in the profile if possible.

This sounds like a possible explanation for the difference in behavior. Would you be able to test whether adding the nested span changes the profile when running against the same code? Since it sounds like there are two different variables changing at the same time (the extra span and also the different business code), it is impossible to determine which of these is causing the change in behavior.

We need to try changing each of these variables individually in order to determine the exact cause.

ShuaiShao93 · 2024-05-02T19:58:35Z

Thanks for the explanation! @szokeasaurusrex

Originally I thought it was because of the span, now after checking more profiles, I believe it was because the function call was too short, so whether it could catch the trace is actually random. Next time we will only profile longer functions.

getsantry bot added the Waiting for: Product Owner label Apr 26, 2024

getsantry bot removed the Waiting for: Product Owner label Apr 29, 2024

szokeasaurusrex added the Waiting for: Community label Apr 29, 2024

getsantry bot added Waiting for: Product Owner and removed Waiting for: Community labels Apr 29, 2024

getsantry bot removed the Waiting for: Product Owner label Apr 29, 2024

getsantry bot added the Waiting for: Product Owner label Apr 29, 2024

getsantry bot removed the Waiting for: Product Owner label May 2, 2024

szokeasaurusrex added the Waiting for: Community label May 2, 2024

ShuaiShao93 closed this as completed May 2, 2024

getsantry bot removed the Waiting for: Community label May 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Profiling not working #3026

Profiling not working #3026

ShuaiShao93 commented Apr 26, 2024

ShuaiShao93 commented Apr 26, 2024

szokeasaurusrex commented Apr 29, 2024

ShuaiShao93 commented Apr 29, 2024 •

edited

szokeasaurusrex commented Apr 29, 2024

ShuaiShao93 commented Apr 29, 2024

szokeasaurusrex commented May 2, 2024

szokeasaurusrex commented May 2, 2024 •

edited

ShuaiShao93 commented May 2, 2024

Profiling not working #3026

Profiling not working #3026

Comments

ShuaiShao93 commented Apr 26, 2024

How do you use Sentry?

Version

Steps to Reproduce

Expected Result

Actual Result

ShuaiShao93 commented Apr 26, 2024

szokeasaurusrex commented Apr 29, 2024

ShuaiShao93 commented Apr 29, 2024 • edited

szokeasaurusrex commented Apr 29, 2024

ShuaiShao93 commented Apr 29, 2024

szokeasaurusrex commented May 2, 2024

szokeasaurusrex commented May 2, 2024 • edited

ShuaiShao93 commented May 2, 2024

ShuaiShao93 commented Apr 29, 2024 •

edited

szokeasaurusrex commented May 2, 2024 •

edited