Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Django caching instrumentation update #3009

Open
wants to merge 61 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 24 commits
Commits
Show all changes
61 commits
Select commit Hold shift + click to select a range
d848184
some thoughts before starting
antonpirker Apr 11, 2024
8926c93
some work
antonpirker Apr 16, 2024
618157b
Added address and port to cache span.
antonpirker Apr 24, 2024
5ab4ce0
Formatting
antonpirker Apr 24, 2024
ae7a995
Save correct itemsize
antonpirker Apr 24, 2024
11f7e15
Made cache item_size optional
antonpirker Apr 24, 2024
9ba4ccf
Added some documentation
antonpirker Apr 24, 2024
7a4e0cd
explaining docs
antonpirker Apr 24, 2024
8768373
Merge branch 'master' into antonpirker/django-caching-module
antonpirker Apr 29, 2024
8d51038
Testing cache module
antonpirker Apr 29, 2024
9791ec8
forking tests
antonpirker Apr 29, 2024
3d5d08a
changed op name
antonpirker May 3, 2024
fa47da2
Updated tests
antonpirker May 3, 2024
94421d2
Merge branch 'master' into antonpirker/django-caching-module
antonpirker May 3, 2024
be1d4fd
Test for host/port/cluster
antonpirker May 7, 2024
c8d86d4
More tests
antonpirker May 7, 2024
8ec780e
Merge branch 'master' into antonpirker/django-caching-module
antonpirker May 7, 2024
c9bff4b
Removed debug statement
antonpirker May 7, 2024
34743e0
Even more tests
antonpirker May 7, 2024
93b539c
fix
antonpirker May 7, 2024
3a50d66
Get cache settings also from old Django versions
antonpirker May 7, 2024
5a03f12
make mypy happy
antonpirker May 7, 2024
2e41ae9
Fixed accessor to settings
antonpirker May 7, 2024
11a662e
Cleanup
antonpirker May 7, 2024
0fad52b
Apply suggestions from code review
antonpirker May 8, 2024
2dcdeb9
Use tuple
antonpirker May 8, 2024
b398172
Updated comment
antonpirker May 8, 2024
bcf702c
Instrument get_many and set_many
antonpirker May 8, 2024
cefe47f
Moved parsing url outside to not call it that often
antonpirker May 8, 2024
c051197
Comment
antonpirker May 8, 2024
11be8fa
Make port an int early on
antonpirker May 8, 2024
3718f41
Linting
antonpirker May 8, 2024
66e1c6a
Merge branch 'master' into antonpirker/django-caching-module
antonpirker May 8, 2024
16a9edf
Added test
antonpirker May 8, 2024
1ad1cb8
Another test
antonpirker May 8, 2024
3b89394
linting
antonpirker May 8, 2024
0115843
oh, man...
antonpirker May 8, 2024
827858e
tests in CI actually do what is expected (compared to locally)
antonpirker May 8, 2024
cc0f1af
Removed cache.hit assert because it is unrealiable because of LocMemC…
antonpirker May 8, 2024
2de1d03
Always add item_size because load testing revealed no performance impact
antonpirker May 10, 2024
00e1aa5
Merge branch 'master' into antonpirker/django-caching-module
antonpirker May 10, 2024
f2d8c77
Updated tests
antonpirker May 10, 2024
ca65d17
Removed unused var
antonpirker May 10, 2024
7adb5a1
Make mypy happy
antonpirker May 10, 2024
7232c44
Linked GH issue in comment
antonpirker May 10, 2024
a319a51
Linked GH issue in comment
antonpirker May 10, 2024
a691861
Apply suggestions from code review
antonpirker May 10, 2024
5dbdf35
Merge branch 'antonpirker/django-caching-module' of github.com:getsen…
antonpirker May 10, 2024
8b710ab
Linked GH issue in comment
antonpirker May 10, 2024
fe52b14
Improved cache location parsing
antonpirker May 10, 2024
747e0c8
added test and fixed typo
antonpirker May 10, 2024
4fa92d0
Improved stripping sensitive data from keys
antonpirker May 10, 2024
e7986b8
whitespace
antonpirker May 10, 2024
66eff13
Improved url parsing
antonpirker May 10, 2024
2648759
return early
antonpirker May 10, 2024
7d3a2a6
Merge branch 'master' into antonpirker/django-caching-module
antonpirker May 21, 2024
2b84023
use new names and only key in cache span description
antonpirker May 21, 2024
a001e15
cleanup
antonpirker May 21, 2024
7ac3c62
cleanup
antonpirker May 21, 2024
dbdf1e9
Fixed tests
antonpirker May 21, 2024
891b453
Fixed tests
antonpirker May 21, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
19 changes: 19 additions & 0 deletions sentry_sdk/consts.py
Expand Up @@ -209,6 +209,24 @@ class SPANDATA:
Example: 58
"""

CACHE_KEY = "cache.key"
"""
The key of the requested data.
Example: template.cache.some_item.867da7e2af8e6b2f3aa7213a4080edb3
"""

NETWORK_PEER_ADDRESS = "network.peer.address"
"""
Peer address of the network connection - IP address or Unix domain socket name.
Example: 10.1.2.80, /tmp/my.sock, localhost
"""

NETWORK_PEER_PORT = "network.peer.port"
"""
Peer port number of the network connection.
Example: 6379
"""

HTTP_QUERY = "http.query"
"""
The Query string present in the URL.
Expand Down Expand Up @@ -298,6 +316,7 @@ class SPANDATA:
class OP:
ANTHROPIC_MESSAGES_CREATE = "ai.messages.create.anthropic"
CACHE_GET_ITEM = "cache.get_item"
CACHE_SET_ITEM = "cache.set_item"
DB = "db"
DB_REDIS = "db.redis"
EVENT_DJANGO = "event.django"
Expand Down
19 changes: 17 additions & 2 deletions sentry_sdk/integrations/django/__init__.py
Expand Up @@ -104,6 +104,17 @@ def is_authenticated(request_user):


class DjangoIntegration(Integration):
"""
Auto instrument a Django applications.
antonpirker marked this conversation as resolved.
Show resolved Hide resolved

:param transaction_style: How to derive transaction names. Either `"function_name"` or `"url"`. Defaults to `"url"`.
:param middleware_spans: Whether to create spans for middleware. Defaults to `True`.
:param signals_spans: Whether to create spans for signals. Defaults to `True`.
:param signals_denylist: A list of signals to ignore when creating spans.
:param cache_spans: Whether to create spans for cache operations. Defaults to `False`.
:param cache_spans_add_item_size: Whether to add the size of the cache item to the span data. Only applicable when `cache_spans` is set to `True`. Defaults to `False`.
"""

identifier = "django"

transaction_style = ""
Expand All @@ -119,19 +130,23 @@ def __init__(
signals_spans=True,
cache_spans=False,
signals_denylist=None,
cache_spans_add_item_size=False,
antonpirker marked this conversation as resolved.
Show resolved Hide resolved
):
# type: (str, bool, bool, bool, Optional[list[signals.Signal]]) -> None
# type: (str, bool, bool, bool, Optional[list[signals.Signal]], bool) -> None
if transaction_style not in TRANSACTION_STYLE_VALUES:
raise ValueError(
"Invalid value for transaction_style: %s (must be in %s)"
% (transaction_style, TRANSACTION_STYLE_VALUES)
)
self.transaction_style = transaction_style
self.middleware_spans = middleware_spans

self.signals_spans = signals_spans
self.cache_spans = cache_spans
self.signals_denylist = signals_denylist or []

self.cache_spans = cache_spans
self.cache_spans_add_item_size = cache_spans_add_item_size

@staticmethod
def setup_once():
# type: () -> None
Expand Down
124 changes: 95 additions & 29 deletions sentry_sdk/integrations/django/caching.py
@@ -1,80 +1,138 @@
import functools
from typing import TYPE_CHECKING
from urllib3.util import parse_url as urlparse

from django import VERSION as DJANGO_VERSION
from django.core.cache import CacheHandler

import sentry_sdk
from sentry_sdk.consts import OP, SPANDATA
from sentry_sdk.utils import ensure_integration_enabled
from sentry_sdk.utils import capture_internal_exceptions, ensure_integration_enabled


if TYPE_CHECKING:
from typing import Any
from typing import Callable
from typing import Tuple
antonpirker marked this conversation as resolved.
Show resolved Hide resolved
from typing import Optional


METHODS_TO_INSTRUMENT = [
"set",
"get",
"get_many",
antonpirker marked this conversation as resolved.
Show resolved Hide resolved
]


def _get_span_description(method_name, args, kwargs):
# type: (str, Any, Any) -> str
description = "{} ".format(method_name)

def _get_key(args, kwargs):
# type: (Any, Any) -> str
antonpirker marked this conversation as resolved.
Show resolved Hide resolved
if args is not None and len(args) >= 1:
description += str(args[0])
return str(args[0])
elif kwargs is not None and "key" in kwargs:
description += str(kwargs["key"])
return str(kwargs["key"])
antonpirker marked this conversation as resolved.
Show resolved Hide resolved

return ""


def _get_span_description(method_name, args, kwargs):
# type: (str, Any, Any) -> str
antonpirker marked this conversation as resolved.
Show resolved Hide resolved
description = "{} {}".format(
method_name,
_get_key(args, kwargs),
)
return description


def _patch_cache_method(cache, method_name):
# type: (CacheHandler, str) -> None
def _patch_cache_method(cache, method_name, address, port):
# type: (CacheHandler, str, Optional[str], Optional[str]) -> None
from sentry_sdk.integrations.django import DjangoIntegration

original_method = getattr(cache, method_name)

@ensure_integration_enabled(DjangoIntegration, original_method)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@ensure_integration_enabled(DjangoIntegration, original_method)

I think it makes more sense to place this on the sentry_method function. It is inaccurate to say that this method maps original_method because it has a different signature. Although, perhaps it would be more appropriate to make this change in a separate PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, probably better a new PR. Can you please create a ticket @szokeasaurusrex ?

def _instrument_call(cache, method_name, original_method, args, kwargs):
# type: (CacheHandler, str, Callable[..., Any], Any, Any) -> Any
def _instrument_call(
cache, method_name, original_method, args, kwargs, address, port
):
# type: (CacheHandler, str, Callable[..., Any], Any, Any, Optional[str], Optional[str]) -> Any
integration = sentry_sdk.get_client().get_integration(DjangoIntegration)
is_set_operation = method_name.startswith("set")
is_get_operation = not is_set_operation

op = OP.CACHE_SET_ITEM if is_set_operation else OP.CACHE_GET_ITEM
description = _get_span_description(method_name, args, kwargs)

with sentry_sdk.start_span(
op=OP.CACHE_GET_ITEM, description=description
) as span:
with sentry_sdk.start_span(op=op, description=description) as span:
value = original_method(*args, **kwargs)

if value:
span.set_data(SPANDATA.CACHE_HIT, True)

size = len(str(value))
span.set_data(SPANDATA.CACHE_ITEM_SIZE, size)

else:
span.set_data(SPANDATA.CACHE_HIT, False)
with capture_internal_exceptions():
if address is not None:
# Strip eventually contained username and password in url style location
antonpirker marked this conversation as resolved.
Show resolved Hide resolved
if "://" in address:
parsed_url = urlparse(address)
address = "{}://{}{}".format(
parsed_url.scheme or "",
parsed_url.netloc or "",
parsed_url.path or "",
)

span.set_data(SPANDATA.NETWORK_PEER_ADDRESS, address)

if port is not None:
span.set_data(SPANDATA.NETWORK_PEER_PORT, int(port))

key = _get_key(args, kwargs)
szokeasaurusrex marked this conversation as resolved.
Show resolved Hide resolved
if key != "":
span.set_data(SPANDATA.CACHE_KEY, key)

item_size = None
if is_get_operation:
if value:
antonpirker marked this conversation as resolved.
Show resolved Hide resolved
if integration.cache_spans_add_item_size:
item_size = len(str(value))
span.set_data(SPANDATA.CACHE_HIT, True)
else:
span.set_data(SPANDATA.CACHE_HIT, False)
antonpirker marked this conversation as resolved.
Show resolved Hide resolved
else:
if integration.cache_spans_add_item_size:
item_size = len(str(args[1]))
antonpirker marked this conversation as resolved.
Show resolved Hide resolved

if item_size is not None:
span.set_data(SPANDATA.CACHE_ITEM_SIZE, item_size)

return value

@functools.wraps(original_method)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@functools.wraps(original_method)
@ensure_integration_enabled(DjangoIntegration, original_method)

To be applied alongside my comment on (original version) line 41; also, likely makes sense to do in separate PR.

def sentry_method(*args, **kwargs):
# type: (*Any, **Any) -> Any
return _instrument_call(cache, method_name, original_method, args, kwargs)
return _instrument_call(
cache, method_name, original_method, args, kwargs, address, port
)

setattr(cache, method_name, sentry_method)


def _patch_cache(cache):
# type: (CacheHandler) -> None
def _patch_cache(cache, address=None, port=None):
# type: (CacheHandler, Optional[str], Optional[str]) -> None
if not hasattr(cache, "_sentry_patched"):
for method_name in METHODS_TO_INSTRUMENT:
_patch_cache_method(cache, method_name)
_patch_cache_method(cache, method_name, address, port)
cache._sentry_patched = True


def _get_address_port(settings):
# type: (dict[str, Any]) -> Tuple[Optional[str], Optional[str]]
address, port = None, None
location = settings.get("LOCATION")
# TODO: location can also be an array of locations
# see: https://docs.djangoproject.com/en/5.0/topics/cache/#redis
antonpirker marked this conversation as resolved.
Show resolved Hide resolved
if isinstance(location, str):
if ":" in location:
address, port = location.rsplit(":", 1)
else:
address, port = location, None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic is broken. For example, if location == "https://example.com", then ":" in location is True, so we will end up with address == "https" and port == "//example.com".

Perhaps, there is a library that can handle the URL parsing for us? Otherwise, we should use a regular expression here, or at least ensure this case is handled.

Please also add a unit test that ensures that this function handles URLs like https://example.com correctly 🙂

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! ⭐


return address, port


def patch_caching():
# type: () -> None
from sentry_sdk.integrations.django import DjangoIntegration
Expand All @@ -90,7 +148,13 @@ def sentry_get_item(self, alias):

integration = sentry_sdk.get_client().get_integration(DjangoIntegration)
if integration is not None and integration.cache_spans:
_patch_cache(cache)
from django.conf import settings

address, port = _get_address_port(
settings.CACHES[alias or "default"]
)

_patch_cache(cache, address, port)

return cache

Expand All @@ -107,7 +171,9 @@ def sentry_create_connection(self, alias):

integration = sentry_sdk.get_client().get_integration(DjangoIntegration)
if integration is not None and integration.cache_spans:
_patch_cache(cache)
address, port = _get_address_port(self.settings[alias or "default"])

_patch_cache(cache, address, port)

return cache

Expand Down
6 changes: 6 additions & 0 deletions sentry_sdk/integrations/redis/__init__.py
Expand Up @@ -218,6 +218,11 @@ def sentry_patched_execute_command(self, name, *args, **kwargs):
if data_should_be_truncated:
description = description[: integration.max_data_size - len("...")] + "..."

# TODO: Here we could also create the caching spans.
# TODO: also need to check if the `name` (if this is the cache key value) matches the prefix we want to configure in __init__ of the integration
# Questions:
# -) We should probablby have the OP.DB_REDIS span and a separate OP.CACHE_GET_ITEM (or set_item) span, right?
# -) We probably need to research what redis commands are used by caching libs.
antonpirker marked this conversation as resolved.
Show resolved Hide resolved
with sentry_sdk.start_span(op=OP.DB_REDIS, description=description) as span:
set_db_data_fn(span, self)
_set_client_data(span, is_cluster, name, *args)
Expand Down Expand Up @@ -358,6 +363,7 @@ class RedisIntegration(Integration):
def __init__(self, max_data_size=_DEFAULT_MAX_DATA_SIZE):
# type: (int) -> None
self.max_data_size = max_data_size
# TODO: add some prefix that users can set to specify a cache key
antonpirker marked this conversation as resolved.
Show resolved Hide resolved

@staticmethod
def setup_once():
Expand Down
4 changes: 4 additions & 0 deletions sentry_sdk/integrations/redis/asyncio.py
Expand Up @@ -59,6 +59,10 @@ async def _sentry_execute_command(self, name, *args, **kwargs):

description = _get_span_description(name, *args)

# TODO: Here we could also create the caching spans.
# Questions:
# -) We should probablby have the OP.DB_REDIS span and a separate OP.CACHE_GET_ITEM (or set_item) span, right?
# -) We probably need to research what redis commands are used by caching libs.
antonpirker marked this conversation as resolved.
Show resolved Hide resolved
with sentry_sdk.start_span(op=OP.DB_REDIS, description=description) as span:
set_db_data_fn(span, self)
_set_client_data(span, is_cluster, name, *args)
Expand Down