-
Notifications
You must be signed in to change notification settings - Fork 3.3k
Fix UTF-8 failures in Watch #2100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Welcome @davidopic! |
/assign |
Hi! Any updates on this? |
Also interested in this PR to be merged |
@roycaihw any updates regarding this? |
Sorry for the late response. LGTM! Thanks for the contribution! /lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: davidopic, roycaihw The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind bug
What this PR does / why we need it:
With the old implementation of
kubernetes.watch.Watch.stream()
, users may experience intermittent crashes when subscribed to resources that include multi-byte UTF-8 characters (for example,ConfigMap
s with non-ASCII characters in the.data.*
fields). The bug lies inkubernetes.watch.iter_resp_lines
, which decodes eachbytes
segment to a UTF-8 string without waiting for the rest of the event to be received. When thatbytes
segment ends with only part of a multi-byte sequence, the.decode
step will fail. Here's how this symptom presents:The new implementation of
iter_resp_lines
buffers partial events in abytearray
rather than astr
, decoding only after a complete event has been received. Furthermore, the call to.decode
now includes the argument,errors="replace"
, which uses the canonical UTF-8 fallback character, �, when encountering invalid UTF-8.To confirm the fix, I added three new tests, two of which failed using the old implementation.
Which issue(s) this PR fixes:
Fixes #2087
Special notes for your reviewer:
iter_resp_lines
is only called byWatch.stream
, and the interface did not change.The old code implicitly assumed that only
bytes
andstr
were valid segment types, and that assumption has now been made explicit with aTypeError
.I only tested with python 3.11. I assume that CI will test the code against each of the officially supported python versions. If that's not the case, my code should be reviewed with an eye for any new features I unknowingly relied upon.
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: