Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix UTF-8 failures in Watch #2100

Merged
merged 1 commit into from Dec 20, 2023
Merged

Conversation

davidopic
Copy link
Contributor

@davidopic davidopic commented Aug 7, 2023

What type of PR is this?

/kind bug

What this PR does / why we need it:

With the old implementation of kubernetes.watch.Watch.stream(), users may experience intermittent crashes when subscribed to resources that include multi-byte UTF-8 characters (for example, ConfigMaps with non-ASCII characters in the .data.* fields). The bug lies in kubernetes.watch.iter_resp_lines, which decodes each bytes segment to a UTF-8 string without waiting for the rest of the event to be received. When that bytes segment ends with only part of a multi-byte sequence, the .decode step will fail. Here's how this symptom presents:

UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 2046-2047: unexpected end of data

The new implementation of iter_resp_lines buffers partial events in a bytearray rather than a str, decoding only after a complete event has been received. Furthermore, the call to .decode now includes the argument, errors="replace", which uses the canonical UTF-8 fallback character, �, when encountering invalid UTF-8.

To confirm the fix, I added three new tests, two of which failed using the old implementation.

Which issue(s) this PR fixes:

Fixes #2087

Special notes for your reviewer:

iter_resp_lines is only called by Watch.stream, and the interface did not change.

The old code implicitly assumed that only bytes and str were valid segment types, and that assumption has now been made explicit with a TypeError.

I only tested with python 3.11. I assume that CI will test the code against each of the officially supported python versions. If that's not the case, my code should be reviewed with an eye for any new features I unknowingly relied upon.

Does this PR introduce a user-facing change?

Fix UTF-8 failures in Watch

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

N/A

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. kind/bug Categorizes issue or PR as related to a bug. labels Aug 7, 2023
@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Aug 7, 2023

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: davidopic / name: David E (934d026)

@k8s-ci-robot k8s-ci-robot added the cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. label Aug 7, 2023
@k8s-ci-robot
Copy link
Contributor

Welcome @davidopic!

It looks like this is your first PR to kubernetes-client/python 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-client/python has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Aug 7, 2023
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Aug 7, 2023
@roycaihw
Copy link
Member

/assign

@superosku
Copy link

Hi! Any updates on this?

@dolshevsk
Copy link

Also interested in this PR to be merged

@superosku
Copy link

@roycaihw any updates regarding this?

@roycaihw
Copy link
Member

Sorry for the late response. LGTM! Thanks for the contribution!

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 20, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: davidopic, roycaihw

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Dec 20, 2023
@k8s-ci-robot k8s-ci-robot merged commit 1193741 into kubernetes-client:master Dec 20, 2023
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Crash when streaming event with non-UTF-8 data
5 participants