Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deserialize invalid UTF-8 into byte bufs as WTF-8 #877

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Commits on May 18, 2022

  1. Deserialize invalid UTF-8 into byte bufs as WTF-8

    Previously serde-rs#828 added support for deserializing lone leading and
    trailing surrogates into WTF-8 encoded bytes when deserializing a string
    as bytes. This commit extends this to cover the case of a leading
    surrogate followed by code units that are not trailing surrogates. This
    allows for deserialization of "\ud83c\ud83c" (two leading surrogates),
    or  "\ud83c\u0061" (a leading surrogate followed by "a").
    
    The docs also now make it clear that we are serializing the invalid code
    points as WTF-8. This reference to WTF-8 signals to the user that they
    can use a WTF-8 parser on the bytes to construct a valid UTF-8 string.
    lucacasonato committed May 18, 2022
    Configuration menu
    Copy the full SHA
    f50e296 View commit details
    Browse the repository at this point in the history