Deserialize invalid UTF-8 into byte bufs as WTF-8 #877

Previously serde-rs#828 added support for deserializing lone leading and trailing surrogates into WTF-8 encoded bytes when deserializing a string as bytes. This commit extends this to cover the case of a leading surrogate followed by code units that are not trailing surrogates. This allows for deserialization of "\ud83c\ud83c" (two leading surrogates), or "\ud83c\u0061" (a leading surrogate followed by "a"). The docs also now make it clear that we are serializing the invalid code points as WTF-8. This reference to WTF-8 signals to the user that they can use a WTF-8 parser on the bytes to construct a valid UTF-8 string.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deserialize invalid UTF-8 into byte bufs as WTF-8 #877

Deserialize invalid UTF-8 into byte bufs as WTF-8 #877

Commits on May 18, 2022

Deserialize invalid UTF-8 into byte bufs as WTF-8 #877

Are you sure you want to change the base?

Deserialize invalid UTF-8 into byte bufs as WTF-8 #877

Commits on May 18, 2022