Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Deserializer::deserialize_identifier support for adjacently tagged enums #2475

Merged
merged 3 commits into from Jul 9, 2023

Conversation

Baptistemontan
Copy link
Contributor

@Baptistemontan Baptistemontan commented Jun 12, 2023

Adjancently tagged enum should behave like a struct with 2 elements, the tag and the content such as

struct Foo<'a, T> {
  tag: &'a str,
  content: T
}

This is the case for most format until you come across a format that serialize structs with fields identifier other than &str. for exemple auto derived struct handle the case where the field identifier is an unsigned number, or a &[u8], not just &str.

This is due to the fact that TagOrContentFieldVisitor and TagContentOtherFieldVisitor directly calls Deserializer::deserialize_str and only handle cases for &str, any other format are rejected.

This PR enable TagOrContentFieldVisitor and TagContentOtherFieldVisitor to call Deserializer::visit_identifier and handle the case for &str, &[u8] and unsigned numbers.

This modifications should only allow more flexibility and should'nt conflict with flattened adjacently tagged enums, as those serialize as a map and not a struct.

Copy link
Member

@dtolnay dtolnay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

May I ask whether there is a specific publicly-available data format library you are working with where this makes a difference? Could you share a short program I could run to reproduce the error that happens under the old behavior?

@Baptistemontan
Copy link
Contributor Author

Baptistemontan commented Jul 7, 2023

I'm working on a compact binary format that is'nt meant to be publicly used (at least for now?), and it serialize struct as sequences.

My tests looks like this, some simple round trip checks:

#[derive(Debug, Serialize, Deserialize, PartialEq)]
#[serde(tag = "t", content = "c")]
enum AdjTaggedEnum {
    NewType(String),
    Struct { num: usize },
}

#[test]
fn test_serialize_deserialize_adj_tagged_enum_variant1() {
    let value = AdjTaggedEnum::NewType("john".into());

    let mut v: Vec<u8> = Vec::new();
    ser::to_writer(&value, &mut v).unwrap();

    let res: AdjTaggedEnum = de::from_bytes(&v).unwrap();

    assert_eq!(value, res);
}

#[test]
fn test_serialize_deserialize_adj_tagged_enum_variant2() {
    let value = AdjTaggedEnum::Struct { num: 12 };

    let mut v: Vec<u8> = Vec::new();
    ser::to_writer(&value, &mut v).unwrap();

    let res: AdjTaggedEnum = de::from_bytes(&v).unwrap();

    assert_eq!(value, res);
}

But I get this error:

tests::test_serialize_deserialize_adj_tagged_enum_variant1' panicked at 'called Result::unwrap() on an Err value: Message("invalid type: integer 0, expected "t", "c", or other ignored fields")

I use a work around for struct-like types that don't implement visit_seq where I give the field identfier as a u64 index, as auto derived field visitor implement visit_u64. This works great until adjacently tagged enums.

This PR is'nt to fix my own library, I probably can find works around or can settle to serialize the field name, this is just to have consistency, it's a niche case that makes you rework a big portion of the format for no apparent reason.

here is the repo, it's highly WIP and I put a very generic name that will be changed, if I publish it. You can find the tests in src/any/mod.rs

Edit: This PR also allows the use of visit_bytes, which can remove the necessity of core::str::from_utf8 for identifier when deserilizing from a byte slice, not a big perf gain but remove a useless linear search that would happen afterward anyway.

Copy link
Member

@dtolnay dtolnay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@dtolnay dtolnay merged commit 7aa0453 into serde-rs:master Jul 9, 2023
19 checks passed
crapStone added a commit to Calciumdibromid/CaBr2 that referenced this pull request Jul 10, 2023
This PR contains the following updates:

| Package | Type | Update | Change |
|---|---|---|---|
| [serde](https://serde.rs) ([source](https://github.com/serde-rs/serde)) | dependencies | patch | `1.0.167` -> `1.0.171` |

---

### Release Notes

<details>
<summary>serde-rs/serde (serde)</summary>

### [`v1.0.171`](https://github.com/serde-rs/serde/releases/tag/v1.0.171)

[Compare Source](serde-rs/serde@v1.0.170...v1.0.171)

-   Support `derive(Deserialize)` on unit structs that have const generics ([#&#8203;2500](serde-rs/serde#2500), thanks [@&#8203;Baptistemontan](https://github.com/Baptistemontan))

### [`v1.0.170`](https://github.com/serde-rs/serde/releases/tag/v1.0.170)

[Compare Source](serde-rs/serde@v1.0.169...v1.0.170)

-   Produce error message on suffixed string literals inside serde attributes ([#&#8203;2242](serde-rs/serde#2242))
-   Support single identifier as unbraced default value for const generic parameter ([#&#8203;2449](serde-rs/serde#2449))

### [`v1.0.169`](https://github.com/serde-rs/serde/releases/tag/v1.0.169)

[Compare Source](serde-rs/serde@v1.0.168...v1.0.169)

-   Add Deserializer::deserialize_identifier support for adjacently tagged enums ([#&#8203;2475](serde-rs/serde#2475), thanks [@&#8203;Baptistemontan](https://github.com/Baptistemontan))
-   Fix unused_braces lint in generated Deserialize impl that uses braced const generic expressions ([#&#8203;2414](serde-rs/serde#2414))

### [`v1.0.168`](https://github.com/serde-rs/serde/releases/tag/v1.0.168)

[Compare Source](serde-rs/serde@v1.0.167...v1.0.168)

-   Allow `serde::de::IgnoredAny` to be the type for a `serde(flatten)` field ([#&#8203;2436](serde-rs/serde#2436), thanks [@&#8203;Mingun](https://github.com/Mingun))
-   Allow larger preallocated capacity for smaller elements ([#&#8203;2494](serde-rs/serde#2494))

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNi42LjAiLCJ1cGRhdGVkSW5WZXIiOiIzNi42LjAiLCJ0YXJnZXRCcmFuY2giOiJkZXZlbG9wIn0=-->

Co-authored-by: cabr2-bot <cabr2.help@gmail.com>
Co-authored-by: crapStone <crapstone01@gmail.com>
Reviewed-on: https://codeberg.org/Calciumdibromid/CaBr2/pulls/1959
Reviewed-by: crapStone <crapstone01@gmail.com>
Co-authored-by: Calciumdibromid Bot <cabr2_bot@noreply.codeberg.org>
Co-committed-by: Calciumdibromid Bot <cabr2_bot@noreply.codeberg.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants