Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revisit the representation of variant id for adjacently tagged enums #2496

Closed
dtolnay opened this issue Jul 9, 2023 · 4 comments · Fixed by #2505
Closed

Revisit the representation of variant id for adjacently tagged enums #2496

dtolnay opened this issue Jul 9, 2023 · 4 comments · Fixed by #2505
Labels

Comments

@dtolnay
Copy link
Member

dtolnay commented Jul 9, 2023

Currently the Serialize impl for an untagged enum such as:

#[derive(Serialize, Deserialize)]
#[serde(tag = "t", content = "c")]
enum Enum {
    Variant(T),
    /* ... */
}

looks like this:

impl Serialize for Enum {
    fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
    where
        S: Serializer,
    {
        match self {
            Enum::Variant(__field0) => {
                let mut __struct = serializer.serialize_struct("Enum", 2)?;
                __struct.serialize_field("t", "Variant")?;
                __struct.serialize_field("c", __field0)?;
                __struct.end()
            }
            /* ... */
        }
    }
}

i.e. the variant id is simply serialized using impl Serialize for str.

During deserialization, that "Variant" is deserialized by this type:

enum __VariantIdentifier {
    Variant,
    /* ... */
}

struct __VariantIdentifierVisitor;

const __VARIANTS: &'static [&'static str] = &["Variant", /* ... */];

impl<'de> Visitor<'de> for __VariantIdentifierVisitor {
    type Value = __VariantIdentifier;

    fn expecting(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
        formatter.write_str("variant identifier")
    }

    fn visit_u64<E>(self, value: u64) -> Result<Self::Value, E>
    where
        E: serde::de::Error,
    {
        match value {
            0u64 => Ok(__VariantIdentifier::Variant),
            /* ... */
            _ => Err(E::invalid_value(
                serde::de::Unexpected::Unsigned(value),
                &"variant index 0 <= i < 1",
            )),
        }
    }

    fn visit_str<E>(self, value: &str) -> Result<Self::Value, E>
    where
        E: serde::de::Error,
    {
        match value {
            "Variant" => Ok(__VariantIdentifier::Variant),
            /* ... */
            _ => Err(E::unknown_variant(value, __VARIANTS)),
        }
    }

    fn visit_bytes<E>(self, value: &[u8]) -> Result<Self::Value, E>
    where
        E: serde::de::Error,
    {
        match value {
            b"Variant" => Ok(__VariantIdentifier::Variant),
            /* ... */
            _ => Err(E::unknown_variant(value, __VARIANTS)),
        }
    }
}

impl<'de> Deserialize<'de> for __VariantIdentifier {
    fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
    where
        D: Deserializer<'de>,
    {
        deserializer.deserialize_identifier(__VariantIdentifierVisitor)
    }
}

Notice at least 2 things about this don't make sense:

  1. The presence of visit_u64. There is no possible way that a data format could have turned "Variant" into 0. This is neither a struct field (which are sometimes numbered in order by data formats) nor an externally tagged serde enum (for which serde passes a numeric id into serialize_*_variant, in addition to the string variant name).

  2. The use of deserialize_identifier. In formats that serialize externally tagged enums using the variant number instead of name, deserialize_identifier involves reading a number. It doesn't make sense to serialize something as a str and deserialize as number. Possibly the best thing that could happen, depending on the specific binary format, is that it takes the length of the variant name as if that were the variant index.

We should make one of the following changes:

Deserialize using deserialize_str, delete visit_u64

Keep serializing using serialize_str, and deserialize using deserialize_str instead of deserialize_identifier.

This would fix the broken behavior in formats where deserialize_identifier is not the same as deserialize_str.

Serialize using an externally tagged unit variant

In this approach, the following adjacently tagged enum:

#[derive(Serialize)]
#[serde(tag = "t", content = "c")]
enum Enum {
    Variant(T),
    /* ... */
}

would be serialized as if it were effectively represented by:

#[derive(Serialize)]
struct Enum {
    t: EnumTag,
    c: EnumContent,
}

#[derive(Serialize)]
enum EnumTag {
    Variant,
    /* ... */
}

#[derive(Serialize)]
#[serde(untagged)]
enum EnumContent {
    Variant(T),
    /* ... */
}

This way non-human-readable formats automatically serialize just an integer id for the variant identifier for the adjacently tagged enum, same as they do for externally tagged enums. This should be faster and more compact than serializing the whole variant names as strings.

@dtolnay dtolnay added the derive label Jul 9, 2023
@Mingun
Copy link
Contributor

Mingun commented Jul 9, 2023

Your point 1 is what exactly was required in #2479 so you should think twice about removing this ability

@dtolnay
Copy link
Member Author

dtolnay commented Jul 9, 2023

I don't immediately see the connection. That issue appears to be about deserializing the "t" vs "c" from a number. This issue is about deserializing the "Variant" from a number.

@Mingun
Copy link
Contributor

Mingun commented Jul 10, 2023

Yes, you're right, but you does not right that this is always impossible. Yes, it is impossible right now, but if ability to use arbitrary compile-time expressions in tag and content fields will be permitted (which seems reasonable if we want to permit them in rename attributes), then the one could write:

#[derive(Serialize, Deserialize)]
#[serde(tag = 1, content = 2)]
enum AdjacentlyTagged {
  // ...
}

The related issues for rename are #745, #1773, #1964, #2358, #2485

@Baptistemontan
Copy link
Contributor

#2056 seems also related

@dtolnay dtolnay linked a pull request Aug 2, 2023 that will close this issue
martin-g added a commit to apache/avro that referenced this issue Aug 15, 2023
Also see serde-rs/serde#2496

Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
martin-g added a commit to apache/avro that referenced this issue Aug 15, 2023
* Bump serde from 1.0.180 to 1.0.183 in /lang/rust

Bumps [serde](https://github.com/serde-rs/serde) from 1.0.180 to 1.0.183.
- [Release notes](https://github.com/serde-rs/serde/releases)
- [Commits](serde-rs/serde@v1.0.180...v1.0.183)

---
updated-dependencies:
- dependency-name: serde
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* Update the Serde impl after serde-rs/serde#2505

Also see serde-rs/serde#2496

Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>

---------

Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
martin-g pushed a commit to apache/avro that referenced this issue Aug 15, 2023
* Bump serde from 1.0.180 to 1.0.183 in /lang/rust

Bumps [serde](https://github.com/serde-rs/serde) from 1.0.180 to 1.0.183.
- [Release notes](https://github.com/serde-rs/serde/releases)
- [Commits](serde-rs/serde@v1.0.180...v1.0.183)

---
updated-dependencies:
- dependency-name: serde
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* Update the Serde impl after serde-rs/serde#2505

Also see serde-rs/serde#2496

Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>

---------

Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
(cherry picked from commit e26943f)
tgeoghegan added a commit to divviup/janus that referenced this issue Aug 15, 2023
Recent Serde changes ([1], [2]) changed the internal representation of
enums, breaking our test.

[1]: serde-rs/serde#2496
[2]: serde-rs/serde#2505
tgeoghegan added a commit to divviup/janus that referenced this issue Aug 15, 2023
Recent Serde changes ([1], [2]) changed the internal representation of
enums, breaking our test.

[1]: serde-rs/serde#2496
[2]: serde-rs/serde#2505
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging a pull request may close this issue.

3 participants