Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error Deserializing Struct With flatten-ed Tagged Enum #278

Open
nlordell opened this issue May 3, 2022 · 3 comments
Open

Error Deserializing Struct With flatten-ed Tagged Enum #278

nlordell opened this issue May 3, 2022 · 3 comments

Comments

@nlordell
Copy link

nlordell commented May 3, 2022

Version

Tested on both v1.1.6 and master branch.

Briefly describe the bug.

Deserializing a struct with a flattened tagged enum field does not seem to work.

Include a complete program demonstrating a problem.

Code:
#![allow(dead_code)]

use serde::de::DeserializeOwned;
use std::fmt::Debug;

mod tagged_enum {
    use serde::Deserialize;

    #[derive(Debug, Deserialize)]
    pub struct Record {
        common: u64,
        #[serde(flatten)]
        kind: Kind,
    }

    #[derive(Debug, Deserialize)]
    #[serde(tag = "kind", content = "parameter", rename_all = "lowercase")]
    enum Kind {
        Foo(u64),
        Bar(bool),
    }
}

mod flattened_struct {
    use serde::Deserialize;

    #[derive(Debug, Deserialize)]
    pub struct Record {
        common: u64,
        #[serde(flatten)]
        inner: Inner,
    }

    #[derive(Debug, Deserialize)]
    struct Inner {
        kind: Kind,
        parameter: Value,
    }

    #[derive(Debug, Deserialize)]
    #[serde(rename_all = "lowercase")]
    enum Kind {
        Foo,
        Bar,
    }

    #[derive(Debug, Deserialize)]
    #[serde(untagged)]
    enum Value {
        Num(u64),
        Bool(bool),
    }
}

const CSV: &[u8] = b"\
common,kind,parameter
1,foo,42
2,bar,true";

fn dbg_records<T>()
where
    T: Debug + DeserializeOwned,
{
    let mut reader = csv::Reader::from_reader(CSV);
    for record in reader.deserialize::<T>() {
        println!("{:#?}", record);
    }
}

fn main() {
    dbg_records::<tagged_enum::Record>(); // Error!
    dbg_records::<flattened_struct::Record>(); // OK.
}

(Playground link).

What is the observed behavior of the code above?

Fails to deserialize with invalid type: byte array, expected "kind", "parameter", or other ignored fields error:
Err(
    Error(
        Deserialize {
            pos: Some(
                Position {
                    byte: 22,
                    line: 2,
                    record: 1,
                },
            ),
            err: DeserializeError {
                field: None,
                kind: Message(
                    "invalid type: byte array, expected \"kind\", \"parameter\", or other ignored fields",
                ),
            },
        },
    ),
)
Err(
    Error(
        Deserialize {
            pos: Some(
                Position {
                    byte: 31,
                    line: 3,
                    record: 2,
                },
            ),
            err: DeserializeError {
                field: None,
                kind: Message(
                    "invalid type: byte array, expected \"kind\", \"parameter\", or other ignored fields",
                ),
            },
        },
    ),
)

What is the expected or desired behavior of the code above?

To correctly deserialize records as they are done in the case of flattened structs (also included in code example).

@nlordell
Copy link
Author

nlordell commented May 3, 2022

I did a little more digging, using cargo +nightly rustc --profile=check -- -Zunpretty=expanded to check out the generated Deserialize implementation.

It looks like flattened structs generate a "field visitor" type that looks like this:

                    impl<'de> _serde::de::Visitor<'de> for __FieldVisitor {
                        type Value = __Field;
                        fn expecting(&self,
                            __formatter: &mut _serde::__private::Formatter)
                            -> _serde::__private::fmt::Result {
                            _serde::__private::Formatter::write_str(__formatter,
                                "variant identifier")
                        }
                        fn visit_u64<__E>(self, __value: u64)
                            -> _serde::__private::Result<Self::Value, __E> where
                            __E: _serde::de::Error {
                            match __value {
                                0u64 => _serde::__private::Ok(__Field::__field0),
                                1u64 => _serde::__private::Ok(__Field::__field1),
                                _ =>
                                    _serde::__private::Err(_serde::de::Error::invalid_value(_serde::de::Unexpected::Unsigned(__value),
                                            &"variant index 0 <= i < 2")),
                            }
                        }
                        fn visit_str<__E>(self, __value: &str)
                            -> _serde::__private::Result<Self::Value, __E> where
                            __E: _serde::de::Error {
                            match __value {
                                "foo" => _serde::__private::Ok(__Field::__field0),
                                "bar" => _serde::__private::Ok(__Field::__field1),
                                _ => {
                                    _serde::__private::Err(_serde::de::Error::unknown_variant(__value,
                                            VARIANTS))
                                }
                            }
                        }
                        fn visit_bytes<__E>(self, __value: &[u8])
                            -> _serde::__private::Result<Self::Value, __E> where
                            __E: _serde::de::Error {
                            match __value {
                                b"foo" => _serde::__private::Ok(__Field::__field0),
                                b"bar" => _serde::__private::Ok(__Field::__field1),
                                _ => {
                                    let __value = &_serde::__private::from_utf8_lossy(__value);
                                    _serde::__private::Err(_serde::de::Error::unknown_variant(__value,
                                            VARIANTS))
                                }
                            }
                        }
                    }

Meaning it accepts map keys as:

  • indices starting from 0
  • &str matching the names of the fields
  • &[u8] matching the UTF-8 bytes of the names of the fields

In contrast, tagged enums use a "tag-content-other field visitor":

    impl<'de> Visitor<'de> for TagContentOtherFieldVisitor {
        type Value = TagContentOtherField;

        fn expecting(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
            write!(
                formatter,
                "{:?}, {:?}, or other ignored fields",
                self.tag, self.content
            )
        }

        fn visit_str<E>(self, field: &str) -> Result<Self::Value, E>
        where
            E: de::Error,
        {
            if field == self.tag {
                Ok(TagContentOtherField::Tag)
            } else if field == self.content {
                Ok(TagContentOtherField::Content)
            } else {
                Ok(TagContentOtherField::Other)
            }
        }
    }

Note that the tagged enum field visitor only accepts &str field names. This is a problem since the deserializer uses "borrowed bytes deserializer" which forwards all deserialization calls to visit_bytes which TagContentOtherFieldVisitor does not implement (or, to be more specific, implements the method by returning an error unconditionally):

seed.deserialize(BorrowedBytesDeserializer::new(field)).map(Some)

So, I'm not sure how this should be fixed. Should

  • csv do a little UTF-8 decoding dance?
    if let Ok(field) = str::from_utf8(field) {
        seed.deserialize(BorrowedStrDeserializer::new(field)).map(Some)
    } else {
        seed.deserialize(BorrowedBytesDeserializer::new(field)).map(Some)
    }
  • serde::TagContentOtherFieldVisitor should also add implementations for visit_bytes?

Personally, the second option looks "cleaner" to me. Not sure what others think.

@Nonius
Copy link

Nonius commented May 24, 2022

I think I have much smaller example that might be related to this issue.

@dtolnay
Copy link

dtolnay commented May 5, 2023

This is fixed in serde 1.0.162 by serde-rs/serde#2377.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants