Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignore undeclared element when a $value field is present #596

Open
emarsden opened this issue May 1, 2023 · 3 comments
Open

Ignore undeclared element when a $value field is present #596

emarsden opened this issue May 1, 2023 · 3 comments
Labels
enhancement help wanted serde Issues related to mapping from Rust types to XML

Comments

@emarsden
Copy link

emarsden commented May 1, 2023

The default behaviour of ignoring XML elements that are not declared is very useful when dealing with XML that follows an extensible schema. However, including a rename="$value" field changes this behaviour, and undeclared elements generate an UnexpectedStart parse error.

#[derive(Debug, Default, Serialize, Deserialize)]
struct Root {
    #[serde(rename = "$value")]
    content: Option<String>,
}

let xml = r#"<Root><Foo/></Root>"#;
let r1: Result<Root, quick_xml::DeError> = quick_xml::de::from_str(xml);
//  Parse error: UnexpectedStart([70, 111, 111])

If the rename = "$value" is replaced by rename = "$text" then the problem does not arise (parsing is successful and the Foo element is ignored). Is this the intended behaviour?

@Mingun Mingun added the serde Issues related to mapping from Rust types to XML label May 1, 2023
@Mingun
Copy link
Collaborator

Mingun commented May 2, 2023

Yes, for the sake of consistency it would be good to accept such XML. The problem only in construction of consistent rules that will not explode when combining. It is convenient to consider mapping of Rust types as definition of XSD types, which already has consistent rules. The presented type can be expressed in XSD at least by two definitions:

<?xml version="1.0" encoding="utf-8"?>
<xs:schema attributeFormDefault="unqualified"
           elementFormDefault="qualified"
           xmlns:xs="http://www.w3.org/2001/XMLSchema">

  <!--
    Straightforward translation of `struct Root { ... }`.
    Does not allow nested elements (because of
    xs:simpleContent, required for extension of xs:string)
  -->
  <xs:complexType name="Root1">
    <xs:simpleContent>
      <xs:extension base="xs:string"/>
    </xs:simpleContent>
  </xs:complexType>

  <!-- More permissive translation of `struct Root { ... }` -->
  <xs:complexType name="Root2" mixed="true">
    <xs:sequence>
    </xs:sequence>
  </xs:complexType>

</xs:schema>

This is incomplete definitions, because without #[serde(deny_unknown_fields)] we assume that any attributes and any elements can arise inside the type. The above XSD does not reflect that, but that should be relatively simple to add.

I'm not sure, however, that this would be easy to add to quick-xml, because at deserializer side we don't know anything about contents field type. But as you can see from the Root2 type, the mixed attribute is defined in it, so it is the type who know how to deal with strings inside it. So this translation probably could be impossible to implement with serde.

So I leave this open if anyone wish to investigate this rabbit hole.

@anton-dutov
Copy link

anton-dutov commented Sep 29, 2023

Sometimes we need to get body accumulated to string, to deserialize later with different schema which version defined by attr

<root version="01.04"
      messageID="1"
      messageName="SomeTag">
    <SomeTag>
        <OtherTag/>
    </SomeTag>
</root>

Where "SomeTag" should be readed as string with whole content <SomeTag><OtherTag/></SomeTag> for later deserialization depended to version

#[derive(Debug, Serialize, Deserialize, PartialEq)]
#[serde(rename = "root")]
pub struct Message {
    #[serde(rename = "@version")]
    pub version: String,

    #[serde(rename = "@messageID")]
    pub id: u32,

    #[serde(rename = "@messageName")]
    pub method: String,

    #[serde(rename = "$value")]
    pub body: T?,
}

For example xmlserde has special struct Unparsed, but there another limitations

  • We should set exact tag name
  • Out of the box not method to get string from Unparsed

Any way to do that with quick-xml?

@Mingun
Copy link
Collaborator

Mingun commented Sep 30, 2023

I think, that what is you need here is a DOM node, that can be used as a type of body field. There is a minidom crate, build over quick-xml, which probably will close your needs. At time of writing it uses quick-xml 0.28, but I think, it would not be hard to update it.

I also have a plan to add basic DOM support to quick-xml itself and have a branch for that, but the work at the very beginning stage. My current goals are:

  • fix serde >=1.0.181 has breaking changes that fails enum_::adjacently_tagged tests #630 and release 0.31.0
  • rewrite a parser to fix various bugs and improve performance (we are the second based on our benchmarks, the leader is maybe_xml). I have been doing this for almost a month and I plan to include that in 0.32.0
  • rework our errors and include position information to them -- 0.32.0
  • rework the ways to configure parser (using a struct Config with fields instead of calling methods) -- 0.32.0
  • release 0.32.0 at the end of year
  • finish DOM implementation -- 0.33.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement help wanted serde Issues related to mapping from Rust types to XML
Projects
None yet
Development

No branches or pull requests

3 participants