Skip to content

Commit

Permalink
Add minimal support for internally tagged and untagged enums (ron-rs#451
Browse files Browse the repository at this point in the history
)

* Add minimal support for internally tagged and untagged enums

* Edge cases over edge cases

* Add CHANGELOG entry and update README

* Fix the PR references in the README
  • Loading branch information
juntyr committed Aug 16, 2023
1 parent 271c1cd commit 5f92fa7
Show file tree
Hide file tree
Showing 8 changed files with 441 additions and 40 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- [Non-API] Breaking: Bump `bitflags` dependency to 2.0, changes `serde` impls of `Extensions` ([#443](https://github.com/ron-rs/ron/pull/443))
- Add `Map::retain` method ([#460](https://github.com/ron-rs/ron/pull/460))
- Bump MSRV to 1.64.0 and bump dependency: `indexmap` to 2.0 ([#459](https://github.com/ron-rs/ron/pull/459))
- Add minimal roundtripping support for `#[serde(tag = "tag")]`, `#[serde(tag = "tag", content = "content")]`, and `#[serde(untagged)]` enums ([#451](https://github.com/ron-rs/ron/pull/451))

## [0.8.0] - 2022-08-17

Expand Down
14 changes: 9 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -104,13 +104,17 @@ Note the following advantages of RON over JSON:

## Limitations

RON is not designed to be a fully self-describing format (unlike JSON) and is thus not guaranteed to work when [`deserialize_any`](https://docs.rs/serde/latest/serde/trait.Deserializer.html#tymethod.deserialize_any) is used instead of its typed alternatives. In particular, the following Serde attributes are not yet supported:
- `#[serde(tag = "type")]`, i.e. internally tagged enums
- `#[serde(untagged)]`, i.e. untagged enums
RON is not designed to be a fully self-describing format (unlike JSON) and is thus not guaranteed to work when [`deserialize_any`](https://docs.rs/serde/latest/serde/trait.Deserializer.html#tymethod.deserialize_any) is used instead of its typed alternatives. In particular, the following Serde attributes only have limited support:

Furthermore, `#[serde(flatten)]` only has limited support and relies on a small hack [^serde-flatten-hack]. Specifically, flattened structs are only serialised as maps and deserialised from maps. However, this limited implementation supports full roundtripping.
- `#[serde(tag = "tag")]`, i.e. internally tagged enums [^serde-enum-hack]
- `#[serde(tag = "tag", content = "content")]`, i.e. adjacently tagged enums [^serde-enum-hack]
- `#[serde(untagged)]`, i.e. untagged enums [^serde-enum-hack]
- `#[serde(flatten)]`, i.e. flattening of structs into maps [^serde-flatten-hack]

[^serde-flatten-hack]: Deserialising a flattened struct from a map requires that the struct's [`Visitor::expecting`](https://docs.rs/serde/latest/serde/de/trait.Visitor.html#tymethod.expecting) implementation formats a string starting with `"struct "`. This is the case for automatically-derived [`Deserialize`](https://docs.rs/serde/latest/serde/de/trait.Deserialize.html) impls on structs.
While data structures with any of these attributes should roundtrip through RON, their textual representation may not always match your expectation. For instance, flattened structs are only serialised as maps and deserialised from maps.

[^serde-enum-hack]: Deserialising an internally, adjacently, or un-tagged enum requires detecting `serde`'s internal `serde::__private::de::content::Content` content type so that RON can describe the deserialised data structure in serde's internal JSON-like format. This detection only works for the automatically-derived [`Deserialize`](https://docs.rs/serde/latest/serde/de/trait.Deserialize.html) impls on enums. See [#451](https://github.com/ron-rs/ron/pull/451) for more details.
[^serde-flatten-hack]: Deserialising a flattened struct from a map requires that the struct's [`Visitor::expecting`](https://docs.rs/serde/latest/serde/de/trait.Visitor.html#tymethod.expecting) implementation formats a string starting with `"struct "`. This is the case for automatically-derived [`Deserialize`](https://docs.rs/serde/latest/serde/de/trait.Deserialize.html) impls on structs. See [#455](https://github.com/ron-rs/ron/pull/455) for more details.

## RON syntax overview

Expand Down
89 changes: 74 additions & 15 deletions src/de/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ use crate::{
error::{Result, SpannedResult},
extensions::Extensions,
options::Options,
parse::{AnyNum, Bytes, ParsedStr, BASE64_ENGINE},
parse::{AnyNum, Bytes, ParsedStr, StructType, BASE64_ENGINE},
};

mod id;
Expand All @@ -33,6 +33,7 @@ mod value;
pub struct Deserializer<'de> {
bytes: Bytes<'de>,
newtype_variant: bool,
serde_content_newtype: bool,
last_identifier: Option<&'de str>,
recursion_limit: Option<usize>,
}
Expand All @@ -56,6 +57,7 @@ impl<'de> Deserializer<'de> {
let mut deserializer = Deserializer {
bytes: Bytes::new(input)?,
newtype_variant: false,
serde_content_newtype: false,
last_identifier: None,
recursion_limit: options.recursion_limit,
};
Expand Down Expand Up @@ -140,25 +142,49 @@ impl<'de> Deserializer<'de> {
/// struct and deserializes it accordingly.
///
/// This method assumes there is no identifier left.
fn handle_any_struct<V>(&mut self, visitor: V) -> Result<V::Value>
fn handle_any_struct<V>(&mut self, visitor: V, ident: Option<&str>) -> Result<V::Value>
where
V: Visitor<'de>,
{
// Create a working copy
let mut bytes = self.bytes;
// HACK: switch to JSON enum semantics for JSON content
// Robust impl blocked on https://github.com/serde-rs/serde/pull/2420
let is_serde_content =
std::any::type_name::<V::Value>() == "serde::__private::de::content::Content";

if bytes.consume("(") {
bytes.skip_ws()?;
let old_serde_content_newtype = self.serde_content_newtype;
self.serde_content_newtype = false;

if bytes.check_tuple_struct()? {
// first argument is technically incorrect, but ignored anyway
self.deserialize_tuple(0, visitor)
} else {
match (self.bytes.check_struct_type()?, ident) {
(StructType::Unit, Some(ident)) if is_serde_content => {
// serde's Content type needs the ident for unit variants
visitor.visit_str(ident)
}
(StructType::Unit, _) => visitor.visit_unit(),
(_, Some(ident)) if is_serde_content => {
// serde's Content type uses a singleton map encoding for enums
visitor.visit_map(SerdeEnumContent {
de: self,
ident: Some(ident),
})
}
(StructType::Named, _) => {
// giving no name results in worse errors but is necessary here
self.handle_struct_after_name("", visitor)
}
} else {
visitor.visit_unit()
(StructType::NewtypeOrTuple, _) if old_serde_content_newtype => {
// deserialize a newtype struct or variant
self.bytes.consume("(");
self.bytes.skip_ws()?;
let result = self.deserialize_any(visitor);
self.bytes.skip_ws()?;
self.bytes.consume(")");

result
}
(StructType::Tuple | StructType::NewtypeOrTuple, _) => {
// first argument is technically incorrect, but ignored anyway
self.deserialize_tuple(0, visitor)
}
}
}

Expand Down Expand Up @@ -243,14 +269,14 @@ impl<'de, 'a> de::Deserializer<'de> for &'a mut Deserializer<'de> {
// `identifier` does not change state if it fails
let ident = self.bytes.identifier().ok();

if ident.is_some() {
if let Some(ident) = ident {
self.bytes.skip_ws()?;

return self.handle_any_struct(visitor);
return self.handle_any_struct(visitor, Some(std::str::from_utf8(ident)?));
}

match self.bytes.peek_or_eof()? {
b'(' => self.handle_any_struct(visitor),
b'(' => self.handle_any_struct(visitor, None),
b'[' => self.deserialize_seq(visitor),
b'{' => self.deserialize_map(visitor),
b'0'..=b'9' | b'+' | b'-' => {
Expand Down Expand Up @@ -925,3 +951,36 @@ fn struct_error_name(error: Error, name: Option<&str>) -> Error {
e => e,
}
}

struct SerdeEnumContent<'a, 'de: 'a> {
de: &'a mut Deserializer<'de>,
ident: Option<&'a str>,
}

impl<'de, 'a> de::MapAccess<'de> for SerdeEnumContent<'a, 'de> {
type Error = Error;

fn next_key_seed<K>(&mut self, seed: K) -> Result<Option<K::Value>>
where
K: DeserializeSeed<'de>,
{
self.ident
.take()
.map(|ident| seed.deserialize(serde::de::value::StrDeserializer::new(ident)))
.transpose()
}

fn next_value_seed<V>(&mut self, seed: V) -> Result<V::Value>
where
V: DeserializeSeed<'de>,
{
self.de.bytes.skip_ws()?;

let old_serde_content_newtype = self.de.serde_content_newtype;
self.de.serde_content_newtype = true;
let result = seed.deserialize(&mut *self.de);
self.de.serde_content_newtype = old_serde_content_newtype;

result
}
}
73 changes: 65 additions & 8 deletions src/parse.rs
Original file line number Diff line number Diff line change
Expand Up @@ -427,17 +427,67 @@ impl<'a> Bytes<'a> {
.map_or(false, |&b| is_ident_other_char(b))
}

/// Should only be used on a working copy
pub fn check_tuple_struct(mut self) -> Result<bool> {
if self.identifier().is_err() {
// if there's no field ident, this is a tuple struct
return Ok(true);
pub fn check_struct_type(&mut self) -> Result<StructType> {
fn check_struct_type_inner(bytes: &mut Bytes) -> Result<StructType> {
if !bytes.consume("(") {
return Ok(StructType::Unit);
}

bytes.skip_ws()?;

if bytes.identifier().is_ok() {
bytes.skip_ws()?;

match bytes.peek() {
// Definitely a struct with named fields
Some(b':') => return Ok(StructType::Named),
// Definitely a tuple struct with fields
Some(b',') => return Ok(StructType::Tuple),
// Either a newtype or a tuple struct
Some(b')') => return Ok(StructType::NewtypeOrTuple),
// Something else, let's investigate further
_ => (),
};
}

let mut braces = 1;
let mut comma = false;

// Skip ahead to see if the value is followed by a comma
while braces > 0 {
// Skip spurious braces in comments and strings
bytes.skip_ws()?;
let _ = bytes.string();

let c = bytes.eat_byte()?;
if c == b'(' || c == b'[' || c == b'{' {
braces += 1;
} else if c == b')' || c == b']' || c == b'}' {
braces -= 1;
} else if c == b',' && braces == 1 {
comma = true;
break;
}
}

if comma {
Ok(StructType::Tuple)
} else {
Ok(StructType::NewtypeOrTuple)
}
}

self.skip_ws()?;
// Create a temporary working copy
let mut bytes = *self;

// if there is no colon after the ident, this can only be a unit struct
self.eat_byte().map(|c| c != b':')
let result = check_struct_type_inner(&mut bytes);

if result.is_err() {
// Adjust the error span to fit the working copy
*self = bytes;
}

result
}

/// Only returns true if the char after `ident` cannot belong
Expand Down Expand Up @@ -996,6 +1046,13 @@ pub enum ParsedStr<'a> {
Slice(&'a str),
}

pub enum StructType {
NewtypeOrTuple,
Tuple,
Named,
Unit,
}

#[cfg(test)]
mod tests {
use super::*;
Expand Down
2 changes: 1 addition & 1 deletion tests/117_untagged_tuple_variant.rs
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ fn test_ebkalderon_case() {
flags: [
"--enable-thing",
"--enable-other-thing",
If("some-conditional", ["--enable-third-thing"]),
("some-conditional", ["--enable-third-thing"]),
]
)
"#;
Expand Down
Binary file modified tests/307_stack_overflow.rs
Binary file not shown.

0 comments on commit 5f92fa7

Please sign in to comment.