Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collect errors when deserializing untagged enums #1544

Closed
wants to merge 1 commit into from

Conversation

killercup
Copy link
Contributor

This also adds a verbose-debug feature that enables the new behavior.

Previously:

data did not match any variant of untagged enum ReferenceOr at line 6 column 4

Now, with verbose-debug feature enabled:

data did not match any variant of untagged enum ReferenceOr

  • attempted to deserialize Reference but failed with: missing field $ref
  • attempted to deserialize Item but failed with: invalid value: string "lol", expected expected format \dXX at line 6 column 4

cf. #773

@killercup
Copy link
Contributor Author

See this more as a draft that just happens to help me debug stuff locally :)

Bikeshedding of feature name and error format very welcome.

@BlinkyStitt
Copy link

Just used this to debug some nested enums that had a typo. Very helpful! Thanks!

@killercup
Copy link
Contributor Author

@dtolnay are you interested in merging this? I'm not sure what Serde's approach to verbose debugging is/whether there is any precedence for this.

@DerrikMilligan
Copy link

What would be holding us back from merging this in? Seems fantastic.

@piegamesde
Copy link

I cannot get this branch to work. Compiling throws a lot of errors like:

error[E0277]: the trait bound `serde_yaml::Mapping: serde::Deserialize<'_>` is not satisfied
  --> src/config.rs:48:2
   |
48 |     #[serde(default)]
   |     ^ the trait `serde::Deserialize<'_>` is not implemented for `serde_yaml::Mapping`
   |
   = note: required by `serde::de::SeqAccess::next_element`

@Stargateur
Copy link

Stargateur commented Aug 9, 2020

how to use it along with serde_json cause I can override serde version but not serde_json version of serde

Well after try everything, I clone serde_bytes, serde_json, serde, did some git rebase and it's worked.
BTW, that help me a lot, need this feature in serde but maybe named verbose_debug

@BlinkyStitt
Copy link

BlinkyStitt commented Aug 9, 2020 via email

@Stargateur
Copy link

Stargateur commented Aug 9, 2020

You can use [patch] in your Cargo.toml. Check out this section of the rust book: https://doc.rust-lang.org/cargo/reference/overriding-dependencies.html

On Aug 9, 2020, at 4:09 AM, Antoine PLASKOWSKI @.***> wrote:  how to use it along with serde_json cause I can override serde version but not serde_json version of serde — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

This wouldn't work cause there would be two version of serde, and so as @piegamesde reported this cause Deserialize to not be implement for serde_json version of serde.

@yaymukund
Copy link

yaymukund commented Sep 17, 2020

Thanks for doing this work. Is this behind a flag because the message is too long or because collecting the debugging information comes at a performance cost (or something else)?

This also adds a verbose-debug feature that enables the new behavior.

Previously:

> data did not match any variant of untagged enum ReferenceOr at line 6 column 4

Now, with `verbose-debug` feature enabled:

> data did not match any variant of untagged enum `ReferenceOr`
> 	- attempted to deserialize `Reference` but failed with: missing field `$ref`
> 	- attempted to deserialize `Item` but failed with: invalid value: string "lol", expected expected format `\dXX` at line 6 column 4

cf. serde-rs#773
@killercup
Copy link
Contributor Author

Just had a use for this again because the error messages serde gives me when reading in a config file are not helpful to users.

@dtolnay I'd really appreciate if you could have a look at this :)

@andrewdavidmackenzie
Copy link

+1 for adding this or more extensive error messages - some failures are very hard to debug.

@calavera
Copy link

calavera commented Dec 31, 2020

Can any of the maintainers take a look at this change? Debugging deserialization problems into enums is basically impossible without it.

@panther99
Copy link

I know it has been asked already but I think maintainers should seriously consider checking this. It looks like a small change and it will be of huge help to all of us using untagged enums in our projects.

@torkleyy
Copy link

Is there any reason why this is not being considered? This would be so helpful for actually finding errors in YAML files.

@sztomi
Copy link

sztomi commented Aug 29, 2021

I really need something like this. Unfortunately I was not able to use the git version in the PR repo because it's a bit outdated. My use-case: I'm querying an API that either returns an error JSON or the value requested. The API is apparently unstable because after some time, my code no longer works. I'm trying to work out what's not matching in the JSON, but it's really difficult because it's a huge chunk and it almost fits. Is it a type not matching? Is it a field that became optional? I have no way of knowing without looking at each field in my struct and trying to manually match that against the JSON I got.

My workaround is likely going to be: identify an error response without deserializing (i.e. dumb string matching) and then deserializing only the concrete types and use serde_path_to_error to gain some insight.

@Stargateur
Copy link

@dtolnay ?

@nirvana-msu
Copy link

@dtolnay there seems to be a fair amount of interest for this issue. We'd all appreciate if you let us know what your concerns are, and if you see a path for it to get merged.

@kurtbuilds
Copy link

kurtbuilds commented Sep 13, 2022

For anyone who encounters this issue today, you need a branch where the serde version matches your other dependencies.

I took a similar commit (that uses no compiler feature flag) from this fork: https://github.com/jonasbb/serde/tree/dev and cherry picked it on top of mainline's master.

Result:
https://github.com/kurtbuilds/serde is version 1.0.144 of serde, and it will give you better error messages. Use it with these lines in your Cargo.toml

[patch.crates-io]
serde_derive = { git = "https://github.com/kurtbuilds/serde" }

Hopefully this is mainlined soon.

@AlexTMjugador
Copy link

AlexTMjugador commented Oct 1, 2022

Serde has such a fast release cadence that the repository that @kurtbuilds kindly provided is already out of date. If anyone wants this patch, they practically need to keep their own automatically-updated fork with the patch (using tools such as the wei/pull GitHub Action), or else bend over backwards to pin to a Serde version and make sure that nothing ends up pulling a slightly different one, including dependencies of dependencies. Both approaches are inconvenient and contribute to technical debt in downstream applications.

What has been blocking this improvement for more than three years now? It's a really, really welcome change for those that use Serde to parse user-provided configuration.

Edit: I've set up a fork that is automatically updated with wei/pull here. It should be easy to keep it up to date at least.

Edit 2: wei/pull was not working as expected and I could not troubleshoot the cause. I've switched to a periodic GitHub Actions job that automatically opens and merges PRs instead. The fork is updated to 1.0.147 at the time of writing. For reference, the only relevant patch commit applied to my fork is this one: ComunidadAylas@553519b

Edit 3 (November 28th, 2022): wei/pull suddenly started "working" and rewriting the tree so that the patches were discarded. I've disabled it for good and reviewed the GitHub Actions sync job to merge upstream commits by squashing them, so hopefully commit hashes stay constant without the noise of merge commits. I've also updated the fork to 1.0.148. The aforementioned patch commit is now this one.

@coltfred
Copy link

@dtolnay Did you have any thoughts on this approach? When trying to build user facing tools/services this would be a big help.

If you have thoughts on a different approach that's cool too. It'd be great to get some collaboration going on it.

jonasbb added a commit to jonasbb/serde_with that referenced this pull request Apr 2, 2023
Untagged enums do not provide good error messages and likely never will,
given that there are multiple PRs which are just completely ignored
([serde#2376](serde-rs/serde#2376) and
[serde#1544](serde-rs/serde#1544)).

Instead using `content::de` the untagged enums can be replaced by custom
buffering. The error messages for `OneOrMany` and `PickFirst` now look
like this, including the original failure for each variant.

```text
OneOrMany could not deserialize any variant:
  One: invalid type: map, expected u32
  Many: invalid type: map, expected a sequence
```

```text
PickFirst could not deserialize any variant:
  First: invalid type: string "Abc", expected u32
  Second: invalid digit found in string
```

The implementations of `VecSkipError` and `DefaultOnError` are updated
too, but should not result in any visible changes.
jonasbb added a commit to jonasbb/serde_with that referenced this pull request Apr 2, 2023
Untagged enums do not provide good error messages and likely never will,
given that there are multiple PRs which are just completely ignored
([serde#2376](serde-rs/serde#2376) and
[serde#1544](serde-rs/serde#1544)).

Instead using `content::de` the untagged enums can be replaced by custom
buffering. The error messages for `OneOrMany` and `PickFirst` now look
like this, including the original failure for each variant.

```text
OneOrMany could not deserialize any variant:
  One: invalid type: map, expected u32
  Many: invalid type: map, expected a sequence
```

```text
PickFirst could not deserialize any variant:
  First: invalid type: string "Abc", expected u32
  Second: invalid digit found in string
```

The implementations of `VecSkipError` and `DefaultOnError` are updated
too, but should not result in any visible changes.
bors bot added a commit to jonasbb/serde_with that referenced this pull request Apr 2, 2023
586: Improve error messaged by dropping untagged enums r=jonasbb a=jonasbb

Untagged enums do not provide good error messages and likely never will,
given that there are multiple PRs which are just completely ignored
([serde#2376](serde-rs/serde#2376) and
[serde#1544](serde-rs/serde#1544)).

Instead using `content::de` the untagged enums can be replaced by custom
buffering. The error messages for `OneOrMany` and `PickFirst` now look
like this, including the original failure for each variant.

```text
OneOrMany could not deserialize any variant:
  One: invalid type: map, expected u32
  Many: invalid type: map, expected a sequence
```

```text
PickFirst could not deserialize any variant:
  First: invalid type: string "Abc", expected u32
  Second: invalid digit found in string
```

The implementations of `VecSkipError` and `DefaultOnError` are updated
too, but should not result in any visible changes.

Co-authored-by: Jonas Bushart <jonas@bushart.org>
@AlexTMjugador
Copy link

AlexTMjugador commented Jun 11, 2023

#2403 introduced new merge conflicts for this PR, which I've just fixed in my fork (if anyone's using my fork and notices any problem due to that, please file an issue on the fork). If conflicts are keeping this PR from being merged, I'd be happy to contribute and submit my updated patches upstream.

Edit (2023-08-04): 6f1f38d introduced yet another merge conflict, which I've fixed in my fork. The full list of changes from upstream that will be updated automatically as I resolve conflicts can be found here.

Edit 2 (2023-08-04): the above conflict resolution was not enough to get serde to build on at least some dependent projects, as the Deserialize and Serialize procedural macro re-exports at serde fail to locate such procedural macro definitions on serde_derive. I suspect this is caused due to recent changes to precompile serde_derive, but I'm at a loss as why this happens. I'm exploring more maintainable solutions based on custom visitors instead, as recommended in the PR closing comment below.

@0xffffharry
Copy link

When will this PR be merged? This is really useful! There is so little misinformation now!

Copy link
Member

@dtolnay dtolnay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach was rejected in a different PR at #2376.

@dtolnay dtolnay closed this Aug 4, 2023
@Emilgardis
Copy link

One way to substitute this is with serde_path_to_error, for me, that has completely satisfied what this pr did.

@AlexTMjugador
Copy link

AlexTMjugador commented Aug 4, 2023

Thank you @dtolnay for your clarification on the status of this PR - I appreciate being decisive and explicit with PRs as important to certain users as this one has been for years.

@33KK
Copy link

33KK commented Sep 7, 2023

This approach was rejected in a different PR at #2376.

@dtolnay I'm sorry but this just doesn't make sense to me. The reasoning in rejecting #2376 suggests that this is a hack and there's a better way to do this, even though that's not true at all.

EDIT: serde-with has also been suggested as a place to implement this, and they also said it's not doable without forking serde_derive.

Implementing a visitor only works when the enum doesn't require rolling the parser back which is not possible in serde or buffering the input, which is as far as I know only possible with an internal API __private::de::Content. The options are to reimplement or use directly. Using it directly is not ideal since it's a private unstable API. Reimplementation seems to be at least 1000 LOC of duplicated code and effort.

There's also serde-untagged which basically only supports matching on the type, so it's not really a solution either. And even then it's also quite complex at more than >1000 LOC.

Forking serde and serde_derive is obviously complicated since that would also require patching the entire dependency chain to use that fixed version. That is on top of maintaining the fork, keeping it up to date and resolving any conflicts.

Implementing a derive macro is not a good option either since it has the same issues as implementing a custom visitor, on top of the even higher complexity of implementing the derive macro.

There is quite a decent amount of interest in this since this PR is 2nd most upvoted PR and would be the 5th most upvoted issue. And there is no good solution still. This PR basically provides the only solution that is not overly complex and works in every case. I think this feature is pretty much basic and a must-have, since currently it's basically impossible to tell what went wrong in an untagged enum, especially when it's the top level type or has a lot of children, then the entire error is completely worthless.

@GeeWee
Copy link

GeeWee commented Sep 7, 2023

When trying to use serde-untagged I also ran into issues with not being able to roll the parser back thus not being able to easily deserialize an enum containing multiple structs. However I guess an alternate implementation could potentially either fork/copy over the Content API or buffer to some other intermediate struct such as serde_json::Value - it does however require re-implementing a fair bit of code for each struct just to get error messages that are decent.

@AlexTMjugador
Copy link

Adding to some of the points exposed by @33KK:

Forking serde and serde_derive is obviously complicated since that would also require patching the entire dependency chain to use that fixed version. That is on top of maintaining the fork, keeping it up to date and resolving any conflicts.

I've been doing that maintenance work for a while now, so other people don't have to. Maintaining that fork has been straightforward so far, barring the time when precompiled binaries where introduced, but they ended up being reverted for a plethora of reasons. That does not solve the problem that patching serde and serde_derive is only viable for end applications, but fortunately those are the most affected by these cryptic error messages.

[...] I think this feature is pretty much basic and a must-have, since currently it's basically impossible to tell what went wrong in an untagged enum, especially when it's the top level type or has a lot of children, then the entire error is completely worthless.

While I agree with this, I recently stumbled upon and documented the little-known expecting container attribute, which allows setting a custom fixed error message when deserializing untagged enums. By design, such a message lacks any situational information, unlike the changes proposed by this PR, but applications can write it so to be user-friendly enough for most circumstances, and supersede this PR for the applicable use cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet