Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RecursionError when building model and parsing data in a loop #5108

Closed
6 of 15 tasks
radiand opened this issue Feb 24, 2023 · 8 comments
Closed
6 of 15 tasks

RecursionError when building model and parsing data in a loop #5108

radiand opened this issue Feb 24, 2023 · 8 comments
Assignees
Labels
bug V1 Bug related to Pydantic V1.X

Comments

@radiand
Copy link

radiand commented Feb 24, 2023

Initial Checks

  • I have searched GitHub for a duplicate issue and I'm sure this is something new
  • I have searched Google & StackOverflow for a solution and couldn't find anything
  • I have read and followed the docs and still think this is a bug
  • I am confident that the issue is with pydantic (not my code, or another library in the ecosystem like FastAPI or mypy)

Description

I'm getting a RecursionError when I'm doing both dataclass model creation and parsing data in each iteration of a for loop. Solution for this is to move creating the model before the loop, and in a loop leave only parsing, but it is surprising to see RecursionError with simple model without recursion at all.

Obviously, recreating the model over and over does not seem to be efficient, however when I started working on my code it was (and still is) good enough.

This code is working on pydantic==1.9.2 and stopped with >=1.10. The 1.9.x version also required using frozen argument to pydantic.dataclasses.dataclass to make it work.

I am aware of: #4949 and related RecursionError issues, but seems to be a bit different.


Context in which this was found: my application uses pydantic to (de-)serialize and validate messages coming from remote places of my application. Messages are defined as standard python @dataclasses and I have access to their definitions, so I can use pydantic_encoder to convert them to JSONs in one place and somewhere else create pydantic dataclasses ad hoc to recreate the original dataclass.

Commentary to the code example: below example code is very simple and one may ask:

Why so complicated if you can instantiate dataclass as: MessageTypeA(**data)

The answer is: pydantic helps me with more complex cases, e.g. when timestamp comes as a string and it has to be converted back to datetime.

Example Code

from dataclasses import dataclass

import pydantic


@dataclass(frozen=True)
class Message:
    content: str


# This works:
model = pydantic.dataclasses.dataclass(Message)
for x in range(10000):
    content = "message_{0}".format(x)
    data = {"content": content}
    obj = pydantic.parse_obj_as(model, data)
    assert obj == Message(content=content)


# This fails:
for x in range(10000):
    content = "message_{0}".format(x)
    data = {"content": content}
    model = pydantic.dataclasses.dataclass(Message)
    obj = pydantic.parse_obj_as(model, data)
    assert obj == Message(content=content)


# This fails too (of course, comment the previous loop too see):
for x in range(10000):
    content = "message_{0}".format(x)
    data = {"content": content}
    model = pydantic.dataclasses.dataclass(Message)
    parsed_dict = model.__pydantic_model__.parse_obj(data).dict()
    obj = Message(**parsed_dict)
    assert obj == Message(content=content)

Python, Pydantic & OS Version

pydantic version: 1.10.5
            pydantic compiled: True
                 install path: /home/radiand/code/github/pydantic-recursion-error/venv/lib/python3.11/site-packages/pydantic
               python version: 3.11.2 (main, Feb 12 2023, 00:48:52) [GCC 12.2.0]
                     platform: Linux-6.1.0-3-amd64-x86_64-with-glibc2.36
     optional deps. installed: ['typing-extensions']

Same problem is visible in python 3.9.13 and 3.10.4

Affected Components

@radiand radiand added bug V1 Bug related to Pydantic V1.X unconfirmed Bug not yet confirmed as valid/applicable labels Feb 24, 2023
@dmontagu
Copy link
Contributor

@radiand I see what is going on here: every time you call pydantic.dataclasses.dataclass it is re-wrapping the __init__-like methods of the dataclass, and eventually it ends up being so repeatedly wrapped that we get a recursion error due to the stack being too deep, as each wrapping adds a couple layers to the init process.

From the perspective of getting a quick solution in place in v1 and moving on (to focus on v2), I'm inclined to either do nothing (but maybe update the docs), or do the following:

  • Add a "fix" to v1 where, if you apply the pydantic dataclass decorator to the same model a second time, the decorator doesn't do anything the second time, but a UserWarning is emitted indicating that nothing is being done. This way you'd at least get the warning that you shouldn't be re-applying the decorator (in a loop or otherwise).
    • (In principle, this could break someone's code, but unless someone can share of a case where this is done on purpose for a good reason, I would consider this a misuse of the pydantic dataclass decorator.)
    • I think it's important to warn and not just silently do nothing because I don't want to try to validate that the decorator was re-applied with the same settings, and if the settings changed, then it might not behave as expected and be confusing to debug.
      • That said, I could be convinced to just do nothing in this case, not even a warning.
    • I also don't think it's worth the effort to figure out how to rework the dataclass decorator in v1 to be idempotent. I tried a few "obvious" things and it seems to me that the provided class is itself directly modified, and because of that I think it would be hard to "undo" and "redo" the dataclass wrapping.
      • (I might be open to a PR if someone wants to fix this properly, I'm just not sure it's worth it for v1 at this stage.)
  • When we implement dataclasses in v2, I think it's worth some effort trying to make it work so that if you apply the pydantic dataclass decorator twice, it has the same effect as it would if you had just applied the "last-applied" version of the decorator in the first place. (So, like being idempotent, except that you can also change the config.) If it's hard to implement I don't think it is necessary, but I think it's a reasonable behavior to strive for and if it's easy to implement we should.
    • I would also be open to just erroring/warning if you apply the decorator multiple times, but, from what I can tell, the standard library dataclass decorator can be applied many times without it seeming to have any impact (I didn't investigate closely, but as far as I could tell there were no issues applying the decorator 10,000 times and then using the class). So that feels like a good reason to try to make re-application of the pydantic decorator also work.

@dmontagu
Copy link
Contributor

One possible case I wasn't considering above -- re-applying the pydantic decorator to subclasses of a base dataclass. This feels much more likely to both be on purpose, and not be warning-worthy. Given that, I'm a bit more nervous to make the changes suggested above, though I think I may have a way to check if the decorator was called on the specific class provided and not a subclass.. I'll look into it.

@dmontagu
Copy link
Contributor

dmontagu commented Feb 24, 2023

Here is an approach that I think we could take to handling the error looking for re-application of the decorator even with subclassing.

Right after the first line in this snippet:

def wrap(cls: Type[Any]) -> 'DataclassClassOrWrapper':
import dataclasses
should_use_proxy = (
use_proxy
if use_proxy is not None
else (
is_builtin_dataclass(cls)
and (cls.__bases__[0] is object or set(dir(cls)) == set(dir(cls.__bases__[0])))
)

We could insert something like:

        existing_model = getattr(cls, '__pydantic_model__', None)
        if existing_model:
            for base in cls.__bases__:
                if getattr(base, '__pydantic_model__', None) is existing_model:
                    break  # the value of the `__pydantic_model__` attribute is coming from a baseclass
            else:
                # Decorator has been re-applied specifically to this class.
                warnings.warn(
                    f'The pydantic.dataclasses.dataclass decorator has been reapplied to {cls.__name__}; '
                    'this will not have any effect.',
                    UserWarning
                )
                return cls

With that change, in the code below, a warning is only emitted on the final line:

import pydantic

@pydantic.dataclasses.dataclass
class Message:
    content: str

@pydantic.dataclasses.dataclass
class MessageWithMore(Message):
    more: str

MessageWithMore = pydantic.dataclasses.dataclass(MessageWithMore)

(And no recursion error is generated in @radiand's example code.)

I'm not sure if there might be other implications here, but I wanted to at least comment with it for future reference.

@samuelcolvin
Copy link
Member

Thanks so much for the explanation @dmontagu, makes sense.

I'm very very unwilling to mess with dataclasses again on v1 unless absolutely necessary. Every change we've made has broken something else.

@radiand
Copy link
Author

radiand commented Feb 25, 2023

First of all - thanks for the analysis and the response.

I tried your solution with refusing to re-apply the wrapper when it has been applied before - it works, also in the actual application where I'm building pydantic models during both serialization and deserialization for each incoming/outcoming message in the system. But well, my dataclasses are purposely simple, they mostly consist of primitive types, they do not subclass each other nor they nest other dataclasses. Therefore, probably my example won't help in spotting what else could be wrong. I didn't have enough time to dive deeply into pydantic code to help you more, at least for now, I'm sorry.

When I was posting this issue I was hesitant to do it, because I had the "solution" anyway, and I also had a feeling that I'm exploiting pydantic in some way. But since this code was working before, I decided to at least let you know. Since V2 is the priority now, I'm totally fine with staying at pydantic==1.9.2 and eventually rework my application to stop doing silly things and switch to 1.10.x release.

Thank you for your excellent work.

@samuelcolvin
Copy link
Member

Great response, thanks so much for understanding.

@samuelcolvin samuelcolvin removed the unconfirmed Bug not yet confirmed as valid/applicable label Feb 25, 2023
@samuelcolvin samuelcolvin added this to the Version 2 Issues milestone Feb 25, 2023
@samuelcolvin
Copy link
Member

Keeping this open to check on v2.

@dmontagu
Copy link
Contributor

In v2, much of the code in the original snippet needs changing. However, after making the appropriate changes, I'm able to run it without error.

One caveat, I wasn't able to create a frozen pydantic dataclass by wrapping an unfrozen one. Also, I think the pydantic dataclass ends up being a different type, so the equality check fails, but that too can be worked around.

The other thing to keep in mind is that we have added TypeAdapter, which is the new replacement for parse_obj_as, and using that you don't even need to create the pydantic dataclass, you can just call adapter = TypeAdapter(dataclass); adapter.validate_python(...) or whatever.

If you notice new issues in v2 we very much appreciate reports, but I'm going to close this for now since it seems to be mostly linked to v1 misbehavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug V1 Bug related to Pydantic V1.X
Projects
None yet
Development

No branches or pull requests

4 participants