Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

collapse_exception_group breaks with exceptions that are also frozen dataclasses #2607

Open
Badg opened this issue Mar 13, 2023 · 7 comments

Comments

@Badg
Copy link

Badg commented Mar 13, 2023

Quick preface:

  • this only happens with strict_exception_groups=False, and enabling strict exception groups fixes the "problem"
  • I'm not even sure I would necessarily consider this a "problem" with the code per se, but it should probably be at least documented somewhere
  • I realize that strict_exception_groups=False is on the deprecation march, so even if this is considered a problem, it might not be worthwhile to fix it
  • feel free to immediately close as wontfix!

That being said, in the interests of "more information is always better than less information" and "something something posterity", I thought I'd throw up a quick flag: collapse_exception_group breaks if the exception it's trying to collapse also happens to be a frozen dataclass. When trio tries to set the __traceback__ attribute, dataclass' __setattr__ magic kicks in, and you get a FrozenInstanceError. Also, since this happens in the middle of nursery exception management, it kinda... blows everything up.

Again, for me, the solution was suuuuper simple -- just enable strict exception groups and update the rest of my code to match. Voila, and I get futureproofing free of charge! But it wasn't immediately clear to me that that would be a viable solution, until I poked around some inside the trio codebase. So it might be helpful to document that somewhere (I have no idea where, or I'd just have directly opened up a PR).

@oremanj
Copy link
Member

oremanj commented Mar 13, 2023

I had never considered the possibility of a frozen exception before. A bunch of the built-in attributes are sort of inherently mutable (including __context__, __cause__, add_note, etc). You can I-guess get away with it if all of their manipulations happen in C, because that will bypass the custom __setattr__ that enforces the frozenness. But some Python code needs to set these and a frozen exception is making its life really difficult. One non-Trio example: context.ExitStack.__exit__ will try to assign to __context__ in some cases.

Trio also assigns to __context__, in a pretty performance-sensitive location (cancel scope and nursery exit) that is probably not amenable to a test for __context__ assignability. We can't remove this assignment as it's necessary to avoid having a nursery's nested-child exception become the __context__ for the nursery's MultiError. And this will not be going away eventually, unlike the __traceback__ assignment you ran into.

I think the only surefire fix here is to either stop using frozen exception types or to freeze them in a way that allows the BaseException magic attributes to still be accessed. (Suggestion: if the attribute being set exists on BaseException, then allow the set to proceed.)

Trio might be able to work around this by doing the equivalent of BaseException.__context__.__set__(exc, context) rather than exc.__context__ = context. That's pretty unpleasant though, and you'll still have the ExitStack hazard.

@A5rocks
Copy link
Contributor

A5rocks commented Mar 14, 2023

I believe attrs itself sets the attributes on init (it has to somehow) via object.__setattr__(self, 'attribute name', value). We could use that?

>>> import attrs
>>> @attrs.frozen
... class A:
...   x: int
...
>>> a = A(42)
>>> a
A(x=42)
>>> object.__setattr__(a, 'x', 43)
>>> a
A(x=43)
>>> a.x = 43
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\A5rocks\Documents\Temp\venv\lib\site-packages\attr\_make.py", line 617, in _frozen_setattrs
    raise FrozenInstanceError()
attr.exceptions.FrozenInstanceError

@A5rocks
Copy link
Contributor

A5rocks commented Mar 14, 2023

Alternatively, we can fix this upstream?

IE attrs (if it detects that you're making an exception in auto_exc=True mode, which is the default, and if you're frozen) could allow certain (which?) attributes to be overwritten in __setattr__?

I'm surprised it doesn't already have this functionality! I may have missed something in my tests:

>>> @attrs.frozen
... class B(Exception):
...   pass
...
>>> b = B()
>>> b.__context__ = None
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\A5rocks\Documents\Temp\venv\lib\site-packages\attr\_make.py", line 617, in _frozen_setattrs
    raise FrozenInstanceError()
attr.exceptions.FrozenInstanceError
>>> b.args  # testing auto_exc
()

EDIT: turns out attrs special cases PyPy to allow this! python-attrs/attrs#712

@oremanj
Copy link
Member

oremanj commented Mar 14, 2023

It looks like attrs already has the full fix! python-attrs/attrs#1081 -- Jan of this year, so not in a release yet.

The original request was for dataclasses, which are harder to fix upstream. I'd be happy to entertain a Trio PR that grabs the descriptors for BaseException.__traceback__ and BaseException.__context__, and uses them to set these attributes when necessary.

@A5rocks
Copy link
Contributor

A5rocks commented Mar 14, 2023

Huh I don't know why my brain went dataclasses -> attrs. Oops!

I'd be happy to entertain a Trio PR that grabs the descriptors for BaseException.__traceback__ and BaseException.__context__, and uses them to set these attributes when necessary.

I assume just calling object.__setattr__(exception, '__traceback__', ...) wouldn't work then? (That's what I suggested in #2607 (comment))

@Badg
Copy link
Author

Badg commented Mar 14, 2023

I assume just calling object.setattr(exception, 'traceback', ...) wouldn't work then?

That would work, and it's actually a very common pattern (at least in my experience) when working with frozen dataclasses. In theory, the downside is that it potentially breaks subclassing for anyone that wants to implement their own __setattr__ for the dataclass. However, as per stdlib docs:

frozen: If true (the default is False), assigning to fields will generate an exception. This emulates read-only frozen instances. If __setattr__() or __delattr__() is defined in the class, then TypeError is raised. See the discussion below.

So actually, I think this "potential downside" is really a non-issue, since dataclasses explicitly forbids it. So calling object.__setattr__ directly would certainly be a valid fix for trio!

Edit: I missed the nuance that we're talking about descriptors here. I wasn't 100% sure if that would still play nicely together, so I tried it out. This worked fine:

>>> import dataclasses
>>> @dataclasses.dataclass(frozen=True)
... class TestFrozenException(Exception):
...   foo: str
...
>>> instance = TestFrozenException(foo='bar')
>>> instance.__traceback__ = None
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 4, in __setattr__
dataclasses.FrozenInstanceError: cannot assign to field '__traceback__'
>>> object.__setattr__(instance, '__traceback__', None)
>>> instance.__traceback__

That being said, I would be a huge supporter of "fixing it upstream", it's just -- as @oremanj pointed out -- then we're talking about the stdlib. I personally have never contributed to the stdlib, so I have absolutely no idea what the procedure would be there, though -- would I need a PEP? Is this small enough to just write a PR? etc.

A quick search through the cpython issues shows that other people have encountered this same problem in other contexts. And I would say that, at least on my end, my most common use case for frozen dataclasses by far is for hashability. However, I very frequently have need for other, non-dataclass-field attributes (that aren't included in the hash) that can be mutated after creation. Caching lookups on things is one example; another is lazy-loading IDs from a database when manipulating API requests (the API request comes off-the-line without the database ID, you do some set/dict manipulations locally, and then you need to upsert the record in the database, returning its ID, which you then store on the instance to avoid needing another database request). And various other bookkeeping objects are also frequent uses, where you need to store some ephemeral in-memory application state (ideally as plain instance attributes) along with some static distributed state (as frozen fields). I use dataclasses extensively both in my dayjob and in personal projects, and probably about 50% of the frozen dataclass classes I create need a workaround -- either through the object.__setattr__ trick above, or through encapsulating a non-frozen dataclass within a frozen one (so, having the ephemeral state container as a field on the frozen state container). Neither solution is ideal, and if you're working around an existing API (like with exceptions), neither of them is actually reliable.

What I wish there were within dataclasses, were a frozen_fields parameter to the dataclass constructor, that I could use instead of frozen=True. Here, the behavior would be exactly the same as today, except that it would only raise FrozenInstanceError if you were attempting to set the value on something that was defined as a dataclass field. That would be the best of both worlds, and it would mean I wouldn't need to keep writing these awkward workarounds anymore. It also sidesteps the backwards compatibility issue, because it's a completely new parameter, and doesn't change existing behavior.

@Badg
Copy link
Author

Badg commented Mar 14, 2023

Also, an example use case for why you might want to have frozen dataclass instances for exceptions: at my dayjob we're running... what could be concisely but not 100% accurately described as an ETL job... over customer data. Long story short, we get CSV files with 100s to 1000s (or more, but it's still relatively small-scale) of customer data rows in them. These data rows might have errors. We have two base error classes we use for collecting the file errors -- CustomerError and ServerError, both of which inherit from ReportableErrorBase. We send back a collected error report to the customer, so they can fix all of the problems they have with the file in a single shot. We use the dataclass hashability to determine the "sameness" of the errors, and above a certain heuristic threshold, we group the errors together and send back a BulkErrorReport instead of a list of individual errors. (the threshold is how we decide if we think "this is probably an error with the entire column", like having a wrong number format specification -- vs "this is an error within this specific row", like "NaN is not allowed here".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants