Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix backward compatibility with pickles before v22.2.0 #1085

Merged
merged 3 commits into from Jan 25, 2023

Conversation

ngnpope
Copy link
Contributor

@ngnpope ngnpope commented Jan 13, 2023

Summary

Unfortunately we're using django-redis to cache some attrs instances and the change to make __getstate__ and __setstate__ use dict values instead of tuple values prevents reading any values from the cache when updating to attrs>=22.2.0. We get an AttributeError raised for all attributes.

This happens because the tuple of values stored in the original pickle is checked for the name of the attribute and so none of the attributes are set on unpickle.

To address this I've made a change so that __setstate__ will still be able to handle values pickled as tuples allowing existing pickles to be unpickled.

The test just contains a pre-pickled value due to the complexities around tying to mock methods on a slotted as documented1.

I realise that documentation also mentions that we should "think twice" before using pickle, but sometimes those decisions predate our involvement in a project! In addition, pretty much all of Django's cache implementations are built around using pickle by default. This change will allow an easier path to upgrade by unpickling and repickling all values in the cache.

Regression in 89a890a.

Pull Request Check List

  • Added tests for changed code.
    Our CI fails if coverage is not 100%.
  • New features have been added to our Hypothesis testing strategy.
  • Changes or additions to public APIs are reflected in our type stubs (files ending in .pyi).
    • ...and used in the stub test file tests/typing_example.py.
    • If they've been added to attr/__init__.pyi, they've also been re-imported in attrs/__init__.pyi.
  • Updated documentation for changed code.
    • New functions/classes have to be added to docs/api.rst by hand.
    • Changes to the signature of @attr.s() have to be added by hand too.
    • Changed/added classes/methods/functions have appropriate versionadded, versionchanged, or deprecated directives.
      Find the appropriate next version in our __init__.py file.
  • Documentation in .rst files is written using semantic newlines.
  • Changes (and possible deprecations) have news fragments in changelog.d.
  • Consider granting push permissions to the PR branch, so maintainers can fix minor issues themselves without pestering you.

Footnotes

  1. https://www.attrs.org/en/latest/glossary.html#term-slotted-classes

Copy link
Member

@hynek hynek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry and thanks!

Two little quibbles, should be quick to merge.

tests/test_slots.py Outdated Show resolved Hide resolved
src/attr/_make.py Show resolved Hide resolved
@pganssle
Copy link
Member

Is there any way to add a warning when someone uses the antipattern of persisting pickles across versions? I think a lot of people don't realize that this is a profoundly bad idea. If attrs is going to enable this behavior (even on a best-effort basis), it might be a good idea to at least tell people they are doing something wrong here.

@hynek
Copy link
Member

hynek commented Jan 14, 2023

Cc @djipko for overall assessment

@ngnpope
Copy link
Contributor Author

ngnpope commented Jan 16, 2023

Is there any way to add a warning when someone uses the antipattern of persisting pickles across versions?

Across versions - in a general way - sounds difficult as you'd need to store the version in the pickle itself and then you may still suffer from not being able to unpickle in older versions anyway - it'd be backward incompatible. The only viable way would be to store (<version>, <instance>) so you could then compare the <version> in __setstate__() before handling the <instance>...

...but then we'd be changing the state again and we'd need to know that the tuple contained a version in the first item rather than being the tuple state from a pre-v22.2.0 version of attrs.

I think a lot of people don't realize that this is a profoundly bad idea.

This is true. Others of us are saddled with these issues until we have resources to be able to undo decisions made by others in the past.

If attrs is going to enable this behavior (even on a best-effort basis), it might be a good idea to at least tell people they are doing something wrong here.

This. I think a nice big admonition in the documentation would be a good idea. A documentation search yields very little - one small warning in the slotted classes glossary entry. It may also be worth stating that some cache implementations, e.g. Django's caching framework, make use of pickle serialization by default, which could lead to compatibility issues across versions.

Further, I know that whether this is considered public API is a possible grey area, but given the fragility it's probably worth mentioning backward incompatible changes to pickle serialization in the release notes in future. That linked comment did also say "So if you can make it compatible" which I would argue didn't happen, hence this PR.

@ngnpope
Copy link
Contributor Author

ngnpope commented Jan 16, 2023

Another thought: Do we want to add a DeprecationWarning to the tuple branch?

@hynek
Copy link
Member

hynek commented Jan 16, 2023

Another thought: Do we want to add a DeprecationWarning to the tuple branch?

I think that's not great because it would warn about exactly one version change, as you wrote. Kinda a mix of enough to be annoying, but not enough to solve the problem.

While I agree with Paul that it's not great to pickle across versions, with attrs it shouldn't be a problem and I bet people do it and love it and are gonna yell at us if we annoy them (just as they yelled at us about the 3.6 warning).

IOW I agree, but I'm afraid it's too disruptive.

I think I'm fine with the code as it is. I'd just like @djipko to give it a look to ensure we're not breaking something only he knows and then it's ready to merge.

@pganssle
Copy link
Member

While I agree with Paul that it's not great to pickle across versions, with attrs it shouldn't be a problem and I bet people do it and love it and are gonna yell at us if we annoy them (just as they yelled at us about the 3.6 warning).

IOW I agree, but I'm afraid it's too disruptive.

I mean, I think people shouldn't save pickles to disk at all, but 🤷

FWIW, this issue also has blocked me from upgrading attrs internally for now, but I don't mind when this happens so much because it's a decent way to ferret out programs that are inappropriately using pickle. There are a bunch of other ways that this sort of thing can go wrong in much more subtle ways, so I appreciate when things break a bit more loudly.

Also, I understand the perspective of "people do it and love it", but I suspect that in reality people are neutral about it. They think pickle is the way you serialize stuff and don't think about it until it breaks, at which point their reactive "You broke me" emotions kick in. It's kind of a thankless job to be like, "Oops you were doing it wrong sorry pickle exists" but it's almost certainly educational.

That said, in some ways I think you are a kinder person than me and not as inclined to say, "This is for your own good." when breaking private APIs, and I'm not going to lobby too hard for just letting this stuff break. I figured this point where we know that someone is relying on cross-version unpickling might be a good place to put a warning (annoying enough to show people they are doing the wrong thing but not difficult to silence or fix, and with no blast radius outside of this particular subset), but it's not a hill I'm interested in dying on (or even just getting grass stains on my knees on).

@djipko
Copy link
Contributor

djipko commented Jan 19, 2023

I think the change itself is fine and something I considered doing in the first place when I proposed to move to dict. If it's breaking enough people and stopping them from upgrading, even though they are doing something that's known to be "a bad idea" (tm), on the whole, that's a bad thing.

But as demonstrated in the issue I raised, using tuple is inherently unsafe - now everyone's just broken which is, in some ways better than their code doing the wrong thing (even if it's an edge case that doesn't come up as often).

With this change, everyone is still vulnerable to the original issue though which is not great either. Here's what I propose if @hynek agrees - let's have an issue open for say next version (v23.0.0 or whatever) to really remove tuple then? While I don't doubt this will still break some people even then - most folks would have had enough time to re-process data if there are any stragglers.

I'd be happy to keep an eye on such issue and put up a fix and land it when safe!

Copy link
Contributor

@djipko djipko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One tiny code nit

tests/test_slots.py Show resolved Hide resolved
@ngnpope
Copy link
Contributor Author

ngnpope commented Jan 19, 2023

...even though they are doing something that's known to be "a bad idea" (tm)...

I know, I know! 🙈 I'm the victim, I promise! 😛

I'm happy if you want to remove it again in a major version bump - I'd expect nothing less 😄

What this provides is a (few) version(s) where we can read the tuple and dict versions and write the dict version only. That allows for a seamless upgrade of cache values to the new pickle format - via a load-and-dump cycle of the cache - which is not currently possible.

Yes, the original issue is still a problem in a sense, but only as far as reading existing pickles goes. With a load-and-dump upgrade this potential issue will go away for us (although our structure is not currently at risk of changing).

ngnpope and others added 2 commits January 23, 2023 12:12
Unfortunately we're using `django-redis` to cache some `attrs` instances
and the change to make `__getstate__` and `__setstate__` use `dict`
values instead of `tuple` values prevents reading any values from the
cache when updating to `attrs>=22.2.0`. We get an `AttributeError`
raised for all attributes.

This happens because the tuple of values stored in the original pickle
is checked for the name of the attribute and so none of the attributes
are set on unpickle.

To address this I've made a change so that `__setstate__` will still be
able to handle values pickled as `tuple`s allowing existing pickles to
be unpickled.

The test just contains a pre-pickled value due to the complexities
around tying to mock methods on a slotted as documented[^1].

I realise that documentation also mentions that we should "think twice"
before using pickle, but sometimes those decisions predate our
involvement in a project! In addition, pretty much all of Django's cache
implementations are built around using pickle by default. This change
will allow an easier path to upgrade by unpickling and repickling all
values in the cache.

Regression in 89a890a.

[^1]: https://www.attrs.org/en/latest/glossary.html#term-slotted-classes
Copy link
Member

@hynek hynek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@hynek hynek enabled auto-merge (squash) January 25, 2023 06:55
@hynek hynek merged commit c9150d2 into python-attrs:main Jan 25, 2023
@ngnpope ngnpope deleted the fix-pickle-backward-compat branch January 25, 2023 09:23
@Redoubts
Copy link

Any chance of cutting a dot release with this?

@djipko
Copy link
Contributor

djipko commented Apr 7, 2023

One thing to note before cutting an official release with this patch - if code was susceptible to the original issue #1004 i.e. unpickling will assign wrong data to an attribute, and someone runs the newly cut release, that reads and then saves data again, the saved pickled data will now be permanently corrupted (read as tuple which does the wrong thing, but then saved as dict).

Releasing it officially will make it more likely people hit this.

Now I am not saying not to release it - ultimately saving pickled data is very error prone anyway (hence this patch) - but I feel silently corrupting data is a slightly worse kind of edge case then just loudly breaking, even if it is less likely to happen.

@hynek
Copy link
Member

hynek commented Apr 8, 2023

Why isn't that a great lose-lose scenario. ;)

My impression of the situation is that people who will take advantage of this bw-fix are probably holding back their upgrades and had all the time in the world to fix it. There's only so much we can do...

@ngnpope
Copy link
Contributor Author

ngnpope commented Apr 8, 2023

My impression of the situation is that people who will take advantage of this bw-fix are probably holding back their upgrades…

Yes, I am.

…and had all the time in the world to fix it.

No, I don't. Storing pickles was a choice made by somebody else long before I came along. Although wrong, it worked "fine", then it got broken by a backward incompatible change. attrs makes big claims of backward compatibility and no (or little) mention of any exception with regard to pickling. Fixing this issue by manually rewriting all stored data to be stored serialised in a different format is way down the list of priorities. (It'll happen eventually…)

Releasing it officially will make it more likely people hit this.

I'm sorry, but I think this is misleading. It'd only be the case if you upgraded and reordered attributes. Having this fix released would allow for stored pickles to be replaced automatically and gradually with a "safer" version. (Yes, we'd still need to not reorder attributes until we were sure that all stored data had been upgraded.)

I just get the impression that the argument being made is that we should just break so hard that we intentionally make everyone's lives difficult rather than stick to the promises of backward compatibility and issue a big fat warning.

@hynek
Copy link
Member

hynek commented Apr 8, 2023

That's what I meant. You either held back and have the same pickles as before, or upgraded and fixed your pickles and therefore aren't affected anymore.

@ngnpope
Copy link
Contributor Author

ngnpope commented Apr 8, 2023

Sorry, I didn't mean to come across so grumpy!

Yeah, it's definitely lose-lose 🤷🏻‍♂️

@djipko
Copy link
Contributor

djipko commented Apr 11, 2023

I also think we should release this personally but wanted to make sure we are considering all the angles. It is indeed a lose lose.

@hynek
Copy link
Member

hynek commented Apr 16, 2023

23.1.0 is on PyPI

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants