Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zlib upgrade broke known vectors #50138

Closed
panva opened this issue Oct 11, 2023 · 13 comments
Closed

zlib upgrade broke known vectors #50138

panva opened this issue Oct 11, 2023 · 13 comments
Labels
dependencies Pull requests that update a dependency file. invalid Issues and PRs that are invalid. zlib Issues and PRs related to the zlib subsystem.

Comments

@panva
Copy link
Member

panva commented Oct 11, 2023

See nodejs/citgm#1011

node -p 'process.versions.zlib'
1.2.13.1-motley
node -v
v18.18.1
node -e "result = zlib.deflateRawSync('You can trust us to stick with you through thick and thin–to the bitter end. And you can trust us to keep any secret of yours–closer than you keep it yourself. But you cannot trust us to let you face trouble alone, and go off without a word. We are your friends, Frodo.'); console.log(result); console.log(result.byteLength)"
<Buffer 6d 8f c1 0d c2 30 0c 45 57 f9 03 54 dd 01 0e ac 80 38 a6 a9 43 a2 46 31 72 1c 55 bd b1 03 1b 32 09 6e 2a 0e 48 9c 1c f9 ff ff fc 73 e3 06 ef 0a 54 5a ... 120 more bytes>
170
./node -p 'process.versions.zlib'
1.2.13.1-motley
➜  node git:(0522ac086c) ./node -e "result = zlib.deflateRawSync('You can trust us to stick with you through thick and thin–to the bitter end. And you can trust us to keep any secret of yours–closer than you keep it yourself. But you cannot trust us to let you face trouble alone, and go off without a word. We are your friends, Frodo.'); console.log(result); console.log(result.byteLength)"
<Buffer 5d 8f c1 0d 42 31 0c 43 57 f1 00 e8 ef 00 07 56 40 1c 4b 9b d2 8a 2a 41 a9 2b c4 8d 1d d8 90 49 d0 2f e2 00 b7 28 b6 5f 9c a3 0d c4 a0 a0 8f 4e 8c 0e ... 118 more bytes>
168

I suggest to revert the following commits and add unit tests for more known vectors to prevent breaking updates in the future.

cc @alfonsograziano @lpinca @aduh95

@panva panva added the zlib Issues and PRs related to the zlib subsystem. label Oct 11, 2023
@panva panva added the dependencies Pull requests that update a dependency file. label Oct 11, 2023
@lpinca
Copy link
Member

lpinca commented Oct 11, 2023

cc: @Adenilson

@targos
Copy link
Member

targos commented Oct 11, 2023

Why do you think 660f902 is involved in the issue?

@panva
Copy link
Member Author

panva commented Oct 11, 2023

I thought maybe because it's a followup to 0522ac0 that resolves some compile warnings and wouldn't be needed if 0522ac0 was reverted?

@panva
Copy link
Member Author

panva commented Oct 11, 2023

It was/is my limited understanding that the zlib output is deterministic.

Whilst I can confirm the different versions can inflate/deflate each other this change in output is unexpected, to me at least.

I can mark the jose test as not reproducible to have it not check equal outputs but the cookbook of these vectors does not mention the zlib deflate output to be variable.

@targos
Copy link
Member

targos commented Oct 11, 2023

I thought maybe because it's a followup to 0522ac0 that resolves some compile warnings and wouldn't be needed if 0522ac0 was reverted?

No, it's a followup to another, much less recent, zlib upgrade: 5fae8bc

@lpinca
Copy link
Member

lpinca commented Oct 11, 2023

Whilst I can confirm the different versions can inflate/deflate each other this change in output is unexpected, to me at least.

I also tested this. Let's wait a little to see if we can get some feedback from @Adenilson.

@panva
Copy link
Member Author

panva commented Oct 11, 2023

Sounds good, i'll also wait on marking the test as non-deterministic in the jose test suite then.

@bnoordhuis
Copy link
Member

It was/is my limited understanding that the zlib output is deterministic.

It's not. The only requirement is that it properly round-trips. I'll go ahead and close this.

@bnoordhuis bnoordhuis closed this as not planned Won't fix, can't repro, duplicate, stale Oct 11, 2023
@bnoordhuis bnoordhuis added the invalid Issues and PRs that are invalid. label Oct 11, 2023
@panva
Copy link
Member Author

panva commented Oct 11, 2023

Fair enough. Thank you @bnoordhuis

I'm left yearning more details though if you would find the time.

@Adenilson
Copy link

It was/is my limited understanding that the zlib output is deterministic.

It's not. The only requirement is that it properly round-trips. I'll go ahead and close this.

That is correct (i.e. the requirement is that it round-trips and its compatible with other implementations).

I will repost here a comment related to the subject that I made during the review of the first landed patch:
"
Changes in the algorithmic implementation in zlib may change the compressed bitstream, but the decompressed content is still the same (otherwise the data would be corrupted).

Unfortunately, there are tests that compare compressed content in different ways.

It seems to be a common behavior by client code to assume some sort of 'binary stability' when it comes to gzip compressed streams.

There are no such guarantees even between releases of zlib and I've spent quite some time suggesting that people should compare decompressed content while writing tests.
"

Basically if there are changes in the way that we perform matches (i.e. longest_match) or in the way we keep track of those in a hash table (i.e. by changing the hashing function), the compressed output will be different but still a valid GZIP stream that must be compatible with other implementations.

As an experiment, try to compress a given file using canonical zlib (it uses a Rabin-Karp hash) and compare what other forks (e.g. Chromium zlib now uses ANZAC++, before we were using CRC-32) will produce and it should be different.

But they must be compatible with each other (e.g. compress with one library, decompress with another and vice-versa) and the decompressed result must match with the original input.

I hope that helps to clarify the subject and feel free to contact me if you run in any issues.
:-)

@Adenilson
Copy link

For context, the test that triggered the revert of the original patch turned out to be flakey:
https://bugs.chromium.org/p/chromium/issues/detail?id=1487363#c20

@Adenilson
Copy link

Adenilson commented Oct 11, 2023

And this were the compression performance gains thanks to the new hash on Xeon 4th gen (average +42% faster compression):
SPR_ Multiplicative hash

novemberborn added a commit to avajs/ava that referenced this issue Oct 22, 2023
The snapshots are compressed differently in Node.js 21. The compression has been stable for many versions but that is not a guarantee, see nodejs/node#50138 for background.

Change tests to compare the decompressed data instead.
novemberborn added a commit to avajs/ava that referenced this issue Oct 23, 2023
The snapshots are compressed differently in Node.js 21. The compression has been stable for many versions but that is not a guarantee, see nodejs/node#50138 for background.

Change tests to compare the decompressed data instead.
novemberborn added a commit to avajs/ava that referenced this issue Oct 23, 2023
The snapshots are compressed differently in Node.js 21. The compression has been stable for many versions but that is not a guarantee, see nodejs/node#50138 for background.

Change tests to compare the decompressed data instead.
novemberborn added a commit to avajs/ava that referenced this issue Oct 23, 2023
* Drop support for Node.js 16
* Change expected Node.js 18 to 18.18
* Change expected Node.js 20 to 20.8
* Add Node.js 21 to the test matrix

* Change snapshot tests to compare decompressed bitstreams

The snapshots are compressed differently in Node.js 21. The compression has been stable for many versions but that is not a guarantee, see nodejs/node#50138 for background.

Change tests to compare the decompressed data instead.
wraithgar added a commit to npm/pacote that referenced this issue Dec 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Pull requests that update a dependency file. invalid Issues and PRs that are invalid. zlib Issues and PRs related to the zlib subsystem.
Projects
None yet
Development

No branches or pull requests

5 participants