Reduce allocations in flate decompressor and minor code improvements #869

flrdv · 2023-10-09T00:05:01Z

decompressor.step is now enumeric instead of function pointer. This reduces ALL allocations in decompressor benchmarks to constant 1 (previously this number differed for each case). However, it doesn't bring much gains in terms of throughtput.
huffmanBlockDecoder(), generated by _gen/gen_inflate.go, also used to return a function pointer. Now calling the method directly inside of it. Also doesn't bring much gains, but in microbenchmarks may be visible
Improved decompressor benchmarks by replacing io.Copy() with io.CopyBuffer(), because io.Copy() allocates internally a buffer for reading
Removed some expressions, that are useless

according to my local benchmarks, it doesn't affect throughtput anyhow, but leaves constant 1 allocation/op in decompressor benchmarks

this conversion doesn't do anything, and removing it doesn't fail any of the tests

as io.Copy is just an alias for io.CopyBuffer, but passing nil instead of the actual buffer, it is being allocated internally. So better to pass it manually

this operation does nothing, so remoed it

previous variant wasn't actually adding a big misaccuracy, but better to place it anyway just before a benchmarking code

instead of returning func ptr, calling the method directly. This also doesn't bring a lot of performance gains, but on microbenchmarks may be visible

klauspost

Thanks!

Could you post times in a (comparable) benchmark? Just want to ensure there isn't a speed regression.

flate/inflate.go

this improves readability of the code. Also added a debug print in case next step enumeric is unrecognized. However, everything will silently fail even without this print

…nually

flrdv · 2023-10-09T11:23:51Z

@klauspost in doStep() method, if current enum is unknown - nothing will be done. This may result in undefined behaviour, up to infinite loop. Currently there's just a debug print in default case, however this doesn't help much either. Shall we rise a panic there?

flate/inflate.go

klauspost · 2023-10-09T11:45:44Z

if current enum is unknown - nothing will be done. This may result in undefined behaviour, up to infinite loop. Currently there's just a debug print in default case, however this doesn't help much either. Shall we rise a panic there?

Sure. Should be easier to pick up by a Fuzzer then.

…unks array back

flrdv · 2023-10-09T12:21:34Z

Full benchmarks output can be found here: https://gist.github.com/fakefloordiv/395a15a982cda3e43dc4f4833d3b2aac

Here's short conclusion:

Time per operation (in ns/op):

DecodeDigitsSpeed1e4:   36650   -> 33963 (1.079x)
DecodeDigitsSpeed1e5:   346724  -> 340885 (1.017x)
DecodeDigitsSpeed1e6:   3576404 -> 3515543 (1.017x)
DecodeDigitsDefault1e4:   32550   -> 31307 (1.04x)
DecodeDigitsDefault1e5:   388516  -> 383745 (1.012x)
DecodeDigitsDefault1e6:   3847923 -> 3825121 (1.006x)
DecodeDigitsCompress1e4:   33756   -> 31259 (1.08x)
DecodeDigitsCompress1e5:   367638  -> 359207 (1.023x)
DecodeDigitsCompress1e6:   3618779 -> 3546951 (1.02x)
DecodeTwainSpeed1e4:   28407   -> 27533 (1.032x)
DecodeTwainSpeed1e5:   377619  -> 371688 (1.016x)
DecodeTwainSpeed1e6:   3810692 -> 3726031 (1.023x)
DecodeTwainDefault1e4:   29461   -> 27603 (1.067x)
DecodeTwainDefault1e5:   393549  -> 385159 (1.022x)
DecodeTwainDefault1e6:   3926477 -> 3854784 (1.019x)
DecodeTwainCompress1e4:   26872   -> 25846 (1.04x)
DecodeTwainCompress1e5:   365831  -> 358215 (1.021x)
DecodeTwainCompress1e6:   3651813 -> 3577634 (1.021x)
DecodeRandomSpeed1e4:   214     -> 202.8 (1.055x)
DecodeRandomSpeed1e5:   1659    -> 1626 (1.02x)
DecodeRandomSpeed1e6:   17198   -> 17274 (0.996x)

Throughtput (in mb/s):

DecodeDigitsSpeed1e4:   272.85 -> 294.43 (1.079x)
DecodeDigitsSpeed1e5:   288.41 -> 293.35 (1.017x)
DecodeDigitsSpeed1e6:   279.61 -> 284.45 (1.017x)
DecodeDigitsDefault1e4:   307.22 -> 319.42 (1.04x)
DecodeDigitsDefault1e5:   257.39 -> 260.59 (1.012x)
DecodeDigitsDefault1e6:   259.88 -> 261.43 (1.006x)
DecodeDigitsCompress1e4:   296.24 -> 319.91 (1.08x)
DecodeDigitsCompress1e5:   272.01 -> 278.39 (1.023x)
DecodeDigitsCompress1e6:   276.34 -> 281.93 (1.02x)
DecodeTwainSpeed1e4:   352.03 -> 363.2 (1.032x)
DecodeTwainSpeed1e5:   264.82 -> 269.04 (1.016x)
DecodeTwainSpeed1e6:   262.42 -> 268.38 (1.023x)
DecodeTwainDefault1e4:   339.44 -> 362.28 (1.067x)
DecodeTwainDefault1e5:   254.1 -> 259.63 (1.022x)
DecodeTwainDefault1e6:   254.68 -> 259.42 (1.019x)
DecodeTwainCompress1e4:   372.13 -> 386.91 (1.04x)
DecodeTwainCompress1e5:   273.35 -> 279.16 (1.021x)
DecodeTwainCompress1e6:   273.84 -> 279.51 (1.021x)
DecodeRandomSpeed1e4:   46695.1 -> 49300.24 (1.056x)
DecodeRandomSpeed1e5:   60272.24 -> 61516.82 (1.021x)
DecodeRandomSpeed1e6:   58144.68 -> 57889.09 (0.996x)

Please note: benchmarks were also updated (replaced io.Copy with io.CopyBuffer), this may affect results, too. However, even in this case they simply became a bit more accurate

klauspost · 2023-10-09T13:13:51Z

benchmarks were also updated

Yes. That is why I asked for "comparable benchmarks", meaning the same code. :)

I don't think the "CopyBuffer" will make any difference, since both will use the io.WriterTo interface, so you should be able to take that out.

flrdv · 2023-10-09T21:25:04Z

benchmarks were also updated

Yes. That is why I asked for "comparable benchmarks", meaning the same code. :)

I don't think the "CopyBuffer" will make any difference, since both will use the io.WriterTo interface, so you should be able to take that out.

You are right, there's completely no difference. Here are results with io.Copy: https://gist.github.com/fakefloordiv/d32971935516f566006f28c6c000a330

as flate reader implements WriterTo interface, there's completely no difference what to use

klauspost

LGTM

This PR contains the following updates: | Package | Type | Update | Change | |---|---|---|---| | [github.com/klauspost/compress](https://togithub.com/klauspost/compress) | indirect | patch | `v1.17.0` -> `v1.17.2` | --- ### Release Notes <details> <summary>klauspost/compress (github.com/klauspost/compress)</summary> ### [`v1.17.2`](https://togithub.com/klauspost/compress/releases/tag/v1.17.2) [Compare Source](https://togithub.com/klauspost/compress/compare/v1.17.1...v1.17.2) #### What's Changed - zstd: Fix corrupted output in "best" by [@klauspost](https://togithub.com/klauspost) in [https://github.com/klauspost/compress/pull/876](https://togithub.com/klauspost/compress/pull/876) **Full Changelog**: klauspost/compress@v1.17.1...v1.17.2 ### [`v1.17.1`](https://togithub.com/klauspost/compress/releases/tag/v1.17.1) [Compare Source](https://togithub.com/klauspost/compress/compare/v1.17.0...v1.17.1) #### What's Changed - s2: Fix S2 "best" dictionary wrong encoding by [@klauspost](https://togithub.com/klauspost) in [https://github.com/klauspost/compress/pull/871](https://togithub.com/klauspost/compress/pull/871) - flate: Reduce allocations in decompressor and minor code improvements by [@fakefloordiv](https://togithub.com/fakefloordiv) in [https://github.com/klauspost/compress/pull/869](https://togithub.com/klauspost/compress/pull/869) - s2: Fix EstimateBlockSize on 6&7 length input by [@klauspost](https://togithub.com/klauspost) in [https://github.com/klauspost/compress/pull/867](https://togithub.com/klauspost/compress/pull/867) - tests: Fuzzing Coverage Expansion by [@viktoriia-lsg](https://togithub.com/viktoriia-lsg) in [https://github.com/klauspost/compress/pull/866](https://togithub.com/klauspost/compress/pull/866) - tests: Set FSE decompress fuzzer max limit by [@klauspost](https://togithub.com/klauspost) in [https://github.com/klauspost/compress/pull/868](https://togithub.com/klauspost/compress/pull/868) - tests: Fuzzing Coverage Expansion ([#2](https://togithub.com/klauspost/compress/issues/2)) by [@viktoriia-lsg](https://togithub.com/viktoriia-lsg) in [https://github.com/klauspost/compress/pull/870](https://togithub.com/klauspost/compress/pull/870) #### New Contributors - [@viktoriia-lsg](https://togithub.com/viktoriia-lsg) made their first contribution in [https://github.com/klauspost/compress/pull/866](https://togithub.com/klauspost/compress/pull/866) - [@fakefloordiv](https://togithub.com/fakefloordiv) made their first contribution in [https://github.com/klauspost/compress/pull/869](https://togithub.com/klauspost/compress/pull/869) **Full Changelog**: klauspost/compress@v1.17.0...v1.17.1 </details> --- ### Configuration 📅 **Schedule**: Branch creation - "before 4am on the first day of the month" (UTC), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] If you want to rebase/retry this PR, check this box --- This PR has been generated by [Renovate Bot](https://togithub.com/renovatebot/renovate).

flrdv added 7 commits October 8, 2023 20:53

made f.step an enum instead of func pointer

6a1c1c6

according to my local benchmarks, it doesn't affect throughtput anyhow, but leaves constant 1 allocation/op in decompressor benchmarks

removed useless array-to-slice conversion

1203209

this conversion doesn't do anything, and removing it doesn't fail any of the tests

made benchmarks more accurate by removing io.Copy call

a4773a9

as io.Copy is just an alias for io.CopyBuffer, but passing nil instead of the actual buffer, it is being allocated internally. So better to pass it manually

removed useless operation of nulling the length of the slice

c127997

this operation does nothing, so remoed it

moved b.StartTimer() to the bottom

98e21c7

previous variant wasn't actually adding a big misaccuracy, but better to place it anyway just before a benchmarking code

replaced indirect decoder call with direct

4c8d7a2

instead of returning func ptr, calling the method directly. This also doesn't bring a lot of performance gains, but on microbenchmarks may be visible

removed somehow commented code

e40efbf

klauspost requested changes Oct 9, 2023

View reviewed changes

flate/inflate.go Outdated Show resolved Hide resolved

flate/inflate.go Outdated Show resolved Hide resolved

flrdv added 2 commits October 9, 2023 13:20

moved execution of next step into the dedicated method

d517f8e

this improves readability of the code. Also added a debug print in case next step enumeric is unrecognized. However, everything will silently fail even without this print

copying zeros into chunks instead of iterating over it and setting ma…

fbb9795

…nually

klauspost reviewed Oct 9, 2023

View reviewed changes

flate/inflate.go Outdated Show resolved Hide resolved

added panic in case of unknown step state, brought old way to zero ch…

863cd97

…unks array back

rollback to io.Copy

1248009

as flate reader implements WriterTo interface, there's completely no difference what to use

klauspost approved these changes Oct 10, 2023

View reviewed changes

klauspost merged commit c051ef1 into klauspost:master Oct 10, 2023

simonswine mentioned this pull request Aug 7, 2024

Hackathon Pyrobench: Test previous optimization grafana/hackathon-2024-08-compress#2

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce allocations in flate decompressor and minor code improvements #869

Reduce allocations in flate decompressor and minor code improvements #869

flrdv commented Oct 9, 2023

klauspost left a comment

flrdv commented Oct 9, 2023

klauspost commented Oct 9, 2023

flrdv commented Oct 9, 2023 •

edited

Loading

klauspost commented Oct 9, 2023

flrdv commented Oct 9, 2023 •

edited

Loading

klauspost left a comment

Reduce allocations in flate decompressor and minor code improvements #869

Reduce allocations in flate decompressor and minor code improvements #869

Conversation

flrdv commented Oct 9, 2023

klauspost left a comment

Choose a reason for hiding this comment

flrdv commented Oct 9, 2023

klauspost commented Oct 9, 2023

flrdv commented Oct 9, 2023 • edited Loading

klauspost commented Oct 9, 2023

flrdv commented Oct 9, 2023 • edited Loading

klauspost left a comment

Choose a reason for hiding this comment

flrdv commented Oct 9, 2023 •

edited

Loading

flrdv commented Oct 9, 2023 •

edited

Loading