Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce allocations in flate decompressor and minor code improvements #869

Merged
merged 11 commits into from Oct 10, 2023

Conversation

flrdv
Copy link
Contributor

@flrdv flrdv commented Oct 9, 2023

  • decompressor.step is now enumeric instead of function pointer. This reduces ALL allocations in decompressor benchmarks to constant 1 (previously this number differed for each case). However, it doesn't bring much gains in terms of throughtput.
  • huffmanBlockDecoder(), generated by _gen/gen_inflate.go, also used to return a function pointer. Now calling the method directly inside of it. Also doesn't bring much gains, but in microbenchmarks may be visible
  • Improved decompressor benchmarks by replacing io.Copy() with io.CopyBuffer(), because io.Copy() allocates internally a buffer for reading
  • Removed some expressions, that are useless

according to my local benchmarks, it doesn't affect throughtput anyhow, but leaves constant 1 allocation/op in decompressor benchmarks
this conversion doesn't do anything, and removing it doesn't fail any of the tests
as io.Copy is just an alias for io.CopyBuffer, but passing nil instead of the actual buffer, it is being allocated internally. So better to pass it manually
this operation does nothing, so remoed it
previous variant wasn't actually adding a big misaccuracy, but better to place it anyway just before a benchmarking code
instead of returning func ptr, calling the method directly. This also doesn't bring a lot of performance gains, but on microbenchmarks may be visible
Copy link
Owner

@klauspost klauspost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Could you post times in a (comparable) benchmark? Just want to ensure there isn't a speed regression.

flate/inflate.go Outdated Show resolved Hide resolved
flate/inflate.go Outdated Show resolved Hide resolved
this improves readability of the code. Also added a debug print in case next step enumeric is unrecognized. However, everything will silently fail even without this print
@flrdv
Copy link
Contributor Author

flrdv commented Oct 9, 2023

@klauspost in doStep() method, if current enum is unknown - nothing will be done. This may result in undefined behaviour, up to infinite loop. Currently there's just a debug print in default case, however this doesn't help much either. Shall we rise a panic there?

flate/inflate.go Outdated Show resolved Hide resolved
@klauspost
Copy link
Owner

if current enum is unknown - nothing will be done. This may result in undefined behaviour, up to infinite loop. Currently there's just a debug print in default case, however this doesn't help much either. Shall we rise a panic there?

Sure. Should be easier to pick up by a Fuzzer then.

@flrdv
Copy link
Contributor Author

flrdv commented Oct 9, 2023

Full benchmarks output can be found here: https://gist.github.com/fakefloordiv/395a15a982cda3e43dc4f4833d3b2aac

Here's short conclusion:

Time per operation (in ns/op):

DecodeDigitsSpeed1e4:   36650   -> 33963 (1.079x)
DecodeDigitsSpeed1e5:   346724  -> 340885 (1.017x)
DecodeDigitsSpeed1e6:   3576404 -> 3515543 (1.017x)
DecodeDigitsDefault1e4:   32550   -> 31307 (1.04x)
DecodeDigitsDefault1e5:   388516  -> 383745 (1.012x)
DecodeDigitsDefault1e6:   3847923 -> 3825121 (1.006x)
DecodeDigitsCompress1e4:   33756   -> 31259 (1.08x)
DecodeDigitsCompress1e5:   367638  -> 359207 (1.023x)
DecodeDigitsCompress1e6:   3618779 -> 3546951 (1.02x)
DecodeTwainSpeed1e4:   28407   -> 27533 (1.032x)
DecodeTwainSpeed1e5:   377619  -> 371688 (1.016x)
DecodeTwainSpeed1e6:   3810692 -> 3726031 (1.023x)
DecodeTwainDefault1e4:   29461   -> 27603 (1.067x)
DecodeTwainDefault1e5:   393549  -> 385159 (1.022x)
DecodeTwainDefault1e6:   3926477 -> 3854784 (1.019x)
DecodeTwainCompress1e4:   26872   -> 25846 (1.04x)
DecodeTwainCompress1e5:   365831  -> 358215 (1.021x)
DecodeTwainCompress1e6:   3651813 -> 3577634 (1.021x)
DecodeRandomSpeed1e4:   214     -> 202.8 (1.055x)
DecodeRandomSpeed1e5:   1659    -> 1626 (1.02x)
DecodeRandomSpeed1e6:   17198   -> 17274 (0.996x)

Throughtput (in mb/s):

DecodeDigitsSpeed1e4:   272.85 -> 294.43 (1.079x)
DecodeDigitsSpeed1e5:   288.41 -> 293.35 (1.017x)
DecodeDigitsSpeed1e6:   279.61 -> 284.45 (1.017x)
DecodeDigitsDefault1e4:   307.22 -> 319.42 (1.04x)
DecodeDigitsDefault1e5:   257.39 -> 260.59 (1.012x)
DecodeDigitsDefault1e6:   259.88 -> 261.43 (1.006x)
DecodeDigitsCompress1e4:   296.24 -> 319.91 (1.08x)
DecodeDigitsCompress1e5:   272.01 -> 278.39 (1.023x)
DecodeDigitsCompress1e6:   276.34 -> 281.93 (1.02x)
DecodeTwainSpeed1e4:   352.03 -> 363.2 (1.032x)
DecodeTwainSpeed1e5:   264.82 -> 269.04 (1.016x)
DecodeTwainSpeed1e6:   262.42 -> 268.38 (1.023x)
DecodeTwainDefault1e4:   339.44 -> 362.28 (1.067x)
DecodeTwainDefault1e5:   254.1 -> 259.63 (1.022x)
DecodeTwainDefault1e6:   254.68 -> 259.42 (1.019x)
DecodeTwainCompress1e4:   372.13 -> 386.91 (1.04x)
DecodeTwainCompress1e5:   273.35 -> 279.16 (1.021x)
DecodeTwainCompress1e6:   273.84 -> 279.51 (1.021x)
DecodeRandomSpeed1e4:   46695.1 -> 49300.24 (1.056x)
DecodeRandomSpeed1e5:   60272.24 -> 61516.82 (1.021x)
DecodeRandomSpeed1e6:   58144.68 -> 57889.09 (0.996x)

Please note: benchmarks were also updated (replaced io.Copy with io.CopyBuffer), this may affect results, too. However, even in this case they simply became a bit more accurate

@klauspost
Copy link
Owner

benchmarks were also updated

Yes. That is why I asked for "comparable benchmarks", meaning the same code. :)

I don't think the "CopyBuffer" will make any difference, since both will use the io.WriterTo interface, so you should be able to take that out.

@flrdv
Copy link
Contributor Author

flrdv commented Oct 9, 2023

benchmarks were also updated

Yes. That is why I asked for "comparable benchmarks", meaning the same code. :)

I don't think the "CopyBuffer" will make any difference, since both will use the io.WriterTo interface, so you should be able to take that out.

You are right, there's completely no difference. Here are results with io.Copy: https://gist.github.com/fakefloordiv/d32971935516f566006f28c6c000a330

as flate reader implements WriterTo interface, there's completely no difference what to use
Copy link
Owner

@klauspost klauspost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@klauspost klauspost merged commit c051ef1 into klauspost:master Oct 10, 2023
18 checks passed
kodiakhq bot pushed a commit to cloudquery/filetypes that referenced this pull request Nov 1, 2023
This PR contains the following updates:

| Package | Type | Update | Change |
|---|---|---|---|
| [github.com/klauspost/compress](https://togithub.com/klauspost/compress) | indirect | patch | `v1.17.0` -> `v1.17.2` |

---

### Release Notes

<details>
<summary>klauspost/compress (github.com/klauspost/compress)</summary>

### [`v1.17.2`](https://togithub.com/klauspost/compress/releases/tag/v1.17.2)

[Compare Source](https://togithub.com/klauspost/compress/compare/v1.17.1...v1.17.2)

#### What's Changed

-   zstd: Fix corrupted output in "best" by [@&#8203;klauspost](https://togithub.com/klauspost) in [klauspost/compress#876

**Full Changelog**: klauspost/compress@v1.17.1...v1.17.2

### [`v1.17.1`](https://togithub.com/klauspost/compress/releases/tag/v1.17.1)

[Compare Source](https://togithub.com/klauspost/compress/compare/v1.17.0...v1.17.1)

#### What's Changed

-   s2: Fix S2 "best" dictionary wrong encoding by [@&#8203;klauspost](https://togithub.com/klauspost) in [klauspost/compress#871
-   flate: Reduce allocations in decompressor and minor code improvements by [@&#8203;fakefloordiv](https://togithub.com/fakefloordiv) in [klauspost/compress#869
-   s2: Fix EstimateBlockSize on 6&7 length input by [@&#8203;klauspost](https://togithub.com/klauspost) in [klauspost/compress#867
-   tests: Fuzzing Coverage Expansion by [@&#8203;viktoriia-lsg](https://togithub.com/viktoriia-lsg) in [klauspost/compress#866
-   tests: Set FSE decompress fuzzer max limit by [@&#8203;klauspost](https://togithub.com/klauspost) in [klauspost/compress#868
-   tests: Fuzzing Coverage Expansion ([#&#8203;2](https://togithub.com/klauspost/compress/issues/2)) by [@&#8203;viktoriia-lsg](https://togithub.com/viktoriia-lsg) in [klauspost/compress#870

#### New Contributors

-   [@&#8203;viktoriia-lsg](https://togithub.com/viktoriia-lsg) made their first contribution in [klauspost/compress#866
-   [@&#8203;fakefloordiv](https://togithub.com/fakefloordiv) made their first contribution in [klauspost/compress#869

**Full Changelog**: klauspost/compress@v1.17.0...v1.17.1

</details>

---

### Configuration

📅 **Schedule**: Branch creation - "before 4am on the first day of the month" (UTC), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [Renovate Bot](https://togithub.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNi4xMDkuNCIsInVwZGF0ZWRJblZlciI6IjM2LjEwOS40IiwidGFyZ2V0QnJhbmNoIjoibWFpbiJ9-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants