Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update upperBound ratio when guessing the required decompression buffer size #141

Merged
merged 1 commit into from
Jun 28, 2024

Conversation

sfluor
Copy link
Contributor

@sfluor sfluor commented Jun 26, 2024

We noticed that for one of our services we sometimes have a lot of allocations caused by the ioutil.ReadAll call in the Decompress method.

After investigation those are coming from buffers whose decompressed size is greater than 10x the input size. This is quite wasteful because it means that we allocate twice for those buffers.

Once in this branch:

zstd/zstd.go

Line 143 in 869dae0

dst = make([]byte, bound)

And another time here once we realise that the buffer we previously allocated wasn't big enough:

zstd/zstd.go

Line 154 in 869dae0

// We failed getting a dst buffer of correct size, use stream API

We have this limit to avoid malicious payloads but we noticed that this actually triggered quite often for one of our services whose compression rates are above 10x.

This technically doesn't fully solve the problem but it should now happen less often. I can also make this upper bound configurable if this change sounds too scary

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
…er size

We noticed that for one of our services we sometimes have a lot of allocations caused by the `ioutil.ReadAll` call in the `Decompress` method.

After investigation those are coming from buffers whose decompressed size is greater than 10x the input size. This is quite wasteful because it means that we allocate twice for those buffers.

Once in this branch: https://github.com/DataDog/zstd/blob/869dae002e5efb372a0b09cd7d99390ca2089cc1/zstd.go#L143

And another time here once we realise that the buffer we previously allocated wasn't big enough: https://github.com/DataDog/zstd/blob/869dae002e5efb372a0b09cd7d99390ca2089cc1/zstd.go#L154

This technically doesn't fully solve the problem but it should now happen less often. I can also make this upper bound configurable if this change sounds too scary
@sfluor sfluor marked this pull request as ready for review June 26, 2024 13:12
// 1 MB or 10x input size
upperBound := 10 * len(src)
// 1 MB or 50x input size
upperBound := 50 * len(src)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a huge bump no? It means we'll allocate 5x more memory than before in all these cases.

Copy link
Contributor Author

@sfluor sfluor Jun 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will in the case where we cannot read the size from the zstd headers (or that the size we read is bigger than 10x the compressed size).

However imo it's better to do that and allocate once rather than first allocate 10xsize, realise it's not enough and reallocate decompressedSize

@sfluor sfluor merged commit beb4dfd into 1.x Jun 28, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants