Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add xerial snappy read/writer #838

Merged
merged 1 commit into from
Jul 27, 2023
Merged

Add xerial snappy read/writer #838

merged 1 commit into from
Jul 27, 2023

Conversation

klauspost
Copy link
Owner

@klauspost klauspost commented Jul 26, 2023

Forked from github.com/eapache/go-xerial-snappy.

Changes:

  • Uses S2 for better/faster compression and decompression.
  • Fixes 0-length roundtrips.
  • Adds DecodeCapped, which allows decompression with capped output size.
  • DecodeInto will decode directly into destination if there is space enough.
  • Encode will now encode directly into 'dst' if it has space enough.
  • Fixes short snappy buffers returning ErrMalformed.
  • Renames EncodeStream to Encode.
  • Adds EncodeBetter for better than default compression at ~half the speed.

Comparison (before/after):

BenchmarkSnappyStreamEncode-32    	  959010	      1170 ns/op	 875.15 MB/s	    1280 B/op	       1 allocs/op
BenchmarkSnappyStreamEncode-32    	 1000000	      1107 ns/op	 925.04 MB/s	       0 B/op	       0 allocs/op
--> Output size: 913 -> 856 bytes

BenchmarkSnappyStreamEncodeBetter-32      477739	      2506 ns/op	 408.62 MB/s	       0 B/op	       0 allocs/op
--> Output size: 835 bytes

BenchmarkSnappyStreamEncodeMassive-32  	     100    	  10596963 ns/op	 966.31 MB/s	   40977 B/op	       1 allocs/op
BenchmarkSnappyStreamEncodeMassive-32  	     100       	  10220236 ns/op	1001.93 MB/s	       0 B/op	       0 allocs/op
--> Output size: 2365547 -> 2256991 bytes

BenchmarkSnappyStreamEncodeBetterMassive-32    	      69	  16983314 ns/op	 602.94 MB/s	       0 B/op	       0 allocs/op
--> Output size: 2011997 bytes

BenchmarkSnappyStreamDecodeInto-32    	 1887378	       639.5 ns/op	1673.19 MB/s	    1088 B/op	       3 allocs/op
BenchmarkSnappyStreamDecodeInto-32    	 2707915	       436.2 ns/op	2452.99 MB/s	       0 B/op	       0 allocs/op

BenchmarkSnappyStreamDecodeIntoMassive-32    	     267	   4559594 ns/op	2245.81 MB/s	   71120 B/op	       1 allocs/op
BenchmarkSnappyStreamDecodeIntoMassive-32    	     282	   4285844 ns/op	2389.26 MB/s	       0 B/op	       0 allocs/op

@eapache - I don't know if you are interested in having these changes upstream?

Forked from [github.com/eapache/go-xerial-snappy](https://github.com/eapache/go-xerial-snappy).

Changes:

* Uses [S2](https://github.com/klauspost/compress/tree/master/s2#snappy-compatibility) for better/faster compression and decompression.
* Fixes 0-length roundtrips.
* Adds `DecodeCapped`, which allows decompression with capped output size.
* `DecodeInto` will decode directly into destination if there is space enough.
* `Encode` will now encode directly into 'dst' if it has space enough.
* Fixes short snappy buffers returning `ErrMalformed`.
* Renames `EncodeStream` to `Encode`.
* Adds `EncodeBetter` for better than default compression at ~half the speed.

Comparison (before/after):

```
BenchmarkSnappyStreamEncode-32    	  959010	      1170 ns/op	 875.15 MB/s	    1280 B/op	       1 allocs/op
BenchmarkSnappyStreamEncode-32    	 1000000	      1107 ns/op	 925.04 MB/s	       0 B/op	       0 allocs/op
--> Output size: 913 -> 856 bytes

BenchmarkSnappyStreamEncodeBetter-32    	  477739	      2506 ns/op	 408.62 MB/s	       0 B/op	       0 allocs/op
--> Output size: 835 bytes

BenchmarkSnappyStreamEncodeMassive-32  	     100    	  10596963 ns/op	 966.31 MB/s	   40977 B/op	       1 allocs/op
BenchmarkSnappyStreamEncodeMassive-32  	     100       	  10220236 ns/op	1001.93 MB/s	       0 B/op	       0 allocs/op
--> Output size: 2365547 -> 2256991 bytes

BenchmarkSnappyStreamEncodeBetterMassive-32    	      69	  16983314 ns/op	 602.94 MB/s	       0 B/op	       0 allocs/op
--> Output size: 2011997 bytes

BenchmarkSnappyStreamDecodeInto-32    	 1887378	       639.5 ns/op	1673.19 MB/s	    1088 B/op	       3 allocs/op
BenchmarkSnappyStreamDecodeInto-32    	 2707915	       436.2 ns/op	2452.99 MB/s	       0 B/op	       0 allocs/op

BenchmarkSnappyStreamDecodeIntoMassive-32    	     267	   4559594 ns/op	2245.81 MB/s	   71120 B/op	       1 allocs/op
BenchmarkSnappyStreamDecodeIntoMassive-32    	     282	   4285844 ns/op	2389.26 MB/s	       0 B/op	       0 allocs/op
```
@klauspost klauspost merged commit 895291c into master Jul 27, 2023
18 checks passed
@klauspost klauspost deleted the add-xerial-fork branch July 27, 2023 07:57
@eapache
Copy link

eapache commented Jul 29, 2023

@eapache - I don't know if you are interested in having these changes upstream?

Hmm, that's a surprisingly good question :)

It sounds like there are some bug fixes and some feature improvements mixed together here? Though it's not clear to me at a glance whether the bug fixes are just side-effects of the switch to S2, or whether the framing code in xerial.go actually changed meaningfully as well?

The last time I touched the xerial-snappy package was January, in eapache/go-xerial-snappy#9 - @dnwe and I had a chat in that PR (which you are welcome to read for context) about the future of the package and its primary user (https://github.com/IBM/sarama/), given I'm not really involved in any of the related projects anymore.

klauspost/compress certainly seems like a much more active and maintained project, so if you want to subsume the xerial decoder then I would be quite happy to EOL/archive eapache/go-xerial-snappy and redirect folks to this package instead. But that puts some onus on @dnwe to migrate Sarama over to this package - I think that seems reasonable but really that's up to him. (I'm assuming based on the docs/readme here that it would be mostly a drop-in replacement, but I am coming at this with very little context).

To answer the original question though - I'm certainly happy to accept small bug fixes and improvements upstream if you'd like to contribute them there instead / as well / depending on the discussion with Dom. Swapping the underlying library for S2 is not something that I would feel comfortable accepting just because it's a major change and I don't have the setup or time to verify for myself that it actually maintains compatibility, wouldn't break Sarama, etc.

@klauspost
Copy link
Owner Author

@eapache I can send the fixes for the decoding edge cases. That'll give me an excuse to add the test cases outside of the fuzz tests.

I understand that you aren't looking to test replacing the encoder - I had a hunch about that, which is why I forked it. It is of course safe, but I wouldn't take such a change without spending time investigating myself.

Yes, it would be drop-in, except EncodeStream is now just Encode. Nothing in the API functionality changed.

Sidenote: It took me a while to understand how the "append" mechanic of EncodeStream works. I think I will add some more documentation on that.

I will send a PR with the fixes ASAP. You can then decide on what the fate of your package will be :)

kodiakhq bot pushed a commit to cloudquery/filetypes that referenced this pull request Oct 1, 2023
This PR contains the following updates:

| Package | Type | Update | Change |
|---|---|---|---|
| [github.com/klauspost/compress](https://togithub.com/klauspost/compress) | indirect | minor | `v1.16.7` -> `v1.17.0` |

---

### Release Notes

<details>
<summary>klauspost/compress (github.com/klauspost/compress)</summary>

### [`v1.17.0`](https://togithub.com/klauspost/compress/releases/tag/v1.17.0)

[Compare Source](https://togithub.com/klauspost/compress/compare/v1.16.7...v1.17.0)

#### What's Changed

-   Add dictionary builder by [@&#8203;klauspost](https://togithub.com/klauspost) in [klauspost/compress#853
-   Add xerial snappy read/writer by [@&#8203;klauspost](https://togithub.com/klauspost) in [klauspost/compress#838
-   flate: Add limited window compression by [@&#8203;klauspost](https://togithub.com/klauspost) in [klauspost/compress#843
-   s2: Do 2 overlapping match checks by [@&#8203;klauspost](https://togithub.com/klauspost) in [klauspost/compress#839
-   flate: Add amd64 assembly matchlen by [@&#8203;klauspost](https://togithub.com/klauspost) in [klauspost/compress#837
-   gzip: Copy bufio.Reader on Reset by [@&#8203;thatguystone](https://togithub.com/thatguystone) in [klauspost/compress#860
-   zstd: Remove offset from bitReader by [@&#8203;greatroar](https://togithub.com/greatroar) in [klauspost/compress#854
-   fse, huff0, zstd: Remove always-nil error returns by [@&#8203;greatroar](https://togithub.com/greatroar) in [klauspost/compress#857
-   tests: unnecessary use of fmt.Sprintf by [@&#8203;testwill](https://togithub.com/testwill) in [klauspost/compress#836
-   tests: Fix OSS fuzzer t.Run by [@&#8203;klauspost](https://togithub.com/klauspost) in [klauspost/compress#852
-   tests: Use Go 1.21.x by [@&#8203;klauspost](https://togithub.com/klauspost) in [klauspost/compress#851

#### New Contributors

-   [@&#8203;testwill](https://togithub.com/testwill) made their first contribution in [klauspost/compress#836
-   [@&#8203;thatguystone](https://togithub.com/thatguystone) made their first contribution in [klauspost/compress#860

**Full Changelog**: klauspost/compress@v1.16.7...v1.17.0

</details>

---

### Configuration

📅 **Schedule**: Branch creation - "before 4am on the first day of the month" (UTC), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [Renovate Bot](https://togithub.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNi4xMDkuNCIsInVwZGF0ZWRJblZlciI6IjM2LjEwOS40IiwidGFyZ2V0QnJhbmNoIjoibWFpbiJ9-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants