New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use simd masking for amd64&arm64 #326
Merged
Merged
Commits on Oct 26, 2023
-
mask.go: Use SIMD masking for amd64 and arm64
goos: windows goarch: amd64 pkg: nhooyr.io/websocket cpu: Intel(R) Core(TM) i5-9300H CPU @ 2.40GHz Benchmark_mask/2/basic-8 425339004 2.795 ns/op 715.66 MB/s Benchmark_mask/2/nhooyr-8 379937766 3.186 ns/op 627.78 MB/s Benchmark_mask/2/gorilla-8 392164167 3.071 ns/op 651.24 MB/s Benchmark_mask/2/gobwas-8 310037222 3.880 ns/op 515.46 MB/s Benchmark_mask/3/basic-8 321408024 3.806 ns/op 788.32 MB/s Benchmark_mask/3/nhooyr-8 350726338 3.478 ns/op 862.58 MB/s Benchmark_mask/3/gorilla-8 332217727 3.634 ns/op 825.43 MB/s Benchmark_mask/3/gobwas-8 247376214 4.886 ns/op 614.01 MB/s Benchmark_mask/4/basic-8 261182472 4.582 ns/op 872.91 MB/s Benchmark_mask/4/nhooyr-8 381830712 3.262 ns/op 1226.05 MB/s Benchmark_mask/4/gorilla-8 272616304 4.395 ns/op 910.04 MB/s Benchmark_mask/4/gobwas-8 204574558 5.855 ns/op 683.19 MB/s Benchmark_mask/8/basic-8 191330037 6.162 ns/op 1298.24 MB/s Benchmark_mask/8/nhooyr-8 369694992 3.285 ns/op 2435.65 MB/s Benchmark_mask/8/gorilla-8 175388466 6.743 ns/op 1186.48 MB/s Benchmark_mask/8/gobwas-8 241719933 4.886 ns/op 1637.45 MB/s Benchmark_mask/16/basic-8 100000000 10.92 ns/op 1464.83 MB/s Benchmark_mask/16/nhooyr-8 272565096 4.436 ns/op 3606.98 MB/s Benchmark_mask/16/gorilla-8 100000000 11.20 ns/op 1428.53 MB/s Benchmark_mask/16/gobwas-8 221356798 5.405 ns/op 2960.45 MB/s Benchmark_mask/32/basic-8 61476984 20.40 ns/op 1568.80 MB/s Benchmark_mask/32/nhooyr-8 238665572 5.050 ns/op 6337.22 MB/s Benchmark_mask/32/gorilla-8 100000000 12.09 ns/op 2647.28 MB/s Benchmark_mask/32/gobwas-8 186077235 6.477 ns/op 4940.36 MB/s Benchmark_mask/128/basic-8 14629720 80.90 ns/op 1582.19 MB/s Benchmark_mask/128/nhooyr-8 181241968 6.565 ns/op 19497.98 MB/s Benchmark_mask/128/gorilla-8 68308342 16.76 ns/op 7639.37 MB/s Benchmark_mask/128/gobwas-8 94582026 12.97 ns/op 9872.11 MB/s Benchmark_mask/512/basic-8 3921001 305.6 ns/op 1675.55 MB/s Benchmark_mask/512/nhooyr-8 123102199 9.721 ns/op 52669.11 MB/s Benchmark_mask/512/gorilla-8 32355914 38.18 ns/op 13411.43 MB/s Benchmark_mask/512/gobwas-8 31528501 37.80 ns/op 13544.37 MB/s Benchmark_mask/4096/basic-8 491804 2381 ns/op 1720.39 MB/s Benchmark_mask/4096/nhooyr-8 26159691 46.98 ns/op 87187.73 MB/s Benchmark_mask/4096/gorilla-8 4898440 243.6 ns/op 16817.89 MB/s Benchmark_mask/4096/gobwas-8 4336398 277.2 ns/op 14776.40 MB/s Benchmark_mask/16384/basic-8 113842 9623 ns/op 1702.66 MB/s Benchmark_mask/16384/nhooyr-8 8088847 154.5 ns/op 106058.18 MB/s Benchmark_mask/16384/gorilla-8 1282993 933.6 ns/op 17549.90 MB/s Benchmark_mask/16384/gobwas-8 997347 1086 ns/op 15093.49 MB/s We're about 4-5x faster then gorilla now.
Configuration menu - View commit details
-
Copy full SHA for 5df0303 - Browse repository at this point
Copy the full SHA 5df0303View commit details -
Configuration menu - View commit details
-
Copy full SHA for cda2170 - Browse repository at this point
Copy the full SHA cda2170View commit details -
Slower for some reason than just SIMD. Also no dependency on cpu package is nice.
Configuration menu - View commit details
-
Copy full SHA for f5397ae - Browse repository at this point
Copy the full SHA f5397aeView commit details -
Configuration menu - View commit details
-
Copy full SHA for 14172e5 - Browse repository at this point
Copy the full SHA 14172e5View commit details -
Configuration menu - View commit details
-
Copy full SHA for 685a56e - Browse repository at this point
Copy the full SHA 685a56eView commit details -
Configuration menu - View commit details
-
Copy full SHA for cb7509a - Browse repository at this point
Copy the full SHA cb7509aView commit details -
Configuration menu - View commit details
-
Copy full SHA for 3f8c9e0 - Browse repository at this point
Copy the full SHA 3f8c9e0View commit details -
Configuration menu - View commit details
-
Copy full SHA for 367743d - Browse repository at this point
Copy the full SHA 367743dView commit details -
Configuration menu - View commit details
-
Copy full SHA for 27f80cb - Browse repository at this point
Copy the full SHA 27f80cbView commit details -
Configuration menu - View commit details
-
Copy full SHA for 369d641 - Browse repository at this point
Copy the full SHA 369d641View commit details -
Configuration menu - View commit details
-
Copy full SHA for fb13df2 - Browse repository at this point
Copy the full SHA fb13df2View commit details -
Configuration menu - View commit details
-
Copy full SHA for ecf7dec - Browse repository at this point
Copy the full SHA ecf7decView commit details -
wsjson: Add json.Encoder vs json.Marshal benchmark
json.Encoder is 42% faster than json.Marshal thanks to the memory reuse. goos: linux goarch: amd64 pkg: nhooyr.io/websocket/wsjson cpu: 12th Gen Intel(R) Core(TM) i5-1235U BenchmarkJSON/json.Encoder-12 3517579 340.2 ns/op 24 B/op 1 allocs/op BenchmarkJSON/json.Marshal-12 2374086 484.3 ns/op 728 B/op 2 allocs/op Closes nhooyr#409
Configuration menu - View commit details
-
Copy full SHA for d34e5d4 - Browse repository at this point
Copy the full SHA d34e5d4View commit details -
Configuration menu - View commit details
-
Copy full SHA for e25d968 - Browse repository at this point
Copy the full SHA e25d968View commit details -
Configuration menu - View commit details
-
Copy full SHA for 640e3c2 - Browse repository at this point
Copy the full SHA 640e3c2View commit details -
wsjson: Extend benchmark with multiple sizes
[qrvnl@dios ~/src/websocket] 130$ go test -bench=. ./wsjson/ goos: linux goarch: amd64 pkg: nhooyr.io/websocket/wsjson cpu: 12th Gen Intel(R) Core(TM) i5-1235U BenchmarkJSON/json.Encoder/8-12 14041426 72.59 ns/op 110.21 MB/s 16 B/op 1 allocs/op BenchmarkJSON/json.Encoder/16-12 13936426 86.99 ns/op 183.92 MB/s 16 B/op 1 allocs/op BenchmarkJSON/json.Encoder/32-12 11416401 115.3 ns/op 277.59 MB/s 16 B/op 1 allocs/op BenchmarkJSON/json.Encoder/128-12 4600574 264.7 ns/op 483.55 MB/s 16 B/op 1 allocs/op BenchmarkJSON/json.Encoder/256-12 2710398 433.9 ns/op 590.06 MB/s 16 B/op 1 allocs/op BenchmarkJSON/json.Encoder/512-12 1588930 717.3 ns/op 713.82 MB/s 16 B/op 1 allocs/op BenchmarkJSON/json.Encoder/1024-12 823138 1484 ns/op 689.80 MB/s 16 B/op 1 allocs/op BenchmarkJSON/json.Encoder/2048-12 402823 2875 ns/op 712.32 MB/s 16 B/op 1 allocs/op BenchmarkJSON/json.Encoder/4096-12 213926 5602 ns/op 731.14 MB/s 16 B/op 1 allocs/op BenchmarkJSON/json.Encoder/8192-12 92864 11281 ns/op 726.19 MB/s 16 B/op 1 allocs/op BenchmarkJSON/json.Encoder/16384-12 39318 29203 ns/op 561.04 MB/s 19 B/op 1 allocs/op BenchmarkJSON/json.Marshal/8-12 10768671 114.5 ns/op 69.89 MB/s 48 B/op 2 allocs/op BenchmarkJSON/json.Marshal/16-12 10140996 113.9 ns/op 140.51 MB/s 64 B/op 2 allocs/op BenchmarkJSON/json.Marshal/32-12 9211780 121.6 ns/op 263.06 MB/s 64 B/op 2 allocs/op BenchmarkJSON/json.Marshal/128-12 4632796 264.2 ns/op 484.53 MB/s 224 B/op 2 allocs/op BenchmarkJSON/json.Marshal/256-12 2441511 473.5 ns/op 540.65 MB/s 432 B/op 2 allocs/op BenchmarkJSON/json.Marshal/512-12 1298788 896.2 ns/op 571.27 MB/s 912 B/op 2 allocs/op BenchmarkJSON/json.Marshal/1024-12 602084 1866 ns/op 548.83 MB/s 1808 B/op 2 allocs/op BenchmarkJSON/json.Marshal/2048-12 341151 3817 ns/op 536.61 MB/s 3474 B/op 2 allocs/op BenchmarkJSON/json.Marshal/4096-12 175594 7034 ns/op 582.32 MB/s 6548 B/op 2 allocs/op BenchmarkJSON/json.Marshal/8192-12 83222 15023 ns/op 545.30 MB/s 13591 B/op 2 allocs/op BenchmarkJSON/json.Marshal/16384-12 33087 39348 ns/op 416.39 MB/s 27304 B/op 2 allocs/op PASS ok nhooyr.io/websocket/wsjson 32.934s
Configuration menu - View commit details
-
Copy full SHA for 0596e7a - Browse repository at this point
Copy the full SHA 0596e7aView commit details -
Configuration menu - View commit details
-
Copy full SHA for 30447a3 - Browse repository at this point
Copy the full SHA 30447a3View commit details -
Configuration menu - View commit details
-
Copy full SHA for f4e61e5 - Browse repository at this point
Copy the full SHA f4e61e5View commit details
Commits on Feb 22, 2024
-
Configuration menu - View commit details
-
Copy full SHA for f533f43 - Browse repository at this point
Copy the full SHA f533f43View commit details -
Configuration menu - View commit details
-
Copy full SHA for a1bb441 - Browse repository at this point
Copy the full SHA a1bb441View commit details -
Configuration menu - View commit details
-
Copy full SHA for fee3739 - Browse repository at this point
Copy the full SHA fee3739View commit details -
I'm just not good enough at assembly. I added tests to confirm that @wdvxdr's implementation works correctly and matches the output of the basic masking loop.
Configuration menu - View commit details
-
Copy full SHA for 68fc887 - Browse repository at this point
Copy the full SHA 68fc887View commit details -
Configuration menu - View commit details
-
Copy full SHA for f62cef3 - Browse repository at this point
Copy the full SHA f62cef3View commit details -
internal/xcpu: Vendor golang.org/x/sys/cpu
Standard library does this too. Unfortunate wish they just exposed it in the standard library. Perhaps we can isolate the specific code we need later.
Configuration menu - View commit details
-
Copy full SHA for 92acb74 - Browse repository at this point
Copy the full SHA 92acb74View commit details -
Configuration menu - View commit details
-
Copy full SHA for 17e1b86 - Browse repository at this point
Copy the full SHA 17e1b86View commit details -
Configuration menu - View commit details
-
Copy full SHA for 2cd18b3 - Browse repository at this point
Copy the full SHA 2cd18b3View commit details
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.