Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use simd masking for amd64&arm64 #326

Merged
merged 26 commits into from Feb 22, 2024
Merged

Commits on Oct 26, 2023

  1. mask.go: Use SIMD masking for amd64 and arm64

    goos: windows
    goarch: amd64
    pkg: nhooyr.io/websocket
    cpu: Intel(R) Core(TM) i5-9300H CPU @ 2.40GHz
    Benchmark_mask/2/basic-8         	425339004	         2.795 ns/op	 715.66 MB/s
    Benchmark_mask/2/nhooyr-8        	379937766	         3.186 ns/op	 627.78 MB/s
    Benchmark_mask/2/gorilla-8       	392164167	         3.071 ns/op	 651.24 MB/s
    Benchmark_mask/2/gobwas-8        	310037222	         3.880 ns/op	 515.46 MB/s
    Benchmark_mask/3/basic-8         	321408024	         3.806 ns/op	 788.32 MB/s
    Benchmark_mask/3/nhooyr-8        	350726338	         3.478 ns/op	 862.58 MB/s
    Benchmark_mask/3/gorilla-8       	332217727	         3.634 ns/op	 825.43 MB/s
    Benchmark_mask/3/gobwas-8        	247376214	         4.886 ns/op	 614.01 MB/s
    Benchmark_mask/4/basic-8         	261182472	         4.582 ns/op	 872.91 MB/s
    Benchmark_mask/4/nhooyr-8        	381830712	         3.262 ns/op	1226.05 MB/s
    Benchmark_mask/4/gorilla-8       	272616304	         4.395 ns/op	 910.04 MB/s
    Benchmark_mask/4/gobwas-8        	204574558	         5.855 ns/op	 683.19 MB/s
    Benchmark_mask/8/basic-8         	191330037	         6.162 ns/op	1298.24 MB/s
    Benchmark_mask/8/nhooyr-8        	369694992	         3.285 ns/op	2435.65 MB/s
    Benchmark_mask/8/gorilla-8       	175388466	         6.743 ns/op	1186.48 MB/s
    Benchmark_mask/8/gobwas-8        	241719933	         4.886 ns/op	1637.45 MB/s
    Benchmark_mask/16/basic-8        	100000000	        10.92 ns/op	1464.83 MB/s
    Benchmark_mask/16/nhooyr-8       	272565096	         4.436 ns/op	3606.98 MB/s
    Benchmark_mask/16/gorilla-8      	100000000	        11.20 ns/op	1428.53 MB/s
    Benchmark_mask/16/gobwas-8       	221356798	         5.405 ns/op	2960.45 MB/s
    Benchmark_mask/32/basic-8        	61476984	        20.40 ns/op	1568.80 MB/s
    Benchmark_mask/32/nhooyr-8       	238665572	         5.050 ns/op	6337.22 MB/s
    Benchmark_mask/32/gorilla-8      	100000000	        12.09 ns/op	2647.28 MB/s
    Benchmark_mask/32/gobwas-8       	186077235	         6.477 ns/op	4940.36 MB/s
    Benchmark_mask/128/basic-8       	14629720	        80.90 ns/op	1582.19 MB/s
    Benchmark_mask/128/nhooyr-8      	181241968	         6.565 ns/op	19497.98 MB/s
    Benchmark_mask/128/gorilla-8     	68308342	        16.76 ns/op	7639.37 MB/s
    Benchmark_mask/128/gobwas-8      	94582026	        12.97 ns/op	9872.11 MB/s
    Benchmark_mask/512/basic-8       	 3921001	       305.6 ns/op	1675.55 MB/s
    Benchmark_mask/512/nhooyr-8      	123102199	         9.721 ns/op	52669.11 MB/s
    Benchmark_mask/512/gorilla-8     	32355914	        38.18 ns/op	13411.43 MB/s
    Benchmark_mask/512/gobwas-8      	31528501	        37.80 ns/op	13544.37 MB/s
    Benchmark_mask/4096/basic-8      	  491804	      2381 ns/op	1720.39 MB/s
    Benchmark_mask/4096/nhooyr-8     	26159691	        46.98 ns/op	87187.73 MB/s
    Benchmark_mask/4096/gorilla-8    	 4898440	       243.6 ns/op	16817.89 MB/s
    Benchmark_mask/4096/gobwas-8     	 4336398	       277.2 ns/op	14776.40 MB/s
    Benchmark_mask/16384/basic-8     	  113842	      9623 ns/op	1702.66 MB/s
    Benchmark_mask/16384/nhooyr-8    	 8088847	       154.5 ns/op	106058.18 MB/s
    Benchmark_mask/16384/gorilla-8   	 1282993	       933.6 ns/op	17549.90 MB/s
    Benchmark_mask/16384/gobwas-8    	  997347	      1086 ns/op	15093.49 MB/s
    
    We're about 4-5x faster then gorilla now.
    wdvxdr1123 authored and nhooyr committed Oct 26, 2023
    Configuration menu
    Copy the full SHA
    5df0303 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    cda2170 View commit details
    Browse the repository at this point in the history
  3. mask_asm.go: Disable AVX2

    Slower for some reason than just SIMD.
    
    Also no dependency on cpu package is nice.
    nhooyr committed Oct 26, 2023
    Configuration menu
    Copy the full SHA
    f5397ae View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    14172e5 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    685a56e View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    cb7509a View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    3f8c9e0 View commit details
    Browse the repository at this point in the history
  8. mask_amd64.sh: Cleanup

    nhooyr committed Oct 26, 2023
    Configuration menu
    Copy the full SHA
    367743d View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    27f80cb View commit details
    Browse the repository at this point in the history
  10. mask_arm64.s: Cleanup

    nhooyr committed Oct 26, 2023
    Configuration menu
    Copy the full SHA
    369d641 View commit details
    Browse the repository at this point in the history
  11. Configuration menu
    Copy the full SHA
    fb13df2 View commit details
    Browse the repository at this point in the history
  12. Configuration menu
    Copy the full SHA
    ecf7dec View commit details
    Browse the repository at this point in the history
  13. wsjson: Add json.Encoder vs json.Marshal benchmark

    json.Encoder is 42% faster than json.Marshal thanks to the memory reuse.
    
    goos: linux
    goarch: amd64
    pkg: nhooyr.io/websocket/wsjson
    cpu: 12th Gen Intel(R) Core(TM) i5-1235U
    BenchmarkJSON/json.Encoder-12            3517579           340.2 ns/op        24 B/op          1 allocs/op
    BenchmarkJSON/json.Marshal-12            2374086           484.3 ns/op       728 B/op          2 allocs/op
    
    Closes nhooyr#409
    nhooyr committed Oct 26, 2023
    Configuration menu
    Copy the full SHA
    d34e5d4 View commit details
    Browse the repository at this point in the history
  14. Configuration menu
    Copy the full SHA
    e25d968 View commit details
    Browse the repository at this point in the history
  15. Configuration menu
    Copy the full SHA
    640e3c2 View commit details
    Browse the repository at this point in the history
  16. wsjson: Extend benchmark with multiple sizes

    [qrvnl@dios ~/src/websocket] 130$ go test -bench=. ./wsjson/
    goos: linux
    goarch: amd64
    pkg: nhooyr.io/websocket/wsjson
    cpu: 12th Gen Intel(R) Core(TM) i5-1235U
    BenchmarkJSON/json.Encoder/8-12         14041426            72.59 ns/op  110.21 MB/s          16 B/op          1 allocs/op
    BenchmarkJSON/json.Encoder/16-12        13936426            86.99 ns/op  183.92 MB/s          16 B/op          1 allocs/op
    BenchmarkJSON/json.Encoder/32-12        11416401           115.3 ns/op   277.59 MB/s          16 B/op          1 allocs/op
    BenchmarkJSON/json.Encoder/128-12        4600574           264.7 ns/op   483.55 MB/s          16 B/op          1 allocs/op
    BenchmarkJSON/json.Encoder/256-12        2710398           433.9 ns/op   590.06 MB/s          16 B/op          1 allocs/op
    BenchmarkJSON/json.Encoder/512-12        1588930           717.3 ns/op   713.82 MB/s          16 B/op          1 allocs/op
    BenchmarkJSON/json.Encoder/1024-12        823138          1484 ns/op     689.80 MB/s          16 B/op          1 allocs/op
    BenchmarkJSON/json.Encoder/2048-12        402823          2875 ns/op     712.32 MB/s          16 B/op          1 allocs/op
    BenchmarkJSON/json.Encoder/4096-12        213926          5602 ns/op     731.14 MB/s          16 B/op          1 allocs/op
    BenchmarkJSON/json.Encoder/8192-12         92864         11281 ns/op     726.19 MB/s          16 B/op          1 allocs/op
    BenchmarkJSON/json.Encoder/16384-12        39318         29203 ns/op     561.04 MB/s          19 B/op          1 allocs/op
    BenchmarkJSON/json.Marshal/8-12         10768671           114.5 ns/op    69.89 MB/s          48 B/op          2 allocs/op
    BenchmarkJSON/json.Marshal/16-12        10140996           113.9 ns/op   140.51 MB/s          64 B/op          2 allocs/op
    BenchmarkJSON/json.Marshal/32-12         9211780           121.6 ns/op   263.06 MB/s          64 B/op          2 allocs/op
    BenchmarkJSON/json.Marshal/128-12        4632796           264.2 ns/op   484.53 MB/s         224 B/op          2 allocs/op
    BenchmarkJSON/json.Marshal/256-12        2441511           473.5 ns/op   540.65 MB/s         432 B/op          2 allocs/op
    BenchmarkJSON/json.Marshal/512-12        1298788           896.2 ns/op   571.27 MB/s         912 B/op          2 allocs/op
    BenchmarkJSON/json.Marshal/1024-12        602084          1866 ns/op     548.83 MB/s        1808 B/op          2 allocs/op
    BenchmarkJSON/json.Marshal/2048-12        341151          3817 ns/op     536.61 MB/s        3474 B/op          2 allocs/op
    BenchmarkJSON/json.Marshal/4096-12        175594          7034 ns/op     582.32 MB/s        6548 B/op          2 allocs/op
    BenchmarkJSON/json.Marshal/8192-12         83222         15023 ns/op     545.30 MB/s       13591 B/op          2 allocs/op
    BenchmarkJSON/json.Marshal/16384-12        33087         39348 ns/op     416.39 MB/s       27304 B/op          2 allocs/op
    PASS
    ok      nhooyr.io/websocket/wsjson  32.934s
    nhooyr committed Oct 26, 2023
    Configuration menu
    Copy the full SHA
    0596e7a View commit details
    Browse the repository at this point in the history
  17. Configuration menu
    Copy the full SHA
    30447a3 View commit details
    Browse the repository at this point in the history
  18. Configuration menu
    Copy the full SHA
    f4e61e5 View commit details
    Browse the repository at this point in the history

Commits on Feb 22, 2024

  1. mask.go: Reorganize

    nhooyr committed Feb 22, 2024
    Configuration menu
    Copy the full SHA
    f533f43 View commit details
    Browse the repository at this point in the history
  2. ci: Fix dev coverage output

    nhooyr committed Feb 22, 2024
    Configuration menu
    Copy the full SHA
    a1bb441 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    fee3739 View commit details
    Browse the repository at this point in the history
  4. mask.go: Revert my changes

    I'm just not good enough at assembly. I added tests to confirm that @wdvxdr's
    implementation works correctly and matches the output of the basic masking loop.
    nhooyr committed Feb 22, 2024
    Configuration menu
    Copy the full SHA
    68fc887 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    f62cef3 View commit details
    Browse the repository at this point in the history
  6. internal/xcpu: Vendor golang.org/x/sys/cpu

    Standard library does this too. Unfortunate wish they just exposed it in the
    standard library. Perhaps we can isolate the specific code we need later.
    nhooyr committed Feb 22, 2024
    Configuration menu
    Copy the full SHA
    92acb74 View commit details
    Browse the repository at this point in the history
  7. mask_asm: Disable AVX2

    nhooyr committed Feb 22, 2024
    Configuration menu
    Copy the full SHA
    17e1b86 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    2cd18b3 View commit details
    Browse the repository at this point in the history