k12: improve API and support multithreaded computation #443

bwesterb · 2023-06-04T16:28:26Z

Use options style API, so that the common case is very simple:

h := k12.NewDraft10()

but we can provide options elegantly:

h := k12.NewDraft10(
    WithContext([]byte("some context")),
    WithWorkers(runtime.NumCPU()),
)

Allows multithreaded computation with the WithWorkers() option. On M2 Pro scales well with a few workers, but isn't able to utilize all cores effectively:

BenchmarkK12_100B-12          	 5223334	       228.8 ns/op	 437.07 MB/s
BenchmarkK12_10K-12           	  105744	     11183 ns/op	 894.21 MB/s
BenchmarkK12_100K-12          	   27141	     44364 ns/op	2254.06 MB/s
BenchmarkK12_3M-12            	    1172	   1010401 ns/op	3243.07 MB/s
BenchmarkK12_32M-12           	     100	  10033730 ns/op	3265.78 MB/s
BenchmarkK12_327M-12          	      12	  98888208 ns/op	3313.64 MB/s
BenchmarkK12_3276M-12         	       2	 991111938 ns/op	3306.19 MB/s
BenchmarkK12x2_32M-12         	     183	   6510423 ns/op	5033.16 MB/s
BenchmarkK12x2_327M-12        	      18	  63622058 ns/op	5150.41 MB/s
BenchmarkK12x2_3276M-12       	       2	 632823584 ns/op	5178.06 MB/s
BenchmarkK12x4_32M-12         	     364	   3300120 ns/op	9929.34 MB/s
BenchmarkK12x4_327M-12        	      39	  29477854 ns/op	11116.14 MB/s
BenchmarkK12x4_3276M-12       	       2	 581819167 ns/op	11263.98 MB/s
BenchmarkK12x8_32M-12         	     520	   2301923 ns/op	14235.05 MB/s
BenchmarkK12x8_327M-12        	      76	  15590312 ns/op	21018.18 MB/s
BenchmarkK12x8_3276M-12       	       4	 296590469 ns/op	22096.46 MB/s
BenchmarkK12xCPUs_32M-12      	     472	   2526827 ns/op	12968.04 MB/s
BenchmarkK12xCPUs_327M-12     	      78	  15139957 ns/op	21643.39 MB/s
BenchmarkK12xCPUs_3276M-12    	       4	 280958114 ns/op	23325.90 MB/s

We only reach 23GB/s (at 12x) instead of the lower bound of 33GB/s expected with 10 performance cores.

Adds {Max,Next}WriteSize to suggest the caller how big to choose their Write() calls.

armfazh

For 32 GB,

Cores	Speedup
1	1.00x
2	1.49x
4	2.48x
8	4.27x

For 32MB

Cores	Speedup
1	1.00x
2	1.28x
4	1.81x
8	1.92x

the speedup increases with the length of the message (as expected) but the largest test case (where numbers to look ok) requires message of many GB in memory. Not sure if this is gonna be realistic since usually large reads are done in batches of smaller sizes.

For medium-sized messages (say 32MB), the max speedup is below 2x even with 8 threads.

*not in opposition to this effort, but timings make me think if something must be done differently (or I missed something)

xof/k12/k12.go

xof/k12/k12_test.go

xof/k12/k12.go

bwesterb · 2023-06-17T15:49:11Z

requires message of many GB in memory. Not sure if this is gonna be realistic since usually large reads are done in batches of smaller sizes.

Those tests don't use a buffer of GBs. Instead they use the size suggested by MaxWriteSize(), which atm is lanes * CPUs * 8kB, ie. 192kB on M2 Pro.

but timings make me think if something must be done differently (or I missed something)

There clearly is some concurrency bottleneck here which I haven't figured out yet.

Use options style API, so that the common case is very simple: h := k12.NewDraft10() but we can provide options elegantly: h := k12.NewDraft10( WithContext([]byte("some context")), WithWorkers(runtime.NumCPU()), ) Allows multithreaded computation with the WithWorkers() option. On M2 Pro scales well with a few workers, but isn't able to utilize all cores effectively. BenchmarkK12_100B-12 5223334 228.8 ns/op 437.07 MB/s BenchmarkK12_10K-12 105744 11183 ns/op 894.21 MB/s BenchmarkK12_100K-12 27141 44364 ns/op 2254.06 MB/s BenchmarkK12_3M-12 1172 1010401 ns/op 3243.07 MB/s BenchmarkK12_32M-12 100 10033730 ns/op 3265.78 MB/s BenchmarkK12_327M-12 12 98888208 ns/op 3313.64 MB/s BenchmarkK12_3276M-12 2 991111938 ns/op 3306.19 MB/s BenchmarkK12x2_32M-12 183 6510423 ns/op 5033.16 MB/s BenchmarkK12x2_327M-12 18 63622058 ns/op 5150.41 MB/s BenchmarkK12x2_3276M-12 2 632823584 ns/op 5178.06 MB/s BenchmarkK12x4_32M-12 364 3300120 ns/op 9929.34 MB/s BenchmarkK12x4_327M-12 39 29477854 ns/op 11116.14 MB/s BenchmarkK12x4_3276M-12 2 581819167 ns/op 11263.98 MB/s BenchmarkK12x8_32M-12 520 2301923 ns/op 14235.05 MB/s BenchmarkK12x8_327M-12 76 15590312 ns/op 21018.18 MB/s BenchmarkK12x8_3276M-12 4 296590469 ns/op 22096.46 MB/s BenchmarkK12xCPUs_32M-12 472 2526827 ns/op 12968.04 MB/s BenchmarkK12xCPUs_327M-12 78 15139957 ns/op 21643.39 MB/s BenchmarkK12xCPUs_3276M-12 4 280958114 ns/op 23325.90 MB/s We only reach 23GB/s (at 12x) instead of the lower bound of 33GB/s expected with 10 performance cores. Adds {Max,Next}WriteSize to suggest the caller how big to choose their Write() calls. Adds a test to the CI, to check the k12 package with the data race detector on.

bwesterb force-pushed the bas/pk12 branch 3 times, most recently from 1cec358 to 02c9cde Compare June 4, 2023 16:38

armfazh requested changes Jun 5, 2023

View reviewed changes

xof/k12/k12.go Show resolved Hide resolved

xof/k12/k12.go Outdated Show resolved Hide resolved

xof/k12/k12.go Outdated Show resolved Hide resolved

xof/k12/k12_test.go Show resolved Hide resolved

xof/k12/k12.go Show resolved Hide resolved

xof/k12/k12.go Show resolved Hide resolved

bwesterb force-pushed the bas/pk12 branch from 02c9cde to 0c36dde Compare June 17, 2023 15:44

bwesterb force-pushed the bas/pk12 branch 2 times, most recently from 80cdfd3 to 7df5996 Compare June 17, 2023 20:36

bwesterb force-pushed the bas/pk12 branch from 7df5996 to 78965b7 Compare June 17, 2023 20:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

k12: improve API and support multithreaded computation #443

k12: improve API and support multithreaded computation #443

bwesterb commented Jun 4, 2023 •

edited

armfazh left a comment

bwesterb commented Jun 17, 2023

k12: improve API and support multithreaded computation #443

Are you sure you want to change the base?

k12: improve API and support multithreaded computation #443

Conversation

bwesterb commented Jun 4, 2023 • edited

armfazh left a comment

Choose a reason for hiding this comment

bwesterb commented Jun 17, 2023

bwesterb commented Jun 4, 2023 •

edited