We can reslice instead of maintaining a separate offset. This gets rid of some bounds checks.
Also some other micro-optimizations to bit reading code. Combined results:
│ zstd/old │ zstd/new │
│ B/s │ B/s vs base │
Decoder_DecoderSmall/kppkn.gtb.zst/buffered-8 427.6Mi ± 0% 428.2Mi ± 0% +0.13% (p=0.019 n=10)
Decoder_DecoderSmall/kppkn.gtb.zst/unbuffered-8 511.6Mi ± 3% 516.9Mi ± 3% ~ (p=0.280 n=10)
Decoder_DecoderSmall/geo.protodata.zst/buffered-8 1.110Gi ± 0% 1.110Gi ± 0% ~ (p=0.165 n=10)
Decoder_DecoderSmall/geo.protodata.zst/unbuffered-8 824.7Mi ± 2% 827.3Mi ± 2% ~ (p=0.481 n=10)
Decoder_DecoderSmall/plrabn12.txt.zst/buffered-8 330.4Mi ± 0% 330.3Mi ± 1% ~ (p=0.645 n=10)
Decoder_DecoderSmall/plrabn12.txt.zst/unbuffered-8 533.3Mi ± 4% 538.8Mi ± 5% ~ (p=0.393 n=10)
Decoder_DecoderSmall/lcet10.txt.zst/buffered-8 395.0Mi ± 0% 394.6Mi ± 0% -0.10% (p=0.034 n=10)
Decoder_DecoderSmall/lcet10.txt.zst/unbuffered-8 556.5Mi ± 6% 546.2Mi ± 8% ~ (p=0.436 n=10)
Decoder_DecoderSmall/asyoulik.txt.zst/buffered-8 342.2Mi ± 0% 342.2Mi ± 0% ~ (p=0.956 n=10)
Decoder_DecoderSmall/asyoulik.txt.zst/unbuffered-8 436.7Mi ± 2% 435.4Mi ± 3% ~ (p=0.739 n=10)
Decoder_DecoderSmall/alice29.txt.zst/buffered-8 335.6Mi ± 2% 337.0Mi ± 0% +0.43% (p=0.000 n=10)
Decoder_DecoderSmall/alice29.txt.zst/unbuffered-8 552.6Mi ± 3% 550.7Mi ± 4% ~ (p=1.000 n=10)
Decoder_DecoderSmall/html_x_4.zst/buffered-8 2.264Gi ± 0% 2.271Gi ± 0% +0.29% (p=0.035 n=10)
Decoder_DecoderSmall/html_x_4.zst/unbuffered-8 1.558Gi ± 4% 1.554Gi ± 3% ~ (p=0.579 n=10)
Decoder_DecoderSmall/paper-100k.pdf.zst/buffered-8 3.554Gi ± 5% 3.610Gi ± 0% +1.59% (p=0.000 n=10)
Decoder_DecoderSmall/paper-100k.pdf.zst/unbuffered-8 1.701Gi ± 8% 1.709Gi ± 5% ~ (p=0.631 n=10)
Decoder_DecoderSmall/fireworks.jpeg.zst/buffered-8 7.891Gi ± 4% 8.070Gi ± 0% +2.26% (p=0.000 n=10)
Decoder_DecoderSmall/fireworks.jpeg.zst/unbuffered-8 3.062Gi ± 4% 3.129Gi ± 2% +2.16% (p=0.002 n=10)
Decoder_DecoderSmall/urls.10K.zst/buffered-8 525.4Mi ± 6% 553.8Mi ± 0% +5.39% (p=0.000 n=10)
Decoder_DecoderSmall/urls.10K.zst/unbuffered-8 763.7Mi ± 6% 819.7Mi ± 2% +7.34% (p=0.000 n=10)
Decoder_DecoderSmall/html.zst/buffered-8 894.8Mi ± 0% 898.8Mi ± 2% +0.45% (p=0.043 n=10)
Decoder_DecoderSmall/html.zst/unbuffered-8 722.3Mi ± 2% 717.7Mi ± 2% ~ (p=0.912 n=10)
Decoder_DecoderSmall/comp-data.bin.zst/buffered-8 386.6Mi ± 2% 390.4Mi ± 0% +1.00% (p=0.000 n=10)
Decoder_DecoderSmall/comp-data.bin.zst/unbuffered-8 145.2Mi ± 2% 148.7Mi ± 1% +2.42% (p=0.003 n=10)
geomean 770.3Mi 777.5Mi +0.93%