Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zstd: Shorter and faster asm for decSymbol.newState #896

Merged
merged 2 commits into from
Dec 9, 2023

Commits on Dec 9, 2023

  1. zstd: Shorter asm for decSymbol.newState

    The asm needs to compute decSymbol.newState, which is
    
    	uint16(state >> 16),
    
    or, equivalently (except for types),
    
    	uint32(state) >> 16.
    
    This can be accomplished by a MOVL+SHRL, the former of which is elided
    by avo, so we get a single instruction for both the BMI2 and non-BMI2
    cases.
    
    Benchmarks show no difference on a new BMI2-supporting machine, but on
    an older i7, decompression throughput is a tiny bit faster:
    
    	goos: linux
    	goarch: amd64
    	pkg: github.com/klauspost/compress/zstd
    	cpu: Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz
    	                                       │     old      │                shift                │
    	                                       │     B/s      │     B/s       vs base               │
    	Decoder_DecodeAll/kppkn.gtb.zst-8        441.4Mi ± 2%   450.4Mi ± 0%  +2.03% (p=0.000 n=10)
    	Decoder_DecodeAll/geo.protodata.zst-8    1.148Gi ± 1%   1.152Gi ± 0%  +0.34% (p=0.009 n=10)
    	Decoder_DecodeAll/plrabn12.txt.zst-8     347.9Mi ± 0%   356.6Mi ± 1%  +2.48% (p=0.000 n=10)
    	Decoder_DecodeAll/lcet10.txt.zst-8       417.4Mi ± 0%   427.3Mi ± 0%  +2.37% (p=0.000 n=10)
    	Decoder_DecodeAll/asyoulik.txt.zst-8     347.1Mi ± 0%   352.7Mi ± 1%  +1.62% (p=0.003 n=10)
    	Decoder_DecodeAll/alice29.txt.zst-8      346.3Mi ± 1%   352.6Mi ± 0%  +1.83% (p=0.000 n=10)
    	Decoder_DecodeAll/html_x_4.zst-8         1.440Gi ± 0%   1.445Gi ± 0%  +0.29% (p=0.019 n=10)
    	Decoder_DecodeAll/paper-100k.pdf.zst-8   4.191Gi ± 0%   4.210Gi ± 0%  +0.45% (p=0.007 n=10)
    	Decoder_DecodeAll/fireworks.jpeg.zst-8   8.891Gi ± 0%   8.849Gi ± 0%  -0.47% (p=0.000 n=10)
    	Decoder_DecodeAll/urls.10K.zst-8         589.6Mi ± 0%   600.2Mi ± 0%  +1.80% (p=0.001 n=10)
    	Decoder_DecodeAll/html.zst-8             926.1Mi ± 1%   937.9Mi ± 0%  +1.27% (p=0.000 n=10)
    	Decoder_DecodeAll/comp-data.bin.zst-8    389.6Mi ± 0%   395.1Mi ± 0%  +1.40% (p=0.000 n=10)
    	geomean                                  832.6Mi        843.3Mi       +1.28%
    greatroar committed Dec 9, 2023
    Configuration menu
    Copy the full SHA
    3818b77 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    0eb409c View commit details
    Browse the repository at this point in the history