Improve string encoding by following json approach #1350

cdvr1993 · 2023-09-08T23:47:49Z

Recently we found an application were using zap.Reflect was faster than using zapcore.ObjectMarshaler. After profiling we found that string encoding is really expensive. I replicated what encoding/json does and was able to get greater performance than Reflect.

I had to modify the benchmark to have a greater number of strings.

Benchmark results

goos: linux
goarch: amd64
pkg: go.uber.org/zap/zapcore
cpu: AMD EPYC 7B13
               │ /tmp/old.txt │            /tmp/new.txt             │
               │    sec/op    │   sec/op     vs base                │
ZapJSON-8         89.10µ ± 1%   33.38µ ± 3%  -62.54% (p=0.000 n=10)
StandardJSON-8    40.74µ ± 1%   42.46µ ± 1%   +4.22% (p=0.000 n=10)
geomean           60.25µ        37.65µ       -37.52%

Benchmark results goos: linux goarch: amd64 pkg: go.uber.org/zap/zapcore cpu: AMD EPYC 7B13 │ /tmp/old.txt │ /tmp/new.txt │ │ sec/op │ sec/op vs base │ ZapJSON-8 89.10µ ± 1% 33.38µ ± 3% -62.54% (p=0.000 n=10) StandardJSON-8 40.74µ ± 1% 42.46µ ± 1% +4.22% (p=0.000 n=10) geomean 60.25µ 37.65µ -37.52%

codecov · 2023-09-09T00:34:16Z

Codecov Report

Merging #1350 (ced79e2) into master (82c728b) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master    #1350   +/-   ##
=======================================
  Coverage   98.40%   98.41%           
=======================================
  Files          52       52           
  Lines        3457     3471   +14     
=======================================
+ Hits         3402     3416   +14     
  Misses         46       46           
  Partials        9        9

Files Changed	Coverage Δ
buffer/buffer.go	`100.00% <100.00%> (ø)`
zapcore/json_encoder.go	`100.00% <100.00%> (ø)`

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

abhinav

Very nice!
I would prefer to do this without unsafe, and I have some ideas on how we could do that. Let me try something out locally.

abhinav · 2023-09-09T00:18:09Z

buffer/buffer.go

+func (b *Buffer) AppendByteV(v ...byte) {
+	b.bs = append(b.bs, v...)
+}


This should probably be named AppendBytes and take a []byte, not a vararg. (Also, the docstring is inaccurate.)

(FYI, there's also Buffer.Write which does the same, while satisfying io.Writer, but there's no problem with also having this method. However, if we have both, maybe Write should call AppendBytes.)

abhinav · 2023-09-09T00:26:15Z

zapcore/json_encoder.go

+	enc.safeAddByteString(*(*[]byte)(unsafe.Pointer(&reflect.SliceHeader{
+		Data: (*reflect.StringHeader)(unsafe.Pointer(&s)).Data,
+		Len:  len(s),
+		Cap:  len(s),
+	})))


Putting aside that this is decidedly Not Safe (in a function called safeAddString), starting Go 1.20, the above isn't the best way for an unsafe string to byte slice conversion.

It's better to now do: unsafe.Slice(unsafe.StringData(s), len(s)).

https://go.dev/play/p/mGnV97K5tu_w

abhinav · 2023-09-09T00:29:30Z

zapcore/json_encoder.go

+			if s[i] >= 0x20 && s[i] != '\\' && s[i] != '"' {
+				i++
+				continue


Very nice! This right here is the performance win.
Scanning until the next position escaping is needed instead of appending one byte at a time.

I'm in favor of such a change, but I would prefer if we could do this without the unsafe.

abhinav · 2023-09-09T00:47:26Z

@cdvr1993 I just pushed a change to your branch that drops the unsafe/reflect in favor of generics.
It needs function references for DecodeUTF8 and AppendString vs AppendBytes,
but otherwise it's almost as good as the unsafe version -- only 4% slower than your version.

name     old time/op  new time/op  delta
ZapJSON   779µs ± 0%   813µs ± 1%  +4.34%  (p=0.000 n=9+10)

I think that may be acceptable since this is still a pretty massive net improvement. WDYT?

cdvr1993 · 2023-09-09T03:18:58Z

LGTM, on my computer the delta is like 1-2%, so probably you have some noise there.

This no longer needs to be a separate function.

Adds a fuzz test for the string and []byte versions of safeAppendStringLike that verifies that both variants are able to decode the original string back.

The optimization is basically "instead of appending byte at a time, skip over non-special bytes and append them all together." The original optimization applies only to single-byte runes. This applies the same to multi-byte runes.

Flips the logic a little to be easier to follow. The shape is basically: if mutli byte rune { if no special handling { skip continue } special handling } else { if no special handling { skip continue } special handling } This makes the logic much more obvious while retaining performance.

abhinav · 2023-09-09T04:09:48Z

@cdvr1993 I realized that the same idea (skip over characters that don't need special handling) could be used for the multi-byte runes as well. That yields a small improvement too.

I've pushed that and a small readability fix on top.

jquirke

tangential: why did we never escape \f as well?

abhinav · 2023-09-09T16:01:00Z

tangential: why did we never escape \f as well?

I don't know what the original reasoning for that choice is, but it's definitely handled:

zap/zapcore/json_encoder_impl_test.go

Lines 73 to 76 in 82c728b

    
           // \b and \f are sometimes backslash-escaped, but this representation is also 
        
           // conformant. 
        
           "\b": `\u0008`, 
        
           "\f": `\u000c`,

We could change that and still be okay.

abhinav

CC @prashantv @rabbbit @sywhang

abhinav · 2023-09-09T20:04:33Z

Thanks, @cdvr1993!

sywhang

Retrospective LGTM!

I measured this on a pretty beefy machine but here's the results:

name        old time/op  new time/op  delta
ZapJSON-96  57.1µs ±25%  27.8µs ± 2%  -51.34%  (p=0.000 n=10+8)

makkes · 2023-09-18T07:16:36Z

buffer/buffer.go

@@ -42,6 +42,11 @@ func (b *Buffer) AppendByte(v byte) {
 	b.bs = append(b.bs, v)
 }

+// AppendBytes writes a single byte to the Buffer.


Just came here from the 1.26.0 release notes. Great job! This comment, though, doesn't seem to be right in that the method writes all bytes to the buffer. /cc @abhinav

Oops, yes, you're right. I missed this. Thanks!

abhinav reviewed Sep 9, 2023

View reviewed changes

PoC: Generics instead of unsafe of reflection

044300b

Buffer.AppendBytes: Fix doc

e12743b

abhinav added 5 commits September 8, 2023 20:26

chore: Inline tryAddRuneError

85815f6

This no longer needs to be a separate function.

doc: explain how safeAppendStringLike works

9165693

test: Add fuzz test for safeAppendStringLike

f974d60

Adds a fuzz test for the string and []byte versions of safeAppendStringLike that verifies that both variants are able to decode the original string back.

perf: Apply the same optimization to multi-byte runes

c00d33c

The optimization is basically "instead of appending byte at a time, skip over non-special bytes and append them all together." The original optimization applies only to single-byte runes. This applies the same to multi-byte runes.

jquirke reviewed Sep 9, 2023

View reviewed changes

Merge branch 'master' into improve-string-encoding

ced79e2

abhinav approved these changes Sep 9, 2023

View reviewed changes

abhinav merged commit 5a27bab into uber-go:master Sep 9, 2023
6 checks passed

sywhang approved these changes Sep 9, 2023

View reviewed changes

makkes reviewed Sep 18, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve string encoding by following json approach #1350

Improve string encoding by following json approach #1350

cdvr1993 commented Sep 8, 2023

codecov bot commented Sep 9, 2023 •

edited

abhinav left a comment

abhinav Sep 9, 2023

abhinav Sep 9, 2023

abhinav Sep 9, 2023

abhinav commented Sep 9, 2023 •

edited

cdvr1993 commented Sep 9, 2023

abhinav commented Sep 9, 2023

jquirke left a comment

abhinav commented Sep 9, 2023

abhinav left a comment

abhinav commented Sep 9, 2023

sywhang left a comment

makkes Sep 18, 2023 •

edited

abhinav Sep 18, 2023

Improve string encoding by following json approach #1350

Improve string encoding by following json approach #1350

Conversation

cdvr1993 commented Sep 8, 2023

Benchmark results

codecov bot commented Sep 9, 2023 • edited

Codecov Report

abhinav left a comment

Choose a reason for hiding this comment

abhinav Sep 9, 2023

Choose a reason for hiding this comment

abhinav Sep 9, 2023

Choose a reason for hiding this comment

abhinav Sep 9, 2023

Choose a reason for hiding this comment

abhinav commented Sep 9, 2023 • edited

cdvr1993 commented Sep 9, 2023

abhinav commented Sep 9, 2023

jquirke left a comment

Choose a reason for hiding this comment

abhinav commented Sep 9, 2023

abhinav left a comment

Choose a reason for hiding this comment

abhinav commented Sep 9, 2023

sywhang left a comment

Choose a reason for hiding this comment

makkes Sep 18, 2023 • edited

Choose a reason for hiding this comment

abhinav Sep 18, 2023

Choose a reason for hiding this comment

codecov bot commented Sep 9, 2023 •

edited

abhinav commented Sep 9, 2023 •

edited

makkes Sep 18, 2023 •

edited