Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve string encoding by following json approach #1350

Merged
merged 9 commits into from Sep 9, 2023

Conversation

cdvr1993
Copy link
Contributor

@cdvr1993 cdvr1993 commented Sep 8, 2023

Recently we found an application were using zap.Reflect was faster than using zapcore.ObjectMarshaler. After profiling we found that string encoding is really expensive. I replicated what encoding/json does and was able to get greater performance than Reflect.

I had to modify the benchmark to have a greater number of strings.

Benchmark results

goos: linux
goarch: amd64
pkg: go.uber.org/zap/zapcore
cpu: AMD EPYC 7B13
               │ /tmp/old.txt │            /tmp/new.txt             │
               │    sec/op    │   sec/op     vs base                │
ZapJSON-8         89.10µ ± 1%   33.38µ ± 3%  -62.54% (p=0.000 n=10)
StandardJSON-8    40.74µ ± 1%   42.46µ ± 1%   +4.22% (p=0.000 n=10)
geomean           60.25µ        37.65µ       -37.52%

Benchmark results

goos: linux
goarch: amd64
pkg: go.uber.org/zap/zapcore
cpu: AMD EPYC 7B13
               │ /tmp/old.txt │            /tmp/new.txt             │
               │    sec/op    │   sec/op     vs base                │
ZapJSON-8         89.10µ ± 1%   33.38µ ± 3%  -62.54% (p=0.000 n=10)
StandardJSON-8    40.74µ ± 1%   42.46µ ± 1%   +4.22% (p=0.000 n=10)
geomean           60.25µ        37.65µ       -37.52%
@codecov
Copy link

codecov bot commented Sep 9, 2023

Codecov Report

Merging #1350 (ced79e2) into master (82c728b) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master    #1350   +/-   ##
=======================================
  Coverage   98.40%   98.41%           
=======================================
  Files          52       52           
  Lines        3457     3471   +14     
=======================================
+ Hits         3402     3416   +14     
  Misses         46       46           
  Partials        9        9           
Files Changed Coverage Δ
buffer/buffer.go 100.00% <100.00%> (ø)
zapcore/json_encoder.go 100.00% <100.00%> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Collaborator

@abhinav abhinav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice!
I would prefer to do this without unsafe, and I have some ideas on how we could do that. Let me try something out locally.

buffer/buffer.go Outdated
Comment on lines 46 to 48
func (b *Buffer) AppendByteV(v ...byte) {
b.bs = append(b.bs, v...)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be named AppendBytes and take a []byte, not a vararg. (Also, the docstring is inaccurate.)

(FYI, there's also Buffer.Write which does the same, while satisfying io.Writer, but there's no problem with also having this method. However, if we have both, maybe Write should call AppendBytes.)

Comment on lines 491 to 495
enc.safeAddByteString(*(*[]byte)(unsafe.Pointer(&reflect.SliceHeader{
Data: (*reflect.StringHeader)(unsafe.Pointer(&s)).Data,
Len: len(s),
Cap: len(s),
})))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting aside that this is decidedly Not Safe (in a function called safeAddString), starting Go 1.20, the above isn't the best way for an unsafe string to byte slice conversion.

It's better to now do: unsafe.Slice(unsafe.StringData(s), len(s)).

https://go.dev/play/p/mGnV97K5tu_w

Comment on lines 503 to 505
if s[i] >= 0x20 && s[i] != '\\' && s[i] != '"' {
i++
continue
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! This right here is the performance win.
Scanning until the next position escaping is needed instead of appending one byte at a time.

I'm in favor of such a change, but I would prefer if we could do this without the unsafe.

@abhinav
Copy link
Collaborator

abhinav commented Sep 9, 2023

@cdvr1993 I just pushed a change to your branch that drops the unsafe/reflect in favor of generics.
It needs function references for DecodeUTF8 and AppendString vs AppendBytes,
but otherwise it's almost as good as the unsafe version -- only 4% slower than your version.

name     old time/op  new time/op  delta
ZapJSON   779µs ± 0%   813µs ± 1%  +4.34%  (p=0.000 n=9+10)

I think that may be acceptable since this is still a pretty massive net improvement. WDYT?

@cdvr1993
Copy link
Contributor Author

cdvr1993 commented Sep 9, 2023

LGTM, on my computer the delta is like 1-2%, so probably you have some noise there.

This no longer needs to be a separate function.
Adds a fuzz test for the string and []byte versions of safeAppendStringLike
that verifies that both variants are able to decode the original string
back.
The optimization is basically "instead of appending byte at a time,
skip over non-special bytes and append them all together."
The original optimization applies only to single-byte runes.
This applies the same to multi-byte runes.
Flips the logic a little to be easier to follow.
The shape is basically:

    if mutli byte rune {
        if no special handling {
            skip
            continue
        }
        special handling
    } else {
        if no special handling {
            skip
            continue
        }
        special handling
    }

This makes the logic much more obvious while retaining performance.
@abhinav
Copy link
Collaborator

abhinav commented Sep 9, 2023

@cdvr1993 I realized that the same idea (skip over characters that don't need special handling) could be used for the multi-byte runes as well. That yields a small improvement too.

I've pushed that and a small readability fix on top.

Copy link
Contributor

@jquirke jquirke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tangential: why did we never escape \f as well?

@abhinav
Copy link
Collaborator

abhinav commented Sep 9, 2023

tangential: why did we never escape \f as well?

I don't know what the original reasoning for that choice is, but it's definitely handled:

// \b and \f are sometimes backslash-escaped, but this representation is also
// conformant.
"\b": `\u0008`,
"\f": `\u000c`,

We could change that and still be okay.

Copy link
Collaborator

@abhinav abhinav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abhinav abhinav merged commit 5a27bab into uber-go:master Sep 9, 2023
6 checks passed
@abhinav
Copy link
Collaborator

abhinav commented Sep 9, 2023

Thanks, @cdvr1993!

Copy link
Contributor

@sywhang sywhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Retrospective LGTM!

I measured this on a pretty beefy machine but here's the results:

name        old time/op  new time/op  delta
ZapJSON-96  57.1µs ±25%  27.8µs ± 2%  -51.34%  (p=0.000 n=10+8)

@@ -42,6 +42,11 @@ func (b *Buffer) AppendByte(v byte) {
b.bs = append(b.bs, v)
}

// AppendBytes writes a single byte to the Buffer.
Copy link

@makkes makkes Sep 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just came here from the 1.26.0 release notes. Great job! This comment, though, doesn't seem to be right in that the method writes all bytes to the buffer. /cc @abhinav

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, yes, you're right. I missed this. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants