logstats: do not allocate memory while logging #15539

vmg · 2024-03-21T10:45:51Z

Description

This is the first PR in a series dedicated to taming the CPU and memory usage of vttablet instances. The effort here is based off profiling and behavior data from PlanetScale customers that run particularly large and busy Vitess clusters. Many of these results do not reproduce in synthetic benchmarks.

For this first low hanging fruit, I've noticed that a disproportionate amount of CPU time and allocations is spent logging queries as part of the logstats system. Let's start from the finish line and look at before/after graphs and metrics.

Flame graph for CPU usage (before vs after)

These are CPU flame graphs for the vttablet in synthetic stress benchmark. This result reproduces very well regardless of the QPS/throughput of the Vitess cluster. The amount of time spent logging is always disproportionate.

The time spent in logstats is marked with a red square in both graphs. Important detail: although the CPU usage for the subsystem was disproportionate, I've only highlighted in this flamegraph the direct CPU impact of logging. The indirect impact caused by the extreme amount of small allocations cannot be clearly highlighted here but it'll become apparent in the metrics.

Flame graph for object allocation count (before vs after)

Here we display the object allocation count for the vttablet (before and after). I've skipped the total memory allocated flamegraph because it looks very similar. As you can see, this one is particularly disproportionate: up to one third of all object allocations during normal execution of a vttablet is spent on temporary objects to compose these log messages.

The reduction in the "after" flamegraph is extremely significant.

Benchmark results

              │ benchold.txt │             benchnew.txt             │
              │  UserCPU/s   │ UserCPU/s   vs base                  │
OLTP/vtgate-4     4.230 ± 1%   4.160 ± 0%  -1.66% (p=0.000 n=13+14)
OLTP/tablet-4     1.427 ± 0%   1.373 ± 1%  -3.75% (n=52+56)
geomean           2.457        2.390       -2.71%

              │ benchold.txt │             benchnew.txt             │
              │  IdleCPU/s   │ IdleCPU/s   vs base                  │
OLTP/vtgate-4     27.20 ± 0%   27.27 ± 0%  +0.23% (p=0.002 n=13+14)
OLTP/tablet-4     30.45 ± 0%   30.51 ± 0%       ~ (p=0.057 n=52+56)
geomean           28.78        28.84       +0.23%

              │ benchold.txt │             benchnew.txt              │
              │   GCCPU/s    │   GCCPU/s    vs base                  │
OLTP/vtgate-4    565.8m ± 1%   562.5m ± 1%       ~ (p=0.488 n=13+14)
OLTP/tablet-4    134.0m ± 1%   120.6m ± 1%  -9.97% (n=52+56)
geomean          275.3m        260.5m       -5.39%

              │ benchold.txt │              benchnew.txt               │
              │     KB/s     │     KB/s      vs base                   │
OLTP/vtgate-4    315.9k ± 9%   257.4k ± 14%  -18.50% (p=0.011 n=13+14)
OLTP/tablet-4    82.75k ± 4%   60.09k ± 10%  -27.39% (p=0.000 n=52+56)
geomean          161.7k        124.4k        -23.07%

              │ benchold.txt │              benchnew.txt               │
              │   allocs/s   │   allocs/s    vs base                   │
OLTP/vtgate-4    5.629M ± 9%   4.442M ± 14%  -21.09% (p=0.003 n=13+14)
OLTP/tablet-4    1.569M ± 4%   1.027M ± 10%  -34.55% (n=52+56)
geomean          2.972M        2.136M        -28.14%

For this set of PRs, I've chosen to skip arewefastyet altogether and instead use a custom harness instead. There's several reasons for this: First and foremost, we're having significant replication issues with the platform which @frouioui is actively working on fixing. Also, the OLTP and TPCC benchmarks that currently run in the platform do not model particularly well the traffic of the busiest Vitess clusters we've seen in production. Lastly, the platform right now does not measure some of the metrics which we're interested in, such as total CPU time spent in GC.

Anyway, onto the results analysis: it looks very good. The key metrics here are the reduction in memory allocations per second and object allocations per second in both the tablets and the gates (up to -35% fewer objects allocated per second after this change). The other key result is reduction in CPU seconds spent in GC for the tablets, which is down 10%. There's also a corresponding and very nice reduction in CPU time spent in user code because the new logstats logger is plain faster, but I think the major and significant impact here is GC time.

The new implementation

To accomplish these results, I've implemented a new zero-allocation logger for logstats. The logstats functionality is currently used in both the gates and the tablets (whilst tracking slightly different metrics), but the performance impact appears to be much more significant in the tablets. Regardless, I've unified the two implementations, which is why the benchmark metrics also show improvements on the tablets.

For the new logger interface, I've tried to balance ergonomics and performance, although this is code that won't change often. I think the new implementation is frankly more readable than the old one! I've also paid special attention to backwards compatibility to ensure the logging results are equivalent.

There's only one exception to this: the printing of bind variables in the logstats. For the textual logstats output, the bind variables were printed in a very peculiar way: fmt.Sprintf("%v", bvars). The resulting syntax of this print expression is neither stable nor parseable, so it seemed futile to try to replicate it. Instead, I've chosen to print the bind variables as JSON in both textual and JSON logstat output, because this is a stable format that can always be parsed. I've also fixed a bug in the previous version of the JSON bind-variable printer, where the variables were not consistently sorted when printing.

Lastly, as part of the unified logger path, I'm storing the loggers in a sync pool which is not exposed directly via the public interface but which allows for transparently re-using the logging buffer between logging calls.

Related Issue(s)

Checklist

"Backport to:" labels have been added if this change should be back-ported to release branches
If this change is to be back-ported to previous releases, a justification is included in the PR description
Tests were added or are not required
Did the new or modified tests pass consistently locally and on CI?
Documentation was added or is not required

Deployment Notes

vitess-bot · 2024-03-21T10:45:54Z

vitess-bot · 2024-03-21T10:46:17Z

Hello! 👋

This Pull Request is now handled by arewefastyet. The current HEAD and future commits will be benchmarked.

You can find the performance comparison on the arewefastyet website.

codecov · 2024-03-21T11:03:00Z

Codecov Report

Attention: Patch coverage is 98.57820% with 3 lines in your changes are missing coverage. Please review.

Project coverage is 65.76%. Comparing base (696fe0e) to head (925744b).
Report is 147 commits behind head on main.

❗ Current head 925744b differs from pull request most recent head c52fee6. Consider uploading reports for the commit c52fee6 to get more accurate results

Files	Patch %	Lines
go/logstats/logger.go	97.19%	3 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main   #15539      +/-   ##
==========================================
- Coverage   67.41%   65.76%   -1.66%     
==========================================
  Files        1560     1561       +1     
  Lines      192752   194710    +1958     
==========================================
- Hits       129952   128057    -1895     
- Misses      62800    66653    +3853

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

mdlayher

LGTM, well done.

For JSON there is the experimental json/v2 under development at https://github.com/go-json-experiment/json which may make some of the JSON token construction easier.

But since logic supports both JSON and some kind of plaintext format, I have no objections to keeping it as is.

systay · 2024-03-22T16:21:54Z

go/logstats/logger.go

+	BVar *querypb.BindVariable
+}
+
+type Logger struct {


maybe a line or two with the raison d'etre for this logger?

systay · 2024-03-22T16:22:08Z

go/test/endtoend/vtgate/queries/benchmark/oltp_test.go

@@ -0,0 +1,130 @@
+package dml


Signed-off-by: Vicent Marti <vmg@strn.cat>

deepthi · 2024-03-29T01:13:35Z

@mdlayher there's an open PR for json-v2. We should review that.

Signed-off-by: Vicent Marti <vmg@strn.cat> Signed-off-by: Vilius Okockis <vilius.okockis@vinted.com>

github-actions bot added this to the v20.0.0 milestone Mar 21, 2024

vmg removed the Benchmark me Add label to PR to run benchmarks label Mar 22, 2024

vmg marked this pull request as ready for review March 22, 2024 15:16

vmg requested review from harshit-gangal, systay, shlomi-noach, rohit-nayak-ps, frouioui, GuptaManan100 and deepthi as code owners March 22, 2024 15:16

mdlayher approved these changes Mar 22, 2024

View reviewed changes

systay reviewed Mar 22, 2024

View reviewed changes

go/test/endtoend/vtgate/queries/benchmark/oltp_test.go Outdated

@@ -0,0 +1,130 @@

package dml

Copy link

Collaborator

systay Mar 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

license

systay approved these changes Mar 22, 2024

View reviewed changes

logstats: do not allocate memory while logging

c52fee6

Signed-off-by: Vicent Marti <vmg@strn.cat>

vmg force-pushed the vmg/logstats branch from 925744b to c52fee6 Compare March 25, 2024 08:30

vmg merged commit 54ef7b2 into vitessio:main Mar 25, 2024
100 checks passed

vmg deleted the vmg/logstats branch March 25, 2024 09:12

vmg mentioned this pull request Apr 1, 2024

querylog: json format version 2 #15271

Open

5 tasks

DeathBorn pushed a commit to vinted/vitess that referenced this pull request Apr 15, 2024

logstats: do not allocate memory while logging (vitessio#15539)

8b9115e

Signed-off-by: Vicent Marti <vmg@strn.cat> Signed-off-by: Vilius Okockis <vilius.okockis@vinted.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

logstats: do not allocate memory while logging #15539

logstats: do not allocate memory while logging #15539

vmg commented Mar 21, 2024 •

edited

vitess-bot bot commented Mar 21, 2024

vitess-bot bot commented Mar 21, 2024

codecov bot commented Mar 21, 2024 •

edited

mdlayher left a comment

systay Mar 22, 2024

systay Mar 22, 2024

deepthi commented Mar 29, 2024

logstats: do not allocate memory while logging #15539

logstats: do not allocate memory while logging #15539

Conversation

vmg commented Mar 21, 2024 • edited

Description

Flame graph for CPU usage (before vs after)

Flame graph for object allocation count (before vs after)

Benchmark results

The new implementation

Related Issue(s)

Checklist

Deployment Notes

vitess-bot bot commented Mar 21, 2024

Review Checklist

General

Tests

Documentation

New flags

If a workflow is added or modified:

Backward compatibility

vitess-bot bot commented Mar 21, 2024

codecov bot commented Mar 21, 2024 • edited

Codecov Report

mdlayher left a comment

Choose a reason for hiding this comment

systay Mar 22, 2024

Choose a reason for hiding this comment

systay Mar 22, 2024

Choose a reason for hiding this comment

deepthi commented Mar 29, 2024

vmg commented Mar 21, 2024 •

edited

codecov bot commented Mar 21, 2024 •

edited