Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance degradation seen with secure eventfd #1858

Open
jinengandhi-intel opened this issue Apr 24, 2024 · 13 comments
Open

Performance degradation seen with secure eventfd #1858

jinengandhi-intel opened this issue Apr 24, 2024 · 13 comments

Comments

@jinengandhi-intel
Copy link
Contributor

jinengandhi-intel commented Apr 24, 2024

Description of the problem

Performance benchmarking for workloads like MongoDB and Memcached are reporting degradation with secure eventfd commit when compared with the numbers with insecure eventfd.

MongoDB:

15% degradation is seen for MongoDB benchmark with 64 threads but almost no degradation is seen for 32 threads, when comparing with and without secure eventfd.

Please find the attached sheet where we have done some runs with Gramine master (which includes secure eventfd commit) and one commit before secure eventfd and you will notice the degradation.
mongodb_perf_degradation_secure_eventfd.xlsx

Memcached

With the higher batch size and 1:9 read write ratio, we see 20% degradation between these commits for latency and 8% for throughput when we compare the secure eventfd commit and 1 commit before that.

For lower data sizes like 1024, 8192, 16384 no degradation was seen and only for data size starting from 32768 bytes we see degradation. Please find the attached sheet for more details.

Memcached_secure_eventfd_analysis.xlsx

OpenVINO

This workload was not known to use eventfd and didn't even have the insecure eventfd flag configured but post secure eventfd commit, we are seeing 6-8% degradation for OpenVino BERT model tests.

openvino_secure_eventfd_analysis.xlsx

Steps to reproduce

git clone https://github.com/mongodb/mongo-perf.git
cd mongo-perf

wget https://repo.mongodb.org/apt/ubuntu/dists/focal/mongodb-org/5.0/multiverse/binary-amd64/mongodb-org-shell_5.0.21_amd64.deb
sudo dpkg -i mongodb-org-shell_5.0.21_amd64.deb
echo "deb http://security.ubuntu.com/ubuntu focal-security main" | sudo tee /etc/apt/sources.list.d/focal-security.list
sudo apt-get update

sudo apt-get install libssl1.1

Native run: mongod --nounixsocket --dbpath /var/run/db
gramine-direct: gramine-direct mongod --nounixsocket --dbpath /var/run/db
gramine-sgx: gramine-sgx mongod --nounixsocket --dbpath /var/run/db

~/jk/examples/mongodb/mongo-perf$ python3 benchrun.py -f testcases/complex_update.js -t 64

Expected results

No response

Actual results

No response

Gramine commit hash

51e99f9

@dimakuv
Copy link
Contributor

dimakuv commented Apr 24, 2024

@jkr0103 Is this something you could look in? I mean, run the perf analysis.

@dimakuv
Copy link
Contributor

dimakuv commented Apr 24, 2024

OpenVINO

@jinengandhi-intel A quick check:

  1. Rerun with loader.log_level = "all" and check the Gramine logs. If you see eventfd2 syscall, then we know that OpenVINO always tries to use eventfd, and my theory about a fallback mechanism is correct (previous Gramine reported "unsupported", current Gramine reports "supported").
  2. Try my new PR [LibOS] Add sys.mock_syscalls = [ ... ] manifest option #1859
    • Add in the OpenVINO manifest this: sys.disallowed_syscalls = [ "eventfd", "eventfd2" ]
    • Rerun -- now Gramine reports "unsupported" on eventfd, so exactly the same as it was before.

@vasanth-intel
Copy link

@dimakuv
Attaching the debug logs for both 1 and 2 cases. The attached zip file Openvino_Debug_Logs.zip contains the debug logs with loader.log_level = "all". Openvino_secure_eventfd_trace_output.txt is the debug log for 1 and Openvino_PR_1859.txt is the debug log for 2. Couldn't see either eventfd or eventdf2 syscalls from both the logs. But, there were many futex, sched_yield and clock_gettime syscalls.

Openvino_Debug_Logs.zip

@dimakuv
Copy link
Contributor

dimakuv commented Apr 25, 2024

@jinengandhi-intel @vasanth-intel Indeed, there is no eventfd syscalls in the OpenVINO workload.

This doesn't make sense -- the commit 51e99f9 didn't change anything else other than eventfd.

Are you absolutely sure that:

  1. You bisected the commit at which the degradation starts to this particular commit?
  2. You didn't modify anything else in your testbed between the "good" run and the "bad" run?

@dimakuv
Copy link
Contributor

dimakuv commented Apr 26, 2024

I cannot reproduce the Memcached results on my Icelake server. What was your exact test?

Below is what I did.

Memcached with eventfd degradation

Eventfd commit

  • gramine-direct:
$ gramine-direct memcached

$ memtier_benchmark --port=11211 --protocol=memcache_binary --hide-histogram --ratio=9:1 --data-size=65536
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets        83975.74          ---          ---         2.31475         2.19100         4.60700         5.72700   5378292.46
Gets         9330.64         9.33      9321.31         0.81662         0.72700         1.75900         6.23900       354.34
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals      93306.38         9.33      9321.31         2.16493         2.19100         4.57500         5.72700   5378646.79
  • gramine-sgx:
$ gramine-sgx memcached

$ memtier_benchmark --port=11211 --protocol=memcache_binary --hide-histogram --ratio=9:1 --data-size=65536
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets        82262.40          ---          ---         2.28824         2.03900         4.89500         6.97500   5268560.17
Gets         9140.27         3.75      9136.52         2.00322         1.88700         4.06300         4.99100       347.11
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals      91402.67         3.75      9136.52         2.25974         2.02300         4.83100         6.87900   5268907.28

Commit right before eventfd commit

  • gramine-direct:
$ numactl --cpunodebind=0 --membind=0 gramine-direct memcached

$ memtier_benchmark --port=11211 --protocol=memcache_binary --hide-histogram --ratio=9:1 --data-size=65536
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets        83086.37          ---          ---         2.33857         2.22300         4.70300         6.07900   5321332.34
Gets         9231.82         9.23      9222.59         0.81624         0.73500         1.67100         5.56700       350.58
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals      92318.19         9.23      9222.59         2.18634         2.20700         4.67100         6.07900   5321682.93
  • gramine-sgx:
$ numactl --cpunodebind=0 --membind=0 gramine-sgx memcached

$ memtier_benchmark --port=11211 --protocol=memcache_binary --hide-histogram --ratio=9:1 --data-size=65536
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets        82597.81          ---          ---         2.33955         2.07900         5.72700         7.99900   5290041.93
Gets         9177.53         2.85      9174.69         2.00897         1.89500         4.12700         5.08700       348.52
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals      91775.35         2.85      9174.69         2.30649         2.06300         5.63100         7.90300   5290390.45

As can be seen, there is basically no overhead.

@jinengandhi-intel
Copy link
Contributor Author

@vasanth-intel can you share the commands/our setup with Dmitrii so that he can repro the numbers?

@vasanth-intel
Copy link

@dimakuv

We run memcached workload on a client-server based setup in 2 different machines, where server and client are connected with 100GB LAN cable.

We have shared the steps/commands and workspace to repro the numbers separately within Microsoft Teams.

Following are the numbers we got from these workspaces.

1 commit before Secure Eventfd commit:

Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec 
----------------------------------------------------------------------------------------------------------------------------
Sets        41632.57          ---          ---         0.35365         0.33500         0.78300         1.26300   2666309.26 
Gets       374692.45      3575.71    371116.75         0.34458         0.31100         0.80700         1.20700     13498.04 
Waits           0.00          ---          ---             ---             ---             ---             ---          --- 
Totals     416325.02      3575.71    371116.75         **0.34549**         0.31900         0.80700         1.21500   2679807.30

Secure Eventfd commit:

Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec 
----------------------------------------------------------------------------------------------------------------------------
Sets        35970.56          ---          ---         0.42464         0.37500         1.07900         1.57500   2303692.74 
Gets       323734.48      3094.70    320639.78         0.39710         0.35100         1.01500         1.58300     11662.31 
Waits           0.00          ---          ---             ---             ---             ---             ---          --- 
Totals     359705.04      3094.70    320639.78         **0.39986**         0.35900         1.02300         1.58300   2315355.05

@dimakuv
Copy link
Contributor

dimakuv commented Apr 29, 2024

For completeness, a significant difference from the default Memcached example is that the Memcached-under-test here is run with -t 16. In other words, the Memcached server runs with 16 threads, in contrast to the default config with 4 threads.

@dimakuv
Copy link
Contributor

dimakuv commented Apr 29, 2024

Update: I am able to reproduce the ~20% slowdown. This happens on -t 16 (Memcached server with 16 threads).

Quick results below.

  • Gramine master (with eventfd)
$ numactl --cpunodebind=0 --membind=0 gramine-sgx memcached -t 16

$ memtier_benchmark --port=11211 --protocol=memcache_binary --hide-histogram --ratio=9:1 --data-size=65536
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Totals     103253.62         0.00     10325.36         2.27464         2.15900         5.75900        26.11100   5952055.93
  • Gramine c1eac09 (without eventfd)
$ numactl --cpunodebind=0 --membind=0 gramine-sgx memcached -t 16

$ memtier_benchmark --port=11211 --protocol=memcache_binary --hide-histogram --ratio=9:1 --data-size=65536
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
Totals      85342.60         0.00      8534.26         2.03052         1.99100         4.67100         6.04700   4919575.00

@dimakuv
Copy link
Contributor

dimakuv commented Apr 29, 2024

Oops, in my previous comment the Gramine master (with secure eventfd) actually works better than Gramine without secure eventfd. So please ignore my "I am able to reproduce". I still can't reproduce the bug.

@dimakuv
Copy link
Contributor

dimakuv commented Apr 29, 2024

@jinengandhi-intel @vasanth-intel How to reproduce the OpenVINO perf degradation?

@jinengandhi-intel
Copy link
Contributor Author

jinengandhi-intel commented Apr 29, 2024

@dimakuv In Latency lower the number is better, so if I see your numbers avg latency with Gramine master is 2.27464 and without eventfd average latency is 2.03052 so you are also seeing degradation. with secure eventfd. Even for p99 latency before eventfd is 4.67100 and with Gramine master its 5.75900 so there is close to 20% degradation. For throughput higher is better but for latency lower is better.

@jinengandhi-intel
Copy link
Contributor Author

@jinengandhi-intel @vasanth-intel How to reproduce the OpenVINO perf degradation?

OpenVINO runs on the same machine as memcached, so you can look at memcached first and then we can setup the workspace for OpenVINO.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants