`Nginx` performance degradation in `Gramine-SGX` attributed to SHA256 hashing #1712

sahason · 2024-01-10T11:14:33Z

Description of the problem

Poor performance of Nginx in Gramine-SGX is seen due to ~60% overhead of mbedtls_sha256_update. Please see the attached perf report.
We used wrk benchmarking tool to compare the performance of Nginx on Native and Gramine-SGX with threads 1 and 64. The wrk command used: wrk -t64 -c300 -d30s http://127.0.0.1:8002/random/10K.1.html. The statistics of Throughput (Requests/sec) shows a poor performance of Nginx in Garmine-SGX.

Threads	Gramine-SGX (Requests/Sec)	Native (Requets/Sec)	Native vs Gramine-SGX
1	44145.58	86392.24	-48.90
64	427658.63	947360.14	-54.86

Few observations:

The performance degradation is more for bigger file size.
Keeping the folder containing 10K.1.html under allowed files in manifest file improves the performance.
Commenting out SHA256 calls in copy_and_verify_trusted_files improves the performance a lot. Please check below table.

Threads	Gramine-SGX (Requests/Sec)	Native (Requests/Sec)	Native vs Gramine-SGX
64	876094.83	943265.29	-7.12

Please suggest how can we improve the throughput.

With 64 threads:

Steps to reproduce

Build and run Nginx server:

cd CI-Examples/nginx
Modify nginx-gramine.conf.template with the below data

worker_processes auto; 

#error_log  logs/error.log;
#error_log  logs/error.log  notice;
#error_log  logs/error.log  info;

#pid        logs/nginx.pid;

events {
    worker_connections  768;
}

http {
    include            mime.types;
    default_type       application/octet-stream;
    keepalive_timeout  13;
    access_log  off;
    client_body_buffer_size 80k;
    client_max_body_size 9m;
    client_header_buffer_size 1k;
    client_body_timeout 10;
    client_header_timeout 10;
    send_timeout 10;
    open_file_cache max=1024 inactive=10s;
    open_file_cache_valid 60s;
    open_file_cache_min_uses 2;
    open_file_cache_errors on;
    sendfile           on;
    #keepalive_timeout  65;

    # a single HTTP/HTTPS server
    server {
        listen 8002;
        listen  8444 ssl;
        server_name 127.0.0.1;

        ssl_certificate            server.crt;
        ssl_certificate_key        server.key;
        ssl_session_cache          shared:SSL:10m;
        ssl_session_timeout        10m;
        ssl_protocols              TLSv1 TLSv1.1 TLSv1.2;
        ssl_ciphers                HIGH:!aNULL:!MD5;
        ssl_prefer_server_ciphers  on;

        location / {
            root   html;
            index  index.html index.htm;
        }

        # redirect server error pages to the static page /50x.html
        error_page   500 502 503 504  /50x.html;
        location = /50x.html {
            root   html;
        }

        access_log off;
    }
}

daemon off;

make SGX=1
gramine-sgx nginx

Run benchmark:
wrk -t64 -c300 -d30s http://127.0.0.1:8002/random/10K.1.html

Expected results

No response

Actual results

No response

Gramine commit hash

1f72aaf

The text was updated successfully, but these errors were encountered:

aneessahib · 2024-01-10T11:37:43Z

@sahason please update the perf table when you bypass hash verification

sahason · 2024-01-10T11:58:30Z

Updated the issue with perf table when hash verification is bypassed by not calling SHA256 calls and not comparing the hash data.

dimakuv · 2024-01-10T16:08:16Z

Thanks for reporting the perf numbers, I didn't expect such a huge overhead of trusted-files hash comparison.

This is because Nginx relies on the Linux kernel's Page Cache feature (see https://en.wikipedia.org/wiki/Page_cache) as an optimization. So once Linux reads the file contents, they stay in main memory (until kicked out due to some cache-eviction policy). This means that on bare Linux, opening the file and reading from it is a very fast operation.

Gramine doesn't implement a Page Cache feature, so it doesn't have the optimization of keeping the trusted file in enclave memory. Please note that for allowed files this is irrelevant as they don't have hashing, and for protected files also irrelevant as protected files already have a caching optimization.

So the solution would be to implement a Page Cache for trusted files. This shouldn't be hard, we have the LRU-cache building blocks (implemented for Protected Files), just need to agree on the policy and on how we expose it to users and how we allow to fine-tune it. Memory is precious resource in SGX enclaves, so we should not hard-code a limit that would set aside too few/too much enclave memory for this Page Cache.

dimakuv · 2024-01-11T08:34:55Z

The Page Cache I proposed above should have a size. I propose to use rlimits for this: #1714 (comment)

For this particular issue, we could have a new non-standard rlimit: loader.rlimit.RLIMIT_TRUSTED_FILES_CACHE or something like this.

mkow · 2024-01-11T14:13:50Z

Why rlimit? What will loader.rlimit.RLIMIT_TRUSTED_FILES_CACHE = "passthrough" do then? Why not a normal manifest option?

dimakuv · 2024-01-11T15:00:59Z

Why rlimit?

Because this would also allow applications (that can be modified to use Gramine-specific features) to adjust this limit at run time. Instead of having to calculate it in advance.
I was also hoping that there will be some similar rlimit already existing in Linux, but nothing like this was found.

What will loader.rlimit.RLIMIT_TRUSTED_FILES_CACHE = "passthrough" do then?

Good question, I don't know :) I guess this will be disallowed, or it will mean "default" (which is a terminology overload, which is bad).

mkow · 2024-01-11T16:46:49Z

Because this would also allow applications (that can be modified to use Gramine-specific features) to adjust this limit at run time. Instead of having to calculate it in advance.

I'm not convinced about taking Linux APIs and adding special cases in them with different meanings inside Gramine than in Linux... And I'm not sure anyone will actually modify the app to dynamically adjust this, people usually just use some existing apps like nginx already mentioned here?

So, I'd rather do a separate manifest option for this or, if you want dynamic control, have a special file in /dev or something like that, same as we do with other Gramine-specific APIs.

dimakuv · 2024-01-11T16:57:13Z

And I'm not sure anyone will actually modify the app to dynamically adjust this, people usually just use some existing apps like nginx already mentioned here?

We see more and more people using Gramine as an "SDK on steroids", not just throwing an existing unmodified app into the enclave.

So, I'd rather do a separate manifest option for this or, if you want dynamic control, have a special file in /dev or something like that, same as we do with other Gramine-specific APIs.

Ok, yes, I don't mind having another user-facing API. /dev/ is totally fine with me.

sahason · 2024-01-23T10:37:12Z

@dimakuv Thanks for the analysis. I have one query. How do we handle this scenerio - with page cache feature for trusted files if the cached part of the trusted file (stored inside enclave) becomes stale when the file in disk modified by malicious host? Currently before computing the hash we read the data so we are always having latest data and able to catch any hash verification fail.

dimakuv · 2024-01-23T11:42:19Z

@sahason There are actually two algorithms at play:

On the open of the file, the whole file is read and its hash is compared against the one in sgx.trusted_files.
As the file is read, it is split into chunks (I think of the size 16KB), and we compute a SHA256-truncated-to-128 for each chunk. This list of hashes-of-chunks is always stored inside the SGX enclave. Afterwards, each newly copied into enclave file chunk is always compared against the corresponding subset from this list of hashes-of-chunks.

So in your scenario, if the partial data is read from the malicious host (the data is maliciously modified), then algorithm 2 kicks in. This is already implemented and is always done.

dimakuv mentioned this issue Jan 11, 2024

[Encrypted FS] Use pre-allocacted free list instead of calloc/free for file nodes #1714

Closed

dimakuv mentioned this issue Jan 11, 2024

Env ulimit/prlimit not inherited by Gramine process #1576

Open

jkr0103 linked a pull request Feb 22, 2024 that will close this issue

Page Cache support for trusted files #1776

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`Nginx` performance degradation in `Gramine-SGX` attributed to SHA256 hashing #1712

`Nginx` performance degradation in `Gramine-SGX` attributed to SHA256 hashing #1712

sahason commented Jan 10, 2024 •

edited

aneessahib commented Jan 10, 2024

sahason commented Jan 10, 2024

dimakuv commented Jan 10, 2024

dimakuv commented Jan 11, 2024

mkow commented Jan 11, 2024

dimakuv commented Jan 11, 2024

mkow commented Jan 11, 2024

dimakuv commented Jan 11, 2024

sahason commented Jan 23, 2024

dimakuv commented Jan 23, 2024

Nginx performance degradation in Gramine-SGX attributed to SHA256 hashing #1712

Nginx performance degradation in Gramine-SGX attributed to SHA256 hashing #1712

Comments

sahason commented Jan 10, 2024 • edited

Description of the problem

Steps to reproduce

Expected results

Actual results

Gramine commit hash

aneessahib commented Jan 10, 2024

sahason commented Jan 10, 2024

dimakuv commented Jan 10, 2024

dimakuv commented Jan 11, 2024

mkow commented Jan 11, 2024

dimakuv commented Jan 11, 2024

mkow commented Jan 11, 2024

dimakuv commented Jan 11, 2024

sahason commented Jan 23, 2024

dimakuv commented Jan 23, 2024

`Nginx` performance degradation in `Gramine-SGX` attributed to SHA256 hashing #1712

`Nginx` performance degradation in `Gramine-SGX` attributed to SHA256 hashing #1712

sahason commented Jan 10, 2024 •

edited