Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nginx performance degradation in Gramine-SGX attributed to SHA256 hashing #1712

Open
sahason opened this issue Jan 10, 2024 · 10 comments · May be fixed by #1776
Open

Nginx performance degradation in Gramine-SGX attributed to SHA256 hashing #1712

sahason opened this issue Jan 10, 2024 · 10 comments · May be fixed by #1776

Comments

@sahason
Copy link
Contributor

sahason commented Jan 10, 2024

Description of the problem

Poor performance of Nginx in Gramine-SGX is seen due to ~60% overhead of mbedtls_sha256_update. Please see the attached perf report.
We used wrk benchmarking tool to compare the performance of Nginx on Native and Gramine-SGX with threads 1 and 64. The wrk command used: wrk -t64 -c300 -d30s http://127.0.0.1:8002/random/10K.1.html. The statistics of Throughput (Requests/sec) shows a poor performance of Nginx in Garmine-SGX.

Threads Gramine-SGX (Requests/Sec) Native (Requets/Sec) Native vs Gramine-SGX
1 44145.58 86392.24 -48.90
64 427658.63 947360.14 -54.86

Few observations:

  1. The performance degradation is more for bigger file size.
  2. Keeping the folder containing 10K.1.html under allowed files in manifest file improves the performance.
  3. Commenting out SHA256 calls in copy_and_verify_trusted_files improves the performance a lot. Please check below table.
Threads Gramine-SGX (Requests/Sec) Native (Requests/Sec) Native vs Gramine-SGX
64 876094.83 943265.29 -7.12

Please suggest how can we improve the throughput.

With 64 threads:
perf-report

Steps to reproduce

Build and run Nginx server:

  1. cd CI-Examples/nginx
  2. Modify nginx-gramine.conf.template with the below data
worker_processes auto; 

#error_log  logs/error.log;
#error_log  logs/error.log  notice;
#error_log  logs/error.log  info;

#pid        logs/nginx.pid;

events {
    worker_connections  768;
}

http {
    include            mime.types;
    default_type       application/octet-stream;
    keepalive_timeout  13;
    access_log  off;
    client_body_buffer_size 80k;
    client_max_body_size 9m;
    client_header_buffer_size 1k;
    client_body_timeout 10;
    client_header_timeout 10;
    send_timeout 10;
    open_file_cache max=1024 inactive=10s;
    open_file_cache_valid 60s;
    open_file_cache_min_uses 2;
    open_file_cache_errors on;
    sendfile           on;
    #keepalive_timeout  65;

    # a single HTTP/HTTPS server
    server {
        listen 8002;
        listen  8444 ssl;
        server_name 127.0.0.1;

        ssl_certificate            server.crt;
        ssl_certificate_key        server.key;
        ssl_session_cache          shared:SSL:10m;
        ssl_session_timeout        10m;
        ssl_protocols              TLSv1 TLSv1.1 TLSv1.2;
        ssl_ciphers                HIGH:!aNULL:!MD5;
        ssl_prefer_server_ciphers  on;

        location / {
            root   html;
            index  index.html index.htm;
        }

        # redirect server error pages to the static page /50x.html
        error_page   500 502 503 504  /50x.html;
        location = /50x.html {
            root   html;
        }

        access_log off;
    }
}

daemon off;
  1. make SGX=1
  2. gramine-sgx nginx

Run benchmark:
wrk -t64 -c300 -d30s http://127.0.0.1:8002/random/10K.1.html

Expected results

No response

Actual results

No response

Gramine commit hash

1f72aaf

@aneessahib
Copy link
Contributor

@sahason please update the perf table when you bypass hash verification

@sahason
Copy link
Contributor Author

sahason commented Jan 10, 2024

Updated the issue with perf table when hash verification is bypassed by not calling SHA256 calls and not comparing the hash data.

@dimakuv
Copy link
Contributor

dimakuv commented Jan 10, 2024

Thanks for reporting the perf numbers, I didn't expect such a huge overhead of trusted-files hash comparison.

This is because Nginx relies on the Linux kernel's Page Cache feature (see https://en.wikipedia.org/wiki/Page_cache) as an optimization. So once Linux reads the file contents, they stay in main memory (until kicked out due to some cache-eviction policy). This means that on bare Linux, opening the file and reading from it is a very fast operation.

Gramine doesn't implement a Page Cache feature, so it doesn't have the optimization of keeping the trusted file in enclave memory. Please note that for allowed files this is irrelevant as they don't have hashing, and for protected files also irrelevant as protected files already have a caching optimization.

So the solution would be to implement a Page Cache for trusted files. This shouldn't be hard, we have the LRU-cache building blocks (implemented for Protected Files), just need to agree on the policy and on how we expose it to users and how we allow to fine-tune it. Memory is precious resource in SGX enclaves, so we should not hard-code a limit that would set aside too few/too much enclave memory for this Page Cache.

@dimakuv
Copy link
Contributor

dimakuv commented Jan 11, 2024

The Page Cache I proposed above should have a size. I propose to use rlimits for this: #1714 (comment)

For this particular issue, we could have a new non-standard rlimit: loader.rlimit.RLIMIT_TRUSTED_FILES_CACHE or something like this.

@mkow
Copy link
Member

mkow commented Jan 11, 2024

Why rlimit? What will loader.rlimit.RLIMIT_TRUSTED_FILES_CACHE = "passthrough" do then? Why not a normal manifest option?

@dimakuv
Copy link
Contributor

dimakuv commented Jan 11, 2024

Why rlimit?

  1. Because this would also allow applications (that can be modified to use Gramine-specific features) to adjust this limit at run time. Instead of having to calculate it in advance.
  2. I was also hoping that there will be some similar rlimit already existing in Linux, but nothing like this was found.

What will loader.rlimit.RLIMIT_TRUSTED_FILES_CACHE = "passthrough" do then?

Good question, I don't know :) I guess this will be disallowed, or it will mean "default" (which is a terminology overload, which is bad).

@mkow
Copy link
Member

mkow commented Jan 11, 2024

Because this would also allow applications (that can be modified to use Gramine-specific features) to adjust this limit at run time. Instead of having to calculate it in advance.

I'm not convinced about taking Linux APIs and adding special cases in them with different meanings inside Gramine than in Linux... And I'm not sure anyone will actually modify the app to dynamically adjust this, people usually just use some existing apps like nginx already mentioned here?

So, I'd rather do a separate manifest option for this or, if you want dynamic control, have a special file in /dev or something like that, same as we do with other Gramine-specific APIs.

@dimakuv
Copy link
Contributor

dimakuv commented Jan 11, 2024

And I'm not sure anyone will actually modify the app to dynamically adjust this, people usually just use some existing apps like nginx already mentioned here?

We see more and more people using Gramine as an "SDK on steroids", not just throwing an existing unmodified app into the enclave.

So, I'd rather do a separate manifest option for this or, if you want dynamic control, have a special file in /dev or something like that, same as we do with other Gramine-specific APIs.

Ok, yes, I don't mind having another user-facing API. /dev/ is totally fine with me.

@sahason
Copy link
Contributor Author

sahason commented Jan 23, 2024

@dimakuv Thanks for the analysis. I have one query. How do we handle this scenerio - with page cache feature for trusted files if the cached part of the trusted file (stored inside enclave) becomes stale when the file in disk modified by malicious host? Currently before computing the hash we read the data so we are always having latest data and able to catch any hash verification fail.

@dimakuv
Copy link
Contributor

dimakuv commented Jan 23, 2024

@sahason There are actually two algorithms at play:

  1. On the open of the file, the whole file is read and its hash is compared against the one in sgx.trusted_files.
  2. As the file is read, it is split into chunks (I think of the size 16KB), and we compute a SHA256-truncated-to-128 for each chunk. This list of hashes-of-chunks is always stored inside the SGX enclave. Afterwards, each newly copied into enclave file chunk is always compared against the corresponding subset from this list of hashes-of-chunks.

So in your scenario, if the partial data is read from the malicious host (the data is maliciously modified), then algorithm 2 kicks in. This is already implemented and is always done.

@jkr0103 jkr0103 linked a pull request Feb 22, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants