Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PAL/Linux-SGX] Allow to dump current SGX/perf stats on a signal #1711

Open
dimakuv opened this issue Jan 10, 2024 · 5 comments
Open

[PAL/Linux-SGX] Allow to dump current SGX/perf stats on a signal #1711

dimakuv opened this issue Jan 10, 2024 · 5 comments

Comments

@dimakuv
Copy link
Contributor

dimakuv commented Jan 10, 2024

Description of the feature

Currently, Gramine-SGX has two perf analysis tools:

  1. Trivial stats on SGX events (EENTER, EEXIT, etc.)
  2. Advanced stats similar to perf record

Both these tools have a limitation: they start collecting stats when Gramine-SGX starts and end collecting stats when Gramine-SGX terminates.

This limits the ability to analyze performance of long-living applications. For example, if MySQL runs under Gramine-SGX, then we may want to analyze only the stats during "hot runs", when a particular client with a particular workload connects to the MySQL server. But because of the current limitation, we will have a lot of noise because stats also contain the startup events, the termination events, and other non-relevant events (like clients that pre-populate the database).

Proposal 1: dump stats on a signal

We choose a signal that serves as a hint to Gramine to dump the currently collected statistics, e.g. SIGUSR1. For simplicity, we block this signal on all threads of the process bar the main thread (so SIGUSR1 is guaranteed to always lend in Thread 1).

When the signal arrives, we dump SGX stats similar to this:

static long sgx_ocall_exit(void* args) {
struct ocall_exit* ocall_exit_args = args;
if (ocall_exit_args->exitcode != (int)((uint8_t)ocall_exit_args->exitcode)) {
log_debug("Saturation error in exit code %d getting rounded down to %u",
ocall_exit_args->exitcode, (uint8_t)ocall_exit_args->exitcode);
ocall_exit_args->exitcode = 255;
}
/* exit the whole process if exit_group() */
if (ocall_exit_args->is_exitgroup) {
update_and_print_stats(/*process_wide=*/true);

NOTE: We will need to dump stats on all currently executing threads, and this will require finding a way to iterate through all threads and summing up their stats. Should be doable.

The output can be like this:

   static uint32_t g_user_signal_number = 0;
   log_always("----- SGX stats for process %d (on user signal %u) -----\n"
                   "  # of EENTERs:        %lu\n", ... g_user_signal_number++);
   ...

Now the MySQL example can be done like this:

  1. Start MySQL in Gramine
  2. Do/wait for the initialization to finish
  3. Right-before starting the workload client, send SIGUSR1 to Gramine
  4. Gramine dumps the current stats
  5. Right-after ending the workload client, send SIGUSR1 to Gramine
  6. Gramine dumps the current stats
  7. Finish the run

Now we have two sets of stats, collected at steps 4 and 6. We subtract 4-stats from 6-stats (we can subtract only process-wide stats), and we get the statistic on SGX events during the client workload -- exactly what we wanted.

Proposal 2: reset stats on a signal

Same as Proposal 1, but set all stats to zero. This will be easier for end user to read, but Proposal 1 (which requires the "differential" analysis of stats) seems more flexible and easier to implement.

I am in favor of Proposal 1.

What about perf record stats?

Perf record style (advanced) stats are much more complicated, see:

Here the problem is that we create a perf.data file, initialize it with some header, and add events one by one into it. So it's unclear what we can do when a SIGUSR1 signal arrives -- can we seal the current file and start a new file? This adheres to Proposal 2. I don't know how to make it work with Proposal 1...

Someone needs to learn how this can be achieved -- perf record surely allows such things, so it must be accounted for in perf internal formats.

One can start with the simpler SGX stats though, and leave the perf record stats for later implementation.

Why Gramine should implement it?

Useful for perf analysis.

@jkr0103
Copy link
Contributor

jkr0103 commented Jan 10, 2024

But because of the current limitation, we will have a lot of noise because stats also contain the startup events, the termination events, and other non-relevant events

Can we eliminate the startup and end event noise in all cases? this would help in perf analysis of PyTorch like applications which doesn't run foreever, either reset or dump.

@dimakuv
Copy link
Contributor Author

dimakuv commented Jan 10, 2024

Can we eliminate the startup and end event noise in all cases? this would help in perf analysis of PyTorch like applications which doesn't run foreever, either reset or dump.

But how can you do it? You need to know the "start point without the noise" -- how do you automatically determine this start point? I don't think it's possible without hints from the application.

@jkr0103
Copy link
Contributor

jkr0103 commented Jan 17, 2024

yes application need to inform gramine when it want stats/perf records to be collected. Is there a way application running inside Gramine can send some signal to Gramine?

@jkr0103
Copy link
Contributor

jkr0103 commented Jan 17, 2024

One suggestion, we print enclave enter/exits data with Gramine stats but not the count of syscalls which have caused the enclave enter/exits. We can collect the count of each syscall which caused enclave enter/exit and print with the stats.

@dimakuv
Copy link
Contributor Author

dimakuv commented Jan 17, 2024

yes application need to inform gramine when it want stats/perf records to be collected. Is there a way application running inside Gramine can send some signal to Gramine?

The app can write to a new pseudo-file under /dev/. However, I'm a bit wary of adding more Gramine-specific APIs without (1) a good reason, and (2) a good design/naming proposal.

If you mean UNIX signals (like SIGINT), then no, the app cannot send such signals to Gramine.

One suggestion, we print enclave enter/exits data with Gramine stats but not the count of syscalls which have caused the enclave enter/exits. We can collect the count of each syscall which caused enclave enter/exit and print with the stats.

I don't think it's possible. The EENTER/EEXIT statistics is collected at the level of Linux-SGX PAL, but the syscall statistics is collected at the level of LibOS. These are just different layers, and I don't see a simple way to show them together.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants