[LibOS,PAL/Linux-SGX] Add EDMM lazy allocation support #1513

kailun-qin · 2023-08-28T01:50:18Z

Description of the changes

The initial implementation of EDMM support pre-accepted every mmapped region of enclave pages. However in some cases, applications (e.g., Java runtime) may rely on lazy allocation of pages, where the VMAs are reserved but not actually committed to physical memory. In particular, mmap(..., MAP_NORESERVE) requests are used in such cases -- to mmap a huge chunk of memory (possibly never used in the future) at once and then commit pages on demand on page fault events. Due to the lack of lazy allocation concept in Gramine, this incurred performance and physical memory overhead on allocating and accepting the whole range of enclave pages.

The commit adds the lazy allocation support by deferring page accepts to page-fault events when Gramine notices MAP_NORESERVE flag in the mmap request. It also introduces a bit vector indicating the commit status of each enclave page and an upcall from the SGX EDMM backend into the LibOS to ask for additional info on a particular page.

Fixes #1099.
Fixes #1754.

How to test this PR?

Updated regression tests + CI.

This change is

dimakuv

Reviewed 1 of 18 files at r1.
Reviewable status: 1 of 18 files reviewed, 4 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @kailun-qin)

a discussion (no related file):
Haven't done a deep review of this PR yet, but several high-level notes:

Please add a paragraph about this optimization here: https://gramine.readthedocs.io/en/stable/devel/features.html#memory-management
Can you run some benchmarks and report results? E.g., some Java app, and/or a microbenchmark that uses MAP_NORESERVE.
Please rename PAL_PROT_NORESERVE to something more easily recognizable. I suggest PAL_PROT_LAZYALLOC.

libos/test/regression/munmap.c line 45 at r1 (raw file):

    test_mmap_munmap(PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_PRIVATE);

    test_mmap_munmap(PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_PRIVATE | MAP_NORESERVE);

This test looks quite useless... What is the side effect of MAP_NORESERVE? I mean, functionality-wise we won't notice the effect of your PR in this test. Similarly, because it's just 3 pages, we won't notice the effect of your PR performance-wise -- it will be too fast to recognize the difference.

I understand that since this is a perf optimization, it can't be easily tested. But what about modifying large_mmap.c -- just add MAP_NORESERVE to that huge mmap syscall, and at least in manual testing, Gramine developers will notice a significant difference.

You also won't need to modify the generic manifest.template, instead you can modify only large_mmap.manifest.template.

pal/src/host/linux-sgx/pal_memory.c line 23 at r1 (raw file):

#include "spinlock.h"

enclave_page_tracker_t* g_enclave_page_tracker = NULL;

I would prefer to have a top-level comment about this tracker, and why it is needed. I think you can steal bits of info from the commit message and also from the original GitHub issue.

In particular, it is important to explain that this tracker is used purely for EDMM lazy allocation, and is required in the memory-free flows (otherwise we don't know whether the enclave page was already EACCEPTed). It is also important to explain that other info, like the enclave-page permissions (required in the #PF flow) are taken from the LibOS VMA subsystem, for which we use a special upcall.

pal/src/host/linux-sgx/pal_memory.c line 29 at r1 (raw file):

#define ENCLAVE_PAGE_LOCK()   spinlock_lock(&g_enclave_page_lock)
#define ENCLAVE_PAGE_UNLOCK() spinlock_unlock(&g_enclave_page_lock)
#define ENCLAVE_PAGE_LOCKED() spinlock_is_locked(&g_enclave_page_lock)

What's the point in these three macros? Why not just use the spinlock_xxx() functions?

kailun-qin

Reviewable status: 0 of 23 files reviewed, 4 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @dimakuv)

a discussion (no related file):
Thanks for the feedbacks!

Please add a paragraph about this optimization here: https://gramine.readthedocs.io/en/stable/devel/features.html#memory-management

I suppose this should be done (in batch for all new features) before each release? But added anyway.

Can you run some benchmarks and report results? E.g., some Java app, and/or a microbenchmark that uses MAP_NORESERVE.

Sure, will do.

Please rename PAL_PROT_NORESERVE to something more easily recognizable. I suggest PAL_PROT_LAZYALLOC.

Done.

pal/src/host/linux-sgx/pal_memory.c line 23 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

I would prefer to have a top-level comment about this tracker, and why it is needed. I think you can steal bits of info from the commit message and also from the original GitHub issue.

In particular, it is important to explain that this tracker is used purely for EDMM lazy allocation, and is required in the memory-free flows (otherwise we don't know whether the enclave page was already EACCEPTed). It is also important to explain that other info, like the enclave-page permissions (required in the #PF flow) are taken from the LibOS VMA subsystem, for which we use a special upcall.

Done.

pal/src/host/linux-sgx/pal_memory.c line 29 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

What's the point in these three macros? Why not just use the spinlock_xxx() functions?

Done.

libos/test/regression/munmap.c line 45 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

This test looks quite useless... What is the side effect of MAP_NORESERVE? I mean, functionality-wise we won't notice the effect of your PR in this test. Similarly, because it's just 3 pages, we won't notice the effect of your PR performance-wise -- it will be too fast to recognize the difference.

I understand that since this is a perf optimization, it can't be easily tested. But what about modifying large_mmap.c -- just add MAP_NORESERVE to that huge mmap syscall, and at least in manual testing, Gramine developers will notice a significant difference.

You also won't need to modify the generic manifest.template, instead you can modify only large_mmap.manifest.template.

Done.

kailun-qin

Reviewable status: 0 of 23 files reviewed, 4 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @dimakuv)

Documentation/devel/features.md line 1717 at r2 (raw file):

are lazily accepted on page-fault events. Note that the current implementation has one caveat that
when a process forks, the child process inherits the `MAP_NORESERVE` memory mappings of the parent
process, but the content isn't copied (via checkpoint-and-restore in Gramine).

This was not covered by the initial proposal. Let's dissuss it here. Thanks!

Code quote:

when a process forks, the child process inherits the `MAP_NORESERVE` memory mappings of the parent
process, but the content isn't copied (via checkpoint-and-restore in Gramine).

llly

Reviewed 6 of 18 files at r1, 8 of 12 files at r2, all commit messages.
Reviewable status: 14 of 23 files reviewed, 7 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @dimakuv and @kailun-qin)

libos/src/bookkeep/libos_vma.c line 1151 at r2 (raw file):

    int ret = lookup_vma((void*)addr, &vma_info);
    if (ret < 0) {
        *out_prot_flags = 0;

Don't need to set out_prot_flags when vma not found.

pal/src/host/linux-sgx/pal_memory.c line 316 at r2 (raw file):

        if (consecutive_count > 0) {
            /* invoke the `callback` on the consecutive pages */
            ret = callback(consecutive_start_addr, consecutive_count, arg);

Is callback really needed?
I can see direct functions can be called according to walk_set. When false, sgx_edmm_remove_pages() unset_enclave_addr_range(); when true, sgx_edmm_add_pages() set_enclave_addr_range().

dimakuv

Reviewable status: 14 of 23 files reviewed, 10 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @kailun-qin)

libos/src/libos_checkpoint.c line 221 at r2 (raw file):

        pal_prot_flags_t mem_prot = entry->prot;

        if (entry->dummy || mem_prot & PAL_PROT_LAZYALLOC) {

Actually, why do you skip MAP_NORESERVE regions on fork?

If you would remove these tricks, you would end up with much worse performance (because to copy such region on EDMM, there will be a ton of #PFs). But at least the correctness will be preserved, right? Also, what if we are on gramine-direct or SGX without EDMM? Looks like you're breaking other backends, just to improve performance for SGX EDMM?

libos/src/libos_checkpoint.c line 330 at r2 (raw file):

            }

            if (prot & PAL_PROT_LAZYALLOC) {

Why would you even get into this IF condition? Looks like this is deadcode, because send_memory_on_stream() skips such memory regions.

If it's indeed deadcode, I suggest to replace this thing with simply assert(!(prot & PAL_PROT_LAZYALLOC)).

libos/src/bookkeep/libos_vma.c line 1458 at r2 (raw file):

                DO_CP_SIZE(memory, vma->addr, vma->length, &mem);
                /* Propogate `MAP_NORESERVE` flag if it's set. */
                mem->prot = LINUX_PROT_TO_PAL(vma->prot, vma->flags & MAP_NORESERVE);

I don't understand all these tricks with checkpoint/restore of MAP_NORESERVE. Can you explain all these flows?

dimakuv

Reviewed 4 of 18 files at r1, 3 of 12 files at r2.
Reviewable status: 20 of 23 files reviewed, 17 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @kailun-qin)

-- commits line 19 at r2:
We will need to add a note on the current handling of MAP_NORESERVE VMAs during fork.

Documentation/devel/features.md line 1717 at r2 (raw file):

Previously, kailun-qin (Kailun Qin) wrote…

This was not covered by the initial proposal. Let's dissuss it here. Thanks!

Left my comments in different places. General feedback: I think we need to fix it.

libos/src/bookkeep/libos_vma.c line 1458 at r2 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

I don't understand all these tricks with checkpoint/restore of MAP_NORESERVE. Can you explain all these flows?

Ok, I thought about this problem of having only a subset of pages committed for MAP_NORESERVE VMAs.

I think we should not cut corners (or unnecessarily worsen performance) and we should send only committed memory regions. For this, we'll need to query the PAL to learn which pages from the VMA were committed, and create as many struct libos_mem_entry objects as there are committed pages. My proposed approach would look smth like this:

if (vma->flags & MAP_NORESERVE) {
    /* lazy allocation of pages, send only committed pages */
    char* bitvector = malloc(vma->length / 4096 / 8);
    PalGetCommittedPages(vma->addr, vma->length, bitvector);
    for each bit in bitvector {
        if (bit is set) {
            struct libos_mem_entry* mem;
            DO_CP_SIZE(memory, vma->addr + page_addr(bit), 4096, &mem);
            mem->prot = LINUX_PROT_TO_PAL(vma->prot, /*map_flags=*/0);
        }
    }
} else {
    /* no lazy allocation of pages, send the whole memory region of VMA */
    struct libos_mem_entry* mem;
    DO_CP_SIZE(memory, vma->addr, vma->length, &mem);
    mem->prot = LINUX_PROT_TO_PAL(vma->prot, /*map_flags=*/0);
}

Several notes here:

We will need to introduce a new PalGetCommittedPages() API
1. Returns all-ones for non-SGX non-EDMM PALs
2. Returns a populated bitvector slice for SGX EDMM PAL
Each committed page will be sent separately -- this may be sub-par in terms of performance, but good enough for the initial version. If we ever find it to be a bottleneck, we'll optimize to send memory regions, not single pages.

pal/src/host/linux-sgx/pal_main.c line 631 at r2 (raw file):

    init_slab_mgr();

    /* initialize the enclave page tracker as soon as we Initialized the slab memory allocator */

Initialized -> use small letter, not capital letter

pal/src/host/linux-sgx/pal_memory.c line 24 at r2 (raw file):

/*
 * Global enclave page tracker purely used for EDMM lazy allocation (based on a bitmap vector).

purely used -> used

pal/src/host/linux-sgx/pal_memory.c line 27 at r2 (raw file):

 *
 * This is required for the memory-free flows, where we have to know whether the to-be-freed pages
 * were already EACCEPTed (so that we need to remove them) or not not (so that we can simply skip

not not -> not

pal/src/host/linux-sgx/pal_memory.c line 33 at r2 (raw file):

 *
 * Besides, for the additional info required in the #PF flow (e.g., enclave-page permissions), it's
 * taken from the LibOS VMA subsystem, for which we use a special upcall.

I would reformulate this last paragraph to:

We don't use this page tracker to store info required in the #PF flows (e.g. enclave-page permissions).
Instead, we get this info from the LibOS VMA subsystem, for which we use a special upcall, see g_mem_bkeep_get_vma_info_upcall.

pal/src/host/linux-sgx/pal_memory.c line 329 at r2 (raw file):

                   int (*callback)(uintptr_t, size_t, void*), void* arg) {
    return walk_pages(start_addr, count, true, callback, arg);
}

This function is overcomplicated. (A similar concern that @llly raised.)

Why not replace it with:

int remove_commited_pages(uintptr_t start_addr, size_t count) {
    walk_pages(start_addr, count, /*walk_set=*/true, sgx_edmm_remove_pages_callback, NULL);
}

int add_uncommitted_pages(uintptr_t start_addr, size_t count,  uint64_t prot_flags) {
    walk_pages(start_addr, count, /*walk_set=*/false, sgx_edmm_add_pages_callback, &prot_flags);
}

Otherwise you have too many levels of indirection, and it takes a while to untangle this sequence of funcs...

pal/src/host/linux-sgx/pal_memory.c line 335 at r2 (raw file):

                     int (*callback)(uintptr_t, size_t, void*), void* arg) {
    return walk_pages(start_addr, count, false, callback, arg);
}

Please move all these funcs to sgx_edmm.c file, as these funcs are only used for EDMM flows, and are not part of the generic PAL functions.

pal/src/host/linux-sgx/pal_sgx.h line 3 at r2 (raw file):

/* SPDX-License-Identifier: LGPL-3.0-or-later */
/* Copyright (C) 2022 Intel Corporation
 *                    Borys Popławski <borysp@invisiblethingslab.com>

You can add your copyright, btw, since the changes you do are big.

pal/src/host/linux-sgx/pal_sgx.h line 8 at r2 (raw file):

#include "pal.h"
#include "sgx_arch.h"
#include "spinlock.h"

This include won't be needed if you remove the lock field

pal/src/host/linux-sgx/pal_sgx.h line 16 at r2 (raw file):

}

typedef struct {

Could you add a top-level comment with a bit of arithmetic -- how much memory will this data bitvector occupy if the enclave size is 1GB, 128GB, 1TB?

If my calculations are correct, then for a 1TB enclave, this bitvector will occupy 32MB, or 0.003%. Which is reasonable.

pal/src/host/linux-sgx/pal_sgx.h line 19 at r2 (raw file):

    unsigned char* data;    /* bit array to store page allocation status */
    uintptr_t base_address; /* base address of the enclave memory region */
    size_t size;            /* number of pages in the memory region */

Comments for base_address and size don't seem to be true? It's really about the whole enclave memory address space and the whole enclave size, not about some "memory region".

pal/src/host/linux-sgx/pal_sgx.h line 21 at r2 (raw file):

    size_t size;            /* number of pages in the memory region */
    size_t page_size;       /* size of each page in bytes */
    spinlock_t lock;        /* lock for the page tracker */

Unused field?

kailun-qin

Reviewable status: 10 of 29 files reviewed, 17 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "squash! " and "fixup! " found in commit messages' one-liners (waiting on @dimakuv and @llly)

-- commits line 19 at r2:

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

We will need to add a note on the current handling of MAP_NORESERVE VMAs during fork.

Done.

libos/src/bookkeep/libos_vma.c line 1151 at r2 (raw file):

Previously, llly (Li Xun) wrote…

Don't need to set out_prot_flags when vma not found.

Done.

libos/src/bookkeep/libos_vma.c line 1458 at r2 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Ok, I thought about this problem of having only a subset of pages committed for MAP_NORESERVE VMAs.

I think we should not cut corners (or unnecessarily worsen performance) and we should send only committed memory regions. For this, we'll need to query the PAL to learn which pages from the VMA were committed, and create as many struct libos_mem_entry objects as there are committed pages. My proposed approach would look smth like this:
if (vma->flags & MAP_NORESERVE) {
    /* lazy allocation of pages, send only committed pages */
    char* bitvector = malloc(vma->length / 4096 / 8);
    PalGetCommittedPages(vma->addr, vma->length, bitvector);
    for each bit in bitvector {
        if (bit is set) {
            struct libos_mem_entry* mem;
            DO_CP_SIZE(memory, vma->addr + page_addr(bit), 4096, &mem);
            mem->prot = LINUX_PROT_TO_PAL(vma->prot, /*map_flags=*/0);
        }
    }
} else {
    /* no lazy allocation of pages, send the whole memory region of VMA */
    struct libos_mem_entry* mem;
    DO_CP_SIZE(memory, vma->addr, vma->length, &mem);
    mem->prot = LINUX_PROT_TO_PAL(vma->prot, /*map_flags=*/0);
}
Several notes here:

We will need to introduce a new PalGetCommittedPages() API

Returns all-ones for non-SGX non-EDMM PALs

Returns a populated bitvector slice for SGX EDMM PAL

Each committed page will be sent separately -- this may be sub-par in terms of performance, but good enough for the initial version. If we ever find it to be a bottleneck, we'll optimize to send memory regions, not single pages.

Done.

pal/src/host/linux-sgx/pal_main.c line 631 at r2 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Initialized -> use small letter, not capital letter

Done.

pal/src/host/linux-sgx/pal_memory.c line 24 at r2 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

purely used -> used

Done.

pal/src/host/linux-sgx/pal_memory.c line 27 at r2 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

not not -> not

Done.

pal/src/host/linux-sgx/pal_memory.c line 33 at r2 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

I would reformulate this last paragraph to:

We don't use this page tracker to store info required in the #PF flows (e.g. enclave-page permissions).
Instead, we get this info from the LibOS VMA subsystem, for which we use a special upcall, see g_mem_bkeep_get_vma_info_upcall.

Done.

pal/src/host/linux-sgx/pal_memory.c line 316 at r2 (raw file):

Previously, llly (Li Xun) wrote…

Is callback really needed?
I can see direct functions can be called according to walk_set. When false, sgx_edmm_remove_pages() unset_enclave_addr_range(); when true, sgx_edmm_add_pages() set_enclave_addr_range().

Done.

pal/src/host/linux-sgx/pal_memory.c line 329 at r2 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

This function is overcomplicated. (A similar concern that @llly raised.)

Why not replace it with:
int remove_commited_pages(uintptr_t start_addr, size_t count) {
    walk_pages(start_addr, count, /*walk_set=*/true, sgx_edmm_remove_pages_callback, NULL);
}

int add_uncommitted_pages(uintptr_t start_addr, size_t count,  uint64_t prot_flags) {
    walk_pages(start_addr, count, /*walk_set=*/false, sgx_edmm_add_pages_callback, &prot_flags);
}
Otherwise you have too many levels of indirection, and it takes a while to untangle this sequence of funcs...

Done.

pal/src/host/linux-sgx/pal_memory.c line 335 at r2 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Please move all these funcs to sgx_edmm.c file, as these funcs are only used for EDMM flows, and are not part of the generic PAL functions.

Done.

pal/src/host/linux-sgx/pal_sgx.h line 3 at r2 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

You can add your copyright, btw, since the changes you do are big.

Done.

pal/src/host/linux-sgx/pal_sgx.h line 8 at r2 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

This include won't be needed if you remove the lock field

Done.

pal/src/host/linux-sgx/pal_sgx.h line 16 at r2 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Could you add a top-level comment with a bit of arithmetic -- how much memory will this data bitvector occupy if the enclave size is 1GB, 128GB, 1TB?

If my calculations are correct, then for a 1TB enclave, this bitvector will occupy 32MB, or 0.003%. Which is reasonable.

Done.

pal/src/host/linux-sgx/pal_sgx.h line 19 at r2 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Comments for base_address and size don't seem to be true? It's really about the whole enclave memory address space and the whole enclave size, not about some "memory region".

Done.

pal/src/host/linux-sgx/pal_sgx.h line 21 at r2 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Unused field?

Yeah, left over. Removed.

libos/src/libos_checkpoint.c line 221 at r2 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Actually, why do you skip MAP_NORESERVE regions on fork?

If you would remove these tricks, you would end up with much worse performance (because to copy such region on EDMM, there will be a ton of #PFs). But at least the correctness will be preserved, right? Also, what if we are on gramine-direct or SGX without EDMM? Looks like you're breaking other backends, just to improve performance for SGX EDMM?

Not relevant any more.

libos/src/libos_checkpoint.c line 330 at r2 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Why would you even get into this IF condition? Looks like this is deadcode, because send_memory_on_stream() skips such memory regions.

If it's indeed deadcode, I suggest to replace this thing with simply assert(!(prot & PAL_PROT_LAZYALLOC)).

Not relevant any more.

dimakuv

Reviewed 2 of 18 files at r1, 2 of 12 files at r2, 11 of 18 files at r3, 1 of 5 files at r5, all commit messages.
Reviewable status: 21 of 29 files reviewed, 6 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "squash! " found in commit messages' one-liners (waiting on @kailun-qin and @llly)

libos/src/bookkeep/libos_vma.c line 1463 at r5 (raw file):

                    int ret = PalGetCommittedPages((uintptr_t)vma->addr, vma->length, bitvector,
                                                   &bv_size, &start_idx);

Meh, this function signature is complicated. I suggest to return a bitvector that reflects pages starting from vma->addr, instead of the current bitvector that contains pages with an offset.

Let's start with something simple and easy to understand (even if not super-performant). If really needed, we will modify this in follow up PRs.

libos/src/bookkeep/libos_vma.c line 1470 at r5 (raw file):

                    size_t byte_idx = 0;
                    while (byte_idx < bv_size) {

Why not a for loop?

libos/src/bookkeep/libos_vma.c line 1473 at r5 (raw file):

                        unsigned char byte = bitvector[byte_idx];
                        size_t bit_idx = 0;
                        while (bit_idx < 8) {

Why not a for loop?

pal/include/pal/pal.h line 1014 at r5 (raw file):

 * \param[in,out] bitvector    On success, will contain the commit status of the pages in the memory
 *                             area.
 * \param[in,out] bv_size      On success, will contain the actual size of the bitvector.

Please rename to bitvector_size, for readability.

pal/include/pal/pal.h line 1015 at r5 (raw file):

 *                             area.
 * \param[in,out] bv_size      On success, will contain the actual size of the bitvector.
 * \param[out]    out_bv_index On success, will contain the starting index of the corresponding

What's the point in this starting index? This looks like an optimization, but why overcomplicate this PAL function? You could calculate the same in the LibOS.

I feel like this is a premature optimization, and I would remove this arg.

I think the out logic of bv_size already covers a common case of MAP_NORESRERVE area having zero committed pages.

pal/include/pal/pal.h line 1018 at r5 (raw file):

 *                             address in the bitvector.
 *
 * Returns all-ones for non-SGX non-EDMM PALs; returns a populated bitvector slice for SGX EDMM PAL.

Please also add that this func is currently used for checkpoint-and-restore logic (to learn which memory areas need to be sent to the child process, as a perf optimization specific for SGX EDMM).

kailun-qin

Reviewable status: 15 of 29 files reviewed, 6 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @dimakuv and @llly)

libos/src/bookkeep/libos_vma.c line 1463 at r5 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Meh, this function signature is complicated. I suggest to return a bitvector that reflects pages starting from vma->addr, instead of the current bitvector that contains pages with an offset.

Let's start with something simple and easy to understand (even if not super-performant). If really needed, we will modify this in follow up PRs.

Done.

libos/src/bookkeep/libos_vma.c line 1470 at r5 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Why not a for loop?

Done.

libos/src/bookkeep/libos_vma.c line 1473 at r5 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Why not a for loop?

Done.

pal/include/pal/pal.h line 1014 at r5 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Please rename to bitvector_size, for readability.

Done.

pal/include/pal/pal.h line 1015 at r5 (raw file):

What's the point in this starting index? This looks like an optimization

Yes, it's for perf considerations. Updated.

pal/include/pal/pal.h line 1018 at r5 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Please also add that this func is currently used for checkpoint-and-restore logic (to learn which memory areas need to be sent to the child process, as a perf optimization specific for SGX EDMM).

Done.

dimakuv

Reviewed 1 of 12 files at r2, 2 of 18 files at r3, 9 of 9 files at r6, all commit messages.
Reviewable status: 27 of 29 files reviewed, 21 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @kailun-qin and @llly)

libos/src/bookkeep/libos_vma.c line 1456 at r6 (raw file):

                if (vma->flags & MAP_NORESERVE) {
                    /* lazy allocation of pages, send only committed pages */
                    size_t bitvector_size = ((vma->length + PAGE_SIZE - 1) / PAGE_SIZE + 7) / 8;

What's this calculation?

I think this is more readable:

size_t bitvector_size = ALIGN_UP(ALIGN_UP(vma->length, PAGE_SIZE) / PAGE_SIZE, 8) / 8;

pal/include/pal/pal.h line 1012 at r6 (raw file):

 * \param         addr           Starting address of the memory area.
 * \param         size           Size of the memory area.
 * \param[in,out] bitvector      On success, will contain the commit status of the pages in the memory

Please rewrap to 100 chars per line

pal/src/host/linux/pal_misc.c line 89 at r6 (raw file):

    assert(bitvector_size);

    size_t num_pages = (size + g_page_size - 1) / g_page_size;

Please use ALIGN_UP()

pal/src/host/linux/pal_misc.c line 91 at r6 (raw file):

    size_t num_pages = (size + g_page_size - 1) / g_page_size;
    size_t end_page = num_pages - 1;
    size_t num_bytes = (num_pages + 7) / 8;

Please use ALIGN_UP()

pal/src/host/linux/pal_misc.c line 102 at r6 (raw file):

        /* clear the leading bits in the last byte of the slice */
        bitvector[num_bytes - 1] &= (0xFF >> (7 - (end_page % 8)));
    }

I can't parse this logic. Aren't you supposed to just set the last N pages in the last byte?

Isn't something like this not enough:

leftover_pages = num_pages % 8;
if (leftover_pages)
    bitvector[num_bytes - 1] = (1 << leftover_pages) - 1;

pal/src/host/linux-sgx/enclave_edmm.c line 24 at r6 (raw file):

 * Global enclave page tracker used for EDMM lazy allocation (based on a bitmap vector).
 *
 * This is required for the memory-free flows, where we have to know whether the to-be-freed pages

memory-free flows -> "memory free()" flows

(memory-free sounds weird, like "flows that do not use memory")

pal/src/host/linux-sgx/enclave_edmm.c line 36 at r6 (raw file):

enclave_page_tracker_t* g_enclave_page_tracker = NULL;

static spinlock_t g_enclave_page_lock = INIT_SPINLOCK_UNLOCKED;

Can we rename to g_enclave_page_tracker_lock

pal/src/host/linux-sgx/enclave_edmm.c line 197 at r6 (raw file):

    enclave_page_tracker_t* tracker = malloc(sizeof(enclave_page_tracker_t));
    if (!tracker)
        INIT_FAIL("cannot initialize enclave page tracker");

initialize -> allocate

pal/src/host/linux-sgx/enclave_edmm.c line 198 at r6 (raw file):

    if (!tracker)
        INIT_FAIL("cannot initialize enclave page tracker");
    tracker->data = (unsigned char*)calloc((num_pages + 7) / 8, sizeof(unsigned char));

Please use ALIGN_UP

pal/src/host/linux-sgx/enclave_edmm.c line 200 at r6 (raw file):

    tracker->data = (unsigned char*)calloc((num_pages + 7) / 8, sizeof(unsigned char));
    if (!tracker->data)
        INIT_FAIL("cannot initialize enclave page tracker data");

initialize -> allocate

pal/src/host/linux-sgx/enclave_edmm.c line 214 at r6 (raw file):

    assert(!g_enclave_page_tracker);

    size_t num_pages = (memory_size + page_size - 1) / page_size;

Isn't memory_size guaranteed to be page-aligned? Please add an assert on this and remove this complicated arithmetic.

pal/src/host/linux-sgx/enclave_edmm.c line 273 at r6 (raw file):

/* get a copy of bitvector slice according to the memory address and the size in bytes; also give
 * the starting index of the corresponding address in the slice */

also give the starting index ... -- stale comment?

pal/src/host/linux-sgx/enclave_edmm.c line 277 at r6 (raw file):

                        size_t* bitvector_size) {
    size_t start_page = address_to_index(addr);
    size_t num_pages = (size + g_enclave_page_tracker->page_size - 1) /

Please use ALIGN_UP() and other macros (if needed) in this code

pal/src/host/linux-sgx/enclave_edmm.c line 296 at r6 (raw file):

            unsigned char val_next = g_enclave_page_tracker->data[start_byte + i + 1];
            bitvector[i] = ((val_next & (0xFF >> (8 - start_offset))) << (8 - start_offset)) |
                           ((val_cur & (0xFF << start_offset)) >> start_offset);

What is this madness. Please simplify!

Why not have three distinct parts in your code:

if (start_offset != 0)
    ... copy the highest bits in the starting byte ...

for (loop)
    ... copy each next byte entirely ...

if (end_offset != 0)
    ... copy the lowest bits in the ending byte ...

pal/src/host/linux-sgx/enclave_edmm.c line 306 at r6 (raw file):

        /* clear the leading bits in the last byte of the slice */
        bitvector[num_bytes - 1] &= (0xFF >> (7 - end_offset));
    }

Please see my comment for Linux PAL

pal/src/host/linux-sgx/enclave_edmm.c line 313 at r6 (raw file):

/* iterate over the given range of enclave pages in the tracker and perform the specified `callback`
 * on the consecutive set/unset pages; return error when `callback` failed */
static int walk_pages(uintptr_t start_addr, size_t count, bool walk_set,

walk_set -> walk_set_pages

pal/src/host/linux-sgx/pal_exception.c line 289 at r6 (raw file):

    if (ADDR_IN_PAL(uc->rip) && event_num != PAL_EVENT_QUIT &&
                                event_num != PAL_EVENT_INTERRUPTED &&
                                event_num != PAL_EVENT_MEMFAULT) {

In which case this can happen? What's the scenario when we are doing something in PAL (so that ADDR_IN_PAL(uc->rip) is true) and we hit this to-be-lazily-allocated page?

pal/src/host/linux-sgx/pal_exception.c line 346 at r6 (raw file):

            ret = _PalVirtualMemoryAlloc((void*)addr, g_page_size, prot_flags);
            if (ret < 0) {
                log_error("failed to allocate page at 0x%lx: %s", addr, pal_strerror(ret));

failed to allocate page -> failed to lazily allocate page

pal/src/host/linux-sgx/pal_exception.c line 350 at r6 (raw file):

            }
            goto out;
        }

Now if we are inside PAL and we hit a memfault on a NOT lazily-allocated page, we'll pass through this new logic and will end up calling the LibOS upcall -- which is wrong because LibOS shouldn't try to deal with inside-PAL faults.

See my other comment -- when is it really possible that while we're in PAL, a #PF happens?

If it's possible, then we'll need to add here a case of checking if (ADDR_IN_PAL(uc->rip)) and terminating on this case.

pal/src/host/linux-sgx/pal_memory.c line 29 at r6 (raw file):

    if (g_pal_linuxsgx_state.edmm_enabled) {
        /* defer page accepts to page-fault events when `MAP_NORESERVE` is set */

MAP_NORESERVE -> PAL_PROT_LAZYALLOC

pal/src/host/linux-sgx/pal_misc.c line 824 at r6 (raw file):

        return get_bitvector_slice(addr, size, bitvector, bitvector_size);
    } else {
        size_t num_pages = (size + g_page_size - 1) / g_page_size;

ditto (all comments from Linux PAL)

dimakuv

Reviewable status: 27 of 29 files reviewed, 22 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @kailun-qin and @llly)

a discussion (no related file):

Previously, kailun-qin (Kailun Qin) wrote…

Thanks for the feedbacks!

Please add a paragraph about this optimization here: https://gramine.readthedocs.io/en/stable/devel/features.html#memory-management

I suppose this should be done (in batch for all new features) before each release? But added anyway.

Can you run some benchmarks and report results? E.g., some Java app, and/or a microbenchmark that uses MAP_NORESERVE.

Sure, will do.

Please rename PAL_PROT_NORESERVE to something more easily recognizable. I suggest PAL_PROT_LAZYALLOC.

Done.

Please run some benchmarks and report results.

I'm afraid the complexity of this change sky-rocketed, and I will be unhappy if this all is for 1% performance improvement of 1% of use cases...

kailun-qin

Reviewable status: 27 of 29 files reviewed, 22 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @dimakuv and @llly)

pal/src/host/linux-sgx/enclave_edmm.c line 296 at r6 (raw file):

What is this madness. Please simplify!

This was a bit like implementing a variant of memcpy w/ offsets by bits support.

Why not have three distinct parts in your code:
...

This was my original impl. But we wanted to get rid of the start_offset (-- "I suggest to return a bitvector that reflects pages starting from vma->addr, instead of the current bitvector that contains pages with an offset"). How can your proposal achieve that?

dimakuv

Reviewable status: 27 of 29 files reviewed, 22 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @kailun-qin and @llly)

pal/src/host/linux-sgx/enclave_edmm.c line 296 at r6 (raw file):

Previously, kailun-qin (Kailun Qin) wrote…

What is this madness. Please simplify!

This was a bit like implementing a variant of memcpy w/ offsets by bits support.

Why not have three distinct parts in your code:
...

This was my original impl. But we wanted to get rid of the start_offset (-- "I suggest to return a bitvector that reflects pages starting from vma->addr, instead of the current bitvector that contains pages with an offset"). How can your proposal achieve that?

Oh, I think I know what you mean. What you're saying is that LibOS (the caller) can issue PalGetCommittedPages(/*addr=*/4096, /*size=*/1TB, &bitvector).

In this case, bitvector[0][bit 0] must contain g_enclave_page_tracker->data[0][bit 1], and so on -- all the bits will be with the offset, and we can't just copy-paste byte-by-byte.

@kailun-qin Given this complexity, could you split this code into helper functions, so that it's clear how things work? The current code has too much bit-twiddling magic...

kailun-qin

Reviewable status: 20 of 29 files reviewed, 22 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @dimakuv and @llly)

a discussion (no related file):

Please run some benchmarks and report results.

I tested our openjdk example.

Without this PR (release build, commit b984089, EDMM enabled):

$ time gramine-sgx java -Xmx8G MultiThreadMain
...
real    33m27.868s
user    9m30.793s
sys     25m20.908s

With this PR:

$ time gramine-sgx java -Xmx8G MultiThreadMain
...
real    0m34.768s
user    0m18.902s
sys     0m21.610s

libos/src/bookkeep/libos_vma.c line 1456 at r6 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

What's this calculation?

I think this is more readable:
size_t bitvector_size = ALIGN_UP(ALIGN_UP(vma->length, PAGE_SIZE) / PAGE_SIZE, 8) / 8;

Done.

pal/include/pal/pal.h line 1012 at r6 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Please rewrap to 100 chars per line

Done.

pal/src/host/linux/pal_misc.c line 89 at r6 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Please use ALIGN_UP()

Done.

pal/src/host/linux/pal_misc.c line 91 at r6 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Please use ALIGN_UP()

Done.

pal/src/host/linux/pal_misc.c line 102 at r6 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

I can't parse this logic. Aren't you supposed to just set the last N pages in the last byte?

Isn't something like this not enough:
leftover_pages = num_pages % 8;
if (leftover_pages)
    bitvector[num_bytes - 1] = (1 << leftover_pages) - 1;

Done

pal/src/host/linux-sgx/enclave_edmm.c line 24 at r6 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

memory-free flows -> "memory free()" flows

(memory-free sounds weird, like "flows that do not use memory")

Done.

pal/src/host/linux-sgx/enclave_edmm.c line 36 at r6 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Can we rename to g_enclave_page_tracker_lock

Done.

pal/src/host/linux-sgx/enclave_edmm.c line 197 at r6 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

initialize -> allocate

Done.

pal/src/host/linux-sgx/enclave_edmm.c line 198 at r6 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Please use ALIGN_UP

Done.

pal/src/host/linux-sgx/enclave_edmm.c line 200 at r6 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

initialize -> allocate

Done.

pal/src/host/linux-sgx/enclave_edmm.c line 214 at r6 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Isn't memory_size guaranteed to be page-aligned? Please add an assert on this and remove this complicated arithmetic.

Done.

pal/src/host/linux-sgx/enclave_edmm.c line 273 at r6 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

also give the starting index ... -- stale comment?

Done.

pal/src/host/linux-sgx/enclave_edmm.c line 277 at r6 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Please use ALIGN_UP() and other macros (if needed) in this code

Done.

pal/src/host/linux-sgx/enclave_edmm.c line 296 at r6 (raw file):

In this case, bitvector[0][bit 0] must contain g_enclave_page_tracker->data[0][bit 1], and so on -- all the bits will be with the offset, and we can't just copy-paste byte-by-byte.

Yes, exactly. Splitted it to a helper function.

pal/src/host/linux-sgx/enclave_edmm.c line 306 at r6 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Please see my comment for Linux PAL

Done.

pal/src/host/linux-sgx/enclave_edmm.c line 313 at r6 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

walk_set -> walk_set_pages

Done.

pal/src/host/linux-sgx/pal_exception.c line 289 at r6 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

In which case this can happen? What's the scenario when we are doing something in PAL (so that ADDR_IN_PAL(uc->rip) is true) and we hit this to-be-lazily-allocated page?

I think this was a leftover, as we switched to the new approach for MAP_NORESERVE checkpointing, it should not be needed anymore. Reverted.

pal/src/host/linux-sgx/pal_exception.c line 346 at r6 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

failed to allocate page -> failed to lazily allocate page

Done.

pal/src/host/linux-sgx/pal_exception.c line 350 at r6 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Now if we are inside PAL and we hit a memfault on a NOT lazily-allocated page, we'll pass through this new logic and will end up calling the LibOS upcall -- which is wrong because LibOS shouldn't try to deal with inside-PAL faults.

See my other comment -- when is it really possible that while we're in PAL, a #PF happens?

If it's possible, then we'll need to add here a case of checking if (ADDR_IN_PAL(uc->rip)) and terminating on this case.

Not relevant anymore.

pal/src/host/linux-sgx/pal_memory.c line 29 at r6 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

MAP_NORESERVE -> PAL_PROT_LAZYALLOC

Done.

pal/src/host/linux-sgx/pal_misc.c line 824 at r6 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

ditto (all comments from Linux PAL)

Done.

llly

Reviewable status: 20 of 29 files reviewed, 25 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @dimakuv and @kailun-qin)

pal/src/host/linux-sgx/enclave_edmm.c line 144 at r7 (raw file):

        return ret;

    set_enclave_addr_range((uintptr_t)addr, count);

casting (uintptr_t) not needed.

pal/src/host/linux-sgx/enclave_edmm.c line 154 at r7 (raw file):

        return ret;

    unset_enclave_addr_range((uintptr_t)addr, count);

ditto.

pal/src/host/linux-sgx/pal_files.c line 279 at r7 (raw file):

        /* Enclave pages will be written to below, so we must add W permission. */
        uint64_t prot_flags = PAL_TO_SGX_PROT(prot | PAL_PROT_WRITE);
        int ret = add_uncommitted_pages((uintptr_t)addr, size / PAGE_SIZE, prot_flags);

int is not needed.

dimakuv

Reviewed 3 of 18 files at r1, 1 of 4 files at r4, 1 of 5 files at r5, 7 of 7 files at r7, all commit messages.
Reviewable status: all files reviewed, 8 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @kailun-qin)

a discussion (no related file):

Previously, kailun-qin (Kailun Qin) wrote…

Please run some benchmarks and report results.

I tested our openjdk example.

Without this PR (release build, commit b984089, EDMM enabled):
$ time gramine-sgx java -Xmx8G MultiThreadMain
...
real    33m27.868s
user    9m30.793s
sys     25m20.908s
With this PR:
$ time gramine-sgx java -Xmx8G MultiThreadMain
...
real    0m34.768s
user    0m18.902s
sys     0m21.610s

Wow, this is a great boost. The PR clearly has value. Thanks for these numbers.

libos/test/regression/large_mmap.c line 2 at r7 (raw file):

#define _GNU_SOURCE
#include <err.h>

Sorry, could you instead introduce a new test file?

I initially thought it would be a good idea to put MAP_NORESERVE in the already-existing file, but now I see that you modified this test significantly, and there is no benefit in cramming two independent tests in a single file...

Let's just revert the changes here, and instead will create a separate test. Otherwise it's very hard to review.

pal/src/pal_memory.c line 63 at r7 (raw file):

static int (*g_mem_bkeep_alloc_upcall)(size_t size, uintptr_t* out_addr) = NULL;
static int (*g_mem_bkeep_free_upcall)(uintptr_t addr, size_t size) = NULL;
int (*g_mem_bkeep_get_vma_info_upcall)(uintptr_t addr, pal_prot_flags_t* out_prot_flags) = NULL;

Why is it not static like the others?

pal/src/host/linux-sgx/enclave_edmm.c line 283 at r7 (raw file):

        unsigned char val_cur = src_bitvector[0];
        for (size_t i = 0; i < num_bytes; i++) {
            unsigned char val_next = src_bitvector[i + 1];

You have a buffer overread here.

You need to have i < num_bytes - 1 in the for loop condition then, and take special treatment of the very last byte in the source bitvector.

pal/src/host/linux-sgx/enclave_edmm.c line 285 at r7 (raw file):

            unsigned char val_next = src_bitvector[i + 1];
            dest_bitvector[i] = ((val_next & (0xFF >> (8 - offset))) << (8 - offset)) |
                                ((val_cur & (0xFF << offset)) >> offset);

Ok, I finally understand what happens here :)

pal/src/host/linux-sgx/enclave_edmm.c line 313 at r7 (raw file):

    size_t leftover_pages = num_pages % 8;
    if (leftover_pages)
        bitvector[num_bytes - 1] &= (1 << leftover_pages) - 1;

This logic is wrong? Here you unconditionally set bits to 1 for all leftover pages. But you actually should get the information (whether to set or unset each bit) from the g_enclave_page_tracker->data[] array.

Or am I misunderstanding something?

kailun-qin

Reviewable status: all files reviewed, 8 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @dimakuv)

pal/src/pal_memory.c line 63 at r7 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Why is it not static like the others?

Because we need to invoke this call in the exception handling code:

gramine/pal/src/host/linux-sgx/pal_exception.c

Line 338 in 2c13710

ret = g_mem_bkeep_get_vma_info_upcall(addr, &prot_flags);

?

pal/src/host/linux-sgx/enclave_edmm.c line 283 at r7 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

You have a buffer overread here.

You need to have i < num_bytes - 1 in the for loop condition then, and take special treatment of the very last byte in the source bitvector.

hmm... But there is no lengh of buffer specified (so no sense of overreading). Actually, when offset != 0, the length of src_bitvector would actually be larger thandest_bitvector. As num_bytes is the number of bytes to copy from src_bitvector so we will need to copy (partial) bits in src_bitvector[num_bytes] into dest[num_bytes - 1] (which is exactly the "special treatment of the very last byte in the source bitvector").

pal/src/host/linux-sgx/enclave_edmm.c line 313 at r7 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

This logic is wrong? Here you unconditionally set bits to 1 for all leftover pages. But you actually should get the information (whether to set or unset each bit) from the g_enclave_page_tracker->data[] array.

Or am I misunderstanding something?

But this is &= not =, i.e., in the last byte of the dest/copied bitvector, we only keep the bits for the leftover pages?

kailun-qin

Reviewable status: 23 of 34 files reviewed, 8 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @dimakuv and @llly)

pal/src/host/linux-sgx/enclave_edmm.c line 144 at r7 (raw file):

Previously, llly (Li Xun) wrote…

casting (uintptr_t) not needed.

Done.

pal/src/host/linux-sgx/enclave_edmm.c line 154 at r7 (raw file):

Previously, llly (Li Xun) wrote…

ditto.

Done.

pal/src/host/linux-sgx/pal_files.c line 279 at r7 (raw file):

Previously, llly (Li Xun) wrote…

int is not needed.

Done.

libos/test/regression/large_mmap.c line 2 at r7 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Sorry, could you instead introduce a new test file?

I initially thought it would be a good idea to put MAP_NORESERVE in the already-existing file, but now I see that you modified this test significantly, and there is no benefit in cramming two independent tests in a single file...

Let's just revert the changes here, and instead will create a separate test. Otherwise it's very hard to review.

Done.

llly

Reviewed 2 of 12 files at r2, 6 of 18 files at r3, 4 of 9 files at r6, 5 of 7 files at r7, 11 of 11 files at r8, all commit messages.
Reviewable status: all files reviewed, 5 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @dimakuv and @kailun-qin)

dimakuv

Reviewed 11 of 11 files at r8, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @kailun-qin)

libos/test/regression/mmap_map_noreserve.c line 6 at r8 (raw file):

 */

/* Test for creating and accessing anonymous mappings with `MAP_NORESERVE`. */

Can you add an explanation why this test is required? We should say that this test works on both EDMM and non-EDMM platforms, but if EDMM (and EXINFO) is enabled, then the mmap doesn't really commit the enclave pages, but instead they are lazily committed on the first access. This test also stresses the lazy-allocation logic on fork (again, only the actually-accessed enclave pages will be copied to the child enclave).

So on EDMM-enabled platforms, this test is supposed to be significantly faster than on non-EDMM-enabled platforms. But functionality-wise it will be the same.

Also will be ideal if you'll specify approximate timings on e.g. an ICX machine.

libos/test/regression/mmap_map_noreserve.c line 21 at r8 (raw file):

#define TEST_LENGTH  0x10000f000
#define TEST_LENGTH2 0x800f000

Why such weird lengths?

pal/src/pal_memory.c line 63 at r7 (raw file):

Previously, kailun-qin (Kailun Qin) wrote…

Because we need to invoke this call in the exception handling code:

gramine/pal/src/host/linux-sgx/pal_exception.c

Line 338 in 2c13710

ret = g_mem_bkeep_get_vma_info_upcall(addr, &prot_flags);

?

Ok

pal/src/host/linux-sgx/enclave_edmm.c line 283 at r7 (raw file):

Previously, kailun-qin (Kailun Qin) wrote…

hmm... But there is no lengh of buffer specified (so no sense of overreading). Actually, when offset != 0, the length of src_bitvector would actually be larger thandest_bitvector. As num_bytes is the number of bytes to copy from src_bitvector so we will need to copy (partial) bits in src_bitvector[num_bytes] into dest[num_bytes - 1] (which is exactly the "special treatment of the very last byte in the source bitvector").

Ok, got it. Quite convoluted, but I have no ideas how to make it more visible.

pal/src/host/linux-sgx/enclave_edmm.c line 313 at r7 (raw file):

Previously, kailun-qin (Kailun Qin) wrote…

But this is &= not =, i.e., in the last byte of the dest/copied bitvector, we only keep the bits for the leftover pages?

Yes, I missed the &= assignment. Understood now.

dimakuv

Reviewable status: all files reviewed, 3 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @kailun-qin)

a discussion (no related file):
Jenkins-SGX-Sanitizers pipeline failed, please check what's wrong and fix.

kailun-qin

Reviewable status: 32 of 34 files reviewed, 3 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @dimakuv and @llly)

a discussion (no related file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Jenkins-SGX-Sanitizers pipeline failed, please check what's wrong and fix.

I reduced the enclave size (as we're testing forking large enclaves now, which we commented out in the large_mmap test). Let's give it another try.

libos/test/regression/mmap_map_noreserve.c line 6 at r8 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Can you add an explanation why this test is required? We should say that this test works on both EDMM and non-EDMM platforms, but if EDMM (and EXINFO) is enabled, then the mmap doesn't really commit the enclave pages, but instead they are lazily committed on the first access. This test also stresses the lazy-allocation logic on fork (again, only the actually-accessed enclave pages will be copied to the child enclave).

So on EDMM-enabled platforms, this test is supposed to be significantly faster than on non-EDMM-enabled platforms. But functionality-wise it will be the same.

Also will be ideal if you'll specify approximate timings on e.g. an ICX machine.

Done, also added some approximate timings on an ICX machine.

libos/test/regression/mmap_map_noreserve.c line 21 at r8 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Why such weird lengths?

It was a legacy from the large_mmap test. Updated now.

pal/src/host/linux-sgx/enclave_edmm.c line 283 at r7 (raw file):

Quite convoluted

Yeah, I also dislike it.

dimakuv

Reviewed 2 of 2 files at r9, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @kailun-qin)

a discussion (no related file):

Previously, kailun-qin (Kailun Qin) wrote…

I reduced the enclave size (as we're testing forking large enclaves now, which we commented out in the large_mmap test). Let's give it another try.

Why do you think this may be the problem?

kailun-qin

Reviewable status: all files reviewed, 1 unresolved discussion, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @dimakuv)

a discussion (no related file):

Why do you think this may be the problem?

Well, from the error log of the Jenkins-SGX-Sanitizers pipeline:

...
[2023-09-12T15:20:34.288Z] =================================== FAILURES ===================================
[2023-09-12T15:20:34.288Z] __________________ TC_30_Syscall.test_05B_mmap_map_noreserve ___________________
[2023-09-12T15:20:34.288Z] test_libos.py:856: in test_05B_mmap_map_noreserve
...
[2023-09-12T15:20:34.288Z] E   subprocess.CalledProcessError: Command '['/home/jenkins/workspace/graphene-sgx-sanitizers/usr/lib/x86_64-linux-gnu/gramine/sgx/loader', '/home/jenkins/workspace/graphene-sgx-sanitizers/usr/lib/x86_64-linux-gnu/gramine/sgx/libpal.so', 'init', 'mmap_map_noreserve']' died with <Signals.SIGKILL: 9>.
[2023-09-12T15:20:34.288Z] ----------------------------- Captured stderr call -----------------------------
[2023-09-12T15:20:34.288Z] [0.006] Gramine is starting. Parsing TOML manifest file, this may take some time...
...
[2023-09-12T15:20:34.288Z] [57.402] error: Failed to initialize child process: Broken pipe (PAL_ERROR_CONNFAILED_PIPE)

it seemed to me that the failure was caused by the OOM killer. (But w/o access to the machine, I can not confirm it.)

Now that the Jenkins-SGX-Sanitizers pipeline passes, and I also could not reproduce it w/ the same config (using SGX sanitizer + 8GB enclave config) on my local setup, let me know if you still want a double check on that CI machine.

Signed-off-by: Kailun Qin <kailun.qin@intel.com>

kailun-qin

Reviewable status: 40 of 41 files reviewed, 3 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @dimakuv and @mkow)

pal/src/host/linux-sgx/pal_exception.c line 465 at r37 (raw file):

Previously, kailun-qin (Kailun Qin) wrote…

Yes I agree. I was thinking that the behavior can be arch-specific -- since we're in SGX EDMM PAL which should allow things like XOM (execute-only-memory), where users only set PROT_EXEC but read on it would get segfaults. But I'm also fine w/ following Linux. Updated as suggested.

Ok, after quick talk w/ @dimakuv, I keep it as is and add a comment. See if it makes sense. Thanks!

dimakuv

Reviewed 1 of 1 files at r41, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions, not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @mkow)

pal/src/host/linux-sgx/pal_exception.c line 465 at r37 (raw file):

Previously, kailun-qin (Kailun Qin) wrote…

Ok, after quick talk w/ @dimakuv, I keep it as is and add a comment. See if it makes sense. Thanks!

Yep, I agree that in the SGX world, we better keep the SGX semantics.

mwkmwkmwk

Reviewed 1 of 1 files at r39, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions, not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @mkow)

mkow

Reviewed 1 of 3 files at r38, 1 of 1 files at r39, 1 of 1 files at r41.
Reviewable status: 40 of 41 files reviewed, 2 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @dimakuv and @kailun-qin)

pal/src/host/linux-sgx/pal_exception.c line 465 at r37 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Yep, I agree that in the SGX world, we better keep the SGX semantics.

Actually, what is inside ctx.err in such a case? Now you only check for the lack of flags, not for presence of some. And the code assumes it's not 0, because there's such an assert right before the if.

Btw. could you add a XOM case to the tests? (whether reading triggers an exception and the signal handler is called). Would need to be gated behind some check for PKEYs or something like that, I'm not sure how Linux communicates whether this feature is present or not.

pal/src/host/linux-sgx/pal_exception.c line 470 at r41 (raw file):

                 * eXecute-Only-Memory (XOM) (specified with `PROT_EXEC` alone). Note that on Linux,
                 * `PROT_READ` is not required to be set when `PROT_WRITE` or `PROT_EXEC` are set.
                 * But since we're in SGX EDMM PAL, XOM should be allowed -- reading it would cause

But? This sentence is in line with the previous one? Also, "allowed" is a wrong word here IMO, here we aren't allowing or disallowing it, only propagating the exception or not.

Code quote:

But

Signed-off-by: Kailun Qin <kailun.qin@intel.com>

dimakuv

Reviewed 1 of 1 files at r44, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions, not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @kailun-qin and @mkow)

pal/src/host/linux-sgx/pal_exception.c line 465 at r37 (raw file):

Actually, what is inside ctx.err in such a case?

I don't understand the confusion. ctx.err has the same info as classic #PF error code, see https://wiki.osdev.org/Paging#Handling.

And the code assumes it's not 0, because there's such an assert right before the if.

SGX is a ring-3 technology, so hardware exceptions reported inside an SGX enclave always have at least bit 2 (Userspace/Kernelspace) set to 1. Thus ctx.err can never be 0 if the HW exception was raised by the SGX enclave.

Btw. could you add a XOM case to the tests?

We could try, but if it's too cumbersome to implement, I would just ignore this.

pal/src/host/linux-sgx/pal_exception.c line 470 at r44 (raw file):

             * perspective), the host kernel not only resolves the enclave page (brings it back into
             * EPC) but also delivers it to Gramine, ending up in this code path. */
            goto out;

Intel validation team noticed this intermittent bug on client machines (only). @kailun-qin tried to fix it yesterday, and I debugged it today.

I hope the message explains the gist, but I'll leave a couple comments.

The bug happened in a nested-exception way: while the first-#PF-exception code in here called commit_lazy_alloc_pages(), it went to the EACCEPT instruction, and this instruction is special in that it raises a #PF on a non-present page (even though the host kernel will resolve it by swapping the enclave page in and updating the PTEs and the EPCM entries).

See https://www.felixcloutier.com/x86/eaccept, the first THEN #PF(DS:RCX); FI; case.

To recall the PF handling, consult this: https://wiki.osdev.org/Paging#Handling

This was a data race between several threads (they put pressure on the EPC on a client machine, forcing the ksgxd daemon to start swapping some enclave pages out, leading to the bug). So it wasn't possible to debug it via GDB, as everything got slower and didn't overlap in time. Thus I had to reconstruct the context + stack trace manually, here's a dump of the code I used:

        pal_prot_flags_t prot_flags;

        if ((addr & 0xFFF) == 0xFFF) {
                log_always("--- addr = %p", (void*)addr);
                log_always("--- trusted_exit_info = %u", trusted_exit_info_);
                log_always("--- rip  = %p", (void*)uc->rip);
                log_always("--- rax  = %p", (void*)uc->rax);
                log_always("--- rbx  = %p", (void*)uc->rbx);
                log_always("--- rcx  = %p", (void*)uc->rcx);
                log_always("--- rsp  = %p", (void*)uc->rsp);
                log_always("--- rbp  = %p", (void*)uc->rbp);
                log_always("--- ret1 = 0x%lx", *((uint64_t*)uc->rbp + 1));
                uc->rbp = *((uint64_t*)uc->rbp);
                log_always("--- ret2 = 0x%lx", *((uint64_t*)uc->rbp + 1));
                uc->rbp = *((uint64_t*)uc->rbp);
                log_always("--- ret3 = 0x%lx", *((uint64_t*)uc->rbp + 1));
                uc->rbp = *((uint64_t*)uc->rbp);
                log_always("--- ret4 = 0x%lx", *((uint64_t*)uc->rbp + 1));
                uc->rbp = *((uint64_t*)uc->rbp);
                log_always("--- ret5 = 0x%lx", *((uint64_t*)uc->rbp + 1));
                uc->rbp = *((uint64_t*)uc->rbp);
                log_always("--- ret6 = 0x%lx", *((uint64_t*)uc->rbp + 1));
                uc->rbp = *((uint64_t*)uc->rbp);
                log_always("--- ret7 = 0x%lx", *((uint64_t*)uc->rbp + 1));
                uc->rbp = *((uint64_t*)uc->rbp);
                log_always("--- ret8 = 0x%lx", *((uint64_t*)uc->rbp + 1));
                uc->rbp = *((uint64_t*)uc->rbp);
                log_always("--- ret9 = 0x%lx", *((uint64_t*)uc->rbp + 1));
                uc->rbp = *((uint64_t*)uc->rbp);
                log_always("--- ret10 = 0x%lx", *((uint64_t*)uc->rbp + 1));
                log_always("--- err  = %lu", (uint64_t)ctx.err);
                log_always("--- cr2  = %lu", (uint64_t)ctx.cr2);
                log_always("--- event_num  = %lu", (uint64_t)event_num);

                BUG();
        }

        if (g_mem_bkeep_get_vma_info_upcall(addr, &prot_flags) == 0) {
            prot_flags &= ~PAL_PROT_LAZYALLOC;

Note that the bug always reports 0xFFF-ending enclave page address (this seems to be some weirdness of EACCEPT behavior), that's why we have such a condition to catch it.

When printing it, it corresponded to the following stack trace:

enclu
sgx_eaccept
sgx_edmm_add_pages_callback
walk_pages
commit_lazy_alloc_pages
_PalExceptionHandler

Reverts the previous fixup commit (incorrect attempt at fixing the bug by Kailun). And adds the correct fix to the intermittent bug observed on client platforms. Signed-off-by: Dmitrii Kuvaiskii <dmitrii.kuvaiskii@intel.com>

dimakuv

Reviewed 1 of 1 files at r45, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions, not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @kailun-qin and @mkow)

pal/src/host/linux-sgx/pal_exception.c line 470 at r44 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Intel validation team noticed this intermittent bug on client machines (only). @kailun-qin tried to fix it yesterday, and I debugged it today.

I hope the message explains the gist, but I'll leave a couple comments.

The bug happened in a nested-exception way: while the first-#PF-exception code in here called commit_lazy_alloc_pages(), it went to the EACCEPT instruction, and this instruction is special in that it raises a #PF on a non-present page (even though the host kernel will resolve it by swapping the enclave page in and updating the PTEs and the EPCM entries).

See https://www.felixcloutier.com/x86/eaccept, the first THEN #PF(DS:RCX); FI; case.

To recall the PF handling, consult this: https://wiki.osdev.org/Paging#Handling

This was a data race between several threads (they put pressure on the EPC on a client machine, forcing the ksgxd daemon to start swapping some enclave pages out, leading to the bug). So it wasn't possible to debug it via GDB, as everything got slower and didn't overlap in time. Thus I had to reconstruct the context + stack trace manually, here's a dump of the code I used:
        pal_prot_flags_t prot_flags;

        if ((addr & 0xFFF) == 0xFFF) {
                log_always("--- addr = %p", (void*)addr);
                log_always("--- trusted_exit_info = %u", trusted_exit_info_);
                log_always("--- rip  = %p", (void*)uc->rip);
                log_always("--- rax  = %p", (void*)uc->rax);
                log_always("--- rbx  = %p", (void*)uc->rbx);
                log_always("--- rcx  = %p", (void*)uc->rcx);
                log_always("--- rsp  = %p", (void*)uc->rsp);
                log_always("--- rbp  = %p", (void*)uc->rbp);
                log_always("--- ret1 = 0x%lx", *((uint64_t*)uc->rbp + 1));
                uc->rbp = *((uint64_t*)uc->rbp);
                log_always("--- ret2 = 0x%lx", *((uint64_t*)uc->rbp + 1));
                uc->rbp = *((uint64_t*)uc->rbp);
                log_always("--- ret3 = 0x%lx", *((uint64_t*)uc->rbp + 1));
                uc->rbp = *((uint64_t*)uc->rbp);
                log_always("--- ret4 = 0x%lx", *((uint64_t*)uc->rbp + 1));
                uc->rbp = *((uint64_t*)uc->rbp);
                log_always("--- ret5 = 0x%lx", *((uint64_t*)uc->rbp + 1));
                uc->rbp = *((uint64_t*)uc->rbp);
                log_always("--- ret6 = 0x%lx", *((uint64_t*)uc->rbp + 1));
                uc->rbp = *((uint64_t*)uc->rbp);
                log_always("--- ret7 = 0x%lx", *((uint64_t*)uc->rbp + 1));
                uc->rbp = *((uint64_t*)uc->rbp);
                log_always("--- ret8 = 0x%lx", *((uint64_t*)uc->rbp + 1));
                uc->rbp = *((uint64_t*)uc->rbp);
                log_always("--- ret9 = 0x%lx", *((uint64_t*)uc->rbp + 1));
                uc->rbp = *((uint64_t*)uc->rbp);
                log_always("--- ret10 = 0x%lx", *((uint64_t*)uc->rbp + 1));
                log_always("--- err  = %lu", (uint64_t)ctx.err);
                log_always("--- cr2  = %lu", (uint64_t)ctx.cr2);
                log_always("--- event_num  = %lu", (uint64_t)event_num);

                BUG();
        }

        if (g_mem_bkeep_get_vma_info_upcall(addr, &prot_flags) == 0) {
            prot_flags &= ~PAL_PROT_LAZYALLOC;
Note that the bug always reports 0xFFF-ending enclave page address (this seems to be some weirdness of EACCEPT behavior), that's why we have such a condition to catch it.

When printing it, it corresponded to the following stack trace:
enclu
sgx_eaccept
sgx_edmm_add_pages_callback
walk_pages
commit_lazy_alloc_pages
_PalExceptionHandler

Btw, another function that is special in this way (raises #PF that is resolved by host kernel but is still delivered to the Gramine SGX enclave) is EACCEPTCOPY. But we don't use that one (yet), so I didn't add it. I can though, if reviewers think we should safe us future debugs when we start using EACCEPTCOPY.

pal/src/host/linux-sgx/pal_exception.c line 241 at r45 (raw file):

    /* instruction must be ENCLU and leaf must be EACCEPT */
    uint8_t* instr = (uint8_t*)uc->rip;
    if (instr[0] == 0x0f && instr[1] == 0x01 && instr[2] == 0xd7 && uc->rax == EACCEPT)

See https://www.felixcloutier.com/x86/enclu for the opcode.

Signed-off-by: Kailun Qin <kailun.qin@intel.com>

dimakuv

Reviewed 3 of 3 files at r46, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions, not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @kailun-qin and @mkow)

Signed-off-by: Kailun Qin <kailun.qin@intel.com>

dimakuv

Reviewed 3 of 3 files at r47, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions, not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @kailun-qin and @mkow)

llly

Reviewed 5 of 14 files at r24, 3 of 4 files at r30, 3 of 3 files at r31, 1 of 1 files at r32.
Reviewable status: all files reviewed, 3 unresolved discussions, not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @kailun-qin and @mkow)

libos/src/bookkeep/libos_vma.c line 1165 at r47 (raw file):

static int pal_mem_bkeep_get_vma_info(uintptr_t addr, pal_prot_flags_t* out_prot_flags) {
    struct libos_vma_info vma_info;
    int ret = lookup_vma((void*)addr, &vma_info);

           if (vma_info.file) {
                put_handle(vma_info.file);
            }

is needed because get_handle is called in dump_vma.

dimakuv

Reviewable status: all files reviewed, 3 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @kailun-qin, @llly, and @mkow)

libos/src/bookkeep/libos_vma.c line 1165 at r47 (raw file):

Previously, llly (Li Xun) wrote…

           if (vma_info.file) {
                put_handle(vma_info.file);
            }

is needed because get_handle is called in dump_vma.

+1. See:

gramine/libos/src/bookkeep/libos_vma.c

Lines 1166 to 1167 in a9f7d1c

    
           if (vma_info->file) { 
        
               get_handle(vma_info->file);

Signed-off-by: Kailun Qin <kailun.qin@intel.com>

kailun-qin

Reviewable status: 43 of 44 files reviewed, 3 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @dimakuv, @llly, @mkow, and @mwkmwkmwk)

libos/src/bookkeep/libos_vma.c line 1165 at r47 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

+1. See:

gramine/libos/src/bookkeep/libos_vma.c

Lines 1166 to 1167 in a9f7d1c

if (vma_info->file) {

get_handle(vma_info->file);

Good catch! Done.

dimakuv

Reviewed 1 of 1 files at r48, all commit messages.
Reviewable status: all files reviewed, 3 unresolved discussions, not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @kailun-qin, @llly, and @mkow)

llly

Reviewed 1 of 6 files at r25, 2 of 18 files at r33, 1 of 12 files at r34, 4 of 8 files at r36, 19 of 24 files at r37, 1 of 3 files at r38, 1 of 1 files at r45, 3 of 3 files at r46, 3 of 3 files at r47, 1 of 1 files at r48, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions, not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @kailun-qin and @mkow)

dimakuv

Reviewable status: all files reviewed, 3 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @kailun-qin and @mkow)

libos/src/bookkeep/libos_vma.c line 1353 at r48 (raw file):

    uintptr_t end = MIN(ctx->end, vma->end);
    if (PalFreeThenLazyReallocCommittedPages((void*)start, end - start) < 0)
        return false;

Why don't we set ctx->error here?

Signed-off-by: Kailun Qin <kailun.qin@intel.com>

kailun-qin

Reviewable status: 43 of 44 files reviewed, 3 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @dimakuv, @llly, and @mkow)

libos/src/bookkeep/libos_vma.c line 1353 at r48 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Why don't we set ctx->error here?

Done.

dimakuv

Reviewed 1 of 1 files at r49, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions, not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @kailun-qin and @mkow)

dimakuv requested changes Aug 28, 2023

View reviewed changes

kailun-qin commented Aug 28, 2023

View reviewed changes

llly reviewed Aug 29, 2023

View reviewed changes

dimakuv requested changes Aug 30, 2023

View reviewed changes

dimakuv requested changes Aug 31, 2023

View reviewed changes

dimakuv mentioned this pull request Aug 31, 2023

[Pal/Linux-SGX] EDMM hybrid allocation #1262

Open

kailun-qin commented Sep 1, 2023

View reviewed changes

kailun-qin force-pushed the kailun-qin/edmm-lazy-alloc branch 5 times, most recently from a2f3548 to b0fc282 Compare September 2, 2023 14:07

dimakuv requested changes Sep 3, 2023

View reviewed changes

kailun-qin commented Sep 4, 2023

View reviewed changes

dimakuv requested changes Sep 11, 2023

View reviewed changes

kailun-qin commented Sep 11, 2023

View reviewed changes

dimakuv requested changes Sep 11, 2023

View reviewed changes

kailun-qin commented Sep 12, 2023

View reviewed changes

llly reviewed Sep 12, 2023

View reviewed changes

dimakuv requested changes Sep 12, 2023

View reviewed changes

kailun-qin commented Sep 12, 2023

View reviewed changes

llly reviewed Sep 13, 2023

View reviewed changes

dimakuv requested changes Sep 13, 2023

View reviewed changes

kailun-qin commented Sep 13, 2023

View reviewed changes

dimakuv requested changes Sep 13, 2023

View reviewed changes

kailun-qin commented Sep 13, 2023

View reviewed changes

fixup! [LibOS,PAL/Linux-SGX] Add EDMM lazy allocation support

f46e8c6

Signed-off-by: Kailun Qin <kailun.qin@intel.com>

kailun-qin commented Apr 4, 2024

View reviewed changes

dimakuv previously approved these changes Apr 4, 2024

View reviewed changes

mwkmwkmwk reviewed Apr 4, 2024

View reviewed changes

kailun-qin dismissed dimakuv’s stale review via 2c51bf4 April 4, 2024 16:21

mkow requested changes Apr 4, 2024

View reviewed changes

fixup! [LibOS,PAL/Linux-SGX] Add EDMM lazy allocation support

3412b41

Signed-off-by: Kailun Qin <kailun.qin@intel.com>

kailun-qin force-pushed the kailun-qin/edmm-lazy-alloc branch from 2c51bf4 to 3412b41 Compare April 4, 2024 17:22

dimakuv previously approved these changes Apr 5, 2024

View reviewed changes

fixup! [LibOS,PAL/Linux-SGX] Add EDMM lazy allocation support

838854d

Reverts the previous fixup commit (incorrect attempt at fixing the bug by Kailun). And adds the correct fix to the intermittent bug observed on client platforms. Signed-off-by: Dmitrii Kuvaiskii <dmitrii.kuvaiskii@intel.com>

dimakuv dismissed their stale review via 838854d April 5, 2024 11:37

dimakuv force-pushed the kailun-qin/edmm-lazy-alloc branch from d2afc4f to 838854d Compare April 5, 2024 11:37

dimakuv previously approved these changes Apr 5, 2024

View reviewed changes

fixup! [LibOS,PAL/Linux-SGX] Add EDMM lazy allocation support

f889e7b

Signed-off-by: Kailun Qin <kailun.qin@intel.com>

kailun-qin dismissed dimakuv’s stale review via f889e7b April 8, 2024 10:23

dimakuv previously approved these changes Apr 8, 2024

View reviewed changes

fixup! [LibOS,PAL/Linux-SGX] Add EDMM lazy allocation support

62bfa35

Signed-off-by: Kailun Qin <kailun.qin@intel.com>

kailun-qin dismissed dimakuv’s stale review via 62bfa35 April 9, 2024 01:01

dimakuv approved these changes Apr 9, 2024

View reviewed changes

llly reviewed Apr 11, 2024

View reviewed changes

dimakuv requested changes Apr 11, 2024

View reviewed changes

fixup! [LibOS,PAL/Linux-SGX] Add EDMM lazy allocation support

32f5f71

Signed-off-by: Kailun Qin <kailun.qin@intel.com>

kailun-qin commented Apr 12, 2024

View reviewed changes

dimakuv approved these changes Apr 12, 2024

View reviewed changes

llly reviewed Apr 16, 2024

View reviewed changes

dimakuv requested changes Apr 18, 2024

View reviewed changes

fixup! [LibOS,PAL/Linux-SGX] Add EDMM lazy allocation support

4b69d5c

Signed-off-by: Kailun Qin <kailun.qin@intel.com>

kailun-qin commented Apr 18, 2024

View reviewed changes

dimakuv approved these changes Apr 19, 2024

View reviewed changes

mkow mentioned this pull request May 4, 2024

Error only for the openjdk demo: "Enclave creation IOCTL failed with Input/output error (EIO)." gramineproject/examples#100

Closed

[LibOS,PAL/Linux-SGX] Add EDMM lazy allocation support #1513

Are you sure you want to change the base?

[LibOS,PAL/Linux-SGX] Add EDMM lazy allocation support #1513

Conversation

kailun-qin commented Aug 28, 2023 • edited

Description of the changes

How to test this PR?

dimakuv left a comment

Choose a reason for hiding this comment

kailun-qin left a comment

Choose a reason for hiding this comment

kailun-qin left a comment

Choose a reason for hiding this comment

llly left a comment

Choose a reason for hiding this comment

dimakuv left a comment

Choose a reason for hiding this comment

dimakuv left a comment

Choose a reason for hiding this comment

kailun-qin left a comment

Choose a reason for hiding this comment

dimakuv left a comment

Choose a reason for hiding this comment

kailun-qin left a comment

Choose a reason for hiding this comment

dimakuv left a comment

Choose a reason for hiding this comment

dimakuv left a comment

Choose a reason for hiding this comment

kailun-qin left a comment

Choose a reason for hiding this comment

dimakuv left a comment

Choose a reason for hiding this comment

kailun-qin left a comment

Choose a reason for hiding this comment

llly left a comment

Choose a reason for hiding this comment

dimakuv left a comment

Choose a reason for hiding this comment

kailun-qin left a comment

Choose a reason for hiding this comment

kailun-qin left a comment

Choose a reason for hiding this comment

llly left a comment

Choose a reason for hiding this comment

dimakuv left a comment

Choose a reason for hiding this comment

dimakuv left a comment

Choose a reason for hiding this comment

kailun-qin left a comment

Choose a reason for hiding this comment

dimakuv left a comment

Choose a reason for hiding this comment

kailun-qin left a comment

Choose a reason for hiding this comment

kailun-qin left a comment

Choose a reason for hiding this comment

dimakuv left a comment

Choose a reason for hiding this comment

mwkmwkmwk left a comment

Choose a reason for hiding this comment

mkow left a comment

Choose a reason for hiding this comment

dimakuv left a comment

Choose a reason for hiding this comment

dimakuv left a comment

Choose a reason for hiding this comment

dimakuv left a comment

Choose a reason for hiding this comment

dimakuv left a comment

Choose a reason for hiding this comment

llly left a comment

Choose a reason for hiding this comment

dimakuv left a comment

Choose a reason for hiding this comment

kailun-qin left a comment

Choose a reason for hiding this comment

dimakuv left a comment

Choose a reason for hiding this comment

llly left a comment

kailun-qin commented Aug 28, 2023 •

edited