Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] [LibOS] Add support for timerfd system calls #1734

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

kailun-qin
Copy link
Contributor

@kailun-qin kailun-qin commented Jan 26, 2024

Description of the changes

This commit adds support for system calls that create and operate on a timer that delivers timer expiration notifications via a file descriptor, specifically: timerfd_create(), timerfd_settime() and timerfd_gettime(). The timerfd object is associated with a dummy eventfd created on the host to trigger notifications (e.g., in epoll). The object is created inside Gramine, with all it operations resolved entirely inside Gramine.

The emulation is currently implemented at the level of a single process. However, it may sometimes work for multi-process applications, e.g., if the child process inherits the timerfd object but doesn't use it; to support these cases, we introduce the
sys.experimental__allow_timerfd_fork manifest option.

LibOS regression tests are also added.

How to test this PR?

CI + newly added regression tests. TODO: test it on some real workloads.


This change is Reviewable

This commit adds support for system calls that create and operate on a
timer that delivers timer expiration notifications via a file
descriptor, specifically: `timerfd_create()`, `timerfd_settime()` and
`timerfd_gettime()`. The timerfd object is associated with a dummy
eventfd created on the host to trigger notifications (e.g., in epoll).
The object is created inside Gramine, with all it operations resolved
entirely inside Gramine.

The emulation is currently implemented at the level of a single process.
However, it may sometimes work for multi-process applications, e.g.,
if the child process inherits the timerfd object but doesn't use it;  to
support these cases, we introduce the
`sys.experimental__allow_timerfd_fork` manifest option.

LibOS regression tests are also added.

Signed-off-by: Kailun Qin <kailun.qin@intel.com>
This commit is extracted from
gramineproject#1736.

Signed-off-by: Kailun Qin <kailun.qin@intel.com>
Update and fixup based on
gramineproject#1728.

Signed-off-by: Kailun Qin <kailun.qin@intel.com>
Copy link
Contributor

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 27 of 33 files at r1, 9 of 11 files at r2, all commit messages.
Reviewable status: 36 of 39 files reviewed, 24 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " and "WIP" found in commit messages' one-liners (waiting on @kailun-qin)

a discussion (no related file):
I did a partial review. Looks fine generally. Closely related to #1728



libos/include/libos_fs.h line 187 at r2 (raw file):

    /* Verify a single handle after poll. Must update `pal_ret_events` in-place with only allowed
     * ones. Used in e.g. secure timerfd FS. */
    void (*post_poll)(struct libos_handle* hdl, pal_wait_flags_t* pal_ret_events);

This is a shared change as in #1728.


libos/include/libos_handle.h line 139 at r2 (raw file):

struct libos_timerfd_handle {
    spinlock_t expiration_lock; /* protecting below fields */

TODO(myself): understand why we need two locks.


libos/include/libos_handle.h line 234 at r2 (raw file):

     * `libos_inode.lock`. Must be used *only* via lock_pos_handle() and unlock_pos_handle(); these
     * functions make sure that the lock is acquired only on those handle types that can change the
     * position (e.g. not on eventfds or pipes). */

This change is taken from here: #1736. Blocking as a prerequisite PR.


libos/include/linux_abi/timerfd.h line 8 at r2 (raw file):

#pragma once

/* Types and structures used by various Linux ABIs (e.g. syscalls). */

Please modify or remove this comment.


libos/include/linux_abi/timerfd.h line 11 at r2 (raw file):

/* These need to be binary-identical with the ones used by Linux. */

#include <linux/timerfd.h>

Wait, but the point of this header is to remove Linux-host ones. Please copy only the relevant bits from that header into this file.


libos/src/libos_async.c line 27 at r2 (raw file):

    void* arg;
    PAL_HANDLE object;       /* handle (async IO) to wait on */
    PAL_HANDLE timer_object; /* handle to identify timer object; currently used for timerfd */

Why not re-use object?

Alternatively, please rename object to something more specific like async_io_object.


libos/src/meson.build line 47 at r2 (raw file):

    'fs/sys/node_info.c',
    'fs/tmpfs/fs.c',
    'fs/timerfd/fs.c',

Doesn't look sorted


libos/src/bookkeep/libos_handle.c line 34 at r2 (raw file):

#define INIT_HANDLE_MAP_SIZE 32

static void lock_unlock_pos_handle(struct libos_handle* hdl, bool is_lock) {

This whole file is taken from #1736.

Blocking as a prerequisite PR.


libos/src/fs/timerfd/fs.c line 1 at r2 (raw file):

/* SPDX-License-Identifier: LGPL-3.0-or-later */

TODO(myself): carefully review this file.


libos/src/fs/timerfd/fs.c line 83 at r2 (raw file):

    if (*pal_ret_events & (PAL_WAIT_ERROR | PAL_WAIT_HANG_UP | PAL_WAIT_WRITE)) {
        /* impossible: we control eventfd inside the LibOS, and we never raise such conditions */

eventfd -> timerfd


libos/src/sys/libos_timerfd.c line 1 at r2 (raw file):

/* SPDX-License-Identifier: LGPL-3.0-or-later */

TODO(myself): carefully review this file.


libos/src/sys/libos_timerfd.c line 25 at r2 (raw file):

 * polling mechanisms (select/poll/epoll):
 *
 * a. Malicious host may inject the notification too early: POLLIN when nothing was written

nothing was written sounds wrong -- you should use smth like no timer expired yet


libos/src/sys/libos_timerfd.c line 26 at r2 (raw file):

 *
 * a. Malicious host may inject the notification too early: POLLIN when nothing was written
 *    yet. This may lead to a synchronization failure of the app. To prevent this, eventfd

eventfd -> timerfd


libos/src/sys/libos_timerfd.c line 27 at r2 (raw file):

 * a. Malicious host may inject the notification too early: POLLIN when nothing was written
 *    yet. This may lead to a synchronization failure of the app. To prevent this, eventfd
 *    implements a callback `post_poll()` where it verifies that some data was indeed written (i.e.,

ditto (not written but timer expired)


libos/src/sys/libos_timerfd.c line 32 at r2 (raw file):

 *    This is a Denial of Service (DoS), which we don't care about.
 * c. Malicious host may inject POLLERR, POLLHUP, POLLRDHUP, POLLNVAL, POLLOUT. This is impossible
 *    as we control eventfd objects inside the LibOS, and we never raise such conditions. So the

timerfd objects


libos/src/sys/libos_timerfd.c line 112 at r2 (raw file):

    if (clockid != CLOCK_REALTIME) {
        if (FIRST_TIME()) {
            log_warning("Unsupported clockid; replaced by the system-wide real-time clock.");

Please add ...in timerfd_create()


libos/src/sys/libos_timerfd.c line 138 at r2 (raw file):

    spinlock_lock(&hdl->info.timerfd.expiration_lock);

    if (hdl->info.timerfd.num_expirations < UINT64_MAX) {

What happens in Linux kernel is number of expirations overflows?


libos/src/sys/libos_timerfd.c line 152 at r2 (raw file):

static void callback_itimer(IDTYPE caller, void* arg) {
    // XXX: Can we simplify this code or streamline with the other callback?

What's this comment?


libos/src/sys/libos_timerfd.c line 198 at r2 (raw file):

    /* NOTE: cancelable timer (for the case where reads on timerfd would return `ECANCELED` when the
     * real-time clock undergoes a discontinuous change) is currently unsupported; needs to be
     * specified along with `TFD_TIMER_ABSTIME`. */

But why not? Gramine doesn't implement clock_settime(), so this flag becomes always a no-op. So it looks like it's benign to allow this flag.


libos/test/ltp/ltp.cfg line 2446 at r2 (raw file):

skip = yes

# cancelable timer (set with `TFD_TIMER_CANCEL_ON_SET` flag) is unsupported

Why not supported? I think we can have a dummy support for this flag -- Gramine doesn't allow clock_settime() anyway, so this flag will just never be "triggered" anyway.

Or maybe you mean that Gramine doesn't support clock_settime() -- in this case modify the comment.


libos/test/regression/tests_musl.toml line 126 at r2 (raw file):

  "tcp_msg_peek",
  "timerfd",
  "timerfd_fork",

Looks like this line must be removed


libos/test/regression/timerfd.c line 1 at r2 (raw file):

/* SPDX-License-Identifier: LGPL-3.0-or-later */

TODO(myself): carefully review this file.


libos/test/regression/timerfd_fork.c line 28 at r2 (raw file):

static void set_timerfd(int fd) {
    struct itimerspec new_value;

You can use the struct init syntax of C:

struct itimerspec new_value = { .it_value.tv_sec = TIMEOUT_VALUE };

The other fields will be by default set to zeros.


libos/test/regression/timerfd_fork.c line 42 at r2 (raw file):

    if (pid == 0) {
        uint64_t expirations;
        /* child: wait for the timer to expire and then read the timerfd */

A more correct comment would be

child: wait on a blocking read for the timer to expire

Copy link
Contributor

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 36 of 39 files reviewed, 25 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " and "WIP" found in commit messages' one-liners (waiting on @kailun-qin)


libos/src/sys/libos_timerfd.c line 233 at r2 (raw file):

    if (next_value) {
        int64_t install_ret = install_async_event(hdl->pal_handle, next_value, absolute_time,
                                                  &callback_itimer, (void*)hdl);

We share ownership of the timerfd LibOS handle object with the Async Helper thread here.

What happens on close(timerfd)? Looks like Async Helper thread will be left with a dangling pointer to the object.

This seems to be a similar problem as #1721. If it is, please help review #1721 or suggest an alternative solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Coming in next release (v1.8)
Development

Successfully merging this pull request may close these issues.

None yet

2 participants