New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] [LibOS] Add support for timerfd system calls #1734
base: master
Are you sure you want to change the base?
[WIP] [LibOS] Add support for timerfd system calls #1734
Conversation
0ba3491
to
617d0aa
Compare
This commit adds support for system calls that create and operate on a timer that delivers timer expiration notifications via a file descriptor, specifically: `timerfd_create()`, `timerfd_settime()` and `timerfd_gettime()`. The timerfd object is associated with a dummy eventfd created on the host to trigger notifications (e.g., in epoll). The object is created inside Gramine, with all it operations resolved entirely inside Gramine. The emulation is currently implemented at the level of a single process. However, it may sometimes work for multi-process applications, e.g., if the child process inherits the timerfd object but doesn't use it; to support these cases, we introduce the `sys.experimental__allow_timerfd_fork` manifest option. LibOS regression tests are also added. Signed-off-by: Kailun Qin <kailun.qin@intel.com>
617d0aa
to
56310a1
Compare
This commit is extracted from gramineproject#1736. Signed-off-by: Kailun Qin <kailun.qin@intel.com>
Update and fixup based on gramineproject#1728. Signed-off-by: Kailun Qin <kailun.qin@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 27 of 33 files at r1, 9 of 11 files at r2, all commit messages.
Reviewable status: 36 of 39 files reviewed, 24 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " and "WIP" found in commit messages' one-liners (waiting on @kailun-qin)
a discussion (no related file):
I did a partial review. Looks fine generally. Closely related to #1728
libos/include/libos_fs.h
line 187 at r2 (raw file):
/* Verify a single handle after poll. Must update `pal_ret_events` in-place with only allowed * ones. Used in e.g. secure timerfd FS. */ void (*post_poll)(struct libos_handle* hdl, pal_wait_flags_t* pal_ret_events);
This is a shared change as in #1728.
libos/include/libos_handle.h
line 139 at r2 (raw file):
struct libos_timerfd_handle { spinlock_t expiration_lock; /* protecting below fields */
TODO(myself): understand why we need two locks.
libos/include/libos_handle.h
line 234 at r2 (raw file):
* `libos_inode.lock`. Must be used *only* via lock_pos_handle() and unlock_pos_handle(); these * functions make sure that the lock is acquired only on those handle types that can change the * position (e.g. not on eventfds or pipes). */
This change is taken from here: #1736. Blocking as a prerequisite PR.
libos/include/linux_abi/timerfd.h
line 8 at r2 (raw file):
#pragma once /* Types and structures used by various Linux ABIs (e.g. syscalls). */
Please modify or remove this comment.
libos/include/linux_abi/timerfd.h
line 11 at r2 (raw file):
/* These need to be binary-identical with the ones used by Linux. */ #include <linux/timerfd.h>
Wait, but the point of this header is to remove Linux-host ones. Please copy only the relevant bits from that header into this file.
libos/src/libos_async.c
line 27 at r2 (raw file):
void* arg; PAL_HANDLE object; /* handle (async IO) to wait on */ PAL_HANDLE timer_object; /* handle to identify timer object; currently used for timerfd */
Why not re-use object
?
Alternatively, please rename object
to something more specific like async_io_object
.
libos/src/meson.build
line 47 at r2 (raw file):
'fs/sys/node_info.c', 'fs/tmpfs/fs.c', 'fs/timerfd/fs.c',
Doesn't look sorted
libos/src/bookkeep/libos_handle.c
line 34 at r2 (raw file):
#define INIT_HANDLE_MAP_SIZE 32 static void lock_unlock_pos_handle(struct libos_handle* hdl, bool is_lock) {
This whole file is taken from #1736.
Blocking as a prerequisite PR.
libos/src/fs/timerfd/fs.c
line 1 at r2 (raw file):
/* SPDX-License-Identifier: LGPL-3.0-or-later */
TODO(myself): carefully review this file.
libos/src/fs/timerfd/fs.c
line 83 at r2 (raw file):
if (*pal_ret_events & (PAL_WAIT_ERROR | PAL_WAIT_HANG_UP | PAL_WAIT_WRITE)) { /* impossible: we control eventfd inside the LibOS, and we never raise such conditions */
eventfd
-> timerfd
libos/src/sys/libos_timerfd.c
line 1 at r2 (raw file):
/* SPDX-License-Identifier: LGPL-3.0-or-later */
TODO(myself): carefully review this file.
libos/src/sys/libos_timerfd.c
line 25 at r2 (raw file):
* polling mechanisms (select/poll/epoll): * * a. Malicious host may inject the notification too early: POLLIN when nothing was written
nothing was written
sounds wrong -- you should use smth like no timer expired yet
libos/src/sys/libos_timerfd.c
line 26 at r2 (raw file):
* * a. Malicious host may inject the notification too early: POLLIN when nothing was written * yet. This may lead to a synchronization failure of the app. To prevent this, eventfd
eventfd
-> timerfd
libos/src/sys/libos_timerfd.c
line 27 at r2 (raw file):
* a. Malicious host may inject the notification too early: POLLIN when nothing was written * yet. This may lead to a synchronization failure of the app. To prevent this, eventfd * implements a callback `post_poll()` where it verifies that some data was indeed written (i.e.,
ditto (not written but timer expired)
libos/src/sys/libos_timerfd.c
line 32 at r2 (raw file):
* This is a Denial of Service (DoS), which we don't care about. * c. Malicious host may inject POLLERR, POLLHUP, POLLRDHUP, POLLNVAL, POLLOUT. This is impossible * as we control eventfd objects inside the LibOS, and we never raise such conditions. So the
timerfd objects
libos/src/sys/libos_timerfd.c
line 112 at r2 (raw file):
if (clockid != CLOCK_REALTIME) { if (FIRST_TIME()) { log_warning("Unsupported clockid; replaced by the system-wide real-time clock.");
Please add ...in timerfd_create()
libos/src/sys/libos_timerfd.c
line 138 at r2 (raw file):
spinlock_lock(&hdl->info.timerfd.expiration_lock); if (hdl->info.timerfd.num_expirations < UINT64_MAX) {
What happens in Linux kernel is number of expirations overflows?
libos/src/sys/libos_timerfd.c
line 152 at r2 (raw file):
static void callback_itimer(IDTYPE caller, void* arg) { // XXX: Can we simplify this code or streamline with the other callback?
What's this comment?
libos/src/sys/libos_timerfd.c
line 198 at r2 (raw file):
/* NOTE: cancelable timer (for the case where reads on timerfd would return `ECANCELED` when the * real-time clock undergoes a discontinuous change) is currently unsupported; needs to be * specified along with `TFD_TIMER_ABSTIME`. */
But why not? Gramine doesn't implement clock_settime()
, so this flag becomes always a no-op. So it looks like it's benign to allow this flag.
libos/test/ltp/ltp.cfg
line 2446 at r2 (raw file):
skip = yes # cancelable timer (set with `TFD_TIMER_CANCEL_ON_SET` flag) is unsupported
Why not supported? I think we can have a dummy support for this flag -- Gramine doesn't allow clock_settime()
anyway, so this flag will just never be "triggered" anyway.
Or maybe you mean that Gramine doesn't support clock_settime()
-- in this case modify the comment.
libos/test/regression/tests_musl.toml
line 126 at r2 (raw file):
"tcp_msg_peek", "timerfd", "timerfd_fork",
Looks like this line must be removed
libos/test/regression/timerfd.c
line 1 at r2 (raw file):
/* SPDX-License-Identifier: LGPL-3.0-or-later */
TODO(myself): carefully review this file.
libos/test/regression/timerfd_fork.c
line 28 at r2 (raw file):
static void set_timerfd(int fd) { struct itimerspec new_value;
You can use the struct init syntax of C:
struct itimerspec new_value = { .it_value.tv_sec = TIMEOUT_VALUE };
The other fields will be by default set to zeros.
libos/test/regression/timerfd_fork.c
line 42 at r2 (raw file):
if (pid == 0) { uint64_t expirations; /* child: wait for the timer to expire and then read the timerfd */
A more correct comment would be
child: wait on a blocking read for the timer to expire
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 36 of 39 files reviewed, 25 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " and "WIP" found in commit messages' one-liners (waiting on @kailun-qin)
libos/src/sys/libos_timerfd.c
line 233 at r2 (raw file):
if (next_value) { int64_t install_ret = install_async_event(hdl->pal_handle, next_value, absolute_time, &callback_itimer, (void*)hdl);
We share ownership of the timerfd
LibOS handle object with the Async Helper thread here.
What happens on close(timerfd)
? Looks like Async Helper thread will be left with a dangling pointer to the object.
This seems to be a similar problem as #1721. If it is, please help review #1721 or suggest an alternative solution.
Description of the changes
This commit adds support for system calls that create and operate on a timer that delivers timer expiration notifications via a file descriptor, specifically:
timerfd_create()
,timerfd_settime()
andtimerfd_gettime()
. The timerfd object is associated with a dummy eventfd created on the host to trigger notifications (e.g., in epoll). The object is created inside Gramine, with all it operations resolved entirely inside Gramine.The emulation is currently implemented at the level of a single process. However, it may sometimes work for multi-process applications, e.g., if the child process inherits the timerfd object but doesn't use it; to support these cases, we introduce the
sys.experimental__allow_timerfd_fork
manifest option.LibOS regression tests are also added.
How to test this PR?
CI + newly added regression tests. TODO: test it on some real workloads.
This change is