Skip to content

Commit

Permalink
iv_fd_epoll: Don't call epoll_pwait2() again if it fails with -EPERM.
Browse files Browse the repository at this point in the history
Commit 491daf4 ("iv_fd_epoll: Add support for epoll_pwait2().")
added support for epoll_pwait2(), with a fallback to epoll_wait() in
case epoll_pwait2() is not supported by the kernel we are running on,
which would be indicated by epoll_pwait2() returning -ENOSYS.

Some reports (e.g. axoflow/axosyslog#85 ,
#33 (comment) )
suggest that some container technologies can cause -EPERM to be
returned for epoll_pwait2(), independently of whether or not
epoll_pwait2() is actually supported by the kernel we are running on,
and this trips us up because we don't currently handle -EPERM
gracefully, as we did not expect that we would have to do so.

Making system calls return -EPERM to indicate that they were filtered
out by a security policy framework seems somewhat dubious, especially
when considering the amount of application and user confusion generated
by system calls that are not documented as being able to fail with
-EPERM now suddenly being able to fail with -EPERM, but there is not
much we can do about this.

I would be against adding EPERM-as-ENOSYS fallbacks for every current
or future case where we handle ENOSYS, but:

1. it seems that this is the only case where this triggers;

2. upstream seems to agree that this EPERM behavior is a bug (see
   e.g. these links dug up by László Várady:
   containers/common#573 ,
   containers/podman#10337 ,
   opencontainers/runtime-spec#1087 ), so
   there will hopefully be no new cases of this in the future;

3. there's at least one container technology release (podman on
   CentOS 7) where this bug triggers and where the platform is
   sufficiently old to no longer be receiving updates, as pointed
   out by Balazs Scheidler, so this issue can't be fixed by users
   updating their container software.

Under these circumstances, adding a workaround on our end seems
reasonable, and this commit does so.

This issue was originally reported by @mstopa-splunk on GitHub.
Workaround originally by Balazs Scheidler.

Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
  • Loading branch information
buytenh committed May 16, 2024
1 parent 6162a2a commit a98ca24
Showing 1 changed file with 10 additions and 1 deletion.
11 changes: 10 additions & 1 deletion src/iv_fd_epoll.c
Original file line number Diff line number Diff line change
Expand Up @@ -159,7 +159,16 @@ static int iv_fd_epoll_wait(struct iv_state *st, struct epoll_event *events,

ret = syscall(__NR_epoll_pwait2, epfd, events, maxevents,
to_relative(st, &rel, abs), NULL);
if (ret == 0 || errno != ENOSYS)

/*
* Some container technologies (at least podman on CentOS
* 7 and docker on Debian Buster, according to reports)
* cause epoll_pwait2() to return -EPERM. It is unclear
* what security benefits this provides, but we'll have to
* handle this by falling back to epoll_wait() just as if
* -ENOSYS had been returned.
*/
if (ret == 0 || (errno != EPERM && errno != ENOSYS))
return ret;

epoll_pwait2_support = 0;
Expand Down

0 comments on commit a98ca24

Please sign in to comment.