Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LibOS] Add sys.mock_syscalls = [ ... ] manifest option #1859

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

dimakuv
Copy link
Contributor

@dimakuv dimakuv commented Apr 24, 2024

Description of the changes

This PR adds the manifest syntax sys.disallowed_syscalls = [ ... ] sys.mock_syscalls = [ ... ] to specify system calls that will return a specified value when executed in Gramine.

How to test this PR?

Added a LibOS regression test.


This change is Reviewable

@dimakuv dimakuv changed the title [LibOS] Add sys.disallowed_syscalls = [ ... ] manifest option RFC: [LibOS] Add sys.disallowed_syscalls = [ ... ] manifest option Apr 24, 2024
@dimakuv dimakuv force-pushed the dimakuv/manifest-disable-syscalls branch from 0504ab3 to cf686d2 Compare April 24, 2024 13:14
Copy link
Contributor Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 12 files reviewed, 2 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel)

a discussion (no related file):
This is a patch that I was experimenting with (for reasons not relevant to this PR). I decided to create the PR out of what I have. No idea if this is useful... Any feedback welcome.



Documentation/manifest-syntax.rst line 397 at r1 (raw file):

::

    sgx.disallowed_syscalls = [

Alternatives:

  1. Provide an allowlist sys.allowed_syscalls = [ ... ]. This will be more security-expected and more in line with e.g. seccomp profiles. Downside: much harder for end users.
  2. Provide both this denylist and the allowlist manifest options. At init time, Gramine fails if detects them both.

Copy link
Contributor Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 12 files reviewed, 3 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel)


Documentation/manifest-syntax.rst line 399 at r2 (raw file):

    sys.disallowed_syscalls = [
      "syscall_name",
      "syscall_name",

One additional option would be to not simply return -ENOSYS, but allow to specify what int to return:

    sys.fixed_syscalls = [
      { name = "syscall_name1", return = -38},  # returns -ENOSYS
      { name = "syscall_name2", return = 0},    # returns 0, i.e. dummy impl

Copy link
Contributor

@oshogbo oshogbo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 12 files reviewed, 9 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @dimakuv)


Documentation/manifest-syntax.rst line 397 at r2 (raw file):

::

    sys.disallowed_syscalls = [

Should this also be read from a file similarly?


libos/src/libos_parser.c line 1653 at r2 (raw file):

unsigned long get_syscall_number(const char* name) {
    assert(LIBOS_SYSCALL_BOUND == ARRAY_SIZE(syscall_parser_table));

Shouldn't this be a static_assert beyond this function?


libos/src/libos_parser.c line 1658 at r2 (raw file):

        if (!syscall_parser_table[i].name)
            continue;
        if (strcmp(name, syscall_parser_table[i].name) == 0) {

Do we need sysno?

 if (strcmp(name, syscall_parser_table[i].name) == 0) {
  return i;
}
...

return LIBOS_SYSCALL_BOUND;
}

Also shouldn't this indicate the error more clearly?


libos/src/libos_syscalls.c line 114 at r2 (raw file):

        if (!toml_disallowed_syscall_raw) {
            log_error("Invalid disallowed syscall in manifest at index %ld", i);
            return -EINVAL;

goto after return?


libos/src/libos_syscalls.c line 121 at r2 (raw file):

        if (ret < 0) {
            log_error("Invalid disallowed syscall in manifest at index %ld (not a string)", i);
            return -EINVAL;

ditto


libos/src/libos_syscalls.c line 129 at r2 (raw file):

            log_error("Unrecognized disallowed syscall `%s` in manifest at index %ld",
                      toml_disallowed_syscall_str, i);
            return -EINVAL;

ditto

Copy link
Contributor Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 12 files reviewed, 9 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @oshogbo)


Documentation/manifest-syntax.rst line 397 at r2 (raw file):

Previously, oshogbo (Mariusz Zaborski) wrote…

Should this also be read from a file similarly?

What would be the benefit of having these syscalls in a separate file, rather than directly in the manifest?


libos/src/libos_parser.c line 1653 at r2 (raw file):

Previously, oshogbo (Mariusz Zaborski) wrote…

Shouldn't this be a static_assert beyond this function?

Done partially. It indeed needs to be a static assert, but I don't know what you mean by "beyond this function". I add this assert here to make it clear to readers that these two are the same limits.


libos/src/libos_parser.c line 1658 at r2 (raw file):

Previously, oshogbo (Mariusz Zaborski) wrote…

Do we need sysno?

 if (strcmp(name, syscall_parser_table[i].name) == 0) {
  return i;
}
...

return LIBOS_SYSCALL_BOUND;
}

Also shouldn't this indicate the error more clearly?

Done partially.

What do you mean by "indicate the error more clearly"? Replace with int get_syscall_number(const char* name, unsigned long* out_sysno)? I can do it, though the current version looked sufficiently clear to me.


libos/src/libos_syscalls.c line 114 at r2 (raw file):

Previously, oshogbo (Mariusz Zaborski) wrote…

goto after return?

Done.


libos/src/libos_syscalls.c line 121 at r2 (raw file):

Previously, oshogbo (Mariusz Zaborski) wrote…

ditto

Done.


libos/src/libos_syscalls.c line 129 at r2 (raw file):

Previously, oshogbo (Mariusz Zaborski) wrote…

ditto

Done.

Copy link
Contributor

@oshogbo oshogbo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 12 files reviewed, 4 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @dimakuv)


Documentation/manifest-syntax.rst line 397 at r2 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

What would be the benefit of having these syscalls in a separate file, rather than directly in the manifest?

I was thinking about docker seccomp policies, and they are mainly defined as an outside file. So my idea was that somebody might want to define a more generic policy. But this was just a random idea.


libos/src/libos_parser.c line 1653 at r2 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Done partially. It indeed needs to be a static assert, but I don't know what you mean by "beyond this function". I add this assert here to make it clear to readers that these two are the same limits.

"Beyond this function" I thought about doing it above or where the syscall_parse_table/IBOS_SYSCALL_BOUND is defined. But here makes sense as well.


libos/src/libos_parser.c line 1658 at r2 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Done partially.

What do you mean by "indicate the error more clearly"? Replace with int get_syscall_number(const char* name, unsigned long* out_sysno)? I can do it, though the current version looked sufficiently clear to me.

Yes, my idea was to not use a LIBOS_SYSCALL_BOUND as a error msg, as the name does not suggest that something went wrong. And doing it as you proposed. Your decision.

Copy link
Contributor Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 12 files reviewed, 4 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @oshogbo)


Documentation/manifest-syntax.rst line 397 at r2 (raw file):

I was thinking about docker seccomp policies, and they are mainly defined as an outside file.

That's simply because there is no "main configuration file" in Docker. Gramine on the other side has such a file, so seems reasonable to put all the configuration in one file. (The only potential exceptions are envvars and cmdline args, but that's because they may be sent in encrypted form.)


libos/src/libos_parser.c line 1653 at r2 (raw file):

Previously, oshogbo (Mariusz Zaborski) wrote…

"Beyond this function" I thought about doing it above or where the syscall_parse_table/IBOS_SYSCALL_BOUND is defined. But here makes sense as well.

I looked at where else I can easily add this static_assert, but I don't see another good place:

  • in libos_parser.c: it's already obvious that the size of the table is equal to the bound (from array definition)
  • in libos_defs.h: there we don't know what syscall_parser_table is, so not possible.

So keeping as is.d


libos/src/libos_parser.c line 1658 at r2 (raw file):

Previously, oshogbo (Mariusz Zaborski) wrote…

Yes, my idea was to not use a LIBOS_SYSCALL_BOUND as a error msg, as the name does not suggest that something went wrong. And doing it as you proposed. Your decision.

Done

@dimakuv dimakuv force-pushed the dimakuv/manifest-disable-syscalls branch from f469e2e to 073c483 Compare May 16, 2024 08:08
Copy link
Contributor Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 14 files reviewed, 1 unresolved discussion, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel)


Documentation/manifest-syntax.rst line 397 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Alternatives:

  1. Provide an allowlist sys.allowed_syscalls = [ ... ]. This will be more security-expected and more in line with e.g. seccomp profiles. Downside: much harder for end users.
  2. Provide both this denylist and the allowlist manifest options. At init time, Gramine fails if detects them both.

Done, see discussion in #1876


Documentation/manifest-syntax.rst line 399 at r2 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

One additional option would be to not simply return -ENOSYS, but allow to specify what int to return:

    sys.fixed_syscalls = [
      { name = "syscall_name1", return = -38},  # returns -ENOSYS
      { name = "syscall_name2", return = 0},    # returns 0, i.e. dummy impl

Done, replaced with sys.mocked_syscalls as proposed during the Gramine meeting: #1876

@dimakuv dimakuv changed the title RFC: [LibOS] Add sys.disallowed_syscalls = [ ... ] manifest option [LibOS] Add sys.mock_syscalls = [ ... ] manifest option May 16, 2024
This commit adds the manifest syntax `sys.mock_syscalls = [ ... ]`
to specify system calls that will be mocked when executed in Gramine
(i.e. return a specified value without any other side effects).

This may be particularly important for cases where the overhead of
invoking a system call on the host (e.g. exiting the SGX enclave)
becomes a performance bottleneck, and it is more beneficial to disable
or no-op the syscall in the first place; `sched_yield()` is an example.

Another example may be disabling certain functionalities for security
reasons. For example, one may want to disable `eventfd()` and
`eventfd2()` to forbid creation of eventfd objects.

Yet another example may be mocking syscalls currently not implemented in
Gramine. E.g. it may be enough to mock `vhangup()` to always return 0,
so that the workload proceeds further.

Signed-off-by: Dmitrii Kuvaiskii <dmitrii.kuvaiskii@intel.com>
@dimakuv dimakuv force-pushed the dimakuv/manifest-disable-syscalls branch from 073c483 to 4a5ce3e Compare May 16, 2024 09:04
Copy link
Contributor

@llly llly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed all commit messages.
Reviewable status: 0 of 14 files reviewed, 3 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @dimakuv)


Documentation/manifest-syntax.rst line 418 at r4 (raw file):

    sys.mock_syscalls = [
      { name = "eventfd", return = -38 },
      { name = "eventfd", return = -38 },

eventfd2

Code quote:

eventfd

Documentation/manifest-syntax.rst line 428 at r4 (raw file):

   (2) it is also used to create threads in the same process. The
   ``sys.disallow_subprocesses`` manifest option disables only the first usage,
   whereas ``sys.mock_syscalls = [ name = "clone", ...]`` disables both usages.

{name = "clone"}

Code quote:

name = "clone"

Signed-off-by: Dmitrii Kuvaiskii <dmitrii.kuvaiskii@intel.com>
Copy link
Contributor Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 14 files reviewed, 2 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @llly)


Documentation/manifest-syntax.rst line 418 at r4 (raw file):

Previously, llly (Li Xun) wrote…

eventfd2

Done, oops, thanks for catching this!


Documentation/manifest-syntax.rst line 428 at r4 (raw file):

Previously, llly (Li Xun) wrote…

{name = "clone"}

Done.

Copy link
Contributor

@llly llly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 13 of 14 files at r4, 1 of 1 files at r5, all commit messages.
Reviewable status: all files reviewed, all discussions resolved, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners

Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 10 of 14 files at r4, 1 of 1 files at r5, all commit messages.
Reviewable status: all files reviewed, 4 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @dimakuv)


-- commits line 19 at r5:
I think we shouldn't encourage such things, it's not semantically correct (vhangup() is not a semantical no-op).


Documentation/manifest-syntax.rst line 406 at r5 (raw file):

This syntax specifies the system calls that are mocked when executed in
Gramine (i.e. they return a specified value without any other side effects).
If ``return`` field is skipped, then the default value is ``0`` (no-op).

I'd also warn the users that returning a success but skipping side-effects may introduce bugs to the app, if it expects them (e.g. mocking futex may lead to silent introduction of race conditions).


libos/src/libos_parser.c line 1659 at r5 (raw file):

int get_syscall_number(const char* name, unsigned long* out_sysno) {
    static_assert(LIBOS_SYSCALL_BOUND == ARRAY_SIZE(syscall_parser_table), "oops");
    assert(out_sysno);

Why asserting this, but not asserting name? I'd just remove it.


libos/src/libos_syscalls.c line 153 at r5 (raw file):

        /* add syscall to the table of mocked syscalls */
        libos_mock_syscall_table[sysno].is_mocked = true;

Please check if sysno fits in the table.

Copy link
Contributor

@mwkmwkmwk mwkmwkmwk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 5 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @dimakuv)

a discussion (no related file):
I don't think doing it like this is a good idea for UX reasons: it exposes what is essentially a debug-level option for a user-level need.

The "user-level" need here is executing sched_yield within the enclave, which is a fundamentally sound thing to do. It can be a supported option, in the "we provide user support when you use it and something breaks" sense.

However, the proposed generic interface is debug-level: it's a sharp tool that can easily be used to completely break syscall semantics, and make the entire thing unsound. We cannot provide support for using this option.

While I'm not completely against having such debug-level options, I believe they should be clearly marked as such (sys.debug__mock_syscalls?), and not recommended for production usage. Separately, I believe we should provide a simple user-facing option like sys.sched_yield_noop = true.


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants