Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect IPv6 support in containers, generate '/etc/hosts' accordingly. #47062

Merged
merged 2 commits into from Jan 29, 2024

Conversation

robmry
Copy link
Contributor

@robmry robmry commented Jan 11, 2024

- What I did

Some configuration in a container depends on whether it has support for IPv6 (including default entries for ::1 etc in /etc/hosts).

Before this change, the container's support for IPv6 was determined by whether it was connected to any IPv6-enabled networks. But, that can change over time, it isn't a property of the container itself.

So, instead, detect IPv6 support by looking for ::1 on the container's loopback interface. That address will not be present if the kernel does not have IPv6 support, or the user has disabled it in new namespaces by other means. To disable IPv6 when creating a container, use --sysctl net.ipv6.conf.all.disable_ipv6=1.

This change uses the new approach to determining whether a container supports IPv6 to ensure that IPv6 entries are only included in '/etc/hosts' if IPv6 is enabled - which fixes #35954

This change is a start, further work is needed:

  • Like 'hosts', 'resolv.conf' needs to be built according to this detected IPv6-ness.
  • An endpoint for an IPv6 network, in a container with no IPv6 support, is allocated an IPv6 address.
    • That address won't show up on the container's interface to a network added when the container is created, but it will on interfaces to networks that are created later.
    • The address will be reserved for the lifetime of the container.
    • The internal resolver is populated with those IPv6 addreses, and will return them.
    • Similarly, a container with IPv6 and a legacy-network --link to a non-IPv6 container will get that non-functional IPv6 address in its '/etc/hosts'.
  • Check for more cases where the plumbing needs to change for this new way of detecting the IPv6 capability of a container.
  • Check that docs describe the --sysctl as the way to disable IPv6 in a container.

- How I did it

The first commit here, written by @corhere, results in Sandbox.SetKey() being called after the daemon has created the container task. At that point, sysctl options have been applied, making it possible to use the existence of ::1 on the loopback interface to detect IPv6 support when the user provides sysctl net.ipv6.conf.all.disable_ipv6 to explicitly enable or disable IPv6 when creating a container. (SetKey() will still be called from the prestart hook for buildkit, but it doesn't allow sysctl overrides.)

The second commit probes for ::1, and uses the result to re-generate /etc/hosts accordingly.

The daemon no longer disables IPv6 on all interfaces during initialisation. It now disables IPv6 only for interfaces that have not been assigned an IPv6 address:

- How to verify it

Updated unit tests, new integration test.

- Description for the changelog

By default, IPv6 will remain enabled on a container's loopback interface when the container is not connected to an IPv6-enabled network. So, for example, containers that are only connected to an IPv4-only network will now have the '::1' address on their loopback interface.

To disable IPv6 in a container, use option '--sysctl net.ipv6.conf.all.disable_ipv6=1' in the 'create' or 'run' command, or the equivalent 'sysctls' option in the service configuration section of a Compose file.

If IPv6 is not available in a container because it has been explicitly disabled for the container, the host's networking stack does not have IPv6 enabled, or for any other reason - the container's '/etc/hosts' file will not include IPv6 entries.

@robmry robmry force-pushed the 35954-default_ipv6_enabled branch 3 times, most recently from 6181f9b to 842d603 Compare January 12, 2024 09:51
@robmry robmry force-pushed the 35954-default_ipv6_enabled branch 3 times, most recently from b0942ea to fec7c9d Compare January 12, 2024 18:08
@robmry robmry marked this pull request as ready for review January 12, 2024 18:52
Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just some quick comments (didn't finish reading all changes)

daemon/start.go Outdated Show resolved Hide resolved
daemon/start.go Outdated Show resolved Hide resolved
integration/internal/container/ops.go Show resolved Hide resolved
libnetwork/endpoint.go Outdated Show resolved Hide resolved
daemon/start.go Outdated Show resolved Hide resolved
libnetwork/osl/namespace_linux.go Outdated Show resolved Hide resolved
libnetwork/sandbox.go Outdated Show resolved Hide resolved
libnetwork/sandbox_dns_unix.go Outdated Show resolved Hide resolved
@robmry robmry force-pushed the 35954-default_ipv6_enabled branch 2 times, most recently from c28e644 to ddb0c5e Compare January 17, 2024 16:10
@robmry robmry changed the title Enable IPv6 on the loopback interface if possible, generate '/etc/hosts' accordingly. Detect IPv6 support in containers, generate '/etc/hosts' accordingly. Jan 17, 2024
libnetwork/etchosts/etchosts.go Outdated Show resolved Hide resolved
libnetwork/sandbox_linux.go Outdated Show resolved Hide resolved
libnetwork/sandbox.go Outdated Show resolved Hide resolved
libnetwork/osl/namespace_linux.go Outdated Show resolved Hide resolved
libnetwork/osl/namespace_linux.go Outdated Show resolved Hide resolved
libnetwork/osl/interface_linux.go Outdated Show resolved Hide resolved
// to libnetwork's Sandbox.CtrTaskCreated after container task creation, before
// it's started. Setting this option when the container has already been started
// (for an option-refresh) is harmless.
func OptionCtrTaskCreated() SandboxOption {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I worry the past-tense naming of the option is going to confuse the next person who has to touch this code. The option is a promise that the code setting the option will do something in the future, so the option name should reflect that. And I'm not thrilled with libnetwork code making references to the container runtime lifecycle.

Suggested change
func OptionCtrTaskCreated() SandboxOption {
func OptionTwoPhaseInit() SandboxOption {

Single-phase sandbox init is really the special case as it's only needed for buildkit compatibility. What if we made two-phase sandbox init the default, with an opt-out for buildkit? The buildkit executor open-codes the libnetwork-setkey prestart hook so it would be trivial to add an argument or set an environment variable to opt into single-phase sandbox init.

// attach netns to bridge within the container namespace, using reexec in a prestart hook
s.Hooks = &specs.Hooks{
Prestart: []specs.Hook{{
Path: filepath.Join("/proc", strconv.Itoa(os.Getpid()), "exe"),
Args: []string{"libnetwork-setkey", "-exec-root=" + iface.provider.Config().ExecRoot, iface.sbx.ContainerID(), shortNetCtlrID},
}},
}

Alternatively, we could stop depending on the libnetwork-setkey reexec except for buildkit. That would make two-phase sandbox init unnecessary! Either the sandbox is initialized in one shot after container create is done, or the sandbox is initialized in one shot during the prestart OCI hook. I would gladly cherry-pick the aforementioned commit for you to rebase this PR on top of if you agree that this is the way to go.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be happy to change the name.

I guess I'm still trying to understand the rules for changing interfaces like this. I thought I'd understood that we treat packages as public interfaces that anybody could be using - so we can't be sure nothing else is "doing a buildkit", using libnetwork the way it does? That's why I figured calling the new phase-2 method would have to be opt-in. (Having free-reign over this sort of change would make life a lot easier though!)

Your third option sounds best by-far ... but still, if anything else is using WithLibnetwork() it'd be a breaking change. If that's not an issue, let's do it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The third option would not be a breaking change (even if we ignore the fact that neither func withLibnetwork nor its sole caller func (*Daemon) createSpec are exported) because the libnetwork-setkey reexec would continue to behave exactly the same as it currently does! The only functional difference between the existing behaviour and the "new" behaviour is at which point in the task's lifecycle func (*Sandbox) SetKey is called. Anything "doing a buildkit" would continue to invoke it via libnetwork-setkey from an OCI hook. We opt into the new behaviour by instead calling SetKey programmatically after any ipv6-related sysctls have been applied to the network namespace.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(withLibnetwork used to be an exported function, but it was un-exported as part of #43980. Regardless, we consider the daemon tree to be de-facto internal.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yes - sure ... I was looking at the diff you linked to, it deletes WithLibnetwork() but it's been un-exported since. Sounds good then.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've rebased my PR so you can cherry-pick corhere@bd99fb0 cleanly into this one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

libnetwork/sandbox_linux.go Outdated Show resolved Hide resolved
libnetwork/etchosts/etchosts_test.go Outdated Show resolved Hide resolved
libnetwork/controller.go Outdated Show resolved Hide resolved
@robmry robmry force-pushed the 35954-default_ipv6_enabled branch 2 times, most recently from 5f510b2 to 4d33e14 Compare January 19, 2024 19:32
Copy link
Contributor

@corhere corhere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

daemon/start.go Outdated Show resolved Hide resolved
corhere and others added 2 commits January 19, 2024 20:23
Signed-off-by: Cory Snider <csnider@mirantis.com>
Some configuration in a container depends on whether it has support for
IPv6 (including default entries for '::1' etc in '/etc/hosts').

Before this change, the container's support for IPv6 was determined by
whether it was connected to any IPv6-enabled networks. But, that can
change over time, it isn't a property of the container itself.

So, instead, detect IPv6 support by looking for '::1' on the container's
loopback interface. It will not be present if the kernel does not have
IPv6 support, or the user has disabled it in new namespaces by other
means.

Once IPv6 support has been determined for the container, its '/etc/hosts'
is re-generated accordingly.

The daemon no longer disables IPv6 on all interfaces during initialisation.
It now disables IPv6 only for interfaces that have not been assigned an
IPv6 address. (But, even if IPv6 is disabled for the container using the
sysctl 'net.ipv6.conf.all.disable_ipv6=1', interfaces connected to IPv6
networks still get IPv6 addresses that appear in the internal DNS. There's
more to-do!)

Signed-off-by: Rob Murray <rob.murray@docker.com>
Copy link
Member

@akerouanton akerouanton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Great job @robmry!

I just left a comment about a few dead params. Feel free to address it in a follow-up if you prefer.

libnetwork/sandbox_dns_unix.go Show resolved Hide resolved
@akerouanton akerouanton merged commit 794f712 into moby:master Jan 29, 2024
126 checks passed
@thaJeztah thaJeztah added this to the 26.0.0 milestone Feb 3, 2024
@robmry robmry deleted the 35954-default_ipv6_enabled branch March 27, 2024 16:40
zalimeni added a commit to hashicorp/consul that referenced this pull request Apr 17, 2024
As of Docker Engine 26.0.0 (moby/moby#47062),
IPv6 is enabled by default where supported. This causes issues for our
tests attempting to resolve requests to other containers over
localhost, since on Linux IPv6 will be preferred over IPv4 when
available when applying the default behavior defined in RFC3484.

As a workaround, force IPv4 with a flag passed to `docker run`.
zalimeni added a commit to hashicorp/consul that referenced this pull request Apr 17, 2024
As of Docker Engine 26.0.0 (moby/moby#47062),
IPv6 is enabled by default where supported. This causes issues for our
tests attempting to resolve requests to other containers over
localhost, since on Linux IPv6 will be preferred over IPv4 when
available when applying the default behavior defined in RFC3484.

As a workaround, force IPv4 with a flag passed to `docker run`.
zalimeni added a commit to hashicorp/consul that referenced this pull request Apr 17, 2024
As of Docker Engine 26.0.0 (moby/moby#47062),
IPv6 is enabled by default where supported. This causes issues for our
tests attempting to resolve requests to other containers over
localhost, since on Linux IPv6 will be preferred over IPv4 when
available when applying the default behavior defined in RFC3484.

As a workaround, force IPv4 with a flag passed to `docker run`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment