New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Detect IPv6 support in containers, generate '/etc/hosts' accordingly. #47062
Conversation
6181f9b
to
842d603
Compare
b0942ea
to
fec7c9d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just some quick comments (didn't finish reading all changes)
fec7c9d
to
d8f81e6
Compare
c28e644
to
ddb0c5e
Compare
ddb0c5e
to
149f1fe
Compare
libnetwork/sandbox_options.go
Outdated
// to libnetwork's Sandbox.CtrTaskCreated after container task creation, before | ||
// it's started. Setting this option when the container has already been started | ||
// (for an option-refresh) is harmless. | ||
func OptionCtrTaskCreated() SandboxOption { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I worry the past-tense naming of the option is going to confuse the next person who has to touch this code. The option is a promise that the code setting the option will do something in the future, so the option name should reflect that. And I'm not thrilled with libnetwork code making references to the container runtime lifecycle.
func OptionCtrTaskCreated() SandboxOption { | |
func OptionTwoPhaseInit() SandboxOption { |
Single-phase sandbox init is really the special case as it's only needed for buildkit compatibility. What if we made two-phase sandbox init the default, with an opt-out for buildkit? The buildkit executor open-codes the libnetwork-setkey prestart hook so it would be trivial to add an argument or set an environment variable to opt into single-phase sandbox init.
moby/builder/builder-next/executor_linux.go
Lines 142 to 148 in 84036d3
// attach netns to bridge within the container namespace, using reexec in a prestart hook | |
s.Hooks = &specs.Hooks{ | |
Prestart: []specs.Hook{{ | |
Path: filepath.Join("/proc", strconv.Itoa(os.Getpid()), "exe"), | |
Args: []string{"libnetwork-setkey", "-exec-root=" + iface.provider.Config().ExecRoot, iface.sbx.ContainerID(), shortNetCtlrID}, | |
}}, | |
} |
Alternatively, we could stop depending on the libnetwork-setkey reexec except for buildkit. That would make two-phase sandbox init unnecessary! Either the sandbox is initialized in one shot after container create
is done, or the sandbox is initialized in one shot during the prestart
OCI hook. I would gladly cherry-pick the aforementioned commit for you to rebase this PR on top of if you agree that this is the way to go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be happy to change the name.
I guess I'm still trying to understand the rules for changing interfaces like this. I thought I'd understood that we treat packages as public interfaces that anybody could be using - so we can't be sure nothing else is "doing a buildkit", using libnetwork the way it does? That's why I figured calling the new phase-2 method would have to be opt-in. (Having free-reign over this sort of change would make life a lot easier though!)
Your third option sounds best by-far ... but still, if anything else is using WithLibnetwork()
it'd be a breaking change. If that's not an issue, let's do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The third option would not be a breaking change (even if we ignore the fact that neither func withLibnetwork
nor its sole caller func (*Daemon) createSpec
are exported) because the libnetwork-setkey
reexec would continue to behave exactly the same as it currently does! The only functional difference between the existing behaviour and the "new" behaviour is at which point in the task's lifecycle func (*Sandbox) SetKey
is called. Anything "doing a buildkit" would continue to invoke it via libnetwork-setkey
from an OCI hook. We opt into the new behaviour by instead calling SetKey
programmatically after any ipv6-related sysctls have been applied to the network namespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(withLibnetwork
used to be an exported function, but it was un-exported as part of #43980. Regardless, we consider the daemon tree to be de-facto internal.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, yes - sure ... I was looking at the diff you linked to, it deletes WithLibnetwork()
but it's been un-exported since. Sounds good then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've rebased my PR so you can cherry-pick corhere@bd99fb0 cleanly into this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
149f1fe
to
bb89202
Compare
5f510b2
to
4d33e14
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
4d33e14
to
fdc105a
Compare
Signed-off-by: Cory Snider <csnider@mirantis.com>
Some configuration in a container depends on whether it has support for IPv6 (including default entries for '::1' etc in '/etc/hosts'). Before this change, the container's support for IPv6 was determined by whether it was connected to any IPv6-enabled networks. But, that can change over time, it isn't a property of the container itself. So, instead, detect IPv6 support by looking for '::1' on the container's loopback interface. It will not be present if the kernel does not have IPv6 support, or the user has disabled it in new namespaces by other means. Once IPv6 support has been determined for the container, its '/etc/hosts' is re-generated accordingly. The daemon no longer disables IPv6 on all interfaces during initialisation. It now disables IPv6 only for interfaces that have not been assigned an IPv6 address. (But, even if IPv6 is disabled for the container using the sysctl 'net.ipv6.conf.all.disable_ipv6=1', interfaces connected to IPv6 networks still get IPv6 addresses that appear in the internal DNS. There's more to-do!) Signed-off-by: Rob Murray <rob.murray@docker.com>
fdc105a
to
a8f7c5e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Great job @robmry!
I just left a comment about a few dead params. Feel free to address it in a follow-up if you prefer.
As of Docker Engine 26.0.0 (moby/moby#47062), IPv6 is enabled by default where supported. This causes issues for our tests attempting to resolve requests to other containers over localhost, since on Linux IPv6 will be preferred over IPv4 when available when applying the default behavior defined in RFC3484. As a workaround, force IPv4 with a flag passed to `docker run`.
As of Docker Engine 26.0.0 (moby/moby#47062), IPv6 is enabled by default where supported. This causes issues for our tests attempting to resolve requests to other containers over localhost, since on Linux IPv6 will be preferred over IPv4 when available when applying the default behavior defined in RFC3484. As a workaround, force IPv4 with a flag passed to `docker run`.
As of Docker Engine 26.0.0 (moby/moby#47062), IPv6 is enabled by default where supported. This causes issues for our tests attempting to resolve requests to other containers over localhost, since on Linux IPv6 will be preferred over IPv4 when available when applying the default behavior defined in RFC3484. As a workaround, force IPv4 with a flag passed to `docker run`.
- What I did
Some configuration in a container depends on whether it has support for IPv6 (including default entries for
::1
etc in/etc/hosts
).Before this change, the container's support for IPv6 was determined by whether it was connected to any IPv6-enabled networks. But, that can change over time, it isn't a property of the container itself.
So, instead, detect IPv6 support by looking for
::1
on the container's loopback interface. That address will not be present if the kernel does not have IPv6 support, or the user has disabled it in new namespaces by other means. To disable IPv6 when creating a container, use--sysctl net.ipv6.conf.all.disable_ipv6=1
.This change uses the new approach to determining whether a container supports IPv6 to ensure that IPv6 entries are only included in '/etc/hosts' if IPv6 is enabled - which fixes #35954
This change is a start, further work is needed:
--link
to a non-IPv6 container will get that non-functional IPv6 address in its '/etc/hosts'.--sysctl
as the way to disable IPv6 in a container.- How I did it
The first commit here, written by @corhere, results in
Sandbox.SetKey()
being called after the daemon has created the container task. At that point,sysctl
options have been applied, making it possible to use the existence of::1
on the loopback interface to detect IPv6 support when the user provides sysctlnet.ipv6.conf.all.disable_ipv6
to explicitly enable or disable IPv6 when creating a container. (SetKey()
will still be called from the prestart hook for buildkit, but it doesn't allow sysctl overrides.)The second commit probes for
::1
, and uses the result to re-generate/etc/hosts
accordingly.The daemon no longer disables IPv6 on all interfaces during initialisation. It now disables IPv6 only for interfaces that have not been assigned an IPv6 address:
- How to verify it
Updated unit tests, new integration test.
- Description for the changelog
By default, IPv6 will remain enabled on a container's loopback interface when the container is not connected to an IPv6-enabled network. So, for example, containers that are only connected to an IPv4-only network will now have the '::1' address on their loopback interface.
To disable IPv6 in a container, use option '--sysctl net.ipv6.conf.all.disable_ipv6=1' in the 'create' or 'run' command, or the equivalent 'sysctls' option in the service configuration section of a Compose file.
If IPv6 is not available in a container because it has been explicitly disabled for the container, the host's networking stack does not have IPv6 enabled, or for any other reason - the container's '/etc/hosts' file will not include IPv6 entries.