Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

macvlan, ipvlan: generate unsolicited ARP/ND advertisements on link up? #47176

Open
corhere opened this issue Jan 23, 2024 · 2 comments
Open
Labels
area/networking/d/ipvlan area/networking/d/macvlan kind/enhancement Enhancements are not bugs or new features but can improve usability or performance.

Comments

@corhere
Copy link
Contributor

corhere commented Jan 23, 2024

Description

We should get the kernel to send unsolicited IPv4 ARP and IPv6 Neighbor Discovery advertisements when bringing up macvlan and ipvlan container links. That would allow (at least) L2 switches to eagerly update their forwarding tables when a container link's MAC address changes, or a container with a static MAC address is rescheduled onto a node connected to a different switch port. Otherwise the containers will be unreachable from the network until the stale forwarding entries expire.

(The following is all speculation as the user has not yet responded.) I suspect that the author of the linked PR is using either the macvlan or ipvlan driver, tried setting the net.ipv6.conf.all.ndisc_notify sysctl in the container config and found it to be effective only when the interface's IPv6 address is added with the optimistic DAD option. Optimistic DAD would have worked around the issue of user-specified container sysctls getting applied after libnetwork-setkey by delaying the sending of unsolicited ND advertisements until after the container runtime sets ndisc_notify. The kernel sends an unsolicited ND advertisement when ndisc_notify is enabled and an address becomes no longer tentative, which happens after duplicate address detection completes. Optimistic DAD likely delays things long enough that the container runtime can win the race to set ndisc_notify before the kernel sets the address as not-tentative most of the time.

With #47062 changing the order of operations such that user sysctls are applied before any network links are brought up, users would be able to opt in using docker run --sysctl net.ipv6.default.ndisc_notify=1 without any workarounds or hacks. However, I posit that users should not have to opt in; net.ipv4.conf.<iface>.arp_notify=1 and net.ipv6.conf.<iface>.ndisc_notify=1 should be unconditionally set by libnetwork. I cannot think of any downside. Either the unsolicited advertisements are accepted by peers and a restarted/moved/rescheduled container's networking Just Works, or they are ignored and nothing happens differently from today.

(cc @psaab)

@corhere corhere added status/0-triage kind/enhancement Enhancements are not bugs or new features but can improve usability or performance. area/networking/d/ipvlan area/networking/d/macvlan labels Jan 23, 2024
@corhere
Copy link
Contributor Author

corhere commented Apr 12, 2024

Annoyingly, net.ipv4.conf.<iface>.arp_notify=1 is bugged in at least the Ubuntu 20.04 kernel and Docker Desktop's 6.6.16-linuxkit kernel. The gratuitous ARP is allegedly generated in the kernel on link up, but never reaches the network. Changing the link's hardware address does trigger the kernel to send the ARP packet, but in my experimentation with a Docker Desktop kernel it only reaches the network if enough time has elapsed since the link has come up. That suggests it's a race condition inside the kernel. We would either have to work around this kernel bug by finding some way to reliably wait for long enough that setting the hardware address reliably gets the kernel to send the ARP packet to the network, or we'd have to make the daemon send the ARP announcement from userspace over a packet socket. Sending from userspace has the advantage of giving us full control over the procedure, allowing us to do things like broadcasting repeat announcements for improved reliability of delivery when receive queues are full.

@corhere
Copy link
Contributor Author

corhere commented Apr 16, 2024

I suspect that the kernel's support for generating unsolicited IPv6 neighbor advertisements is slightly less broken than the IPv4 equivalent. Linux implements Duplicate Address Detection in kernelspace, and when net.ipv6.conf.<iface>.ndisc_notify=1 is enabled a neighbor advertisement is transmitted for each address assigned after DAD has completed for the address. It still only sends one single advertisement, however. Neighbor advertisements on link up for NODAD addresses are implemented in the same way as the bugged IPv4 ARP advertisements, so unless there is special buffering behaviour in the IPv6 stack (which there might be; needs testing) it may be afflicted by the same bug as IPv4.

IPv4 Address Conflict Detection (RFC 5227), an extension to ARP, is a very recent development in the IPv4 world. Nobody has bothered to implement it in kernelspace as it can be implemented by a CAP_NET_RAW userspace daemon with a packet(7) socket. Multiple userspace implementations exist, often integrated with a DHCP client. While we may be able to defer to the kernel's IPv6 DAD implementation, we would have to implement IPv4 Address Conflict Detection ourselves. In addition to the ipvlan and macvlan use case where non-container peers might be assigned a duplicate address, the bridge driver could also benefit by detecting when an IPAM-allocated address is already in use by an address assigned to a container link by the container's userspace through CAP_NET_ADMIN powers.

For macvlan, ipvlan and bridge endpoints, I think we should:

  • get libnetwork to broadcast unsolicited advertisements for all IPv4 and IPv6 addresses assigned to an endpoint from userspace, respecting the NODAD and NOARP flags
    • broadcast multiple announcements at intervals (per RFC 5227 §2.3 and RFC 2461 §7.2.6) for robustness to packet loss
    • Subscribe to netlink notifications to handle advertising addresses configured by the running container on behalf of the container? Maybe overkill or an anti-feature; what if a systemd-networkd was running in the container?
  • Perform IPv4 ACD and enable IPv6 (Optimistic) DAD on bridge, macvlan and ipvlan links
    • Let the kernel handle IPv6 DAD; we only make the announcements after the fact
    • If a statically-assigned address fails ACD/DAD we could fail starting the container with an informative error
    • If a dynamic (IPAM) assignment fails ACD/DAD we could try again after allocating a new address — with a time limit, of course, and maybe also a maximum number of attempts

Overlay network endpoints would also benefit from unsolicited advertisements, however due to the architecture of the driver it is not possible to successfully perform ACD using ARP or DAD using IPv6-ND. ARP traffic transmitted to the overlay network will reach other containers running on the same node, but not reach any container on any other node. If we want to implement address conflict detection, it will have to be done over the overlay network control plane instead of the data plane. Unsolicited advertisements are just as useful in overlay networks for eagerly updating neighbor table caches, but will have to be "proxied" through the overlay network's control plane to the other nodes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/networking/d/ipvlan area/networking/d/macvlan kind/enhancement Enhancements are not bugs or new features but can improve usability or performance.
Projects
None yet
Development

No branches or pull requests

2 participants