Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libnetwork/overlay: remove dead code #44965

Merged
merged 12 commits into from
Apr 11, 2023

Conversation

akerouanton
Copy link
Member

@akerouanton akerouanton commented Feb 9, 2023

- What I did

All the code related to Classic Swarm is removed from the overlay driver, as well as all the code paths becoming dead because of that.

- How to verify it

CI should be green. It can also be tested by forming a swarm cluster with two dev containers:

dev1$ docker network create --driver overlay --scope global --attachable testnet
dev1$ docker run -d --rm --name whoami --network testnet traefik/whoami
dev2$ docker run --rm -t --name netshoot --network testnet nicolaka/netshoot curl http://whoami

- A picture of a cute animal (not mandatory but encouraged)

@akerouanton akerouanton marked this pull request as draft February 10, 2023 16:10
@akerouanton akerouanton force-pushed the libnetwork-dead-code branch 2 times, most recently from f182240 to 112b050 Compare February 16, 2023 10:03
@akerouanton akerouanton force-pushed the libnetwork-dead-code branch 2 times, most recently from 76657dc to 79a486e Compare March 2, 2023 16:13
@akerouanton akerouanton marked this pull request as ready for review March 2, 2023 16:39
@akerouanton akerouanton changed the title Remove dead code from libnetwork libnetwork/overlay: remove dead code Mar 2, 2023
Copy link
Contributor

@corhere corhere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost everything LGTM. Has overlay host mode been documented as deprecated? We might not be able to rip it out unless it's gone through a deprecation cycle.

libnetwork/drivers/overlay/overlay.go Outdated Show resolved Hide resolved
@@ -205,7 +188,7 @@ func (d *driver) initializeVxlanIdm() error {
return nil
}

d.vxlanIdm, err = idm.New(d.store, "vxlan-id", vxlanIDStart, vxlanIDEnd)
d.vxlanIdm, err = idm.New(nil, "vxlan-id", vxlanIDStart, vxlanIDEnd)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woohoo, datastore support can be ripped out of package idm, which is the only user of bitseq.Handle... WE CAN DELETE PACKAGE bitseq SOON! 🎉 🎉

libnetwork/drivers/overlay/ov_network.go Outdated Show resolved Hide resolved
libnetwork/drivers/windows/overlay/overlay_windows.go Outdated Show resolved Hide resolved
@corhere

This comment was marked as outdated.

@corhere
Copy link
Contributor

corhere commented Mar 27, 2023

Since host mode is an implementation detail rather than a user-facing feature, and is only needed to support overlay networks on long-deprecated kernels, that code can be dropped without first going through a deprecation cycle. It'll need a changelog entry, though.

@corhere corhere added impact/changelog kind/refactor PR's that refactor, or clean-up code labels Mar 27, 2023
Copy link
Contributor

@corhere corhere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we please save the new code and behaviour changes for a follow-up PR? Let's keep this PR on topic: removing dead code. Lazily cleaning up stale sandboxes may appear dumb, and it might be, but it also might have been done that way for a good reason. (Containers can survive a daemon restart, e.g. over an upgrade. Lazy clean-up might be a way to not disrupt the containers before Swarm has a chance to replace them with new ones.)

Comment on lines -236 to -251
if n.secure {
for _, vni := range vnis {
programMangle(vni, false)
programInput(vni, false)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting! So failing to clean up those rules is a long-standing bug. We should fix that. Follow-up PR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'll submit a follow-up PR to properly clean-up those rules, once this one got merged.

libnetwork/drivers/overlay/ov_network.go Show resolved Hide resolved
libnetwork/drivers/overlay/ov_network.go Show resolved Hide resolved
libnetwork/drivers/overlay/overlay.go Outdated Show resolved Hide resolved
@akerouanton
Copy link
Member Author

Could we please save the new code and behaviour changes for a follow-up PR? Let's keep this PR on topic: removing dead code.

Right, I'll move the incriminated commits to another branch for now.

Lazily cleaning up stale sandboxes may appear dumb, and it might be, but it also might have been done that way for a good reason. (Containers can survive a daemon restart, e.g. over an upgrade. Lazy clean-up might be a way to not disrupt the containers before Swarm has a chance to replace them with new ones.)

I think it was meaningful when Classic Swarm was supported, but I don't see any reason to do that anymore with only Swarm Mode supported since it doesn't support live-restore:

  1. dockerd won't start if it's part of a swarm cluster when live-restore is enabled (dockerd errors out with: Error starting cluster component: --live-restore daemon configuration is incompatible with swarm mode) ;
  2. It returns an error if you try to join/init a cluster whereas live-restore is enabled (it fails with: Error response from daemon: --live-restore daemon configuration is incompatible with swarm mode) ;

Now that being said, I think that's not enough to prove my point. I don't want to spend time on this right now but I'll resubmit those changes if I come up with stronger arguments.

This command was useful when overlay networks based on external KV store
was developed but is unused nowadays.

As the last reference to OverlayBindInterface and OverlayNeighborIP
netlabels are in the ovrouter cmd, they're removed too.

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
The overlay driver was creating a global store whenever
netlabel.GlobalKVClient was specified in its config argument. This
specific label is not used anymore since 142b522 (moby#44875).

golangci-lint now detects dead code. This will be fixed in subsequent
commits.

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
The overlay driver was creating a global store whenever
netlabel.GlobalKVClient was specified in its config argument. This
specific label is unused anymore since 142b522 (moby#44875).

It was also creating a local store whenever netlabel.LocalKVClient was
specificed in its config argument. This store is unused since
moby/libnetwork@9e72136 (moby/libnetwork#1636).

Finally, the sync.Once properties are never used and thus can be
deleted.

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
- GetIfaceAddr is unused since moby/libnetwork@e51ead59
  (moby/libnetwork#670).
- ValidateAlias and ParseAlias are unused since moby/moby@0645eb84
  (moby#42539).

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
- LocalKVProvider, LocalKVProviderURL, LocalKVProviderConfig,
  GlobalKVProvider, GlobalKVProviderURL and GlobalKVProviderConfig
  are all unused since moby/libnetwork@be2b6962 (moby/libnetwork#908).
- GlobalKVClient is unused since 0fa873c and c8d2c6e.
- MakeKVProvider, MakeKVProviderURL and MakeKVProviderConfig are unused
  since 96cfb07 (moby#44683).
- MakeKVClient is unused since 142b522 (moby#44875).

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
Prior to 0fa873c, the serf-based event loop was started when a global
store was available. Since there's no more global store, this event loop
and all its associated code is dead.

Most dead code detected by golangci-lint in prior commits is now gone.

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
VNI allocations made by the overlay driver were only used by Classic
Swarm. With Swarm v2 mode, the driver ovmanager is responsible of
allocating & releasing them.

Previously, vxlanIdm was initialized when a global store was available
but since 142b522, no global store can be instantiated. As such,
releaseVxlanID actually does actually nothing and iptables rules are
never removed.

The last line of dead code detected by golangci-lint is now gone.

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
The overlay driver in Swarm v2 mode doesn't support live-restore, ie.
the daemon won't even start if the node is part of a Swarm cluster and
live-restore is enabled. This feature was only used by Swarm Classic.

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
Since the previous commit, data from the local store are never read,
thus proving it was only used for Classic Swarm.

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
Since a few commits, subnet's vni don't change during the lifetime of
the subnet struct, so there's no need to lock the network before
accessing it.

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
Since 0fa873c, there's no function writing overlay networks to some
datastore. As such, overlay network struct doesn't need to implement
KVObject interface.

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
Linux kernel prior to v3.16 was not supporting netns for vxlan
interfaces. As such, moby/libnetwork#821 introduced a "host mode" to the
overlay driver. The related kernel fix is available for rhel7 users
since v7.2.

This mode could be forced through the use of the env var
_OVERLAY_HOST_MODE. However this env var has never been documented and
is not referenced in any blog post, so there's little chance many people
rely on it. Moreover, this host mode is deemed as an implementation
details by maintainers. As such, we can consider it dead and we can
remove it without a prior deprecation warning.

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
Copy link
Contributor

@corhere corhere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Comment on lines -65 to -71
// GetIfaceAddr returns the first IPv4 address and slice of IPv6 addresses for the specified network interface
func GetIfaceAddr(name string) (net.Addr, []net.Addr, error) {
iface, err := net.InterfaceByName(name)
if err != nil {
return nil, nil, err
}
addrs, err := iface.Addrs()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a random comment 🤔

Haven't been through all changes yet, but I saw this being removed when I looked from my phone, and I was curious why it wasn't used (as some options accept either an IP-address or an interface name). But it looks like we have various "variations" on a a theme (some are test-utilities, so perhaps irrelevant). I haven't compared them fully (to see if there's specific differences between them), but was wondering if there would be use for a common implementation (or if it would be worth looking into);

  • getBindAddr

    moby/libnetwork/agent.go

    Lines 58 to 62 in dd3b71d

    func getBindAddr(ifaceName string) (string, error) {
    iface, err := net.InterfaceByName(ifaceName)
    if err != nil {
    return "", fmt.Errorf("failed to find interface %s: %v", ifaceName, err)
    }
  • resolveInterfaceAddr
    func resolveInterfaceAddr(specifiedInterface string) (net.IP, error) {
    // Use a specific interface's IP address.
    intf, err := net.InterfaceByName(specifiedInterface)
    if err != nil {
    return nil, errNoSuchInterface
    }
    addrs, err := intf.Addrs()
  • getExternalAddress
    func getExternalAddress(t *testing.T) net.IP {
    t.Helper()
    iface, err := net.InterfaceByName("eth0")
    skip.If(t, err != nil, "Test not running with `make test-integration`. Interface eth0 not found: %s", err)
    ifaceAddrs, err := iface.Addrs()
  • setupDriver
    iface, err := net.InterfaceByName("eth0")
    if err != nil {
    t.Fatal(err)
    }
    addrs, err := iface.Addrs()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

libnetwork.getBindAddr returns an arbitrary interface IP address which is not an IPv6 link-local address, with no preference of IP address family. On the other hand, cluster.resolveInterfaceAddr prefers IPv4 over IPv6, neglects to skip over IPv6 link-local addresses, and returns an error if there is more than one address of either family associated with the interface. Since IPv6 requires a link-local address be assigned to each interface in addition to any routable addresses, cluster.resolveInterfaceAddr is broken for any interface which has a routable IPv6 address associated.

I question the utility of both functions. While it is kinda convenient to let the administrator specify an interface name instead of an address, specifying an interface implies a promise which the daemon can't currently keep: that the daemon will dynamically handle changes to the IP addresses associated with the interface.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question the utility of both functions. While it is kinda convenient to let the administrator specify an interface name instead of an address, specifying an interface implies a promise which the daemon can't currently keep: that the daemon will dynamically handle changes to the IP addresses associated with the interface.

Yeah, swarm doesn't handle IP changes at all. The "allow interface" was mostly out of convenience (ISTR part of that was created at the time for automated setups of clusters)

It still feels like a good thing to verify the various code paths to check if they're consistent (and do the right thing).

Definitely not for this PR though!

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a nit (and can't comment on commit messages);

This
specific label is not used anymore since https://github.com/moby/moby/commit/142b522946f67a1884228119ebce1e7e5a740364 (https://github.com/moby/moby/pull/44875).

I tend to avoid referencing PR numbers in commit messages, as they can sometimes get noisy (forks backporting fixes etc). What I usually do;

  • refer to the commit in the commit message (useful, even without GitHub around)
  • in the PR description on GitHub I add the PR number (if useful)

Either way; not a blocker 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants