-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v1.16 Backports 2024-11-05 #35781
v1.16 Backports 2024-11-05 #35781
Conversation
[ upstream commit 8190512 ] This commit changes the builder image for cilium-operator to be the cilium-builder image, rather than the golang image. This change will allow for arm64 builds of the cilium-operator to be built with CGO enabled due to the cilium-builder image containing multi-platform C compilers. This build configuration is a stepping stone for other build-time features, such as CGO binaries with race detection enabled. Fixes: 35324 Signed-off-by: Ryan Drew <ryan.drew@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit 9210dff ] Add a new endpoint flag ENDPOINT_F_ATHOSTNS that informs the datapath to deliver endpoint's traffic to the host stack, but for which ARP and IPv6 ND are still served by the bpf datapath due to the endpoint's IP being from the node's IP allocation CIDR range. This can be used for enabling IPv6 ND for Ingress IPs in future. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit fd50af6 ] Even endpoints without bpf programs or policy need ARP and IPv6 ND to work so that traffic can reach them. Place such endpoints in the bpf endpoints map (aka lxcmap). This fixes missing IPv6 ND responses for the Ingress IPs. Fixes: #32980 Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit 985d3d9 ] Add hubble release Make target in preparation to the phase 2 of CFP-31893 [^1] to push hubble CLI release artifacts directly from cilium/cilium repository. [^1]: https://github.com/cilium/design-cfps/blob/main/cilium/CFP-31893-move-hubble-into-cilium.md#phase-2-update-the-release-process Signed-off-by: Michi Mutsuzaki <michi@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit 1f5be4a ] Add the kernel config option. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit 392821c ] If a shared client exchange fell into the fail-safe timeout of one minute, but the handler loop (due to either an error, closing or a _very_ delayed response) would write to the now reader-less channel, it would block all future progress of this shared client. Prevent that from happening by buffering the channel for the one message it will receive. The corresponding, backportable change cilium/dns is cilium/dns#15. Signed-off-by: David Bimmler <david.bimmler@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit 911ed9f ] Set local address for BGP peer based on user provided configuration. Signed-off-by: harsimran pabla <hpabla@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit d8f1fae ] Be consistent in how we treat the xt_socket module - if we attempt to load it, then we should also warn about any error. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit 40f2d3f ] Currently, the flag `loadbalancer-l7` isn't properly registered. Even though it's working fine when reading the config property from the Cilium Configmap, it's not working when trying to use the flag as Go flag. ``` 2024-10-29T16:24:26Z debug layer=debugger Adding target 23 "/usr/bin/cilium-operator-generic-bin --config-dir=/tmp/cilium/config-map --debug=true --loadbalancer-l7=envoy" Error: unknown flag: --loadbalancer-l7 ``` Therefore, this commit is moving the config property into the respective Hive Cell where it gets properly registered. Reported-by: André Martins <andre@cilium.io> Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit e52a80a ] Preparation work to extract the endpoint cluster name from the returned labels. From the caller's perspective, it is easier to find a well-known label after this patch as we can index the returned labels.Labels map instead of iterating over a string slice. Signed-off-by: Alexandre Perrin <alex@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit f22aa4f ] Before this patch, Hubble debug events would not have their endpoint cluster name set in Hubble flows. Signed-off-by: Alexandre Perrin <alex@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com>
…Error for peer manager client [ upstream commit 0b8beea ] When connecting to the hubble peer manager ensure we return the underlying connection error so it's easier to diagnose connection related problems. Currently the only error returned is the context timeout: ``` time="2024-09-18T21:13:13Z" level=warning msg="Failed to create peer client for peers synchronization; will try again after the timeout has expired" error="context deadline exceeded" subsys=hubble-relay target="hubble-peer.kube-system.svc.cluster.local.:443" ``` This will ensure the underlying connection level error is returned when the context is cancelled. In the future we should switch to not using grpc.WithBlock at all which avoids many of these problems, but that requires more testing. In the short-term, lets set these dial options to improve things now, in-case the switch to non-blocking dials is delayed. Signed-off-by: Chance Zibolski <chance.zibolski@gmail.com> Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit f4988cb ] This commit introduced IPv6 flow label control for SRv6 encapsulated packets. Currently IPv6 flow label is set to 0x0 value. IPv6 flow label is used to perform ECMP load-balancing in the core of the network. Setting flow label provides entropy and allows packet flows to be correctly load balanced. The commit uses bfp_get_hash_recalc helper function to populate ctx->hash field and uses the result of helper to set flow label based on 16 lsb of the hash. There is no configurable flag to control this behavior. Additional change in bpf/lib/common.h to use get_hash(ctx) helper function instead of get_hash_recalc. Working assumption is that hash calculation is performed once. Signed-off-by: Arkadiusz Kaliwoda (akaliwod) <akaliwod@cisco.com> Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit 08c311f ] There was a bug where the operator was reporting policies containing ICMP fields as invalid, even though they were accepted by the daemon. It turns out that because option.Config.EnableICMPRules is populated by `vp.GetBool()`, it's not enough just to initialize it to the default value. We must bind that command-line argument as well. Fixes: #35214 Signed-off-by: Casey Callendrello <cdc@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit 123f374 ] Jordan reported that CONFIG_NETFILTER_XT_TARGET_MARK is missing in CI kernels [0]. We should also it to the documentation to make it clear that it is needed. Reported-by: Jordan Rife <jrife@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: #35542 (comment) [0] Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit 13d3c46 ] Hubble CLI will fail to validate certificates with an expiration that's too large on recent MacOS versions with the following error: ``` rpc error: code = Unavailable desc = connection error: desc = "transport: authentication handshake failed: tls: failed to verify certificate: x509: “*.hubble-relay.cilium.io” certificate is not standards compliant" ``` This is due to https://support.apple.com/en-us/103769 which specifies: > All TLS server certificates must comply with these new security requirements in iOS 13 and macOS 10.15: > ... > Additionally, all TLS server certificates issued after July 1, 2019 (as indicated in the NotBefore field of the certificate) must follow these guidelines: > ... > TLS server certificates must have a validity period of 825 days or fewer (as expressed in the NotBefore and NotAfter fields of the certificate). Since we have many users using MacOS and are likely using the default values with Hubble when starting with TLS, we should adjust the default value to work with recent MacOS versions. Users who wish to preserve the existing expiration validity can set `hubble.tls.auto.certValidityDuration` to the previous default value. Signed-off-by: Chance Zibolski <chance.zibolski@gmail.com> Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit 843a6c4 ] Allow to interrupt waiting for ipcache synchronization from the kvstore if the given context expires, to prevent dangling go routines. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit e956866 ] Currently, endpoint regeneration already explicitly waits for synchronization of ipcache information from the kvstore when enabled. However, nodes contribute to the ipcache entries as well (most notably about the remote node IPs). Hence, let's additionally wait for node synchronization from kvstore before triggering regeneration, to prevent potentially breaking pod-to-node connectivity due to incomplete ipcache state. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit 94acf90 ] Signed-off-by: Andrii Iuspin <andrii.iuspin@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com>
…ections` [ upstream commit 3ce7b9f ] Previously, when both `socketLB.enabled` and `socketLB.terminatePodConnections`were set in the Helm chart, the cilium-configmap would include the `bpf-lb-sock-terminate-pod-connections` twice. This fixes the issue by only emitting the key once. Signed-off-by: Fred Heinecke <fred.heinecke@yahoo.com> Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit 3ecac71 ] Signed-off-by: Nico Vibert <nvibert@cisco.com> Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit 9ee0ee4 ] This commit removes a warning that was intended to warn the user that running in native routing mode without a correct native routing CIDR will lead to problems. However, the conditions for the warning are not fully correct. There are native routing setups where the user does not need to specify a native routing CIDR, namely: 1. The user is running in ENI or AlibabaCloud IPAM, where the native routing CIDR is auto-detected if not explicitly configured. 2. The user is running with BPF ip-masq-agent where the native routing CIDR is optional (as there are other means of specifying exclusion CIDRs). 3. The user is not running with Cilium-based masquerading. The existing warning did not trigger for case 1, as the if condition checked for `auto-direct-node-routes` rather than the `routing-mode` flag, and `auto-direct-node-routes` is not used on ENI/AlibabaCloud IPAM modes. However, the warning would trigger for cases 2 and 3, which is incorrect. Cases 2 and 3 should not emit a warning either. And indeed already have a check on the native routing CIDR that handles all three cases correctly, namely `checkIPv{4,6}NativeRoutingCIDR` here: https://github.com/cilium/cilium/blob/bc26c6d3a74e3bfbeb1149e5116de51c496dd254/pkg/option/config.go#L3451-L3476 Therefore, we can safely remove the warning in this commit, as the conditions are already covered by `checkIPv{4,6}NativeRoutingCIDR`. I have also manually tested that the check does take effect if one sets `routingMode=native` and `autoDirectNodeRoutes=true` without specifying a native routing CIDR. Fixes: 6743d28 ("daemon: warn if auto direct node routes is set without a native routing CIDR") Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit 07fd8d5 ] That previous statement about matching states or sockets was incorrect. The new statement was verified with several tests in our CI. Co-authored-by: Louis DeLosSantos <louis.delos@isovalent.com> Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit d5457ff ] Log "Endpoints restored, starting serving" only for the typeURL the stream is for, rather than all typeURLs the stream is capable of serving. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit 3972783 ] This helm flag maps to the --trace-sock agent flag. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Jussi Maki <jussi@isovalent.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My commits look good, thanks!
/test-backport-1.16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🥰
The K8SUpdates failure in ginkgo tests seems likely to be related to #35594. |
bpf-lb-sock-terminate-pod-connections
#35703 (@solidDoWant)Once this PR is merged, a GitHub action will update the labels of these PRs: