Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

datapath/iptable: fuzzy match for devices than exact match #37450

Conversation

liyihuang
Copy link
Contributor

@liyihuang liyihuang commented Feb 5, 2025

Use device filter to fuzzy match the devices than the exact match to match the users input for devices and MasqueradeInterfaces. We described this in the doc but not really doing it in the code.

we need to backport it all the way to 1.15 since that's when we introduce the MasqueradeRouteSource
Fixes: #37448

Fix traffic not getting masqueraded with wildcard devices or egress-masquerade-interfaces when enable-masquerade-to-route-source flag is set.

Use device filter to fuzzy match the devices than the exact match
to match the users input for `devices` and `MasqueradeInterfaces`.
We described this in the doc but not really doing it in the code.

Fixes: cilium#37448

Signed-off-by: Liyi Huang <liyi.huang@isovalent.com>
@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Feb 5, 2025
@liyihuang
Copy link
Contributor Author

/test

@liyihuang
Copy link
Contributor Author

/ci-integration

@liyihuang
Copy link
Contributor Author

/ci-ginkgo

@liyihuang
Copy link
Contributor Author

/ci-ipsec-upgrade

1 similar comment
@liyihuang
Copy link
Contributor Author

/ci-ipsec-upgrade

@liyihuang
Copy link
Contributor Author

/ci-e2e-upgrade

5 similar comments
@liyihuang
Copy link
Contributor Author

/ci-e2e-upgrade

@liyihuang
Copy link
Contributor Author

/ci-e2e-upgrade

@liyihuang
Copy link
Contributor Author

/ci-e2e-upgrade

@liyihuang
Copy link
Contributor Author

/ci-e2e-upgrade

@liyihuang
Copy link
Contributor Author

/ci-e2e-upgrade

@liyihuang
Copy link
Contributor Author

liyihuang commented Feb 6, 2025

I really think the CI failure for e2e upgrade is not related to this PR, but also don't know what caused it

@liyihuang
Copy link
Contributor Author

/ci-e2e-upgrade

@liyihuang liyihuang marked this pull request as ready for review February 6, 2025 01:13
@liyihuang liyihuang requested a review from a team as a code owner February 6, 2025 01:13
@liyihuang liyihuang requested a review from aditighag February 6, 2025 01:13
@liyihuang
Copy link
Contributor Author

liyihuang commented Feb 6, 2025

it looks like CI passed when only retriggering the failed ones.

Confirmed in the slack channel. its the CI issue

@joestringer
Copy link
Member

joestringer commented Feb 10, 2025

Even if this is the correct behavior per the documented steps, I worry a little that this may change the way Cilium behaves in production environments and whether that in itself could cause more problems than the bug being fixed. Do you have a sense for the risk of making this change? Could we start dropping traffic that was previously passed for instance?

(To be clear, this question relates to the backports; for future versions and even v1.17.x it sounds like the fix makes sense 👍 )

@joestringer joestringer added the release-note/bug This PR fixes an issue in a previous release of Cilium. label Feb 10, 2025
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Feb 10, 2025
@joestringer
Copy link
Member

Ah, I see the reproducer would require the user to specify the masquerade route source (unusual config) with eth+ or similar. Maybe the risk is not so high.

@joestringer joestringer added affects/v1.15 This issue affects v1.15 branch affects/v1.16 This issue affects v1.16 branch needs-backport/1.17 This PR / issue needs backporting to the v1.17 branch affects/v1.17 This issue affects v1.17 branch labels Feb 10, 2025
@liyihuang
Copy link
Contributor Author

Even if this is the correct behavior per the documented steps, I worry a little that this may change the way Cilium behaves in production environments and whether that in itself could cause more problems than the bug being fixed. Do you have a sense for the risk of making this change? Could we start dropping traffic that was previously passed for instance?

(To be clear, this question relates to the backports; for future versions and even v1.17.x it sounds like the fix makes sense 👍 )

I personally dont think this will cause any issue unless the users don't use this feature correctly (for example the packet following the default/other route ip table rules via eth0, and this bug fix will make the packet going to eth1 or eth1.1 based on the route table.) that's the only changes that I can think of.

Copy link
Member

@aditighag aditighag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix.

Based on the documentation, I wonder if the correct way to get the expected masquerading behavior is by setting the flag egress-masquerade-interfaces to something like eth+.

With the enable-masquerade-to-route-source: "true" option, Cilium will, by default, use interfaces listed in the devices field as the egress masquerade interfaces when egress-masquerade-interfaces is empty. When egress-masquerade-interfaces is set, it takes precedence over devices to choose which network interface should perform masquerading. You can set egress-masquerade-interfaces to match multiple interfaces like this: eth+ ens+.

@liyihuang
Copy link
Contributor Author

liyihuang commented Feb 12, 2025

yes. and the more common config is people using the eth+ in devices field and egress-masquerade-interfaces is empty, and we need to handle that. People could also configure the egress-masquerade-interfaces with eth+. Before this PR, we say it could take eth+ in doc but it can't.

Btw the doc that you copied is also written by me where I have incorrect understanding about how it works

Hope I made it clear.

Copy link
Member

@aditighag aditighag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a release note before the PR is merged.

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Feb 12, 2025
@aditighag aditighag added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Feb 12, 2025
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Feb 12, 2025
@joestringer joestringer added this pull request to the merge queue Feb 12, 2025
Merged via the queue into cilium:main with commit a650509 Feb 12, 2025
72 checks passed
@aditighag
Copy link
Member

@liyihuang FYI - I rephrased the release note as its purpose is to convey "what user-facing bug was fixed" rather than "how the bug was fixed".

@julianwiedmann julianwiedmann added sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. feature/snat Relates to SNAT or Masquerading of traffic and removed affects/v1.17 This issue affects v1.17 branch labels Feb 13, 2025
@liyihuang liyihuang deleted the pr/liyihuang/fuzzy_match_devices_in_enable_masquerade_route_source branch February 14, 2025 16:07
@nbusseneau nbusseneau mentioned this pull request Feb 14, 2025
22 tasks
@nbusseneau nbusseneau added backport-pending/1.17 The backport for Cilium 1.17.x for this PR is in progress. and removed needs-backport/1.17 This PR / issue needs backporting to the v1.17 branch labels Feb 14, 2025
@github-actions github-actions bot added backport-done/1.17 The backport for Cilium 1.17.x for this PR is done. and removed backport-pending/1.17 The backport for Cilium 1.17.x for this PR is in progress. labels Feb 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects/v1.15 This issue affects v1.15 branch affects/v1.16 This issue affects v1.16 branch backport-done/1.17 The backport for Cilium 1.17.x for this PR is done. feature/snat Relates to SNAT or Masquerading of traffic ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/bug This PR fixes an issue in a previous release of Cilium. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MasqueradeRouteSource only looking for exact match
5 participants