Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

disable GSO and ECN on kernels older than version 5 #4456

Merged
merged 1 commit into from
Apr 24, 2024

Conversation

marten-seemann
Copy link
Member

Fixes #4396. Fixes #4178. Fixes #4446. Closes #4447.

I'm tired of the constant stream of issues with ECN and GSO. These things should just work. These are always very hard to reproduce, and mostly due to some old buggy kernel implementation that has been fixed for years, but is still out there in the wild, with no hope of ever getting updated.

New approach: disable ECN and GSO on all old (before version 5) kernels. This has some false positives, but if somebody is running such an ancient kernel, they probably don't care that much about performance anyway.

cc @heixiaoma @tobyxdd @requilence @arashpayan @ainar-g

@marten-seemann marten-seemann force-pushed the screw-old-kernels branch 3 times, most recently from 689e9d3 to 30d7981 Compare April 23, 2024 15:38
Copy link

@ainar-g ainar-g left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you like us to try and release an edge build of AdGuard Home with quic-go from this branch? As far as I can see, there are minor API changes as well.

sys_conn_helper_linux.go Outdated Show resolved Hide resolved
Copy link

codecov bot commented Apr 23, 2024

Codecov Report

Attention: Patch coverage is 82.75862% with 5 lines in your changes are missing coverage. Please review.

Project coverage is 84.67%. Comparing base (6a4512a) to head (7135257).
Report is 2 commits behind head on master.

Files Patch % Lines
sys_conn_helper_linux.go 84.00% 2 Missing and 2 partials ⚠️
sys_conn_oob.go 75.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4456      +/-   ##
==========================================
+ Coverage   84.65%   84.67%   +0.02%     
==========================================
  Files         152      152              
  Lines       14419    14443      +24     
==========================================
+ Hits        12206    12229      +23     
+ Misses       1707     1706       -1     
- Partials      506      508       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@marten-seemann marten-seemann force-pushed the screw-old-kernels branch 2 times, most recently from 4dbc8ad to 6257f2b Compare April 23, 2024 15:46
@marten-seemann
Copy link
Member Author

Would you like us to try and release an edge build of AdGuard Home with quic-go from this branch? As far as I can see, there are minor API changes as well.

@ainar-g Yes, would be great to see if this actually resolves the problem.

@ainar-g
Copy link

ainar-g commented Apr 23, 2024

@marten-seemann, I've discovered #4457 while testing, and it seems to be a blocker for us, unfortunately. Feel free to wait for feedback from others or merge as-is.

@marten-seemann
Copy link
Member Author

@marten-seemann, I've discovered #4457 while testing, and it seems to be a blocker for us, unfortunately. Feel free to wait for feedback from others or merge as-is.

Fixed in #4458, and rebased this PR. Can you give it another try please?

@arashpayan
Copy link
Contributor

I think this is a great idea. I'd personally take it a step further and disable it on anything older than 5.4, 5.10 or even 5.15 which are all LTS releases. 5.0 was EOL in 2019.

@tobyxdd
Copy link
Contributor

tobyxdd commented Apr 24, 2024

I have no problem with this, but don't we know the exact versions that added the support for these features? For example, for #4396 it's 4.9.

It's too shotgun to disable it on anything older than 5.0 imo

@marten-seemann
Copy link
Member Author

I have no problem with this, but don't we know the exact versions that added the support for these features? For example, for #4396 it's 4.9.

@tobyxdd We could, but it seems like the initial implementations of both GSO and ECN contained bugs that were fixed in later (patch?) releases. Can't really fault the Linux maintainers for that, but the problem is that the update cycles are extremely long in the wild, and we still end up running on those systems years later from time to time.

I think this is a great idea. I'd personally take it a step further and disable it on anything older than 5.4, 5.10 or even 5.15 which are all LTS releases. 5.0 was EOL in 2019.

@arashpayan Are you aware of any problems in older 5.x releases? The intent here is not to punish people for using outdated software. I just don't want to cause people problems when they run quic-go there.

sys_conn_oob.go Outdated Show resolved Hide resolved
func isGSOSupported(syscall.RawConn) bool { return false }
func isGSOEnabled(syscall.RawConn) bool { return false }

func isECNEnabled() bool { return !isECNDisabledUsingEnv() }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to disable it for older macos versions too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven’t heard any complaints about macOS. Let’s wait and see?

@requilence
Copy link

requilence commented Apr 24, 2024

The GSO control message was introduced in kernel version 4.18. According to my observations, many Android devices still use the 4.19 kernel. Additionally, Google still supports upgrading devices with the 4.19 kernel to the latest Android 14. However, I haven't found any public data on the distribution of Android kernels in the wild.

At the same time, I observed cases when using GSO on android kernels which actually doesn't support it caused total stream stuck. It's much more critical than enhancing a performance with this optimisation. So, at the end, I think disabling GSO on kernels older than 5.0 in the quic-go library totally makes sense, as the Linux kernel repo contains numerous fixes for GSO between 4.18 and 5.0.

@marten-seemann
Copy link
Member Author

Thanks for all the input here everyone!

@marten-seemann marten-seemann merged commit 12aa638 into master Apr 24, 2024
34 checks passed
@bt90
Copy link
Contributor

bt90 commented Apr 24, 2024

According to my observations, many Android devices still use the 4.19 kernel.

Depending on the manufacturer, the switch to the 5.x kernel only happened on devices that launched with Android 11 or 12. So if the device is older than ~2 years, chances are it's still running the 4.x kernel.

image

Source: https://source.android.com/docs/core/architecture/kernel/android-common#feature-and-launch-kernels

That said, blocking GSO/ECN on anything pre 5.x is still the right call. quic-go can't and shouldn't try to unfuck the Android ecosystem.

@marten-seemann marten-seemann deleted the screw-old-kernels branch April 25, 2024 08:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
7 participants