New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flaky unit test: test_auth_header_uses_first_match #11348
Comments
(I'm opening this because I have a theory about why it's happening, and a potential fix that I'd like to track against this bug separate to other discussions) |
For a sample build failure, see the logs at https://github.com/jayaddison/sphinx/actions/runs/4765570057/jobs/8471467146?pr=2#step:10:1918 from commit jayaddison@88fb703 . The test in question spins up an HTTP server that captures the client's request headers during linkchecking. The server runs in a separate thread, and should be closed and shutdown before the unit test continues and extracts results from a shared I think that the |
The linked failure is a timeout, not the result of an unexpected list. |
Thanks; my mistake. See https://github.com/jayaddison/sphinx/actions/runs/4765570057/jobs/8471467146?pr=2#step:10:1918 for an example of the failure. I agree that it is unusual that the header value is not found in the list; the thread/timing theory is the best explanation that I have so far. |
Your first link was actually correct, I misread the stdout capture as the test output 🤦. Sorry! Here’s what I believe happens:
The timeout is the root cause of the failure. If the request wasn’t interrupted, the server would have received and processed it, and the test would have succeeded. |
I'm thinking a bit more about this as a detail, and what it might imply. We would expect to see that behaviour if the client was blocked waiting to obtain a connection from the (maybe it's not the only way that a timeout could occur without the server receiving a request, but it is one possibility?) |
@francoisfreitag I appreciate your help on this - I have to admit to being fairly confused about the latest results from #11340, where multiple unit test failures on a30596f show this It seems like a good puzzle, and I think that session-based requests should be unit-testable, but I don't seem to be making much progress on finding the remaining problems. |
It could be yes. As you say, it’s not the only way.
That’s surprising indeed. I can reproduce when running just that test in a tight loop. 🤔 |
With a bisect, the culprit seems to be: I believe the reason is that ThreadingHTTPServer uses
Since we have a daemon thread, we may kill the server but have the daemon thread (which handles the request) linger, and have the test check the list content before the daemon thread gets to send the response. |
That's a good line of thinking - I'm tempted to try setting At https://github.com/sphinx-doc/sphinx/actions/runs/4811667688/jobs/8565941211?pr=11340#step:10:1928 there is some output from the client that suggests to me that it does receive a response:
|
Why was the |
sphinx/tests/test_build_linkcheck.py Lines 188 to 190 in 4eb706e
It’s not incompatible, I believe we can have the following situation:
|
Ok, that makes some sense. During a recent rebase of #11340, I dropped a previous commit 4d485ae (now off-branch) that was experimenting along those lines: --- a/tests/test_build_linkcheck.py
+++ b/tests/test_build_linkcheck.py
@@ -252,9 +252,9 @@ def capture_headers_handler(records):
self.do_GET()
def do_GET(self):
+ records.append(self.headers.as_string())
self.send_response(200, "OK")
self.end_headers()
- records.append(self.headers.as_string())
return HeadersDumperHandler
... so perhaps that could be reintroduced. Completion of the
Before introducing that change, I had seen cases where client-server interaction seemed to pause (perhaps deadlocked, I thought), resulting in timeouts on the client in tests such as When I was at 7162671 (off-branch) and experimented by removed the By enabling the threaded test server (originally commit cbedb4b, again now off-branch), I felt that that multitasking had now been unblocked since the workloads were no longer all waiting for time (and blockable resources) within a single thread. |
Also, an additional benefit (that I did hope for when switching to There is some risk in the switch to multi-threading since it could expose test or implementation fragility -- but the reward is that it may expose genuine implementation problems. Ideally we'd likely want tests that are both reliable (to confidently catch regressions and maintain functionality while allowing development to proceed quickly) and also realistic (providing coverage for situations like communication-timing issues). |
Before re-introducing that commit to #11340, eleven test failures had occurred for the latest commit (a30596f). After cherry-picking 4d485ae, the failure count has reduced to zero. (note: that's based on a single run of the unit tests for each of those commits; running the tests repeatedly would help to gain more confidence) |
Edit: I had been reading a somewhat-but-not-exactly-related documentation section about Python 3.11.2 (main, Mar 13 2023, 12:18:29) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from socketserver import ThreadingMixIn
>>> ThreadingMixIn.daemon_threads
False
>>> from http.server import ThreadingHTTPServer
>>> ThreadingHTTPServer.daemon_threads
True |
...and I have a goldfish-like memory. I experimented with I later removed that, generally on the principle that the smallest set of changes to achieve the desired result are best, and I did not have a reason to believe that enabling daemon threads was helping. |
I think that this bug was only relevant to my development branch of #11340. I'll re-open it if this test failure begins occurring again in the mainline branch. |
...
Not quite zero, after all. Although applying the change does significantly reduce the number of occurrences, the empty-headers result does still appear: https://github.com/sphinx-doc/sphinx/actions/runs/4820527285/jobs/8585137268#step:10:1889 |
Wrong link (copy-paste error); that should be: https://github.com/sphinx-doc/sphinx/actions/runs/4820527285/jobs/8585137268#step:10:1889 |
That’s not what I see in CPython source:
I’m unsure how to replicate? On 4eb706e, with Python 3.10.10 and DU 0.19, this test takes less than 0.2s on my machine.
I do see value here, but that also opens up a can of worms for the implementation, where the server needs to track its threads and join them. It’s certainly possible, but given that requests are few and mostly plain HEAD/GET, I’m not sure the tradeoff is worth it. 🤷 |
Ok, thank you. I had been reading the
I'm having difficulty replicating that slow performance result currently too, when testing against both 7162671 and 4eb706e.
That's reasonable. I'd prefer to provide some level of parallelism testing alongside the session-based linkcheck client upgrade though: if the change isn't robust, then someone else is going to have to spend time investigating later, and I'd prefer not to cause (or at least to minimize) that time cost. That said: I'm not particularly proud of how verbose I was during all this. I was determined to figure out various problems, and would like to co-operate effectively, but I think I verged on spamming. I'll continue to try to replicate the slow-test-duration problem, but may pause for a while. If I'm not contributing effectively then it's better to take some time out. |
Ok: ed45e3d provides a relevant testing commit. From that commit, I repeatedly find that (the 20s+ duration I quoted earlier may have been my memory of how long the entire The relevant changes from that commit are:
|
Could it be that the connection to the server is not immediately closed with HTTP/1.1 (uses persistent connections by default), and the I believe you’re familiar with https://docs.python-requests.org/en/latest/user/advanced/#body-content-workflow:
So, if I’m following correctly:
What do you think of the following plan:
I believe it would solve the actual issue with linkcheck, keep the test suite reasonably simple, and hopefully prevent regressions. P.S. even though verbose, I do appreciate your work on identifying and addressing the issue, and detailing your thoughts do help sometimes :) |
Yep, that's a great and concise description of the problem, thank you :) The same conditions also affect HTTP HEAD requests made by the
👍
👍 that sounds good.
Perfect, yep - good engineering practice to demonstrate the problem being solved before adding/modifying code to address it. In this case, I lacked a clear understanding of what I was solving (even though I had a vague idea something was wrong).
Thanks again! |
I’ve been running the test in a tight loop on bd2ed0b for a good hour (so more than 20k attempts), without reproducing. That sounds like a blip, possibly from the machine running the test being momentary slow. When the server begins its answer (before to actually send bytes over the network), it logs a line in the form All I can say is it took more than 0.075s for the request to be sent, and for the server to start processing it. It seems rare enough that we probably don’t need to spend too much time on it. But up to you. |
Ok, thanks for testing that. That's certainly convincing enough for me not to worry about it and redevelop a fresh pull request (with the extra test, this time). Perhaps that timed-out test was affected by exhaustion of the connection pool client-side? (perhaps explaining the fact that the server never produced any log output. see also yet another side quest that seemed to hint at inter-test resource interference related to garbage collection) Anyway: I'll begin developing the planned changes. |
@francoisfreitag Sanity-checking a detail here: does this imply that during the test, the server would need to be able to handle more requests in parallel than there are connections in the client's connection pool? (in other words: that it must be clear that the test is exercising the client resource consumption behaviour, and not the limitations of the server) |
For what I had in mind, the server capacity did not matter. I was expecting the server to keep the connection alive until the keep-alive timeout, since the first request connection was not closed properly. With the first request picking the connection from the pool (of one connection), I haven’t tested this at all, so don’t put too much weight behind my words. |
No problem, that approach sounds sensible - just wanted to check, so thanks for confirming. I hope to take a look at that tomorrow. |
Describe the bug
The
test_auth_header_uses_first_match
test seems to fail intermittently, particularly when threaded test HTTP servers are in use (as in #11340).How to Reproduce
Visible in continuous build output in jayaddison#2.
Environment Information
Sphinx extensions
Additional context
No response
The text was updated successfully, but these errors were encountered: