Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

linkcheck builder: begin using session-based HTTP requests #11340

Conversation

jayaddison
Copy link
Contributor

@jayaddison jayaddison commented Apr 18, 2023

Feature or Bugfix

  • Bugfix
  • Refactoring

Purpose

  • Reduce the number of HTTP connections used during linkchecking, reducing network resource usage

Detail

Relates

…nkchecking

This ensures that the close method of the response is called when the response variable goes out-of-scope
…nagers during linkchecking""

This reverts commit 692fdef.
@jayaddison
Copy link
Contributor Author

jayaddison commented Apr 18, 2023

Although it might seem strange, I do expect (and intend) the unit tests to fail with 4305d11 to begin with. Despite the pull request title, the session support (ad12d25) is reverted as of that commit, and so the assertion that only a single HTTP connection is used should fail.

Reverting 3bf835c should re-introduce session support, and allow those tests to pass.

@jayaddison

This comment was marked as outdated.

@jayaddison

This comment was marked as outdated.

@jayaddison jayaddison changed the title linkcheck builder: begin using session-based HTTP requests web content retrieval: begin using session-based HTTP requests Apr 18, 2023
@jayaddison jayaddison changed the title web content retrieval: begin using session-based HTTP requests linkcheck builder: begin using session-based HTTP requests Apr 18, 2023
@jayaddison

This comment was marked as outdated.

@AA-Turner
Copy link
Member

Intermittant failures on Python 3.10 with DU 0.18 or 0.19--other than that looks good.

A

@@ -424,6 +425,7 @@ def check(docname: str) -> tuple[str, str, int]:
check_request = self.wqueue.get()
next_check, hyperlink = check_request
if hyperlink is None:
self._session.close()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we close here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to ensure that the requests session's resources are cleaned up when the linkchecker completes -- some warnings appeared during testing to indicate that they weren't -- and this seemed to be a good place for it (the None hyperlink is used as a marker (sentinel?) to indicate that the worker should shutdown).

@jayaddison
Copy link
Contributor Author

Intermittant failures on Python 3.10 with DU 0.18 or 0.19--other than that looks good.

Thanks for the review; I'm taking a look into those test failures now.

FWIW: I saw similar behaviour using Python 3.11 before adding acec710 - and I think that it's possible that calling flush_headers during the response may help (but I'll try to confirm that).

@jayaddison
Copy link
Contributor Author

FWIW: I saw similar behaviour using Python 3.11 before adding acec710 - and I think that it's possible that calling flush_headers during the response may help (but I'll try to confirm that).

On second thoughts: the asserted header is an auth header from the (linkcheck) client - so that makes the server headers seem less-relevant. Either way: I'll investigate.

@jayaddison
Copy link
Contributor Author

jayaddison commented Apr 21, 2023

Intermittant failures on Python 3.10 with DU 0.18 or 0.19--other than that looks good.

There seem to be two problems that contribute to the failures:

I've prepared fixups for each of these -- they're testing well so far -- and may also backport them to #11318 for pedantic consistency.

@jayaddison
Copy link
Contributor Author

I've prepared fixups for each of these -- they're testing well so far -- and may also backport them to #11318 for pedantic consistency.

Maybe it's a better idea not to backport them. That would help to confirm whether these issues affect both individual (non-session) HTTP requests, or only session-based HTTP requests.

@jayaddison
Copy link
Contributor Author

With stream=False (the default) for body content reads, the server may block during this HEAD request. That might indicate a requests bug, since for an HTTP HEAD request I don't think that body content should be expected. An alternative would be to send a zero Content-Length header, as we do in another server handler fixture -- personally I'd prefer to preserve the two different server behaviours because it provides some test coverage diversity.

Even after adjusting the code to use stream=True, there still seems to be the possibility of a stalled connection (resulting in intermittent timeouts).

I'm still relatively confident that sending the Content-Length header from all of the server test handler fixtures would resolve this.. but I'm still trying to understand the problem in more detail to figure out whether there's a deeper bug that can be resolved, or whether it's unsolvable.

@jayaddison
Copy link
Contributor Author

I seem to be running in circles on this one without really figuring out why the remaining unit test failures (all timeouts, I believe) are occuring.

Could it be that multiple unit tests share the same urllib3 connection pool, and that the workers that each linkcheck test spins up aren't being released? (resource contention across tests)

I'll try to test that theory in my personal fork of the repository.

@jayaddison
Copy link
Contributor Author

I seem to be running in circles on this one without really figuring out why the remaining unit test failures (all timeouts, I believe) are occuring.

🤦 I should read before commenting!

The failures don't exhibit timeouts here, but they do exhibit the test_auth_header_uses_first_match header-recording problem.

@jayaddison
Copy link
Contributor Author

There was a failure case that I hadn't seen before in https://github.com/sphinx-doc/sphinx/actions/runs/4820054596/jobs/8584030994 from commit 0ed465c here.

From the test failure output of test_too_many_requests_retry_after_HTTP_date:

E         + Traceback (most recent call last):
E         +   File "C:\hostedtoolcache\windows\Python\3.11.3\x64\Lib\socketserver.py", line 691, in process_request_thread
E         +     self.finish_request(request, client_address)
E         +   File "C:\hostedtoolcache\windows\Python\3.11.3\x64\Lib\socketserver.py", line 361, in finish_request
E         +     self.RequestHandlerClass(request, client_address, self)
E         +   File "C:\hostedtoolcache\windows\Python\3.11.3\x64\Lib\socketserver.py", line 755, in __init__
E         +     self.handle()
E         +   File "C:\hostedtoolcache\windows\Python\3.11.3\x64\Lib\http\server.py", line 434, in handle
E         +     self.handle_one_request()
E         +   File "C:\hostedtoolcache\windows\Python\3.11.3\x64\Lib\http\server.py", line 400, in handle_one_request
E         +     self.raw_requestline = self.rfile.readline(65537)
E         +                            ^^^^^^^^^^^^^^^^^^^^^^^^^^
E         +   File "C:\hostedtoolcache\windows\Python\3.11.3\x64\Lib\socket.py", line 706, in readinto
E         +     return self._sock.recv_into(b)
E         +            ^^^^^^^^^^^^^^^^^^^^^^^
E         + ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host

I might rebase in a moment, because I think we can drop #11349 from the changeset here. But I'd like to revisit this failure even if it doesn't reappear immediately.

@jayaddison jayaddison force-pushed the issue-11324/linkcheck-sessioned-requests branch from 0ed465c to bd2ed0b Compare April 27, 2023 13:44
@jayaddison

This comment was marked as outdated.

@jayaddison
Copy link
Contributor Author

Perhaps a better approach than attempting to merge a set of inter-related changes in a single branch here would be to gradually introduce each one as a separate, simpler pull request.

That would probably help to narrow down the cause of any problems.

Apologies for the noise @AA-Turner @francoisfreitag, and thanks for the review and support. I'm going to close this and will re-open a few smaller, isolated changes (like adding connection-measurement support in the tests) soon.

The goal continues to be to enable session-based linkchecking.

@jayaddison jayaddison closed this Apr 27, 2023
@jayaddison jayaddison deleted the issue-11324/linkcheck-sessioned-requests branch April 27, 2023 14:35
@jayaddison

This comment was marked as outdated.

@jayaddison
Copy link
Contributor Author

There was a failure case that I hadn't seen before in https://github.com/sphinx-doc/sphinx/actions/runs/4820054596/jobs/8584030994 from commit 0ed465c here.

I now have local repro of this (ConnectionResetError) error, so hopefully that will help to narrow-in on a fix.

This is an intermittent test failure in test_too_many_requests_retry_after_int_delay and seems to stop occurring when I remove the preceding test, test_TooManyRedirects_on_HEAD, suggesting that there may be an unintended dependency between those two unit tests.

@jayaddison
Copy link
Contributor Author

This is an intermittent test failure in test_too_many_requests_retry_after_int_delay and seems to stop occurring when I remove the preceding test, test_TooManyRedirects_on_HEAD, suggesting that there may be an unintended dependency between those two unit tests.

I think that this indicates that the failure occurs within test_TooManyRedirects_on_HEAD; the subsequent test test_too_many_requests_retry_after_int_delay just happens to be the runtime context where the (previous test) server thread's exception is raised and printed to standard-error, resulting in that second test failing.

This is confirmed by removing test_too_many_requests_retry_after_int_delay; the failure then appears in test_too_many_requests_retry_after_HTTP_date (the next test down) instead.

So what's the problem in test_TooManyRedirects_on_HEAD?

Well, the test involves a server that will continually redirect the client.

Meanwhile, our test client will close the connection when it reaches some threshold number of redirects.

@jayaddison
Copy link
Contributor Author

Well, the test involves a server that will continually redirect the client.

...only as long as the client performs HTTP HEAD requests, though. The HTTP GET handler should not perform a redirect...

@jayaddison

This comment was marked as outdated.

@jayaddison
Copy link
Contributor Author

This is an intermittent test failure in test_too_many_requests_retry_after_int_delay and seems to stop occurring when I remove the preceding test, test_TooManyRedirects_on_HEAD, suggesting that there may be an unintended dependency between those two unit tests.

I think that this indicates that the failure occurs within test_TooManyRedirects_on_HEAD; the subsequent test test_too_many_requests_retry_after_int_delay just happens to be the runtime context where the (previous test) server thread's exception is raised and printed to standard-error, resulting in that second test failing.

This is confirmed by removing test_too_many_requests_retry_after_int_delay; the failure then appears in test_too_many_requests_retry_after_HTTP_date (the next test down) instead.

So what's the problem in test_TooManyRedirects_on_HEAD?

Well, the test involves a server that will continually redirect the client.

Meanwhile, our test client will close the connection when it reaches some threshold number of redirects.

Ok, I think I've found the problem. It is straightforward to replicate a ConnectionResetError from test_TooManyRedirects_on_HEAD when:

  • Threaded HTTP servers are in use within the tests
  • The test HTTP servers speak the HTTP/1.1 protocol
  • Session-based requests are not in use
  • Context-managed requests are not in use

The problem occurs after:

  • The test client follows multiple HTTP HEAD redirects (that's fine)
  • The test client exceeds the max-redirect limit of 5 (also fine)
  • The test client issues an HTTP GET fallback in streaming mode (also OK)
  • The test server responds HTTP 200 OK (still fine up until now)
  • The test client disconnects, because it was in streaming mode and linkchecking is complete
  • The test server writes a response body that will never be read

The last two items may occur in the inverse order and the result will be the same.

The Content-Length header in this case does not help; the client, by design, does not intend to read the remainder of the server response.

I think it'd be possible to update the InfiniteRedirectOnHeadHandler test server handler to close the server-side connection after the response is written to fix this, without changing the test's semantics.

jayaddison added a commit to jayaddison/sphinx that referenced this pull request Apr 27, 2023
…nnection after response body is written, since we do not anticipate the client will read it

Ref: sphinx-doc#11340 (comment)
@jayaddison
Copy link
Contributor Author

Ok, I think I've found the problem. It is straightforward to replicate a ConnectionResetError from test_TooManyRedirects_on_HEAD when:
...

Here's an attempt to create a minimal repro case for this problem:

from contextlib import contextmanager
from http.server import BaseHTTPRequestHandler, ThreadingHTTPServer
import time
from threading import Thread

import pytest


class TestHandler(BaseHTTPRequestHandler):
    protocol_version = "HTTP/1.1"

    def do_GET(self):
        content = b"content"
        self.send_response(200, "OK")
        self.send_header("Content-Length", len(content))
        self.end_headers()
        self.wfile.write(content)
        self.wfile.flush()


class TestServer(Thread):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.server = ThreadingHTTPServer(("localhost", 7777), TestHandler)

    def run(self):
        self.server.serve_forever()

    def join(self):
        self.server.shutdown()
        self.server.server_close()
        super().join()


@contextmanager
def server_context():
    server_thread = TestServer()
    server_thread.start()
    yield server_thread
    server_thread.join()


@pytest.mark.sphinx('linkcheck', testroot='linkcheck-localserver', freshenv=True)
def test_a(app, capsys):
    with server_context():
        app.build()


@pytest.mark.sphinx('linkcheck', testroot='linkcheck-localserver', freshenv=True)
def test_b(app, capsys):
    time.sleep(0.5)

    _stdout, stderr = capsys.readouterr()
    assert stderr == ""

A few notes:

  • The problem only seems to occur when the same app configured is re-used between the two unit tests, as specified using pytest.mark.sphinx decoration
  • The second unit test (where the error appears in the stderr output, but not where the stacktrace originates) does not have to run the linkchecker - it is enough for it to time.sleep for a long-enough duration

I'm going to try to simplify the conditions further, likely by temporarily removing code from the pytest marker and linkchecker code.

@jayaddison
Copy link
Contributor Author

Ok, after further reduction of the repro case, I don't think that anything about the ConnectionResetError / inter-unit-test-log-capture issue is specific to Sphinx. Here is a repro case where I can reliably reproduce the problem using cPython 3.11.2:

from contextlib import contextmanager
import gc
from http.server import BaseHTTPRequestHandler, ThreadingHTTPServer
from threading import Thread
import time

import pytest
import requests


class TestHandler(BaseHTTPRequestHandler):
    protocol_version = "HTTP/1.1"

    def do_GET(self):
        content = b"content"
        self.send_response(200, "OK")
        self.send_header("Content-Length", len(content))
        self.end_headers()
        self.wfile.write(content)
        self.wfile.flush()


class TestServer(Thread):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.server = ThreadingHTTPServer(("localhost", 8080), TestHandler)

    def run(self):
        self.server.serve_forever()

    def join(self):
        self.server.shutdown()
        self.server.server_close()
        super().join()


@contextmanager
def server_context():
    server_thread = TestServer()
    server_thread.start()
    yield server_thread
    server_thread.join()


def test_a():
    with server_context():
        requests.get('http://localhost:8080', stream=True)


def test_b(capsys):
    gc.collect()
    time.sleep(0.1)

    _stdout, stderr = capsys.readouterr()
    assert stderr == ""

Further notes:

  • Introducing gc to request garbage collection makes the issue more repeatable using cPython.
  • Therefore the theory would be: the recv_into exception relates to socket-handling (a thread?) that is deferred until the socket resources are cleaned up; that can occur outside of the unit test where those resources were created.
  • The behaviour here may be OS-specific, since some urllib3 and socket implementation functionality differs by platform.
  • There may not be much that pytest could do to handle this, since garbage-collecting during teardown of every test would likely introduce significant overhead.

I'm not sure that there's anything else to do for this particular behaviour, given that we can fix it in the unit tests by having the server set self.close_connection = True during request handling (in do_GET).

It's potentially a slightly-annoying cause of noise in server logs, but everything is behaving to-spec, as far as I can tell (maybe that could indicate an HTTP spec limitation, or under-definition of the spec somewhere - I'm not completely sure).

jayaddison added a commit to jayaddison/sphinx that referenced this pull request May 3, 2023
…nnection after response body is written, since we do not anticipate the client will read it

Ref: sphinx-doc#11340 (comment)
jayaddison added a commit to jayaddison/sphinx that referenced this pull request May 3, 2023
…nnection after response body is written, since we do not anticipate the client will read it

Ref: sphinx-doc#11340 (comment)
jayaddison added a commit to jayaddison/sphinx that referenced this pull request May 9, 2023
…nnection after response body is written, since we do not anticipate the client will read it

Ref: sphinx-doc#11340 (comment)
@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 29, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

linkcheck builder: begin using requests.Session functionality during linkchecking
2 participants