Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests: linkcheck: add client connection-pool contention coverage #11402

Conversation

jayaddison
Copy link
Contributor

Feature or Bugfix

  • Test coverage

Purpose

Detail

  • Configure an artificially-limited connection pool by patching requests.Session.get_adapter
  • Create a multi-item linkcheck workload
  • Consume one item at a time from the workload from separate threads
  • Ensure that all items were consumed and that no timeouts occurred

Relates

@jayaddison jayaddison force-pushed the issue-11324-prep/linkcheck-connection-contention-test-coverage branch from 956a0c5 to 60765d3 Compare May 8, 2023 13:17
@jayaddison
Copy link
Contributor Author

Why does the test_connection_contention test case on Windows only manage to perform 4 of the 10 linkchecks before the 5-second timeout occurs?

Comment on lines +675 to +677
# Set an upper-bound on socket timeouts globally
import socket
socket.setdefaulttimeout(5)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to remove this because I think it has side-effects that outlast the unit test.

It was introduced to avoid indefinitely-blocked socket I/O when the test linkcheck client and server fail to communicate reliably.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this could be a global setting for the test suite, I don’t see why an indefinitely-blocked socket I/O would be reasonable in that context.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that would make sense. It would also be nice if the tests could fail if that timeout limit is ever reached, because that'd almost certainly be a sign of a problem with the application code, given that we now entirely request local, non-Internet resources during tests (I believe that's true?).

Another relevant note is that, in practice, I think we already approximate this behaviour -- perhaps unintentionally -- from the time at which a linkchecker object is first constructed. That's because there is a call to the same setdefaulttimeout method here:

# set a timeout for non-responding servers
socket.setdefaulttimeout(5.0)

It does probably make sense to have a timeout in production environments, because in most cases it would be bad for a single long-delayed link to block the entire linkcheck process, but at the same time it would be good to confirm why the global timeout is in place given that there is an existing -- and configurable -- linkcheck timeout setting:

if self.config.linkcheck_timeout:
kwargs['timeout'] = self.config.linkcheck_timeout

(I also should explain that it seems OK to me that both timeouts exist: I sometimes have a tendency to remove redundancy in software to such an extent that the resulting code becomes fragile in other ways, and I'm keen that we avoid that risk here. but I think it would be good to understand the reasoning why both exist. I think it's basically to try to avoid the linkchecker ever stalling indefinitely, even in the absence of a configured linkcheck_timeout value)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you. My reading of the code (and https://www.sphinx-doc.org/en/master/usage/configuration.html#confval-linkcheck_timeout) is:

  1. linkcheck_timeout=None: make sure linkcheck bails out after a reasonable amout of time (5.0 seconds), since requests won’t handle the timeout
  2. linkcheck_timeout=3.14: specify the timeout to requests

So, by default, linkcheck will not get stuck. When the default does not fit a project (developers want slower or longer timeouts), they can use linkcheck_timeout. I don’t see a way to specify no timeout, but I don’t think that’s an issue. A noteworthy surprise is that setting linkcheck_timeout=5.0 is different from using the default, as the default only matters for the socket connection, whereas requests timeout also considers the time to generate the response on the server.

I think the side-effect is not intentional but is pretty nice to have and should not be changed. There aren’t other socket.setdefaulttimeout() in the app, so linkcheck sets that default for other extensions (thinking of extlinks). My suggestion is to move the socket.setdefaulttimeout() to sphinx.application.Sphinx.__init__(). That makes the call explicit and should not break BC (since the socket.setdefaulttimeout() happened when the extensions were being setup, as setup_extension imports the module).

I believe the test suite could use the same setting as the application, or even a shorter one, considering that waiting 5 seconds is pretty long in that context. Anything above a second would probably indicate an issue IMO.

Coming back to linkcheck, we could remove the call to socket.setdefaulttimeout() (that has a side effect on the Sphinx application), and set a default of 5.0 for linkcheck_timeout, but it would break BC:

  • https://docs.python-requests.org/en/latest/user/advanced/#timeouts shows socket.setdefaulttimeout() will timeout only on what requests calls “connect”, not while reading data from the server. So a user who connected quickly but waited a long time to receive the response would see failures where it previously worked. I’m guessing we could maybe (and I don’t wanna) abuse the current implementation to pass a tuple (5.0, None) and preserve the current behavior.
  • users may call socket.setdefaulttimeout() after Sphinx.__init__(), to use their own values. That would be a bit surprising considering linkcheck_timeout exists, but who knows...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes the call explicit and should not break BC (since the socket.setdefaulttimeout() happened when the extensions were being setup, as setup_extension imports the module).

That’s actually not true. socket.setdefaulttimeout(5.0) is only called during CheckExternalLinksBuilder.__init__(), which happens when the builder is created, not when the module is imported and the builder registered.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you. My reading of the code (and https://www.sphinx-doc.org/en/master/usage/configuration.html#confval-linkcheck_timeout) is:

1. `linkcheck_timeout=None`: make sure linkcheck bails out after a reasonable amout of time (5.0 seconds), since `requests` won’t handle the timeout

I'm not so sure about this - my reading of the documentation:

linkcheck_timeout

A timeout value, in seconds, for the linkcheck builder. The default is to use Python’s global socket timeout.

New in version 1.1.

... is that the unmodified global timeout for sockets is to be expected unless the operator changes the configuration value.

What we have currently is a 5-second timeout set by the application when no configuration is applied. I do think that having a timeout in production makes sense, but either the documentation or the code may need updating to bring them into consistency.

It'll take me a while longer to process the rest of your reply and to think about any further changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll create a separate issue soon to track resolution for this (relocation and/or removal of the socket.setdefaulttimeout calls).

@jayaddison

This comment was marked as outdated.

@jayaddison

This comment was marked as outdated.

@jayaddison

This comment was marked as outdated.

@jayaddison

This comment was marked as outdated.

@jayaddison

This comment was marked as outdated.

@jayaddison
Copy link
Contributor Author

@francoisfreitag could you check the approach I'm using in this pull request against the planned coverage?

Add a test using a pool with a single connection in the pool, and a large-but-not-too-large timeout (e.g. 1s) to exhibit the connection pool exhaustion from not response.close()-ing?

At the moment, it passes using HTTP/1.0 test webserving, as expected.

It also fails with HTTP/1.1 testserving, as expected.

However: I find that it fails (unexpectedly) when both HTTP/1.1 webservers and session-based linkchecking (that should response.close...) are enabled.

@AA-Turner AA-Turner merged commit c583e0f into sphinx-doc:master Jul 23, 2023
27 checks passed
@jayaddison
Copy link
Contributor Author

Thanks @AA-Turner!

@jayaddison jayaddison deleted the issue-11324-prep/linkcheck-connection-contention-test-coverage branch July 23, 2023 15:32
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

linkcheck builder: begin using requests.Session functionality during linkchecking
3 participants