Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

linkcheck builder: streamed HTTP requests do not close their local socket #11317

Closed
jayaddison opened this issue Apr 11, 2023 · 3 comments · Fixed by #11318
Closed

linkcheck builder: streamed HTTP requests do not close their local socket #11317

jayaddison opened this issue Apr 11, 2023 · 3 comments · Fixed by #11318
Labels
builder:linkcheck type:enhancement enhance or introduce a new feature

Comments

@jayaddison
Copy link
Contributor

Describe the bug

Running pytest tests/test_build_linkcheck.py emits ResourceWarnings related to unclosed sockets, such as:

tests/test_build_linkcheck.py::test_invalid_ssl
  /usr/lib/python3.11/logging/__init__.py:1572: ResourceWarning: unclosed <socket.socket [closed] fd=15, family=2, type=1, proto=6>
    next_f = f.f_back
  Enable tracemalloc to get traceback where the object was allocated.
  See https://docs.pytest.org/en/stable/how-to/capture-warnings.html#resource-warnings for more info.

This occurs when HTTP requests are made using the requests library with stream=True enabled (approximate behaviour: do not begin consuming content from the connection immediately, and provide it iteratively as it is received).

This occurs in a couple of places currently:

I'm not certain whether this implies a bug/problem. The upstream requests project mentions that connection pooling and the cost of recreating sockets (as opposed to re-using existing ones) can be significant.

I've found two possible resolutions locally (either: make requests using a context-manager, so that the __exit__ implicitly calls .close, or: call .close on the response object explicitly), and will offer a pull request with one of those.

How to Reproduce

Simple repro case:

$ pytest tests/test_build_linkcheck.py`

To trace where the resource allocations originally occurred:

$ python3 -X tracemalloc=20 -m pytest tests/test_build_linkcheck.py

Environment Information

Platform:              linux; (Linux-6.1.0-7-amd64-x86_64-with-glibc2.36)
Python version:        3.11.2 (main, Mar 13 2023, 12:18:29) [GCC 12.2.0])
Python implementation: CPython
Sphinx version:        6.2.0+/2c83af0aa
Docutils version:      0.19
Jinja2 version:        3.1.2
Pygments version:      2.15.0

Sphinx extensions

None

Additional context

Discovered while investigating #11299, although does not appear directly related.

@jayaddison
Copy link
Contributor Author

I'm not certain whether this implies a bug/problem. The upstream requests project mentions that connection pooling and the cost of recreating sockets (as opposed to re-using existing ones) can be significant.

Note: TCP connection re-use -- something that would likely provide performance benefits and resource reduction for the linkchecker -- requires use of requests session functionality, and we're not currently using that.

That's a slight tangent, because somehow I think that there could still be an additional cost introduced here by ensuring that each request socket is closed. But I think closing them is probably a sensible thing to do, and that use of sessions could be introduced separately.

@picnixz
Copy link
Member

picnixz commented Apr 14, 2023

  • In general, sessions are good if you know that you'll maintain your session for more than a single call. With sessions, you can also use them as context managers so, they will automatically close upon exit.

  • Closing the request should always be done, whether automatically or manually. It is not a good idea to leave resources opened (first, because you can have a memory leak, and secondly it might be a security issue (e.g., CWE-404)).

@jayaddison
Copy link
Contributor Author

Closing the request should always be done, whether automatically or manually. It is not a good idea to leave resources opened (first, because you can have a memory leak, and secondly it might be a security issue (e.g., CWE-404)).

That makes sense, and #11318 should perform the relevant request-and-socket closing. I'm not sure we should merge that until we determine the reason that #11299 is happening, though: it may indicate a problem here or elsewhere.

In general, sessions are good if you know that you'll maintain your session for more than a single call. With sessions, you can also use them as context managers so, they will automatically close upon exit.

Yep, I think we should begin using sessions as a followup improvement for the linkchecker. It'd be nice to be able to demonstrate how the connection-pooling behaviour changed before-and-after that.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
builder:linkcheck type:enhancement enhance or introduce a new feature
Projects
None yet
4 participants