Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-103484: Fix broken links reported by linkcheck #103608

Merged
merged 5 commits into from
Apr 22, 2023

Conversation

rffontenelle
Copy link
Contributor

@rffontenelle rffontenelle commented Apr 18, 2023

This is another patch required to fix the current state of make linkcheck in Python Docs, see #103484.

This pull request fixes some broken links reported by make linkcheck. The backport to 3.11 has a few differences and I have already a patch ready for it, just waiting for any change in this one.

Find below the reported error and what solution I applied in this PR:

distributing/index.rst:128: [broken]  https://packaging.python.org/tutorials/packaging-projects/#creating-the-package-files: 
distributing/index.rst:127: [broken]  https://packaging.python.org/tutorials/packaging-projects/#packaging-python-projects:
distributing/index.rst:129: [broken]  https://packaging.python.org/tutorials/packaging-projects/#uploading-the-distribution-archives:
distributing/index.rst:130: [broken]  https://packaging.python.org/specifications/pypirc/:

The link is fine, but for some reason a newline in the doc resulted in being considered as broken for linkcheck, even though it is not broken in the documentation. I removed that newline and this made linkcheck happy.

library/stdtypes.rst:1607: [broken] http://www.unicode.org/versions/Unicode15.0.0/ch03.pdf#G53253: 'utf-8' codec can't decode byte 0xe2 in position 10: invalid continuation byte
library/stdtypes.rst:1767: [broken] https://www.unicode.org/versions/Unicode15.0.0/ch04.pdf#G91002: 'utf-8' codec can't decode byte 0xe2 in position 10: invalid continuation byte
library/stdtypes.rst:1906: [broken] https://www.unicode.org/versions/Unicode15.0.0/ch03.pdf#G34078: 'utf-8' codec can't decode byte 0xe2 in position 10: invalid continuation byte

That's sphinx-doc/sphinx#11041. I removed the anchor and added the section name next to it's [section] number so the reader has no doubt of what section the text is talking about.

whatsnew/changelog.rst:18176: [broken] https://: Invalid URL 'https://': No host supplied

This is the code sample urllib.request.urlopen('https://...'). at Misc/NEWS.d/3.9.0a1.rst. Added it to ignored list as 'https:\/\/$' ($ to not match any other link).

howto/urllib2.rst:457: [broken] http://www.voidspace.org.uk/python/articles/authentication.shtml: HTTPConnectionPool(host='www.voidspace.org.uk', port=80): Max retries exceeded with url: /python/articles/authentication.shtml (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f1bb795e250>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution'))

www.voidspace.org.uk is down, so I replaced it with a Wayback Machine link. There was a code sample that also used this broken link and I replaced with a valid link: http://www.python.org following the previous example in the same file

whatsnew/2.6.rst:174: [broken] http://www.upfrontsoftware.co.za: HTTPSConnectionPool(host='www.upfrontsoftware.co.za', port=443): Max retries exceeded with url: / (Caused by SSLError(CertificateError("hostname 'www.upfrontsoftware.co.za' doesn't match either of 'agibase.com', 'gazette.co.za', 'icinga.siyavula.com', 'icinga.upfronthosting.co.za', 'lists.agibase.com', 'test.agibase.com', 'upfrontsoftware.co.za', 'www.agibase.com', 'www.gazette.co.za'")))

Fix removing 'www', replacing with https://upfrontsoftware.co.za

using/mac.rst:20: [broken] https://developer.apple.com/documentation/macos-release-notes/macos-12_3-release-notes#Python: Anchor 'Python' not found
bugs.rst:39: [broken] https://devguide.python.org/docquality/#helping-with-documentation: Anchor 'helping-with-documentation' not found
whatsnew/3.4.rst:1962: [broken] https://devguide.python.org/coverage/#measuring-coverage-of-c-code-with-gcov-and-lcov: Anchor 'measuring-coverage-of-c-code-with-gcov-and-lcov' not found
bugs.rst:41: [broken] https://devguide.python.org/documenting/#translating: Anchor 'translating' not found
library/gc.rst:101: [broken] https://devguide.python.org/garbage_collector/#collecting-the-oldest-generation: Anchor 'collecting-the-oldest-generation' not found
using/unix.rst:69: [broken] https://devguide.python.org/setup/#get-the-source-code: Anchor 'get-the-source-code' not found

These links lead to the expected anchors without issue, so I added ignore entries to these links.

whatsnew/3.8.rst:77: [broken] https://en.wikipedia.org/wiki/Walrus#/media/File:Pacific_Walrus_-_Bull_(8247646168).jpg: Anchor '/media/File:Pacific_Walrus_-_Bull_(8247646168).jpg' not found

The link is works, but the #/... is considered by linkcheck as invalid anchor. Added it to ignored anchors.

whatsnew/changelog.rst:16408: [broken] https://fishshell.com/docs/current/commands.html#source: Anchor 'source' not found

The source command is now on another page, so I updated the URL.

whatsnew/3.11.rst:1320: [broken] https://github.com/faster-cpython/ideas#published-results: Anchor 'published-results' not found

Anchors from Markdown files in GitHub repositories are not recognized, even though they work just fine. Hence I added this case to ignored links.

whatsnew/changelog.rst:15577: [broken] https://importlib-metadata.readthedocs.io/en/latest/changelog%20(links).html#v1-5-0: 404 Client Error: Not Found for url: https://importlib-metadata.readthedocs.io/en/latest/changelog%20(links).html

Updated the URL with the new page containing the versions history.

howto/functional.rst:1210: [broken] https://mitpress.mit.edu/sicp/: 404 Client Error: Not Found for url: https://mitpress.mit.edu/sicp/

Removing the trailing '/' solves the 404 Client Error.

However, there is another issue: The book is no longer freely available (wayback machine disagrees), so I updated the text to say "The book can be found at" instead of "Full text at".

whatsnew/2.7.rst:2105: [broken] https://sourceware.org/gdb/current/onlinedocs/gdb/Python.html: 404 Client Error: Not Found for url: https://sourceware.org/gdb/current/onlinedocs/gdb/Python.html

Used Wayback Machine because the paragraph mentions GDB 7, so I linked to the latest GDB online docs available, from 2011.

using/windows.rst:554: [broken] https://support.enthought.com/hc/en-us/articles/360038600051-Canopy-GUI-end-of-life-transition-to-the-Enthought-Deployment-Manager-EDM-and-Visual-Studio-Code: 403 Client Error: Forbidden for url: https://support.enthought.com/hc/en-us/articles/360038600051-Canopy-GUI-end-of-life-transition-to-the-Enthought-Deployment-Manager-EDM-and-Visual-Studio-Code

Looks like crawling in this website is not allowed: link is ok in the browser, but fails with curl or sphinx's linkcheck. Added to ignored links.

library/readline.rst:20: [broken] https://tiswww.cwru.edu/php/chet/readline/rluserman.html#SEC9: Anchor 'SEC9' not found

The SEC9 anchor was about "Readline Init File" (wayback machine link). I updated the anchor to match the same subject in the updated documentation.

faq/library.rst:780: [broken] https://twistedmatrix.com/trac/: 404 Client Error: Not Found for url: https://twisted.org/trac/

Updated URL to https://twisted.org/

whatsnew/2.5.rst:879: [broken] https://unix.org/version2/whatsnew/lp64_wp.html: HTTPSConnectionPool(host='unix.org', port=443): Max retries exceeded with url: /version2/whatsnew/lp64_wp.html (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate in certificate chain (_ssl.c:1129)')))

GoDaddy-hosted website, and GoDaddy's certificate chain is not installed causing curl and linkcheck to fail. Using a web browser works, though. So I added this to the ignored links list.

whatsnew/changelog.rst:22339: [broken] https://www.openssl.org/docs/man1.1.0/ssl/SSL_CTX_set_min_proto_version.html: 404 Client Error: Not Found for url: https://www.openssl.org/docs/man1.1.0/ssl/SSL_CTX_set_min_proto_version.html

Updated the URL, using the version 1.1.1 published as the closest possible to 1.1.0 mentioned in the paragraph.

library/zipfile.rst:10: [broken] https://github.com/python/cpython/tree/main/Lib/zipfile.py: 404 Client Error: Not Found for url: https://github.com/python/cpython/tree/main/Lib/zipfile.py

zipfile is a package since #98103. This change is post-3.11, hence a backport must not include this or will cause another 'broken' entry by linkcheck.

Doc/conf.py Outdated Show resolved Hide resolved
Doc/faq/library.rst Outdated Show resolved Hide resolved
@merwok
Copy link
Member

merwok commented Apr 18, 2023

Good job fixing the links! I think the changes to whatsnew documents are appropriate here.

Copy link
Member

@hugovk hugovk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! A few suggestions.

Doc/distributing/index.rst Show resolved Hide resolved
Doc/library/stdtypes.rst Outdated Show resolved Hide resolved
Misc/NEWS.d/3.7.0b2.rst Outdated Show resolved Hide resolved
Misc/NEWS.d/3.8.0a1.rst Outdated Show resolved Hide resolved
@rffontenelle
Copy link
Contributor Author

GH allows me to add suggestions to a batch and then make it a single commit. I didn't find mentions to this feature in devguide. May I use it to simplify?

@merwok
Copy link
Member

merwok commented Apr 18, 2023

Of course

rffontenelle and others added 2 commits April 18, 2023 21:34
- Remove extra diff line in faq/library.rst (merwok)
- Use HTTPS to link Unicode 15.0.0 to solve a redirect (hugovk)
- Use wayback machine link for openssl 1.1.0 instead of linking 1.1.1, "as this text mentions a feature from 1.1.0" (hugovk)

Co-authored-by: Éric <merwok@netwok.org>
Co-authored-by: Hugo van Kemenade <hugovk@users.noreply.github.com>
Doc/conf.py Outdated Show resolved Hide resolved
Co-authored-by: Hugo van Kemenade <hugovk@users.noreply.github.com>
Copy link
Member

@hugovk hugovk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@rffontenelle
Copy link
Contributor Author

rffontenelle commented Apr 19, 2023

NOTE: back-porting to 3.11 requires the following changes:

@hugovk
Copy link
Member

hugovk commented Apr 19, 2023

I've merged #103610.

Copy link
Member

@AA-Turner AA-Turner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

A

Doc/conf.py Outdated Show resolved Hide resolved
Doc/distributing/index.rst Show resolved Hide resolved
Comment on lines +1211 to +1212
Gerald Jay Sussman with Julie Sussman. The book can be found at
https://mitpress.mit.edu/sicp. In this classic textbook of computer science,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MIT Press website just says that the book is out of print -- I found another resource on the MIT domain, but I don't know how long-term the link will be.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also wayback machine (long term available) to get the point in history where the book was freely available. I didn't linked it before as I thought it wouldn't be appropriate, considering they are selling it now 🤷‍♂️

Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
@hugovk hugovk merged commit caed494 into python:main Apr 22, 2023
16 checks passed
@rffontenelle
Copy link
Contributor Author

@hugovk Thanks for merging. Would it be possible to trigger a backport for 3.11?

@arhadthedev arhadthedev added the needs backport to 3.11 only security fixes label Apr 22, 2023
@miss-islington
Copy link
Contributor

Thanks @rffontenelle for the PR, and @hugovk for merging it 🌮🎉.. I'm working now to backport this PR to: 3.11.
🐍🍒⛏🤖

@miss-islington
Copy link
Contributor

Sorry, @rffontenelle and @hugovk, I could not cleanly backport this to 3.11 due to a conflict.
Please backport using cherry_picker on command line.
cherry_picker caed49448d195565940caf198cf0edda65ee5679 3.11

@arhadthedev
Copy link
Member

@rffontenelle Could you backport it manually, please? You need:

  • pip install cherry_picker
  • the 3.11 branch:
    • add python/cpython into your remotes (as, for example, upstream) and fetch it
    • create a local branch from upstream/3.11
    • push the local branch into your origin
  • python.exe -m cherry_picker caed49448d195565940caf198cf0edda65ee5679 3.11
  • create a PR from the backport-* branch the tool will create

rffontenelle added a commit to rffontenelle/cpython that referenced this pull request Apr 22, 2023
* Doc: Fix broken links reported by linkcheck

* Apply suggestions from code review

- Remove extra diff line in faq/library.rst (merwok)
- Use HTTPS to link Unicode 15.0.0 to solve a redirect (hugovk)
- Use wayback machine link for openssl 1.1.0 instead of linking 1.1.1, "as this text mentions a feature from 1.1.0" (hugovk)

Co-authored-by: Éric <merwok@netwok.org>
Co-authored-by: Hugo van Kemenade <hugovk@users.noreply.github.com>

* Doc: Make mark-up code as literal

* Doc: Alphabetize items in linkcheck_ignore

Co-authored-by: Hugo van Kemenade <hugovk@users.noreply.github.com>

* Doc: Improve comment in sphinx conf

Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>

---------

Co-authored-by: Éric <merwok@netwok.org>
Co-authored-by: Hugo van Kemenade <hugovk@users.noreply.github.com>
Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
(cherry picked from commit caed494)
@bedevere-bot
Copy link

GH-103683 is a backport of this pull request to the 3.11 branch.

@bedevere-bot bedevere-bot removed the needs backport to 3.11 only security fixes label Apr 22, 2023
@rffontenelle
Copy link
Contributor Author

The cherry-pick failed because the changes Doc/library/stdtypes.rst in main do not apply to 3.11 – I forgot to list that fact before.

@rffontenelle rffontenelle deleted the fix-broken-links branch April 22, 2023 15:30
hugovk pushed a commit that referenced this pull request Apr 23, 2023
…103683)

Co-authored-by: Éric <merwok@netwok.org>
Co-authored-by: Hugo van Kemenade <hugovk@users.noreply.github.com>
Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
Fix broken links reported by linkcheck (#103608)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants