Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

genindex: use – instead of – for xhtml compatibility #12386

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

methane
Copy link
Contributor

@methane methane commented May 19, 2024

Feature or Bugfix

  • Bugfix

Purpose

There is no – character entitiy in xhtml.
The epubcheck reports fatal error and epub readers like Books.app crashes.

Detail

Relates

Fix #12359

There is no – character entitiy in xhtml.
The epubcheck reports fatal error and epub readers like Books.app crashes.
@picnixz
Copy link
Member

picnixz commented May 21, 2024

I am a bit confused here. AFAICT, ndash is XHTML compliant according to https://www.w3.org/TR/xhtml1/dtds.html#a_dtd_Special_characters. While I do not mind this change, could it be possible that 1) there is an issue with epubcheck? 2) there is an issue without our template where we do not declare correctly the DTD?

@methane
Copy link
Contributor Author

methane commented May 21, 2024

I am a bit confused here. AFAICT, ndash is XHTML compliant according to https://www.w3.org/TR/xhtml1/dtds.html#a_dtd_Special_characters.

I am not an expert of xhtml too. See this article:

https://en.wikipedia.org/wiki/HTML5#XHTML5_(XML-serialized_HTML5)

XHTML5 is simply XML-serialized HTML5 data (that is, HTML5 constrained to XHTML's strict requirements, e.g., not having any unclosed tags),
[snip]
There is no DTD for XHTML5.[126]

So XHTML5 is not compatible with XHTML 1.0/1.1. Since there is no DTD, we can not use entities defined in DTDs.

@methane
Copy link
Contributor Author

methane commented May 21, 2024

I find the way to check this without epubcheck.
Open genindex-A.xhtml in build/epub directory. Or rename html/genindex-A.html to genindex-A.xhtml and open it.

I can see this error.

image

@picnixz
Copy link
Member

picnixz commented May 21, 2024

So I played a bit with that and indeed there is an issue. However, by replacing <!DOCTYPE html> with

<!DOCTYPE html 
	PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" 
	"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

I do not have issues any more (except that firefox tells me now that there is a stray doctype, which I think is because it's source-code viewer is only for HTML and not XHTML+XML files).

While I personally prefer changing the DOCTYPE to make it a real XHTML+XML file, I am not entirely sure that it should be the correct way to do it for EPUB files (I don't know the specs for them). Also, I assume other files might have this issue.

@chrisjsewell I reopened the issue because it's an issue for me. So either we change the entity using unicode points or we change the doctype (but this would only be for EPUB files I think?)

@chrisjsewell
Copy link
Member

I reopened the issue because it's an issue for me

yeh no problem; I guess my generic question would be, how come there is currently no failure of our CI, and can we add a test/build that does break it?

@picnixz
Copy link
Member

picnixz commented May 21, 2024

come there is currently no failure of our CI, and can we add a test/build that does break it?

I don't know why there is no failure but I don't know whether epubcheck wouldn't complain then (note that not all files have this ndash entity and so maybe epubcheck is not catching it).

@methane
Copy link
Contributor Author

methane commented May 21, 2024

come there is currently no failure of our CI, and can we add a test/build that does break it?

I don't know why there is no failure but I don't know whether epubcheck wouldn't complain then (note that not all files have this ndash entity and so maybe epubcheck is not catching it).

html_split_index is false by default. That's why genindex-A.xhtml is not generated.

https://github.com/methane/sphinx/actions/runs/9175879594/job/25229853450

@picnixz
Copy link
Member

picnixz commented May 22, 2024

@chrisjsewell @methane

Here's what I suggest:

  • For now, let's change that simple entity to what you are suggesting.
  • In a separate PR, check if there are some problems in the other EPUB files. For that, we would probably need to see the templates by ourselves... Not sure which flags should be enabled =/
  • Fix them separately, possibly changing the templates + doctypes.

The reason why I want to apply the fix now is to have Sphinx in a state that is "more or less enough" for everyone. If there are issues with other entities, I think people would just report it but for now, we have something that breaks CPython's docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Index pages use &ndash; but it can not be used in ePub
3 participants