New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LANGUAGE environment variable inconsistently affects output of objects.inv #9778
Comments
|
Does the contents of the built From investigation here, it does seem like there is some kind of catalog load-timing issue going on, but I'm yet to figure out where or why it occurs. |
Improvements to A |
Thank you. Without understanding much of the localization process here yet, I'm optimistic/hopeful that there might be an opportunity for a light-touch fix. I've opened draft pull request #10882 with code removal that resolves the issue when I build |
I think the approach in #10882 is probably basically incorrect: for whatever reason, it's not translating any of the After instrumenting the That results in numerous |
(some more internal monologue exposition: as I understand it, the |
Ok, so: the search index dump and the object index dump retrieve their language settings from different places; the fact that those two log lines appear next to each other in the output doesn't imply a connection.
It does appear to me that all elements with So: is there a bug here? |
@AA-Turner is the fact that you tagged #788 recently perhaps no coincidence? :) My current best theory for what has happened here is that at some point -- perhaps between versions 4.3.0 and 4.4.0 of That caused reproducible build check failures in Debian -- but I think that that's a symptom of a deeper problem, which is that Debian has been building a single copy of the documentation with each build (in a single language selected by the build-time environment). Comparing with what happens for infopages/manpages, multiple languages are bundled into the same package at build-time. I think that makes sense, because it means that every system (regardless of the languages that their current staff understand) receives the same content, and there can't be any sleight-of-hand regarding differing build content as could happen if per-language builds were performed (in other words: it's inline with both good user/system internationalization practices and also the goals of the reproducible builds project, I think). That does roll around to the question of how to perform multi-language builds in cc @lamby |
It is a coincidence! (I was cleaning up all issues without a milestone). I'll read through your comments here later in the evening, thank you for looking into this issue. A |
This sounds like an entirely plausible chain of events, although I can't recall the specifics or easily dig up any evidence from quickly going through the bugs filed against the Anyway, I suppose my personal/reproducible view is that it would be a shame if implementing #788 (a good thing!) would block on resolving this issue, even if it was something of a stop-gap solution. Ironically, the success of Sphinx means that quite a lot of Debian packages are currently unreproducible because of this, so I would be quite minded to see a way of just generating English. Thank you all for your recent activity on this issue. ps. Regarding the suggestion to remove an |
Thanks @lamby! Regarding the And regarding #788: distributing multi-locale documentation seems ideal, but I agree it seems hard to predict when it might arrive, and so no, I don't think that should block reproducible build progress. I believe I now understand your quick-and-dirty approach. However: I don't love the idea of code that branches based on the existence of Idea / question: would it be possible to configure the the |
From a sample size of one test build using I'm not sure about all the tradeoffs involved with that approach, or whether it's the correct place to put an environment variable like that, but it did appear to work around the issue. I'll cross-post this to the relevant bug. |
I appreciate your perspective, and this tension is something we sometimes encounter when changing programs' behaviour based on the contents of
This would work fine for individual packages, but when I consider how Debian might apply this as a default to all packages in one fell swoop, we quickly run into a number of difficulties. To start with there is no canonical place where a value of |
Potential fixup attempt available for review in #10949 - this does follow the approach of disabling localization when the |
@AA-Turner @lamby after re-reading #10949 post-merge, I'm starting to have second thoughts about that approach. Deactivating all localisation (not only Roughly speaking: I think I became too focused on fixing the immediate problem without considering all the implications (especially as the nature of the fix itself changed). I don't know for certain whether to revert the change yet, but I have prepared a branch to do that while thinking about it. It seems like support for multiple-locale display-names in the |
...
Could it be that the lazy-loading was the only problem, and that disabling localization itself isn't necessary to achieve reproducible results? (that would seem ideal, if I understand correctly: no reduction-in-localisation of the |
I think I should go back and reconfirm the original |
Three commits in particular to test:
Checking for the problem should be possible by building the Sphinx documentation for Note: I'm also finding that a file named |
🤦 After all that, I did what I probably should've done to begin with: confirm whether the issue still exists, and if not, bisect the place where it disappeared. What I found as a result of doing that today is that the issue continued to appear up-to-and-including From bisecting those commits and using For reference, the approximate build process used during each
|
Describe the bug
Hi,
Not entirely sure where the bug is here, but it seems like there is something up with language handling and generating the objects.inv file. The context to all this is that I'm working on Reproducible Builds, and some update has suddenly rendered a lot of packages that use Sphinx unreproducible - that is, generating different output regardless of the surrounding environment.
In particular, I discovered this by comparing two builds: the first with
LANGUAGE="en_GB:en"
and the second withLANGUAGE="et_EE:et"
environment variable. What happens is that all of the documentation is identical except that a single entry in theobjects.inv
file appears to be translated. This is despite the output including the following logging message in both builds:(NB.
code: en
here in both builds)Decoding this zlib-encoded file, I can see that the difference is a translation one:
... and, indeed, "Indeks" is in the Estonian .po file:
This is just confusing though because why isn't "Module Index" translated as well? "Mooduli indeks" is also there in the Estonian .po:
... so I suppose the bug here is either that "Index" gets translated whilst "Module Index" is not... or the other way around. This why I use "inconsistency" in the title of this issue.
Playing around with the code, I am pretty certain that the translated entry is in
sphinx/domains/std.py
— could it be that the data ininitial_data
is being prematurely translated? Either way, though, I was expecting that the documentation and entries are all identical, regardless of theLANGUAGE
environment variable.How to Reproduce
Compare the builds between exporting the
LANGUAGE="en_GB:en
andLANGUAGE="et_EE:et"
environment variable, specifically theobjects.inv
file.Expected behavior
No response
Your project
I'm using opendrop, but this will occur with any package
Screenshots
No response
OS
Linux
Python version
3.9
Sphinx version
4.2.0
Sphinx extensions
No response
Extra tools
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: