Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seemingly random CI failures of tests/test_intl.py::test_gettext_dont_rebuild_mo with Windows #11232

Closed
jfbu opened this issue Mar 10, 2023 · 15 comments · Fixed by #11435
Closed

Comments

@jfbu
Copy link
Contributor

jfbu commented Mar 10, 2023

Describe the bug

I am (edit: not) competent to understand seemingly occasional:

FAILED tests/test_intl.py::test_gettext_dont_rebuild_mo - AssertionError: assert 1 == 0
 +  where 1 = <function test_gettext_dont_rebuild_mo.<locals>.get_number_of_update_targets at 0x0000022947B2A8E0>(<SphinxTestApp buildername='dummy'>)
= 1 failed, 1787 passed, 23 skipped, 5 xfailed, 28 xpassed, 6 warnings in 271.50s (0:04:31) =
Error: Process completed with exit code 1.

which I have observed a number of times recently either on PRs or direct pushes to master.

How to Reproduce

See https://github.com/sphinx-doc/sphinx/actions/runs/4383723023/jobs/7674313285 or earlier failed test of #11224 prior to merge to master: https://github.com/sphinx-doc/sphinx/actions/runs/4354037120/jobs/7608843307

As this seems random, I have no recipe for reproducing.

Environment Information

not relevant

Sphinx extensions

No response

Additional context

No response

@jayaddison
Copy link
Contributor

Yep, I've noticed this occur a few times too. Most recently here: https://github.com/sphinx-doc/sphinx/actions/runs/5006348520/jobs/8971510673?pr=11426#step:5:1930

(taking a brief look at the relevant tests/code)

@jayaddison
Copy link
Contributor

So:

  • The test uses the 'dummy' builder, that does not emit any output (great, that helps narrow down the cause)
  • The most recent assertion failure occurred during phase1 of the test, before any files are removed and recreated (also good in terms of narrowing the cause)
  • The assertion is on the number of modified/changed files according to env.get_outdated_files
  • There are six locations within that method that call changed.add(...) -- these are the potential source(s)
    • I think that the first two of those can be ignored because the test uses the dummy builder and doesn't do anything to configure reread_always
    • Another one can be ignored, again because the builder is not writing to the target directory.

The remaining items contain two mtime-related modification checks, and one OSError error handler case.

I think it's unlikely that the OSError case is the cause (although it might be better to handle it by adding the relevant files to a new, fourth output tuple entry -- unknown -- rather than bundling them into changed).

In summary:

My best guess at the moment is that we're running into a filesystem timestamp consistency issue on some of the runners:

# store time of reading, for outdated files detection
# (Some filesystems have coarse timestamp resolution;
# therefore time.time() can be older than filesystem's timestamp.
# For example, FAT32 has 2sec timestamp resolution.)
self.env.all_docs[docname] = max(time.time(),
path.getmtime(self.env.doc2path(docname)))

And in terms of how to fix that, for test stability purposes:

Not sure yet.

@jayaddison
Copy link
Contributor

It's tricky without having a way to replicate the problem reliably, but I'd like to try reverting the ... = max(time.time(), path.getmtime(...)) logic (718993a) and going back to the simpler ... = time.time() logic that preceded it. I'm repeat-CI-testing that in a bundle of changes in jayaddison#3 at the moment.

I've read one or two bugs on the cPython bugtracker that seem to indicate that Windows timestamp granularity had some issues and has been improved since 718993a was introduced - and also I think it's probably relatively rare for people to be building Sphinx documentation on low-timestamp-granularity filesystems (particularly FAT32). So I'm thinking of it as a potential tradeoff between unit test stability and (perhaps) some rare edge cases.

@jayaddison
Copy link
Contributor

It's tricky without having a way to replicate the problem reliably, but I'd like to try reverting the ... = max(time.time(), path.getmtime(...)) logic (718993a) and going back to the simpler ... = time.time() logic that preceded it. I'm repeat-CI-testing that in a bundle of changes in jayaddison#3 at the moment.

Update: reverting 718993a does not fix the problem. For a situation where that commit is reverted and yet the test still fails (after a few previous successful test results), see here: https://github.com/jayaddison/sphinx/actions/runs/5038386691/jobs/9035841524?pr=3#step:5:1931

@jayaddison
Copy link
Contributor

I've added some debug printout code, and here is the output from it during a recent unit test run where the failure occurred:

===================
docname:  index
===================
mtime:    1684687126.3444436
newmtime: 1684686899.1642947
changed?  False

===================
docname:  bom
===================
mtime:    1684687126.3444436
newmtime: 1684686899.1642947
changed?  False

-------------------
deppath:  C:\Users\runneradmin\AppData\Local\Temp\pytest-of-runneradmin\pytest-0\builder-gettext-dont-rebuild-mo\.\xx\LC_MESSAGES\bom.mo
-------------------
mtime:    1684687126.3444436
depmtime: 1684687126.3444438
changed?  True

jayaddison added a commit to jayaddison/sphinx that referenced this issue May 21, 2023
…s seen for sphinx-doc#11232) and run a broader matrix of Python versions
@jayaddison
Copy link
Contributor

This is based on a small sample size so far, but switching to use integer nanosecond-precision timestamps instead of the existing floating-point second-precision timestamps may be helping.

If that remains true after a few more rounds of testing, then I'll open a fix pr for this issue, containing the revert of 718993a and the nanosecond-timestamp switchover.

jayaddison added a commit to jayaddison/sphinx that referenced this issue May 21, 2023
…iness was seen for sphinx-doc#11232) and run a broader matrix of Python versions"

This reverts commit 647df0a.
@jayaddison

This comment was marked as resolved.

@AA-Turner AA-Turner reopened this Jul 22, 2023
@jayaddison
Copy link
Contributor

Because the additional informational logging added for this test failure points to the byte-order-mark (bom) file as a source of the problem, I'm wondering about these unit test code lines that rewrite that file under some circumstances.

Perhaps adding further information to indicate whether that test code logic is evaluated could confirm whether it has a causal relationship with the test failures.

@jayaddison
Copy link
Contributor

What do you think about adding a debug print between these lines to determine whether that's a potential cause, @AA-Turner? (I've drafted a commit for that at jayaddison@5924a4d )

@jayaddison
Copy link
Contributor

Taking another look at this currently; I'm going to repeat-run CI on my fork with 5924a4d in place to see whether that helps to track anything down.

jayaddison added a commit to jayaddison/sphinx that referenced this issue Sep 14, 2023
@jayaddison
Copy link
Contributor

This build job failed and the additional output was:

# compiling .mo file C:\Users\runneradmin\AppData\Local\Temp\pytest-of-runneradmin\pytest-0\builder-gettext-dont-rebuild-mo\xx\LC_MESSAGES\xx\LC_MESSAGES\bom.mo

I'm not sure yet whether that helps trace this down. I'm trying to refresh my context about how all this works.

@jayaddison
Copy link
Contributor

Does anyone have any thoughts on whether we could/should mark this test as xfail on non-posix platforms and reinvestigate at a later date?

@AA-Turner
Copy link
Member

We have strict xfail turned on, so that would introduce stochastic failures in the other direction, unhelpfully.

A

@jayaddison
Copy link
Contributor

Ok - although that only changes the default, I think - so we could specify strict=False in the marker.

I'll admit to likely truth of: it's the kind of bug/issue that once hidden, might not really be fixed later (or not until 'much, much' later at least). (subtle, possibly complicated, potentially time-consuming to track down, and seemingly low-impact in terms of production usage (although arguably important for continuous integration reporting))

@picnixz
Copy link
Member

picnixz commented Feb 24, 2024

Fixed in #11940

@picnixz picnixz closed this as completed Feb 24, 2024
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 27, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
4 participants