Test failures: Python 3.12-dev: pycode.parser.DefinitionFinder line numbering assertion failures #11436

jayaddison · 2023-05-22T16:52:32Z

Describe the bug

With recent development builds of Python 3.12, a few parsing-related tests have begun failing. These are built upon Python's tokenizer module and are implemented in the pycode.parser.DefinitionFinder class.

The failures appear to relate to the logic that determines the ending line number for definitions.

Example build logs:

cc @picnixz

How to Reproduce

Minimal repro steps pending; in the meantime, this appears reproducible for most pushes/pull requests that initiate GitHub Actions unit test workflows from this repository.

Environment Information

Python 3.12.0a7+

(sourced from: https://github.com/sphinx-doc/sphinx/actions/runs/4974642742/jobs/8901224179?pr=11423#step:5:10)

Sphinx extensions

N/A

Additional context

Some discussion from today (2023-05-22) in #11435.

The text was updated successfully, but these errors were encountered:

mgmacias95 · 2023-05-22T17:29:34Z

A consequence of porting the Python based implementation of the tokenizer to C, is that now dedents are limited to existing lines in the file. For example, the following python code:

if foo:
    if bar:
        pass

Produces the following output before the change:

$ python -m tokenize < test.py 
1,0-1,2:            NAME           'if'           
1,3-1,6:            NAME           'foo'          
1,6-1,7:            OP             ':'            
1,7-1,8:            NEWLINE        '\n'           
2,0-2,4:            INDENT         '    '         
2,4-2,6:            NAME           'if'           
2,7-2,10:           NAME           'bar'          
2,10-2,11:          OP             ':'            
2,11-2,12:          NEWLINE        '\n'           
3,0-3,8:            INDENT         '        '     
3,8-3,12:           NAME           'pass'         
3,12-3,13:          NEWLINE        '\n'           
4,0-4,0:            DEDENT         ''             
4,0-4,0:            DEDENT         ''             
4,0-4,0:            ENDMARKER      ''

And produces this output after them:

$ python.exe -m tokenize < lel.py 
1,0-1,2:            NAME           'if'           
1,3-1,6:            NAME           'foo'          
1,6-1,7:            OP             ':'            
1,7-1,8:            NEWLINE        '\n'           
2,0-2,4:            INDENT         '    '         
2,4-2,6:            NAME           'if'           
2,7-2,10:           NAME           'bar'          
2,10-2,11:          OP             ':'            
2,11-2,12:          NEWLINE        '\n'           
3,0-3,8:            INDENT         '        '     
3,8-3,12:           NAME           'pass'         
3,12-3,13:          NEWLINE        '\n'           
3,13-3,13:          DEDENT         ''             
3,13-3,13:          DEDENT         ''             
4,0-4,0:            ENDMARKER      ''

Notice the difference in the last two DEDENT tokens. In the first output, these DEDENTS are marked in line 4 and in the second, they are marked in line 3. Notice the file only has 3 lines.

Checking this code, it seems doing -1 is no longer necessary:

sphinx/sphinx/pycode/parser.py

Lines 528 to 530 in d48cc78

    
           end_pos = self.current.end[0] - 1 
        
           while emptyline_re.match(self.get_line(end_pos)): 
        
               end_pos -= 1

jayaddison · 2023-05-22T18:06:18Z

That is brilliant - thank you very much, @mgmacias95!

…e references on py3.12+ This adjustment is no longer required since accurate line-end information is emitted as of Python 3.12.0a7 Refs: sphinx-doc#11436

Refs: sphinx-doc#11436

jayaddison · 2023-05-22T19:43:37Z

I seem to be thrashing around at potential solutions here (I don't have a local install of 3.12.0a7, so I've been leaning on the continuous integration results - but three attempts is the limit of how much of my misunderstanding (and GitHub's CPU time) that I'm willing to spend). I should take a rest and re-read the behaviour changes (and/or let someone else step in with a better fix).

jayaddison · 2023-05-23T10:32:29Z

With the change from 95985e3 applied (essentially: add a special-case for py3.12+ when dedent tokens are found on the ending line of the file), 8 of the 12 failing tests begin passing.

One of the remaining tests is test_LiteralIncludeReader_pyobject2, where code is tokenized and the we attempt to filter the output to include only the Bar object definition.

Again from commit 95985e3, that test fails because some unexpected trailing code is included in the output (diff as found by pytest shown below):

    class Bar:
        def baz():
            pass
  + 
  + # comment after Bar class definition
  + def bar(): pass

As an experiment, I ran the python -m tokenize module on the relevant test code file (tests/roots/test-directive-code/literal.inc) for the matrix of Python versions that Sphinx uses for testing. The output of that tokenization appears to be identical in all cases:

# copied and pasted the output of each 'Tokenize sample file (for issue 11436)' step into local files
$ sha256sum sphinx-11436-py3*
a2fa3da51720b54caef9f539338a7a220aa4c43c8874675698f684d544c9570d  sphinx-11436-py310
a2fa3da51720b54caef9f539338a7a220aa4c43c8874675698f684d544c9570d  sphinx-11436-py311
a2fa3da51720b54caef9f539338a7a220aa4c43c8874675698f684d544c9570d  sphinx-11436-py312
a2fa3da51720b54caef9f539338a7a220aa4c43c8874675698f684d544c9570d  sphinx-11436-py38
a2fa3da51720b54caef9f539338a7a220aa4c43c8874675698f684d544c9570d  sphinx-11436-py39

Refs: sphinx-doc#11436

jayaddison · 2023-05-23T15:26:56Z

It seems that the 'navigate back one line' (subtract one from the dedent line position) logic in the DefinitionFinder exists because dedent tokens often (always?) have an end line number that is associated with a non-empty line of code (that sorta makes sense: most/all indented code blocks in Python have to be implicitly closed by a subsequent, less-indented code line).

Since the code intends to find the start/end range of the preceding code block, it should always omit that non-empty line, and then also omit any empty lines between there and the end of the definition that is being processed.

The potential fix in #11440 attempts to handle the situation without relying on Python version number checks.

jayaddison · 2023-05-28T19:28:38Z

I think this has been resolved by an adjustment from python/cpython#104980 (could you double-check me on that @mgmacias95?).

jayaddison mentioned this issue May 22, 2023

gh-102856: Python tokenizer implementation for PEP 701 python/cpython#104323

Merged

jayaddison added a commit to jayaddison/sphinx that referenced this issue May 22, 2023

Compatibility: pycode parser: adjust dedent-token handling for py3.12+

95985e3

Refs: sphinx-doc#11436

jayaddison added a commit to jayaddison/sphinx that referenced this issue May 23, 2023

Compatibility: pycode parser: adjust dedent-token handling for py3.12+

1417227

Refs: sphinx-doc#11436

jayaddison mentioned this issue May 23, 2023

Compatibility: pycode parser: adjust dedent-token handling to maintain py3.12+ compatibility #11440

Closed

jayaddison closed this as completed May 28, 2023

jayaddison mentioned this issue May 28, 2023

W391: spurious warnings with python 3.12 beta PyCQA/pycodestyle#1142

Closed

github-actions bot locked as resolved and limited conversation to collaborators Jun 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test failures: Python 3.12-dev: pycode.parser.DefinitionFinder line numbering assertion failures #11436

Test failures: Python 3.12-dev: pycode.parser.DefinitionFinder line numbering assertion failures #11436

jayaddison commented May 22, 2023

mgmacias95 commented May 22, 2023

jayaddison commented May 22, 2023

jayaddison commented May 22, 2023

jayaddison commented May 23, 2023

jayaddison commented May 23, 2023

jayaddison commented May 28, 2023

Test failures: Python 3.12-dev: pycode.parser.DefinitionFinder line numbering assertion failures #11436

Test failures: Python 3.12-dev: pycode.parser.DefinitionFinder line numbering assertion failures #11436

Comments

jayaddison commented May 22, 2023

Describe the bug

How to Reproduce

Environment Information

Sphinx extensions

Additional context

mgmacias95 commented May 22, 2023

jayaddison commented May 22, 2023

jayaddison commented May 22, 2023

jayaddison commented May 23, 2023

jayaddison commented May 23, 2023

jayaddison commented May 28, 2023