Test failures with Python 3.12.0b1 #1005

mgorny · 2023-05-25T16:20:45Z

Overview Description

The test suite fails when run with Python 3.12.0b1:

FAILED tests/messages/test_extract.py::ExtractPythonTestCase::test_utf8_message_with_utf8_bom -   File "<string>", line 1
FAILED tests/messages/test_extract.py::ExtractPythonTestCase::test_utf8_message_with_utf8_bom_and_magic_comment -   File "<string>", line 1
FAILED tests/messages/test_extract.py::ExtractPythonTestCase::test_utf8_raw_strings_match_unicode_strings -   File "<string>", line 1
FAILED tests/messages/test_extract.py::ExtractTestCase::test_f_strings - AssertionError: assert 3 == 4
FAILED tests/messages/test_extract.py::ExtractTestCase::test_f_strings_non_utf8 - assert 0 == 1

Furthermore, tox -e py312 fails by default because of missing distutils module (installing setuptools can workaround that but distutils use should be removed altogether).

Steps to Reproduce

tox -e py312

Actual Results

________________________________________ ExtractPythonTestCase.test_utf8_message_with_utf8_bom ________________________________________

self = <tests.messages.test_extract.ExtractPythonTestCase testMethod=test_utf8_message_with_utf8_bom>

        def test_utf8_message_with_utf8_bom(self):
            buf = BytesIO(codecs.BOM_UTF8 + """
    # NOTE: hello
    msg = _('Bonjour à tous')
    """.encode('utf-8'))
>           messages = list(extract.extract_python(buf, ('_',), ['NOTE:'], {}))

tests/messages/test_extract.py:367: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
babel/messages/extract.py:500: in extract_python
    for tok, value, (lineno, _), _, _ in tokens:
/usr/lib/python3.12/tokenize.py:451: in _tokenize
    for token in _generate_tokens_from_c_tokenizer(source, extra_tokens=True):
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

source = "\ufeff\n# NOTE: hello\nmsg = _('Bonjour à tous')\n", extra_tokens = True

    def _generate_tokens_from_c_tokenizer(source, extra_tokens=False):
        """Tokenize a source reading Python code as unicode strings using the internal C tokenizer"""
        import _tokenize as c_tokenizer
>       for info in c_tokenizer.TokenizerIter(source, extra_tokens=extra_tokens):
E         File "<string>", line 1
E           
E           ^
E       SyntaxError: invalid non-printable character U+FEFF

/usr/lib/python3.12/tokenize.py:542: SyntaxError
_______________________________ ExtractPythonTestCase.test_utf8_message_with_utf8_bom_and_magic_comment _______________________________

self = <tests.messages.test_extract.ExtractPythonTestCase testMethod=test_utf8_message_with_utf8_bom_and_magic_comment>

        def test_utf8_message_with_utf8_bom_and_magic_comment(self):
            buf = BytesIO(codecs.BOM_UTF8 + """# -*- coding: utf-8 -*-
    # NOTE: hello
    msg = _('Bonjour à tous')
    """.encode('utf-8'))
>           messages = list(extract.extract_python(buf, ('_',), ['NOTE:'], {}))

tests/messages/test_extract.py:376: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
babel/messages/extract.py:500: in extract_python
    for tok, value, (lineno, _), _, _ in tokens:
/usr/lib/python3.12/tokenize.py:451: in _tokenize
    for token in _generate_tokens_from_c_tokenizer(source, extra_tokens=True):
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

source = "\ufeff# -*- coding: utf-8 -*-\n# NOTE: hello\nmsg = _('Bonjour à tous')\n", extra_tokens = True

    def _generate_tokens_from_c_tokenizer(source, extra_tokens=False):
        """Tokenize a source reading Python code as unicode strings using the internal C tokenizer"""
        import _tokenize as c_tokenizer
>       for info in c_tokenizer.TokenizerIter(source, extra_tokens=extra_tokens):
E         File "<string>", line 1
E           # -*- coding: utf-8 -*-
E           ^
E       SyntaxError: invalid non-printable character U+FEFF

/usr/lib/python3.12/tokenize.py:542: SyntaxError
__________________________________ ExtractPythonTestCase.test_utf8_raw_strings_match_unicode_strings __________________________________

self = <tests.messages.test_extract.ExtractPythonTestCase testMethod=test_utf8_raw_strings_match_unicode_strings>

        def test_utf8_raw_strings_match_unicode_strings(self):
            buf = BytesIO(codecs.BOM_UTF8 + """
    msg = _('Bonjour à tous')
    msgu = _(u'Bonjour à tous')
    """.encode('utf-8'))
>           messages = list(extract.extract_python(buf, ('_',), ['NOTE:'], {}))

tests/messages/test_extract.py:393: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
babel/messages/extract.py:500: in extract_python
    for tok, value, (lineno, _), _, _ in tokens:
/usr/lib/python3.12/tokenize.py:451: in _tokenize
    for token in _generate_tokens_from_c_tokenizer(source, extra_tokens=True):
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

source = "\ufeff\nmsg = _('Bonjour à tous')\nmsgu = _(u'Bonjour à tous')\n", extra_tokens = True

    def _generate_tokens_from_c_tokenizer(source, extra_tokens=False):
        """Tokenize a source reading Python code as unicode strings using the internal C tokenizer"""
        import _tokenize as c_tokenizer
>       for info in c_tokenizer.TokenizerIter(source, extra_tokens=extra_tokens):
E         File "<string>", line 1
E           
E           ^
E       SyntaxError: invalid non-printable character U+FEFF

/usr/lib/python3.12/tokenize.py:542: SyntaxError
___________________________________________________ ExtractTestCase.test_f_strings ____________________________________________________

self = <tests.messages.test_extract.ExtractTestCase testMethod=test_f_strings>

        def test_f_strings(self):
            buf = BytesIO(br"""
    t1 = _('foobar')
    t2 = _(f'spameggs' f'feast')  # should be extracted; constant parts only
    t2 = _(f'spameggs' 'kerroshampurilainen')  # should be extracted (mixing f with no f)
    t3 = _(f'''whoa! a '''  # should be extracted (continues on following lines)
    f'flying shark'
        '... hello'
    )
    t4 = _(f'spameggs {t1}')  # should not be extracted
    """)
            messages = list(extract.extract('python', buf, extract.DEFAULT_KEYWORDS, [], {}))
>           assert len(messages) == 4
E           AssertionError: assert 3 == 4
E            +  where 3 = len([(2, 'foobar', [], None), (4, 'kerroshampurilainen', [], None), (5, '... hello', [], None)])

tests/messages/test_extract.py:544: AssertionError
_______________________________________________ ExtractTestCase.test_f_strings_non_utf8 _______________________________________________

self = <tests.messages.test_extract.ExtractTestCase testMethod=test_f_strings_non_utf8>

        def test_f_strings_non_utf8(self):
            buf = BytesIO(b"""
    # -- coding: latin-1 --
    t2 = _(f'\xe5\xe4\xf6' f'\xc5\xc4\xd6')
    """)
            messages = list(extract.extract('python', buf, extract.DEFAULT_KEYWORDS, [], {}))
>           assert len(messages) == 1
E           assert 0 == 1
E            +  where 0 = len([])

tests/messages/test_extract.py:556: AssertionError

Expected Results

Passing tests (or at least passing as well as py3.11 did).

Reproducibility

Always.

Additional Information

Confirmed with git 8b152db.

The text was updated successfully, but these errors were encountered:

mgorny · 2023-05-28T19:32:53Z

I've digged a bit since there were some regressions in Python 3.12's tokenizer but this doesn't seem to be done. FWICS Babel is decoding the BOM into U+FEFF, and then passing it into generate_tokens().

Note that in Python 3.11 this returned ERRORTOKEN:

>>> list(tokenize.generate_tokens(io.StringIO('\ufeff\n').readline))
[TokenInfo(type=60 (ERRORTOKEN), string='\ufeff', start=(1, 0), end=(1, 1), line='\ufeff\n'), TokenInfo(type=4 (NEWLINE), string='\n', start=(1, 1), end=(1, 2), line='\ufeff\n'), TokenInfo(type=0 (ENDMARKER), string='', start=(2, 0), end=(2, 0), line='')]

whereas in Python 3.12 it raises a SyntaxError:

>>> list(tokenize.generate_tokens(io.StringIO('\ufeff\n').readline))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.12/tokenize.py", line 451, in _tokenize
    for token in _generate_tokens_from_c_tokenizer(source, extra_tokens=True):
  File "/usr/lib/python3.12/tokenize.py", line 542, in _generate_tokens_from_c_tokenizer
    for info in c_tokenizer.TokenizerIter(source, extra_tokens=extra_tokens):
  File "<string>", line 1
    
    ^
SyntaxError: invalid non-printable character U+FEFF

CPython itself is stripping BOM as part of the encoding detection, before starting to decode the source for tokenization. Babel probably needs to do the same.

vstinner · 2023-09-20T12:18:58Z

Are you still able to reproduce the issue with the just released Python 3.12.0rc3 version? The issue was created at May 25, I supposed that Python 3.12.0 beta1 was tested. But bugs were fixed in the meanwhile.

I get a different behavior with #1005 (comment) example and Python 3.12.0rc2.

bug.py:

import io, tokenize
print(list(tokenize.generate_tokens(io.StringIO('\ufeff\n').readline)))

Output:

$ python3.11 bug.py 
[TokenInfo(type=60 (ERRORTOKEN), string='\ufeff', start=(1, 0), end=(1, 1), line='\ufeff\n'), TokenInfo(type=4 (NEWLINE), string='\n', start=(1, 1), end=(1, 2), line='\ufeff\n'), TokenInfo(type=0 (ENDMARKER), string='', start=(2, 0), end=(2, 0), line='')]

$ python3.12 bug.py 
[TokenInfo(type=1 (NAME), string='\ufeff', start=(1, 0), end=(1, 1), line='\ufeff\n'), TokenInfo(type=4 (NEWLINE), string='\n', start=(1, 1), end=(1, 2), line='\ufeff\n'), TokenInfo(type=0 (ENDMARKER), string='', start=(2, 0), end=(2, 0), line='')]

$ python3.11 -VV
Python 3.11.5 (main, Aug 28 2023, 00:00:00) [GCC 13.2.1 20230728 (Red Hat 13.2.1-1)]

$ python3.12 -VV
Python 3.12.0rc2 (main, Sep  6 2023, 00:00:00) [GCC 13.2.1 20230728 (Red Hat 13.2.1-1)]

I don't get a SyntaxError.

Using the REPL:

$ python3.12
Python 3.12.0rc2 (main, Sep  6 2023, 00:00:00) [GCC 13.2.1 20230728 (Red Hat 13.2.1-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tokenize, io
>>> list(tokenize.generate_tokens(io.StringIO('\ufeff\n').readline))
[TokenInfo(type=1 (NAME), string='\ufeff', start=(1, 0), end=(1, 1), line='\ufeff\n'), TokenInfo(type=4 (NEWLINE), string='\n', start=(1, 1), end=(1, 2), line='\ufeff\n'), TokenInfo(type=0 (ENDMARKER), string='', start=(2, 0), end=(2, 0), line='')]

mgorny · 2023-09-20T16:44:40Z

The first three failures seem to be gone. These two seem to remain (plus the missing setuptools dependency):

___________________________________________________ ExtractTestCase.test_f_strings ____________________________________________________

self = <tests.messages.test_extract.ExtractTestCase testMethod=test_f_strings>

        def test_f_strings(self):
            buf = BytesIO(br"""
    t1 = _('foobar')
    t2 = _(f'spameggs' f'feast')  # should be extracted; constant parts only
    t2 = _(f'spameggs' 'kerroshampurilainen')  # should be extracted (mixing f with no f)
    t3 = _(f'''whoa! a '''  # should be extracted (continues on following lines)
    f'flying shark'
        '... hello'
    )
    t4 = _(f'spameggs {t1}')  # should not be extracted
    """)
            messages = list(extract.extract('python', buf, extract.DEFAULT_KEYWORDS, [], {}))
>           assert len(messages) == 4
E           AssertionError: assert 3 == 4
E            +  where 3 = len([(2, 'foobar', [], None), (4, 'kerroshampurilainen', [], None), (5, '... hello', [], None)])

tests/messages/test_extract.py:544: AssertionError
_______________________________________________ ExtractTestCase.test_f_strings_non_utf8 _______________________________________________

self = <tests.messages.test_extract.ExtractTestCase testMethod=test_f_strings_non_utf8>

        def test_f_strings_non_utf8(self):
            buf = BytesIO(b"""
    # -- coding: latin-1 --
    t2 = _(f'\xe5\xe4\xf6' f'\xc5\xc4\xd6')
    """)
            messages = list(extract.extract('python', buf, extract.DEFAULT_KEYWORDS, [], {}))
>           assert len(messages) == 1
E           assert 0 == 1
E            +  where 0 = len([])

tests/messages/test_extract.py:556: AssertionError

vstinner · 2023-09-20T16:49:55Z

To install distutils on Python 3.12, you can use this change:

diff --git a/tox.ini b/tox.ini
index 11cca0c..7c4d56a 100644
--- a/tox.ini
+++ b/tox.ini
@@ -11,6 +11,7 @@ deps =
     backports.zoneinfo;python_version<"3.9"
     tzdata;sys_platform == 'win32'
     pytz: pytz
+    setuptools;python_version>="3.12"
 allowlist_externals = make
 commands = make clean-cldr test
 setenv =

encukou · 2023-09-21T11:35:42Z

Here's a PR for the f-string parsing: #1027

akx · 2023-10-01T11:16:26Z

#1027 was just merged and we're now running CI on 3.12 too as of #1028. Thanks all! ❤️

akx · 2023-10-03T07:29:54Z

Released in https://pypi.org/project/Babel/2.13.0/ just now 🎉

oprypin · 2023-10-07T10:34:48Z

Regarding #1005 (comment), adding the "setuptools" dependency only for CI was not the correct solution, because it's the package itself that depends on it, so CI of other projects (and actual local usages) will still break. I opened issue #1031 and a pull request accordingly.

yselkowitz mentioned this issue Jul 24, 2023

TASK: fix F39FTBFS fedora-eln/eln#153

Closed

53 tasks

encukou mentioned this issue Sep 21, 2023

Add f-string parsing for Python 3.12 (PEP 701) #1027

Merged

akx self-assigned this Oct 1, 2023

akx mentioned this issue Oct 1, 2023

Renovate CI & tools #1028

Merged

akx closed this as completed Oct 1, 2023

vstinner mentioned this issue Oct 7, 2023

distutils no longer part of Python 3.12 - error due to missing dependency #1031

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test failures with Python 3.12.0b1 #1005

Test failures with Python 3.12.0b1 #1005

mgorny commented May 25, 2023

mgorny commented May 28, 2023

vstinner commented Sep 20, 2023

mgorny commented Sep 20, 2023

vstinner commented Sep 20, 2023

encukou commented Sep 21, 2023

akx commented Oct 1, 2023

akx commented Oct 3, 2023

oprypin commented Oct 7, 2023

Test failures with Python 3.12.0b1 #1005

Test failures with Python 3.12.0b1 #1005

Comments

mgorny commented May 25, 2023

Overview Description

Steps to Reproduce

Actual Results

Expected Results

Reproducibility

Additional Information

mgorny commented May 28, 2023

vstinner commented Sep 20, 2023

mgorny commented Sep 20, 2023

vstinner commented Sep 20, 2023

encukou commented Sep 21, 2023

akx commented Oct 1, 2023

akx commented Oct 3, 2023

oprypin commented Oct 7, 2023