Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: adrienverge/yamllint
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: v1.36.2
Choose a base ref
...
head repository: adrienverge/yamllint
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: v1.37.0
Choose a head ref
  • 10 commits
  • 17 files changed
  • 2 contributors

Commits on Mar 21, 2025

  1. CI: Publish each master commit with a unique version on TestPyPI

    This follows commit 7d52df7 "CI: Ignore version duplicates when
    publishing to TestPyPI" with a better design:
    - Only publish builds from `master` branch on TestPyPI.
    - Version every non-tag commit with a `.devN` suffix, e.g. `1.36.2.dev1`.
      This prevents duplicates on TestPyPI.
    - `twine check` built packages.
    
    See discussion at
    #721 (comment)
    for more details.
    adrienverge committed Mar 21, 2025
    Copy the full SHA
    325fafa View commit details

Commits on Mar 23, 2025

  1. tests: Use correct encoding for path

    Before this change, build_temp_workspace() would always encode a path
    using UTF-8 and the strict error handler [1]. Most of the time, this is
    fine, but systems do not necessarily use UTF-8 and the strict error
    handler for paths [2].
    
    [1]: <https://docs.python.org/3.12/library/stdtypes.html#str.encode>
    [2]: <https://docs.python.org/3.12/glossary.html#term-filesystem-encoding-and-error-handler>
    Jayman2000 authored and adrienverge committed Mar 23, 2025
    Copy the full SHA
    5f57f9e View commit details
  2. tests: Restore stdout

    Before this commit, test_run_default_format_output_in_tty() changed the
    value of sys.stdout, but it would never change it back to the original
    value. This commit makes sure that it gets changed back.
    
    At the moment, this commit doesn’t make a user-visible difference. A
    future commit will add a new test named
    test_ignored_from_file_with_multiple_encodings(). That new test requires
    that stdout gets restored, or else it will fail.
    Jayman2000 authored and adrienverge committed Mar 23, 2025
    Copy the full SHA
    82a57b7 View commit details
  3. tests: Move code for deleting env vars to __init__

    The motivation behind this change is to make it easier to create a
    future commit. That future commit will make yamllint change its behavior
    if an environment variable named YAMLLINT_FILE_ENCODING is found. That
    new environment variable will potentially cause interference with many
    different tests.
    
    Before this change, environment variables would only be deleted when the
    tests.test_cli module was used. At the moment, it’s OK to do that
    because that’s the only test module that will fail if certain
    environment variables are set. Once yamllint is updated to look for the
    YAMLLINT_FILE_ENCODING variable, pretty much every test will be likely
    to fail if YAMLLINT_FILE_ENCODING is set to a certain values. This
    change makes the code for deleting environment variables get run for all
    tests (not just tests.test_cli).
    
    As an alternative, we could have kept most of the code for deleting
    environment variables in tests/test_cli.py, and only included code for
    deleting YAMLLINT_FILE_ENCODING in tests/__init__.py. I decided to put
    all of the environment variable deletion code in tests/__init__.py in
    order to make things more consistent and easier to understand.
    
    I had also considered adding a function for deleting environment
    variables to tests/common.py and then adding this to every test module
    that needs to have environment variables deleted:
    
    	from tests.common import remove_env_vars_that_might_interfere
    	setUpModule = remove_env_vars_that_might_interfere()
    
    I decided to not do that because pretty much every single test module
    will fail if YAMLLINT_FILE_ENCODING is set to certain values, and
    there’s a lot of test modules.
    Jayman2000 authored and adrienverge committed Mar 23, 2025

    Unverified

    This commit is not signed, but one or more authors requires that any commit attributed to them is signed.
    Copy the full SHA
    0b3abe5 View commit details
  4. decoder: Autodetect encoding of most YAML files

    Before this change, yamllint would open YAML files using open()’s
    default encoding. As long as UTF-8 mode isn’t enabled, open() defaults
    to using the system’s locale encoding [1][2]. This can cause problems in
    multiple different scenarios.
    
    The first scenario involves linting UTF-8 YAML files on Linux systems.
    Most of the time, the locale encoding on Linux systems is set to UTF-8
    [3][4], but it can be set to something else [5]. In the unlikely event
    that someone was using Linux with a locale encoding other than UTF-8,
    there was a chance that yamllint would crash with a UnicodeDecodeError.
    
    The second scenario involves linting UTF-8 YAML files on Windows
    systems. The locale encoding on Windows systems is the system’s ANSI
    code page [6]. The ANSI code page on Windows systems is NOT set to UTF-8
    by default [7]. In the very likely event that someone was using Windows
    with a locale encoding other than UTF-8, there was a chance that
    yamllint would crash with a UnicodeDecodeError.
    
    Additionally, using open()’s default encoding is a violation of the YAML
    spec. Chapter 5.2 says:
    
    	“On input, a YAML processor must support the UTF-8 and UTF-16
    	character encodings. For JSON compatibility, the UTF-32
    	encodings must also be supported.
    
    	If a character stream begins with a byte order mark, the
    	character encoding will be taken to be as indicated by the byte
    	order mark. Otherwise, the stream must begin with an ASCII
    	character. This allows the encoding to be deduced by the pattern
    	of null (x00) characters.” [8]
    
    In most cases, this change fixes all of those problems by implementing
    the YAML spec’s character encoding detection algorithm. Now, as long as
    YAML files begin with either a byte order mark or an ASCII character,
    yamllint will (in most cases) automatically detect them as being UTF-8,
    UTF-16 or UTF-32. Other character encodings are not supported at the
    moment.
    
    Even with this change, there is still one specific situation where
    yamllint still uses the wrong character encoding. Specifically, this
    change does not affect the character encoding used for stdin. This means
    that at the moment, these two commands may use different character
    encodings when decoding file.yaml:
    
    	$ yamllint file.yaml
    	$ cat file.yaml | yamllint -
    
    A future commit will update yamllint so that it uses the same character
    encoding detection algorithm for stdin.
    
    It’s possible that this change will break things for existing yamllint
    users. This change allows users to use the YAMLLINT_FILE_ENCODING to
    override the autodetection algorithm just in case they’ve been using
    yamllint on weird nonstandard YAML files.
    
    Credit for the idea of having tests with pre-encoded strings and having
    an environment variable for overriding the character encoding
    autodetection algorithm goes to @adrienverge [9].
    
    Fixes #218. Fixes #238. Fixes #347.
    
    [1]: <https://docs.python.org/3.12/library/functions.html#open>
    [2]: <https://docs.python.org/3.12/library/os.html#utf8-mode>
    [3]: <https://www.gnu.org/software/libc/manual/html_node/Extended-Char-Intro.html>
    [4]: <https://wiki.musl-libc.org/functional-differences-from-glibc.html#Character-sets-and-locale>
    [5]: <https://sourceware.org/git/?p=glibc.git;a=blob;f=localedata/SUPPORTED;h=c8b63cc2fe2b4547f2fb1bff6193da68d70bd563;hb=36f2487f13e3540be9ee0fb51876b1da72176d3f>
    [6]: <https://docs.python.org/3.12/glossary.html#term-locale-encoding>
    [7]: <https://learn.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-code-page>
    [8]: <https://yaml.org/spec/1.2.2/#52-character-encodings>
    [9]: <#630 (comment)>
    Jayman2000 authored and adrienverge committed Mar 23, 2025
    Copy the full SHA
    a53fa80 View commit details
  5. decoder: Autodetect decoding of stdin

    Before this change, yamllint would use a character encoding
    autodetection algorithm in order to determine the character encoding of
    all YAML files that it processed, unless the YAML file was sent to
    yamllint via stdin. This change makes it so that yamllint always uses
    the character encoding detection algorithm, even if the YAML file is
    sent to yamllint via stdin.
    
    Before this change, one of yamllint’s tests would replace sys.stdin with
    a StringIO object. This change makes it so that that test replaces
    sys.stdin with a file object instead of a StringIO object. Before this
    change, it was OK to use a StringIO object because yamllint never tried
    to access sys.stdin.buffer. It’s no longer OK to use a StringIO because
    yamllint now tries to access sys.stdin.buffer. File objects do have a
    buffer attribute, so we can use a file object instead.
    Jayman2000 authored and adrienverge committed Mar 23, 2025
    Copy the full SHA
    8e3a3b3 View commit details
  6. decoder: Autodetect encoding for ignore-from-file

    Before this change, yamllint would decode files on the ignore-from-file
    list using open()’s default encoding [1][2]. This can cause decoding to
    fail in some situations (see the previous commit message for details).
    
    This change makes yamllint automatically detect the encoding for files
    on the ignore-from-file list. It uses the same algorithm that it uses
    for detecting the encoding of YAML files, so the same limitations apply:
    files must use UTF-8, UTF-16 or UTF-32 and they must begin with either a
    byte order mark or an ASCII character.
    
    [1]: <https://docs.python.org/3.12/library/fileinput.html#fileinput.input>
    [2]: <https://docs.python.org/3.12/library/fileinput.html#fileinput.FileInput>
    Jayman2000 authored and adrienverge committed Mar 23, 2025
    Copy the full SHA
    fd58e6b View commit details
  7. tests: Stop using open()’s default encoding

    In general, using open()’s default encoding is a mistake [1]. This
    change makes sure that every time open() is called, the encoding
    parameter is specified. Specifically, it makes it so that all tests
    succeed when run like this:
    
    	python -X warn_default_encoding -W error::EncodingWarning -m unittest discover
    
    [1]: <https://peps.python.org/pep-0597/#using-the-default-encoding-is-a-common-mistake>
    Jayman2000 authored and adrienverge committed Mar 23, 2025
    Copy the full SHA
    4d7be6d View commit details
  8. CI: Fail when open()’s default encoding is used

    The previous few commits have removed all calls to open() that use its
    default encoding. That being said, it’s still possible that code added
    in the future will contain that same mistake. This commit makes it so
    that the CI test job will fail if that mistake is made again.
    
    Unfortunately, it doesn’t look like coverage.py allows you to specify -X
    options [1] or warning filters [2] when running your tests [3]. To work
    around this problem, I’m running all of the Python code, including
    coverage.py itself, with -X warn_default_encoding and
    -W error::EncodingWarning. As a result, the CI test job will also fail
    if coverage.py uses open()’s default encoding. Hopefully, coverage.py
    won’t do that. If it does, then we can always temporarily revert this
    commit.
    
    [1]: <https://docs.python.org/3.12/using/cmdline.html#cmdoption-X>
    [2]: <https://docs.python.org/3.12/using/cmdline.html#cmdoption-W>
    [3]: <https://coverage.readthedocs.io/en/7.4.0/cmd.html#execution-coverage-run>
    Jayman2000 authored and adrienverge committed Mar 23, 2025
    Copy the full SHA
    8323394 View commit details
  9. yamllint version 1.37.0

    adrienverge committed Mar 23, 2025
    Copy the full SHA
    be92e15 View commit details
18 changes: 14 additions & 4 deletions .github/workflows/publish.yaml
Original file line number Diff line number Diff line change
@@ -8,19 +8,29 @@ jobs:
build:
name: Build distribution package
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v4
with:
persist-credentials: false
- name: Fetch tags
if: github.ref_type != 'tag'
run: git fetch --prune --unshallow --tags
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.x"
- name: Install pypa/build
run: python -m pip install --user build
run: python -m pip install build twine
- name: Add '.devN' to version for non-tag builds
if: github.ref_type != 'tag'
run:
sed -i
"/^APP_VERSION = /s/'$/.dev$(git describe --tags | cut -d- -f2)'/"
yamllint/__init__.py
- name: Build a binary wheel and a source tarball
run: python -m build
- name: Twine check the distribution packages
run: python -Im twine check --strict dist/yamllint-*
- name: Store the distribution packages
uses: actions/upload-artifact@v4
with:
@@ -29,6 +39,7 @@ jobs:

publish-to-testpypi:
name: Publish distribution package to TestPyPI
if: github.ref_name == github.event.repository.default_branch
needs: build
runs-on: ubuntu-latest
environment:
@@ -46,11 +57,10 @@ jobs:
uses: pypa/gh-action-pypi-publish@release/v1
with:
repository-url: https://test.pypi.org/legacy/
skip-existing: true

publish-to-pypi:
name: Publish distribution package to PyPI
if: startsWith(github.ref, 'refs/tags/') # only for tags
if: github.ref_type == 'tag'
needs: build
runs-on: ubuntu-latest
environment:
9 changes: 8 additions & 1 deletion .github/workflows/tests.yaml
Original file line number Diff line number Diff line change
@@ -57,6 +57,13 @@ jobs:
- run: pip install .
# https://github.com/AndreMiras/coveralls-python-action/issues/18
- run: echo -e "[run]\nrelative_files = True" > .coveragerc
- run: coverage run -m unittest discover
- run: >-
python
-X warn_default_encoding
-W error::EncodingWarning
-m coverage
run
-m unittest
discover
- name: Coveralls
uses: AndreMiras/coveralls-python-action@develop
6 changes: 6 additions & 0 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,12 @@
Changelog
=========

1.37.0 (2025-03-23)
-------------------

- Automatically detect Unicode character encoding of files
- Publish pushes to master branch to TestPyPI

1.36.2 (2025-03-17)
-------------------

36 changes: 36 additions & 0 deletions docs/character_encoding.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
Character Encoding
==================

When yamllint reads a file (whether its a configuration file or a file to
lint), yamllint will try to automatically detect that file’s character
encoding. In order for the automatic detection to work properly, files must
follow these two rules (see `this section of the YAML specification for details
<https://yaml.org/spec/1.2.2/#52-character-encodings>`_):

* The file must be encoded in UTF-8, UTF-16 or UTF-32.

* The file must begin with either a byte order mark or an ASCII character.

Override character encoding
---------------------------

Previous versions of yamllint did not try to autodetect the character encoding
of files. Previous versions of yamllint assumed that files used the current
locale’s character encoding. This meant that older versions of yamllint would
sometimes correctly decode files that didn’t follow those two rules. For the
sake of backwards compatibility, the current version of yamllint allows you to
disable automatic character encoding detection by setting the
``YAMLLINT_FILE_ENCODING`` environment variable. If you set the
``YAMLLINT_FILE_ENCODING`` environment variable to the `the name of one of
Python’s standard character encodings
<https://docs.python.org/3/library/codecs.html#standard-encodings>`_, then
yamllint will use that character encoding instead of trying to autodetect the
character encoding.

The ``YAMLLINT_FILE_ENCODING`` environment variable should only be used as a
stopgap solution. If you need to use ``YAMLLINT_FILE_ENCODING``, then you
should really update your YAML files so that their character encoding can
automatically be detected, or else you may run into compatibility problems.
Future versions of yamllint may remove support for the
``YAMLLINT_FILE_ENCODING`` environment variable, and other YAML processors may
misinterpret your YAML files.
4 changes: 4 additions & 0 deletions docs/configuration.rst
Original file line number Diff line number Diff line change
@@ -228,6 +228,10 @@ or:
.. note:: However, this is mutually exclusive with the ``ignore`` key.

.. note:: Files on the ``ignore-from-file`` list should use either UTF-8,
UTF-16 or UTF-32. See :doc:`Character Encoding <character_encoding>` for
details and workarounds.

If you need to know the exact list of files that yamllint would process,
without really linting them, you can use ``--list-files``:

1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
@@ -27,3 +27,4 @@ Table of contents
development
text_editors
integration
character_encoding
20 changes: 20 additions & 0 deletions tests/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
# Copyright (C) 2016 Adrien Vergé
# Copyright (C) 2025 Jason Yundt
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
@@ -14,5 +15,24 @@
# along with this program. If not, see <http://www.gnu.org/licenses/>.

import locale
import os

locale.setlocale(locale.LC_ALL, 'C')
env_vars_that_could_interfere_with_tests = (
'YAMLLINT_FILE_ENCODING',
# yamllint uses these environment variables to find a config file.
'YAMLLINT_CONFIG_FILE',
'XDG_CONFIG_HOME',
# These variables are used to determine where the user’s home
# directory is. See
# https://docs.python.org/3/library/os.path.html#os.path.expanduser
'HOME',
'USERPROFILE',
'HOMEPATH',
'HOMEDRIVE'
)
for name in env_vars_that_could_interfere_with_tests:
try:
del os.environ[name]
except KeyError:
pass
188 changes: 154 additions & 34 deletions tests/common.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
# Copyright (C) 2016 Adrien Vergé
# Copyright (C) 2023–2025 Jason Yundt
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
@@ -13,20 +14,173 @@
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.

import codecs
import contextlib
from io import StringIO
import os
import shutil
import sys
import tempfile
import unittest
import warnings
from codecs import CodecInfo

import yaml

from yamllint import linter
from yamllint.config import YamlLintConfig


# Encoding related stuff:
UTF_CODECS = (
'utf_32_be',
'utf_32_be_sig',
'utf_32_le',
'utf_32_le_sig',
'utf_16_be',
'utf_16_be_sig',
'utf_16_le',
'utf_16_le_sig',
'utf_8',
'utf_8_sig'
)


def encode_utf_32_be_sig(obj):
return (
codecs.BOM_UTF32_BE + codecs.encode(obj, 'utf_32_be', 'strict'),
len(obj)
)


def encode_utf_32_le_sig(obj):
return (
codecs.BOM_UTF32_LE + codecs.encode(obj, 'utf_32_le', 'strict'),
len(obj)
)


def encode_utf_16_be_sig(obj):
return (
codecs.BOM_UTF16_BE + codecs.encode(obj, 'utf_16_be', 'strict'),
len(obj)
)


def encode_utf_16_le_sig(obj):
return (
codecs.BOM_UTF16_LE + codecs.encode(obj, 'utf_16_le', 'strict'),
len(obj)
)


test_codec_infos = {
'utf_32_be_sig':
CodecInfo(encode_utf_32_be_sig, codecs.getdecoder('utf_32')),
'utf_32_le_sig':
CodecInfo(encode_utf_32_le_sig, codecs.getdecoder('utf_32')),
'utf_16_be_sig':
CodecInfo(encode_utf_16_be_sig, codecs.getdecoder('utf_16')),
'utf_16_le_sig':
CodecInfo(encode_utf_16_le_sig, codecs.getdecoder('utf_16')),
}


def register_test_codecs():
codecs.register(test_codec_infos.get)


def unregister_test_codecs():
if sys.version_info >= (3, 10, 0):
codecs.unregister(test_codec_infos.get)
else:
warnings.warn(
"This version of Python doesn’t allow us to unregister codecs.",
stacklevel=1
)


def is_test_codec(codec):
return codec in test_codec_infos.keys()


def test_codec_built_in_equivalent(test_codec):
return_value = test_codec
for suffix in ('_sig', '_be', '_le'):
return_value = return_value.replace(suffix, '')
return return_value


def uses_bom(codec):
for suffix in ('_32', '_16', '_sig'):
if codec.endswith(suffix):
return True
return False


def encoding_detectable(string, codec):
"""
Returns True if encoding can be detected after string is encoded
Encoding detection only works if you’re using a BOM or the first character
is ASCII. See yamllint.decoder.auto_decode()’s docstring.
"""
return uses_bom(codec) or (len(string) > 0 and string[0].isascii())


# Workspace related stuff:
class Blob:
def __init__(self, text, encoding):
self.text = text
self.encoding = encoding


def build_temp_workspace(files):
tempdir = tempfile.mkdtemp(prefix='yamllint-tests-')

for path, content in files.items():
path = os.fsencode(os.path.join(tempdir, path))
if not os.path.exists(os.path.dirname(path)):
os.makedirs(os.path.dirname(path))

if isinstance(content, list):
os.mkdir(path)
elif isinstance(content, str) and content.startswith('symlink://'):
os.symlink(content[10:], path)
else:
if isinstance(content, Blob):
content = content.text.encode(content.encoding)
elif isinstance(content, str):
content = content.encode('utf_8')
with open(path, 'wb') as f:
f.write(content)

return tempdir


@contextlib.contextmanager
def temp_workspace(files):
"""Provide a temporary workspace that is automatically cleaned up."""
backup_wd = os.getcwd()
wd = build_temp_workspace(files)

try:
os.chdir(wd)
yield
finally:
os.chdir(backup_wd)
shutil.rmtree(wd)


def temp_workspace_with_files_in_many_codecs(path_template, text):
workspace = {}
for codec in UTF_CODECS:
if encoding_detectable(text, codec):
workspace[path_template.format(codec)] = Blob(text, codec)
return workspace


# Miscellaneous stuff:
class RuleTestCase(unittest.TestCase):
def build_fake_config(self, conf):
if conf is None:
@@ -81,37 +235,3 @@ def __exit__(self, *exc_info):
@property
def returncode(self):
return self._raises_ctx.exception.code


def build_temp_workspace(files):
tempdir = tempfile.mkdtemp(prefix='yamllint-tests-')

for path, content in files.items():
path = os.path.join(tempdir, path).encode('utf-8')
if not os.path.exists(os.path.dirname(path)):
os.makedirs(os.path.dirname(path))

if isinstance(content, list):
os.mkdir(path)
elif isinstance(content, str) and content.startswith('symlink://'):
os.symlink(content[10:], path)
else:
mode = 'wb' if isinstance(content, bytes) else 'w'
with open(path, mode) as f:
f.write(content)

return tempdir


@contextlib.contextmanager
def temp_workspace(files):
"""Provide a temporary workspace that is automatically cleaned up."""
backup_wd = os.getcwd()
wd = build_temp_workspace(files)

try:
os.chdir(wd)
yield
finally:
os.chdir(backup_wd)
shutil.rmtree(wd)
Loading