- Sponsor
-
Notifications
You must be signed in to change notification settings - Fork 395
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python 3 port uses Unicode to represent byte strings #537
Comments
The most reasonable solution I can think of that retains backward compatibility is to add encoding and errors properties to Repository, defaulting to the filesystem encoding and "strict" but overridable (even to None). |
Hi, I can confirm this issue. This also happens for mis-encoded metadata in commits, like commit dcb71129841e5821c0cbbdd4017a6f202f180108 in the Linux kernel (look at the author name): Reconstruction:
Raises:
` |
My local workaround is to use the raw members of the classes, and then let them through:
|
At least for paths, to convert C string into Python str object, it can be used Also, it is natulal if conversion from path represented in Python bytes object to Python str object, e.g. in |
# 1.15.0 (2024-05-18) - Many deprecated features have been removed, see below - Upgrade to libgit2 v1.8.1 - New `push_options` optional argument in `Repository.push(...)` [#1282](libgit2/pygit2#1282) - New support comparison of `Oid` with text string - Fix `CheckoutNotify.IGNORED` [#1288](libgit2/pygit2#1288) - Use default error handler when decoding/encoding paths [#537](libgit2/pygit2#537) - Remove setuptools runtime dependency [#1281](libgit2/pygit2#1281) - Coding style with ruff [#1280](libgit2/pygit2#1280) - Add wheels for ppc64le [#1279](libgit2/pygit2#1279) - Fix tests on EPEL8 builds for s390x [#1283](libgit2/pygit2#1283) Deprecations: - Deprecate `IndexEntry.hex`, use `str(IndexEntry.id)` Breaking changes: - Remove deprecated `oid.hex`, use `str(oid)` - Remove deprecated `object.hex`, use `str(object.id)` - Remove deprecated `object.oid`, use `object.id` - Remove deprecated `Repository.add_submodule(...)`, use `Repository.submodules.add(...)` - Remove deprecated `Repository.lookup_submodule(...)`, use `Repository.submodules[...]` - Remove deprecated `Repository.init_submodules(...)`, use `Repository.submodules.init(...)` - Remove deprecated `Repository.update_submodule(...)`, use `Repository.submodules.update(...)` - Remove deprecated constants `GIT_OBJ_XXX`, use `ObjectType` - Remove deprecated constants `GIT_REVPARSE_XXX`, use `RevSpecFlag` - Remove deprecated constants `GIT_REF_XXX`, use `ReferenceType` - Remove deprecated `ReferenceType.OID`, use instead `ReferenceType.DIRECT` - Remove deprecated `ReferenceType.LISTALL`, use instead `ReferenceType.ALL` - Remove deprecated support for passing dicts to repository\'s `merge(...)`, `merge_commits(...)` and `merge_trees(...)`. Instead pass `MergeFlag` for `flags`, and `MergeFileFlag` for `file_flags`. - Remove deprecated support for passing a string for the favor argument to repository\'s `merge(...)`, `merge_commits(...)` and `merge_trees(...)`. Instead pass `MergeFavor`.
In the latest release we're using |
pygit2, when built for Python 3, treats paths as Unicode and will fail if a path isn't decodable as the filesystem encoding. But Git paths are byte strings, not Unicode strings. This includes refs, so repos with branch names containing non-UTF-8 sequences are completely unusable on most systems:
pygit2's behaviour under Python 2 is correct; listall_references and other APIs returns byte strings as they are in the Git model. I don't see why the behaviour should differ by Python version, as both types exist in both languages. It seems to me that the default low-level API should return byte strings to match the underlying model and handle all cases, and convenience wrappers which return Unicode strings could be added if people actually want them. As it stands, some perfectly valid Git repos are unusable except on Python 2.
The text was updated successfully, but these errors were encountered: