Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Wrong way to get list of supported encodings #323

Closed
nickspring opened this issue Aug 20, 2023 · 7 comments · Fixed by #328
Closed

[BUG] Wrong way to get list of supported encodings #323

nickspring opened this issue Aug 20, 2023 · 7 comments · Fixed by #328

Comments

@nickspring
Copy link

Describe the bug
It looks like you get list of encodings from encoding.aliases module.
aliases(as one would/should expect) contains several cases where different keys are mapped to the same value e.g. 1252 and windows_1252 are both mapped to cp1252. You could save time if instead of aliases.keys() you use set(aliases.values()).

BUT THERE'S A WORSE PROBLEM: aliases don't contain codecs that don't have aliases (like cp856, cp874, cp875, cp737, and koi8_u).

To Reproduce
List of encodings, supported by this library:
https://charset-normalizer.readthedocs.io/en/latest/user/support.html#supported-encodings
It's declared that the library supports all encodings which are supported by Python.
But there is no, for example, KOI8-U.

Expected behavior
But KOI8-U is supported (but just it doesn't have an alias):
https://docs.python.org/3.11/library/codecs.html#standard-encodings

@nickspring nickspring added bug Something isn't working help wanted Extra attention is needed labels Aug 20, 2023
@Ousret
Copy link
Owner

Ousret commented Aug 27, 2023

Hello,

It looks like you get list of encodings from encoding.aliases module.

Yes.

You could save time if instead of aliases.keys() you use set(aliases.values()).

Already is the case.

BUT THERE'S A WORSE PROBLEM: aliases don't contain codecs that don't have aliases (like cp856, cp874, cp875, cp737, and koi8_u).

IT IS TRUE WE MISSED 9 ENCODINGS! 💯 (Please avoid caps lock in the future)

Otherwise, it is a good catch. They are still very kind rare encoding (depending on various factors), but still, valuable. Will add a patch to include them.

Regards,

@nickspring
Copy link
Author

nickspring commented Aug 27, 2023

Sorry, it was copied from Stackoverflow :) I didn't realize that there is an information about set(aliases.values()) and Caps lock is from the original post. Actually I'm working on porting this library to other language and go through all sources.

@Ousret
Copy link
Owner

Ousret commented Aug 27, 2023

Actually I'm working on porting this library to other language and go through all sources.

Good! I am interested in any progress or proof of concept. Is it rust? golang? c?

@nickspring
Copy link
Author

It's Rust. At the time I have ported cd/md/utils/models. So api and cli are waiting for implementation (ETA 1-3 weeks).

@nickspring
Copy link
Author

It's Rust. At the time I have ported cd/md/utils/models. So api and cli are waiting for implementation (ETA 1-3 weeks).

Sorry for offtopic, I can't find any other possibility to contact you. https://github.com/nickspring/charset-normalizer-rs there is a Rust version of library. I tried to port with maximum compatibility. I believe I did everything correctly with copyrights and licences :) I would be grateful if you add link to th Rust version.

@Ousret Ousret removed bug Something isn't working help wanted Extra attention is needed labels Sep 30, 2023
@Ousret
Copy link
Owner

Ousret commented Sep 30, 2023

Good work. I briefly tested it and it seems fine.
The immediate remark is that it did not build on alpine due to some incompatible dependencies.

I can mention it, of course.

@nickspring
Copy link
Author

Good work. I briefly tested it and it seems fine. The immediate remark is that it did not build on alpine due to some incompatible dependencies.

I can mention it, of course.

apk add musl-dev should help.

Whole command:

docker run --rm -it -v .:/app -w /app --entrypoint sh rust:1.71.0-alpine -c 'apk add musl-dev && cargo run --bin performance --features performance --release'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants