You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
my_byte_str='Bсеки човек има право на образование.'.encode('cp1251')
# Assign return value so we can fully exploit resultresult=from_bytes(
my_byte_str
).best()
print(result.encoding) # cp1251
In 3.3.0 this would print cp1251 but in 3.3.1 this prints cp1257 (str(result) returns 'Bńåźč ÷īāåź čģą ļšąāī ķą īįšąēīāąķčå.').
)
and added noise (md) probe that identify malformed arabic representation due to the presence of letters in isolated form (credit to my wife, thanks!)
) (#378)
and added noise (md) probe that identify malformed arabic representation due to the presence of letters in isolated form (credit to my wife, thanks!)
I'm updating the charset-normalizer package in OpenWrt (with Python 3.11.6) and tried the example in https://charset-normalizer.readthedocs.io/en/latest/user/handling_result.html#handling-result:
In 3.3.0 this would print
cp1251
but in 3.3.1 this printscp1257
(str(result)
returns'Bńåźč ÷īāåź čģą ļšąāī ķą īįšąēīāąķčå.'
).I also tried the French phrase from https://charset-normalizer.readthedocs.io/en/latest/index.html#introduction:
and
from_bytes(my_byte_str).best()
also has the encodingcp1257
.I have compiled the package for arm, aarch64 and x86_64 and I get the same results.
The text was updated successfully, but these errors were encountered: