-
Notifications
You must be signed in to change notification settings - Fork 326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KeyError in BetterFSM::FSMInfo
when input FSM alphabet
contains UTF-8 characters that ends with \xb8\x80
#833
Comments
BetterFSM::FSMInfo
when input FSM alphabet
contains a specific Chinese characterBetterFSM::FSMInfo
when input FSM alphabet
contains UTF-8 characters that starts with \xb8\x80
BetterFSM::FSMInfo
when input FSM alphabet
contains UTF-8 characters that starts with \xb8\x80
BetterFSM::FSMInfo
when input FSM alphabet
contains UTF-8 characters that ends with \xb8\x80
@m0g1cian opened an upstream issue: numba/numba#9542 Per the thread, it appears to be an upstream bug on the numba side due to There are a few options here:
Output: |
I made a local patch to fix this issue in outlines. It basically makes numba typed Dict or List always use I'll make a PR soon. |
Describe the issue as clearly as possible:
Update 2
Can confirm there's something wrong with Numba's Typed Dict implementation. Check issue here
Update
When
outlines
buildsBetterFSM
from a reference FSM (e.g. frominteregular
), if the reference FSM contains Chinese character "一", the correspondingnumba.typed.Dict
used byBetterFSM::alphabet_symbol_map
somehow converts this character into an empty string, causing a KeyError whenever__getitem__
is triggered .Steps/code to reproduce the bug:
debug_keyerror.py
Some insight:
print (k, v) in
alphabet_symbol_mapping_items
beforecreate_fsm_info()
(right afteroutlines.fsm.regex.py::96
)print (k, v) in
alphabet_symbol_mapping_items
increate_fsm_info()
when buildingalphabet_symbol_map
(right afteroutlines.fsm.regex.py::139
)Expected result:
I was able to get the expected result after tweaking two places:
outlines.fsm.regex.py::112
: changenb_unichar_2_type = numba.types.UnicodeCharSeq(2)
tonb_unichar_2_type = numba.types.unicode_type
outlines.fsm.regex.py::89
: changealphabet_symbol_mapping_items
to a simple python listalphabet_symbol_mapping_items = list((k,v) for k, v in self.alphabet._symbol_mapping.items() if k != anything_else)
Error message:
Outlines/Python version information:
Version information
The text was updated successfully, but these errors were encountered: