You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For reference, here is our word splitting algorithm which I believe I forked out of another crate (this seems to be a fairly common problem). The key part is the classify function . This has a wider problem if only considering ASCII lower case and upper case as characters. I'm assuming we'll need to add a "continuation" mode as these are neither lower case nor upper case. What all characters should be a part of this continuation class, I'm unsure. We'd likely want to have anything in XID be considered a character. As for things like accents, that I'm still not sure of.
Overlooked something but it turned out to not be a problem. We correctly identify that noël is one identifier, so this is only in our word splitting.
That does offer a short-term workaround: we could just say any identifier with non-ascii characters doesn't get split but instead always gets accepted.
I got this error:
Which I assume comes from miss-splitting
noël
intonoe
and something else.The text was updated successfully, but these errors were encountered: