You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'd love to compare it to the other node.js tokenizers on a consistent test set for both accuracy and speed.
Also, the one thing this library is missing currently (from what I could tell; I wasn't able to get it working in my test bed) is a dynamic function to return the tokenizer given a model name. I know the examples show you can do this statically using imports, but for a lot of libraries, the model needs to be customizable at runtime.
Thanks!
The text was updated successfully, but these errors were encountered:
Thanks @transitive-bullshit!
I saw the issue with default imports and fixed it. Latest version should have it fixed.
Submitted a PR to your comparison repo: transitive-bullshit/compare-tokenizers#3.
I see there's some room for improvement in my package regarding performance.
I believe the extra safety features of gpt-tokenizer is what's slowing it down currently.
I'll try to get it down by making the safety (allowedSpecialTokens) optional.
This library looks great.
I tried to add it to https://github.com/transitive-bullshit/compare-tokenizers, but kept running into various ESM import issues.
I'd love to compare it to the other node.js tokenizers on a consistent test set for both accuracy and speed.
Also, the one thing this library is missing currently (from what I could tell; I wasn't able to get it working in my test bed) is a dynamic function to return the tokenizer given a model name. I know the examples show you can do this statically using imports, but for a lot of libraries, the model needs to be customizable at runtime.
Thanks!
The text was updated successfully, but these errors were encountered: