comparison to other tokenizers #11

transitive-bullshit · 2023-05-27T04:22:12Z

This library looks great.

I tried to add it to https://github.com/transitive-bullshit/compare-tokenizers, but kept running into various ESM import issues.

I'd love to compare it to the other node.js tokenizers on a consistent test set for both accuracy and speed.

Also, the one thing this library is missing currently (from what I could tell; I wasn't able to get it working in my test bed) is a dynamic function to return the tokenizer given a model name. I know the examples show you can do this statically using imports, but for a lot of libraries, the model needs to be customizable at runtime.

Thanks!

niieani · 2023-06-01T08:03:55Z

Thanks @transitive-bullshit!
I saw the issue with default imports and fixed it. Latest version should have it fixed.

Submitted a PR to your comparison repo: transitive-bullshit/compare-tokenizers#3.
I see there's some room for improvement in my package regarding performance.
I believe the extra safety features of gpt-tokenizer is what's slowing it down currently.
I'll try to get it down by making the safety (allowedSpecialTokens) optional.

github-actions · 2023-06-01T08:05:48Z

🎉 This issue has been resolved in version 2.1.1 🎉

The release is available on:

Your semantic-release bot 📦🚀

niieani closed this as completed in 2a55474 Jun 1, 2023

github-actions bot added the released label Jun 1, 2023

niieani mentioned this issue Jun 1, 2023

add gpt-tokenizer to comparison transitive-bullshit/compare-tokenizers#3

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

comparison to other tokenizers #11

comparison to other tokenizers #11

transitive-bullshit commented May 27, 2023

niieani commented Jun 1, 2023 •

edited

github-actions bot commented Jun 1, 2023

comparison to other tokenizers #11

comparison to other tokenizers #11

Comments

transitive-bullshit commented May 27, 2023

niieani commented Jun 1, 2023 • edited

github-actions bot commented Jun 1, 2023

niieani commented Jun 1, 2023 •

edited