Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

comparison to other tokenizers #11

Closed
transitive-bullshit opened this issue May 27, 2023 · 2 comments
Closed

comparison to other tokenizers #11

transitive-bullshit opened this issue May 27, 2023 · 2 comments
Labels

Comments

@transitive-bullshit
Copy link

This library looks great.

I tried to add it to https://github.com/transitive-bullshit/compare-tokenizers, but kept running into various ESM import issues.

I'd love to compare it to the other node.js tokenizers on a consistent test set for both accuracy and speed.

Also, the one thing this library is missing currently (from what I could tell; I wasn't able to get it working in my test bed) is a dynamic function to return the tokenizer given a model name. I know the examples show you can do this statically using imports, but for a lot of libraries, the model needs to be customizable at runtime.

Thanks!

@niieani niieani closed this as completed in 2a55474 Jun 1, 2023
@niieani
Copy link
Owner

niieani commented Jun 1, 2023

Thanks @transitive-bullshit!
I saw the issue with default imports and fixed it. Latest version should have it fixed.

Submitted a PR to your comparison repo: transitive-bullshit/compare-tokenizers#3.
I see there's some room for improvement in my package regarding performance.
I believe the extra safety features of gpt-tokenizer is what's slowing it down currently.
I'll try to get it down by making the safety (allowedSpecialTokens) optional.

@github-actions
Copy link

github-actions bot commented Jun 1, 2023

🎉 This issue has been resolved in version 2.1.1 🎉

The release is available on:

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants