Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to write custom Wordpiece class? #1525

Open
xinyinan9527 opened this issue May 9, 2024 · 2 comments
Open

How to write custom Wordpiece class? #1525

xinyinan9527 opened this issue May 9, 2024 · 2 comments

Comments

@xinyinan9527
Copy link

xinyinan9527 commented May 9, 2024

My aim is get the rwkv5 model‘s "tokenizer.json",but it implemented through slow tokenizer(class Pretrainedtokenizer).
I want to convert "slow tokenizer" to "fast tokenizer",it needs to use "tokenizer = Tokenizer(Wordpiece())",but rwkv5 has it‘s own Wordpiece file.
So I want to create a custom Wordpiece

the code is here

from tokenizers.models import Model
class MyWordpiece(Model):
    def __init__(self,vocab,unk_token):
        self.vocab = vocab
        self.unk_token = unk_token



test = MyWordpiece('./vocab.txt',"<s>")
Traceback (most recent call last):
  File "test.py", line 78, in <module>
    test = MyWordpiece('./vocab.txt',"<s>")
TypeError: Model.__new__() takes 0 positional arguments but 2 were given
Copy link

github-actions bot commented Jun 9, 2024

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the Stale label Jun 9, 2024
@ArthurZucker
Copy link
Collaborator

Hey! That is not really the way to make it ! Are you still interested in having the fast version?

@github-actions github-actions bot removed the Stale label Jun 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants