Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Basic Samplers? #61

Closed
ArEnSc opened this issue May 17, 2023 · 4 comments
Closed

Basic Samplers? #61

ArEnSc opened this issue May 17, 2023 · 4 comments

Comments

@ArEnSc
Copy link

ArEnSc commented May 17, 2023

Hey I have noticed this doesn't seem to contain samplers in c I was wondering would it be difficult to implement? why not just copy the llama samplers? stupid question likely! I am not a CPP or ggml pro sorry

@saharNooby
Copy link
Collaborator

The issue with having complete inference on C side is implementing tokenizer. I tried to implement it, but got stuck on Unicode NFC normalization and proper regexes with Unicode support -- C/C++ ecosystem is not pretty for such tasks, if the goal is to have minimized, single-file code.

Some implementation of tokenizer in C can be approximated: we can ignore non-latin characters and normalization, for example. But I would rather have no tokenizer at all than to have half-working tokenizer.

Without properly working tokenizer, I see no value of having sampling code in C -- what's the point of sampling tokens, if you still need to go to Python for decoding? Better then to do saplming in Python too...

BTW, if someone wants to take a shot in implenting proper BPE tokenizer, here is a pure Python impl that can be ported: gist

@LoganDark
Copy link
Contributor

LoganDark commented May 22, 2023

BTW, if someone wants to take a shot in implenting proper BPE tokenizer, here is a pure Python impl that can be ported: gist

thank you so much for this <3 the test cases are awesome!!

image

Recording.2023-05-22.020516.mp4

@LoganDark
Copy link
Contributor

LoganDark commented Jun 7, 2023

The issue with having complete inference on C side is implementing tokenizer. I tried to implement it, but got stuck on Unicode NFC normalization and proper regexes with Unicode support -- C/C++ ecosystem is not pretty for such tasks, if the goal is to have minimized, single-file code.

Some implementation of tokenizer in C can be approximated: we can ignore non-latin characters and normalization, for example. But I would rather have no tokenizer at all than to have half-working tokenizer.

Without properly working tokenizer, I see no value of having sampling code in C -- what's the point of sampling tokens, if you still need to go to Python for decoding? Better then to do saplming in Python too...

BTW, if someone wants to take a shot in implenting proper BPE tokenizer, here is a pure Python impl that can be ported: gist

now that the world tokenizer exists, it is feasible to have a tokenizer implementation in rwkv.cpp (it is only around 100 lines). I just finished a proof of concept here but I swear you do not want to look at those header files that are being included. it is one stage of madness away from deterministic finite automata (which would take linear time but probably use gigabytes of memory).

I'm still in love with the end result that requires no runtime parsing and is probably an order of magnitude faster than the python trie implementation, but maybe I sacrificed a lot to get there

both top-p and top-k token sampling would be dead simple from there, and would make it possible to cut out python entirely, given already-preprocessed model files.

@saharNooby
Copy link
Collaborator

Duplicated by Can we have an example of pure C ++? #112 , pure C++ inference implies tokenizing & basic sampling.

@saharNooby saharNooby closed this as not planned Won't fix, can't repro, duplicate, stale Sep 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants