Basic Samplers? #61

ArEnSc · 2023-05-17T14:56:32Z

Hey I have noticed this doesn't seem to contain samplers in c I was wondering would it be difficult to implement? why not just copy the llama samplers? stupid question likely! I am not a CPP or ggml pro sorry

saharNooby · 2023-05-19T09:26:07Z

The issue with having complete inference on C side is implementing tokenizer. I tried to implement it, but got stuck on Unicode NFC normalization and proper regexes with Unicode support -- C/C++ ecosystem is not pretty for such tasks, if the goal is to have minimized, single-file code.

Some implementation of tokenizer in C can be approximated: we can ignore non-latin characters and normalization, for example. But I would rather have no tokenizer at all than to have half-working tokenizer.

Without properly working tokenizer, I see no value of having sampling code in C -- what's the point of sampling tokens, if you still need to go to Python for decoding? Better then to do saplming in Python too...

BTW, if someone wants to take a shot in implenting proper BPE tokenizer, here is a pure Python impl that can be ported: gist

LoganDark · 2023-05-22T09:00:26Z

BTW, if someone wants to take a shot in implenting proper BPE tokenizer, here is a pure Python impl that can be ported: gist

thank you so much for this <3 the test cases are awesome!!

Recording.2023-05-22.020516.mp4

LoganDark · 2023-06-07T03:23:51Z

The issue with having complete inference on C side is implementing tokenizer. I tried to implement it, but got stuck on Unicode NFC normalization and proper regexes with Unicode support -- C/C++ ecosystem is not pretty for such tasks, if the goal is to have minimized, single-file code.

Some implementation of tokenizer in C can be approximated: we can ignore non-latin characters and normalization, for example. But I would rather have no tokenizer at all than to have half-working tokenizer.

Without properly working tokenizer, I see no value of having sampling code in C -- what's the point of sampling tokens, if you still need to go to Python for decoding? Better then to do saplming in Python too...

BTW, if someone wants to take a shot in implenting proper BPE tokenizer, here is a pure Python impl that can be ported: gist

now that the world tokenizer exists, it is feasible to have a tokenizer implementation in rwkv.cpp (it is only around 100 lines). I just finished a proof of concept here but I swear you do not want to look at those header files that are being included. it is one stage of madness away from deterministic finite automata (which would take linear time but probably use gigabytes of memory).

I'm still in love with the end result that requires no runtime parsing and is probably an order of magnitude faster than the python trie implementation, but maybe I sacrificed a lot to get there

both top-p and top-k token sampling would be dead simple from there, and would make it possible to cut out python entirely, given already-preprocessed model files.

saharNooby · 2023-09-23T13:37:39Z

Duplicated by Can we have an example of pure C ++? #112 , pure C++ inference implies tokenizing & basic sampling.

saharNooby closed this as not planned Won't fix, can't repro, duplicate, stale Sep 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Basic Samplers? #61

Basic Samplers? #61

ArEnSc commented May 17, 2023

saharNooby commented May 19, 2023

LoganDark commented May 22, 2023 •

edited

LoganDark commented Jun 7, 2023 •

edited

saharNooby commented Sep 23, 2023

Basic Samplers? #61

Basic Samplers? #61

Comments

ArEnSc commented May 17, 2023

saharNooby commented May 19, 2023

LoganDark commented May 22, 2023 • edited

LoganDark commented Jun 7, 2023 • edited

saharNooby commented Sep 23, 2023

LoganDark commented May 22, 2023 •

edited

LoganDark commented Jun 7, 2023 •

edited