This module is not ready for CJK characters #16

mashihua · 2023-06-08T07:15:04Z

We found that this module is not ready for CJK characters， when type ここに内容を入力すると、消費されるメダルの数が計算されます。

OpenAI show:

This module show

The token is different to OpenAI.

The text was updated successfully, but these errors were encountered:

xnohat · 2023-06-08T20:46:32Z

Above you use GPT-3 Encoder and below you use cl100k_base Encoder for GPT3.5 and GPT4
They are 2 difference token encoder , out 2 difference tokens set output

foloinfo · 2023-07-17T23:43:46Z

I checked the output with the same string with p50k_base and it seems to give the same result to OpenAI Tokenizer.
I also tested with a longer string (800 characters) and the number of tokens was the same.
I think it's working fine in CJK.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This module is not ready for CJK characters #16

This module is not ready for CJK characters #16

mashihua commented Jun 8, 2023

xnohat commented Jun 8, 2023

foloinfo commented Jul 17, 2023

This module is not ready for CJK characters #16

This module is not ready for CJK characters #16

Comments

mashihua commented Jun 8, 2023

xnohat commented Jun 8, 2023

foloinfo commented Jul 17, 2023