kv_override issue with string values #1487

Erhan1706 · 2024-05-26T14:23:03Z

Prerequisites

Please answer the following questions for yourself before submitting an issue.

I am running the latest code. Development is very rapid so there are no tagged versions as of now.
I carefully followed the README.md.
I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
I reviewed the Discussions, and have a new bug or useful enhancement to share.

Expected Behavior

Expected llama-cpp-python to correctly override the model parameters when passing {"tokenizer.ggml.pre": "llama3"} to kv_override.

Current Behavior

The string value to override always appears to be empty upon running the model as validate_override: Using metadata override ( str) 'tokenizer.ggml.pre' = indicates, and thus the model ends up using the default pre-tokenizer instead of the llama3 one.
Example output:

llama_model_loader: loaded meta data with 22 key-value pairs and 291 tensors from ./Meta-Llama-3-8B-Instruct.Q4_K_M.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = .
llama_model_loader: - kv   2:                           llama.vocab_size u32              = 128256
llama_model_loader: - kv   3:                       llama.context_length u32              = 8192
llama_model_loader: - kv   4:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   5:                          llama.block_count u32              = 32
llama_model_loader: - kv   6:                  llama.feed_forward_length u32              = 14336
llama_model_loader: - kv   7:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv   8:                 llama.attention.head_count u32              = 32
llama_model_loader: - kv   9:              llama.attention.head_count_kv u32              = 8
llama_model_loader: - kv  10:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  11:                       llama.rope.freq_base f32              = 500000.000000
llama_model_loader: - kv  12:                          general.file_type u32              = 15
llama_model_loader: - kv  13:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  14:                      tokenizer.ggml.tokens arr[str,128256]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  15:                      tokenizer.ggml.scores arr[f32,128256]  = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv  16:                  tokenizer.ggml.token_type arr[i32,128256]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  17:                      tokenizer.ggml.merges arr[str,280147]  = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv  18:                tokenizer.ggml.bos_token_id u32              = 128000
llama_model_loader: - kv  19:                tokenizer.ggml.eos_token_id u32              = 128009
llama_model_loader: - kv  20:                    tokenizer.chat_template str              = {% set loop_messages = messages %}{% ...
llama_model_loader: - kv  21:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:   65 tensors
llama_model_loader: - type q4_K:  193 tensors
llama_model_loader: - type q6_K:   33 tensors
validate_override: Using metadata override (  str) 'tokenizer.ggml.pre' = 
llm_load_vocab: missing pre-tokenizer type, using: 'default'
llm_load_vocab:                                             
llm_load_vocab: ************************************        
llm_load_vocab: GENERATION QUALITY WILL BE DEGRADED!        
llm_load_vocab: CONSIDER REGENERATING THE MODEL             
llm_load_vocab: ************************************   
...

Environment and Context

WSL with Ubuntu 20.04

$ lscpu

Architecture:                       x86_64
CPU op-mode(s):                     32-bit, 64-bit
Byte Order:                         Little Endian
Address sizes:                      48 bits physical, 48 bits virtual
CPU(s):                             16
On-line CPU(s) list:                0-15
Thread(s) per core:                 2
Core(s) per socket:                 8
Socket(s):                          1
Vendor ID:                          AuthenticAMD
CPU family:                         25
Model:                              80
Model name:                         AMD Ryzen 9 5900HX with Radeon Graphics
Stepping:                           0
CPU MHz:                            3293.809
BogoMIPS:                           6587.61
Hypervisor vendor:                  Microsoft
Virtualization type:                full
L1d cache:                          256 KiB
L1i cache:                          256 KiB
L2 cache:                           4 MiB
L3 cache:                           16 MiB

$ uname -a

Linux LAPTOP 5.15.146.1-microsoft-standard-WSL2 #1 SMP Thu Jan 11 04:09:03 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

SDK version, e.g. for Linux:

$ python3 --version
Python 3.8.10

$ make --version
GNU Make 4.2.1
Built for x86_64-pc-linux-gnu

$ g++ --version
g++ (Ubuntu 13.1.0-8ubuntu1~20.04.2) 13.1.0

Failure Information (for bugs)

Steps to Reproduce

I'm running the following code:

my_model_path = "./Meta-Llama-3-8B-Instruct.Q4_K_M.gguf"
CONTEXT_SIZE = 8000
model = Llama(model_path=my_model_path, kv_overrides={"tokenizer.ggml.pre":"llama3"}, n_ctx=CONTEXT_SIZE)

Findings

I checked out the code using a debugger and the problem seems to be on the following line:

ctypes.memmove(
  self._kv_overrides_array[i].value.str_value,
  v_bytes,
  min(len(v_bytes), 128),
)

For some reason memmove is not properly copying the string.

The text was updated successfully, but these errors were encountered:

abetlen · 2024-05-29T06:03:42Z

@Erhan1706 thanks for reporting, pushed a fix and should be in the next release.

Spider-netizen · 2024-05-29T16:46:37Z

Hey @abetlen,

llm_load_vocab: missing pre-tokenizer type, using: 'default'
llm_load_vocab:
llm_load_vocab: ************************************
llm_load_vocab: GENERATION QUALITY WILL BE DEGRADED!
llm_load_vocab: CONSIDER REGENERATING THE MODEL
llm_load_vocab: ************************************

I'm also getting this warning when running llama 3-based models. I don't pass any kv_overrides. Is this issue related to the library, or am I missing something?

Thanks for the great work.

dgengler6 · 2024-05-29T20:23:54Z

I had the same error when running llama cpp python with the PrunaAI/Meta-Llama-Guard-2-8B-GGUF-smashed quantized model.

From what I understood the issue is caused by a bug in the actual llama cpp library that caused issue when generating the GGUF files. Thus the metadata for tokenizer.ggml.pre was wrongly formatted. (source)

@Spider-netizen I think that passing a kv_overrides will actually solve your issue (once the fix is released, or if you change the sourcecode yourself).

Spider-netizen · 2024-05-29T21:31:57Z

Thanks @dgengler6. I'll give it a try. Appreciate it.

abetlen added the bug label May 26, 2024

abetlen closed this as completed in df45a4b May 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kv_override issue with string values #1487

kv_override issue with string values #1487

Erhan1706 commented May 26, 2024

abetlen commented May 29, 2024

Spider-netizen commented May 29, 2024

dgengler6 commented May 29, 2024 •

edited

Loading

Spider-netizen commented May 29, 2024

kv_override issue with string values #1487

kv_override issue with string values #1487

Comments

Erhan1706 commented May 26, 2024

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

Failure Information (for bugs)

Steps to Reproduce

Findings

abetlen commented May 29, 2024

Spider-netizen commented May 29, 2024

dgengler6 commented May 29, 2024 • edited Loading

Spider-netizen commented May 29, 2024

dgengler6 commented May 29, 2024 •

edited

Loading