Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error loading model: error loading model vocabulary: unknown pre-tokenizer type: 'qwen2' #4529

Open
Anorid opened this issue May 20, 2024 · 16 comments
Labels
bug Something isn't working

Comments

@Anorid
Copy link

Anorid commented May 20, 2024

What is the issue?

I carefully read the contents of the readme's documentation to try and found that something went wrong

time=2024-05-20T10:06:02.688+08:00 level=INFO source=server.go:320 msg="starting llama server" cmd="/tmp/ollama2132883000/runners/cuda_v11/ollama_llama_server --model /root/autodl-tmp/models/blobs/sha256-1c751709783923dab2b876d5c5c2ca36d4e205cfef7d88988df45752cb91f245 --ctx-size 2048 --batch-size 512 --embedding --log-disable --n-gpu-layers 41 --parallel 1 --port 33525"
time=2024-05-20T10:06:02.690+08:00 level=INFO source=sched.go:338 msg="loaded runners" count=1
time=2024-05-20T10:06:02.690+08:00 level=INFO source=server.go:504 msg="waiting for llama runner to start responding"
time=2024-05-20T10:06:02.691+08:00 level=INFO source=server.go:540 msg="waiting for server to become available" status="llm server error"
INFO [main] build info | build=1 commit="952d03d" tid="140401842012160" timestamp=1716170762
INFO [main] system info | n_threads=64 n_threads_batch=-1 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | " tid="140401842012160" timestamp=1716170762 total_threads=128
INFO [main] HTTP server listening | hostname="127.0.0.1" n_threads_http="127" port="33525" tid="140401842012160" timestamp=1716170762
llama_model_loader: loaded meta data with 21 key-value pairs and 483 tensors from /root/autodl-tmp/models/blobs/sha256-1c751709783923dab2b876d5c5c2ca36d4e205cfef7d88988df45752cb91f245 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = qwen2
llama_model_loader: - kv 1: general.name str = merge5-1
llama_model_loader: - kv 2: qwen2.block_count u32 = 40
llama_model_loader: - kv 3: qwen2.context_length u32 = 32768
llama_model_loader: - kv 4: qwen2.embedding_length u32 = 5120
llama_model_loader: - kv 5: qwen2.feed_forward_length u32 = 13696
llama_model_loader: - kv 6: qwen2.attention.head_count u32 = 40
llama_model_loader: - kv 7: qwen2.attention.head_count_kv u32 = 40
llama_model_loader: - kv 8: qwen2.rope.freq_base f32 = 1000000.000000
llama_model_loader: - kv 9: qwen2.attention.layer_norm_rms_epsilon f32 = 0.000001
llama_model_loader: - kv 10: general.file_type u32 = 2
llama_model_loader: - kv 11: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 12: tokenizer.ggml.pre str = qwen2
llama_model_loader: - kv 13: tokenizer.ggml.tokens arr[str,152064] = ["!", """, "#", "$", "%", "&", "'", ...
llama_model_loader: - kv 14: tokenizer.ggml.token_type arr[i32,152064] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 15: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
llama_model_loader: - kv 16: tokenizer.ggml.eos_token_id u32 = 151643
llama_model_loader: - kv 17: tokenizer.ggml.padding_token_id u32 = 151643
llama_model_loader: - kv 18: tokenizer.ggml.bos_token_id u32 = 151643
llama_model_loader: - kv 19: tokenizer.chat_template str = {% for message in messages %}{% if lo...
llama_model_loader: - kv 20: general.quantization_version u32 = 2
llama_model_loader: - type f32: 201 tensors
llama_model_loader: - type q4_0: 281 tensors
llama_model_loader: - type q6_K: 1 tensors
time=2024-05-20T10:06:02.944+08:00 level=INFO source=server.go:540 msg="waiting for server to become available" status="llm server loading model"
llama_model_load: error loading model: error loading model vocabulary: unknown pre-tokenizer type: 'qwen2'
llama_load_model_from_file: exception loading model
terminate called after throwing an instance of 'std::runtime_error'
what(): error loading model vocabulary: unknown pre-tokenizer type: 'qwen2'
time=2024-05-20T10:06:03.285+08:00 level=INFO source=server.go:540 msg="waiting for server to become available" status="llm server error"
time=2024-05-20T10:06:03.535+08:00 level=ERROR source=sched.go:344 msg="error loading llama server" error="llama runner process has terminated: signal: aborted (core dumped) "
[GIN] 2024/05/20 - 10:06:03 | 500 | 2.178464527s | 127.0.0.1 | POST "/api/chat"
time=2024-05-20T10:06:07.831+08:00 level=INFO source=memory.go:133 msg="offload to gpu" layers.requested=-1 layers.real=41 memory.available="47.3 GiB" memory.required.full="9.7 GiB" memory.required.partial="9.7 GiB" memory.required.kv="1.6 GiB" memory.weights.total="7.2 GiB" memory.weights.repeating="6.6 GiB" memory.weights.nonrepeating="609.1 MiB" memory.graph.full="307.0 MiB" memory.graph.partial="916.1 MiB"
time=2024-05-20T10:06:07.832+08:00 level=INFO source=memory.go:133 msg="offload to gpu" layers.requested=-1 layers.real=41 memory.available="47.3 GiB" memory.required.full="9.7 GiB" memory.required.partial="9.7 GiB" memory.required.kv="1.6 GiB" memory.weights.total="7.2 GiB" memory.weights.repeating="6.6 GiB" memory.weights.nonrepeating="609.1 MiB" memory.graph.full="307.0 MiB" memory.graph.partial="916.1 MiB"
time=2024-05-20T10:06:07.832+08:00 level=INFO source=server.go:320 msg="starting llama server" cmd="/tmp/ollama2132883000/runners/cuda_v11/ollama_llama_server --model /root/autodl-tmp/models/blobs/sha256-1c751709783923dab2b876d5c5c2ca36d4e205cfef7d88988df45752cb91f245 --ctx-size 2048 --batch-size 512 --embedding --log-disable --n-gpu-layers 41 --parallel 1 --port 43339"
time=2024-05-20T10:06:07.833+08:00 level=INFO source=sched.go:338 msg="loaded runners" count=1
time=2024-05-20T10:06:07.833+08:00 level=INFO source=server.go:504 msg="waiting for llama runner to start responding"
time=2024-05-20T10:06:07.833+08:00 level=INFO source=server.go:540 msg="waiting for server to become available" status="llm server error"
INFO [main] build info | build=1 commit="952d03d" tid="140283378036736" timestamp=1716170767
INFO [main] system info | n_threads=64 n_threads_batch=-1 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | " tid="140283378036736" timestamp=1716170767 total_threads=128
INFO [main] HTTP server listening | hostname="127.0.0.1" n_threads_http="127" port="43339" tid="140283378036736" timestamp=1716170767
llama_model_loader: loaded meta data with 21 key-value pairs and 483 tensors from /root/autodl-tmp/models/blobs/sha256-1c751709783923dab2b876d5c5c2ca36d4e205cfef7d88988df45752cb91f245 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = qwen2
llama_model_loader: - kv 1: general.name str = merge5-1
llama_model_loader: - kv 2: qwen2.block_count u32 = 40
llama_model_loader: - kv 3: qwen2.context_length u32 = 32768
llama_model_loader: - kv 4: qwen2.embedding_length u32 = 5120
llama_model_loader: - kv 5: qwen2.feed_forward_length u32 = 13696
llama_model_loader: - kv 6: qwen2.attention.head_count u32 = 40
llama_model_loader: - kv 7: qwen2.attention.head_count_kv u32 = 40
llama_model_loader: - kv 8: qwen2.rope.freq_base f32 = 1000000.000000
llama_model_loader: - kv 9: qwen2.attention.layer_norm_rms_epsilon f32 = 0.000001
llama_model_loader: - kv 10: general.file_type u32 = 2
llama_model_loader: - kv 11: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 12: tokenizer.ggml.pre str = qwen2
llama_model_loader: - kv 13: tokenizer.ggml.tokens arr[str,152064] = ["!", """, "#", "$", "%", "&", "'", ...
llama_model_loader: - kv 14: tokenizer.ggml.token_type arr[i32,152064] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 15: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
llama_model_loader: - kv 16: tokenizer.ggml.eos_token_id u32 = 151643
llama_model_loader: - kv 17: tokenizer.ggml.padding_token_id u32 = 151643
llama_model_loader: - kv 18: tokenizer.ggml.bos_token_id u32 = 151643
llama_model_loader: - kv 19: tokenizer.chat_template str = {% for message in messages %}{% if lo...
llama_model_loader: - kv 20: general.quantization_version u32 = 2
llama_model_loader: - type f32: 201 tensors
llama_model_loader: - type q4_0: 281 tensors
llama_model_loader: - type q6_K: 1 tensors
time=2024-05-20T10:06:08.085+08:00 level=INFO source=server.go:540 msg="waiting for server to become available" status="llm server loading model"
llama_model_load: error loading model: error loading model vocabulary: unknown pre-tokenizer type: 'qwen2'
llama_load_model_from_file: exception loading model
terminate called after throwing an instance of 'std::runtime_error'
what(): error loading model vocabulary: unknown pre-tokenizer type: 'qwen2'
time=2024-05-20T10:06:08.437+08:00 level=INFO source=server.go:540 msg="waiting for server to become available" status="llm server error"
time=2024-05-20T10:06:08.656+08:00 level=WARN source=sched.go:512 msg="gpu VRAM usage didn't recover within timeout" seconds=5.120574757
time=2024-05-20T10:06:08.688+08:00 level=ERROR source=sched.go:344 msg="error loading llama server" error="llama runner process has terminated: signal: aborted (core dumped) "

I look at the 4b to 72b of qwen1.5 provided, so this should be provided by the tokenizer as well

OS

Linux

GPU

Nvidia

CPU

Other

Ollama version

client version is 0.1.38

@Anorid Anorid added the bug Something isn't working label May 20, 2024
@Anorid
Copy link
Author

Anorid commented May 20, 2024

image
This is the GGUF file and the information for the imported model

@liduang
Copy link

liduang commented May 20, 2024

I have also encountered this problem, and I feel that it is the problem here:
May 20 17:54:48 localhost.localdomain ollama[11885]: llama_model_loader: - kv 12: tokenizer.ggml.pre str = qwen2
It is estimated that there is a conflict with llama.cpp's update this time

7114

@GitTurboy
Copy link

I got the same error on windows system:
llama_model_loader: loaded meta data with 21 key-value pairs and 291 tensors from D:\lamaModels\blobs\sha256-6b22d907af67d494c1194b1bd688423945b4d3009bded2e5ecbc88d426b0c5a3 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = qwen2
llama_model_loader: - kv 1: general.name str = Qwen1___5-1___8B-Chat
llama_model_loader: - kv 2: qwen2.block_count u32 = 24
llama_model_loader: - kv 3: qwen2.context_length u32 = 32768
llama_model_loader: - kv 4: qwen2.embedding_length u32 = 2048
llama_model_loader: - kv 5: qwen2.feed_forward_length u32 = 5504
llama_model_loader: - kv 6: qwen2.attention.head_count u32 = 16
llama_model_loader: - kv 7: qwen2.attention.head_count_kv u32 = 16
llama_model_loader: - kv 8: qwen2.rope.freq_base f32 = 1000000.000000
llama_model_loader: - kv 9: qwen2.attention.layer_norm_rms_epsilon f32 = 0.000001
llama_model_loader: - kv 10: general.file_type u32 = 1
llama_model_loader: - kv 11: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 12: tokenizer.ggml.pre str = qwen2
llama_model_loader: - kv 13: tokenizer.ggml.tokens arr[str,151936] = ["!", """, "#", "$", "%", "&", "'", ...
llama_model_loader: - kv 14: tokenizer.ggml.token_type arr[i32,151936] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 15: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
llama_model_loader: - kv 16: tokenizer.ggml.eos_token_id u32 = 151645
llama_model_loader: - kv 17: tokenizer.ggml.padding_token_id u32 = 151643
llama_model_loader: - kv 18: tokenizer.ggml.bos_token_id u32 = 151643
llama_model_loader: - kv 19: tokenizer.chat_template str = {% for message in messages %}{% if lo...
llama_model_loader: - kv 20: general.quantization_version u32 = 2
llama_model_loader: - type f32: 121 tensors
llama_model_loader: - type f16: 170 tensors
time=2024-05-20T16:44:58.427+08:00 level=INFO source=server.go:540 msg="waiting for server to become available" status="llm server loading model"
llama_model_load: error loading model: error loading model vocabulary: unknown pre-tokenizer type: 'qwen2'
llama_load_model_from_file: exception loading model
time=2024-05-20T16:44:58.698+08:00 level=ERROR source=sched.go:344 msg="error loading llama server" error="llama runner process has terminated: exit status 0xc0000409 "

@binganao
Copy link

临时解决方案可以参考我的,使用 convert-hf-to-gguf.py 合并模型时,注释掉这一行

图片

@Treedy2020
Copy link

The specific reason may be that llama.cpp/convert-hf-to-gguf.py encountered issues during the rapid iteration process. I experienced the same problem when exporting and quantizing qwen2 in the latest version of llama.cpp, but the exported and quantized gguf models using an older version of llama.cpp for qwen2 are usable. You can try modifying this file like @binganao did, or simply roll back the version of llama.cpp and try again:

cd llama.cpp 
git reset --hard 46e12c4692a37bdd31a0432fc5153d7d22bc7f72

check this release for detail. Then import and re-quantize the modelscope / hf folder of qwen2 according to the official ollama documentation. Hopefully this can solve your problem.

@xianyuxm
Copy link

The specific reason may be that llama.cpp/convert-hf-to-gguf.py encountered issues during the rapid iteration process. I experienced the same problem when exporting and quantizing qwen2 in the latest version of llama.cpp, but the exported and quantized gguf models using an older version of llama.cpp for qwen2 are usable. You can try modifying this file like @binganao did, or simply roll back the version of llama.cpp and try again:

cd llama.cpp 
git reset --hard 46e12c4692a37bdd31a0432fc5153d7d22bc7f72

check this release for detail. Then import and re-quantize the modelscope / hf folder of qwen2 according to the official ollama documentation. Hopefully this can solve your problem.

I tried binganao's method, but it didn't work. However, following your suggestion to roll back to a previous version successfully resolved the issue. Thank you!

@bartowski1182
Copy link

I just tried a Qwen2 model I made recently with llama.cpp ./main and it loaded and generated with no issues. Are we sure this isn't ollama needing an update?

@tk19911120
Copy link

I have the same issue when exporting and quantizing qwen1.5-7b-chat,(Error: llama runner process has terminated: signal: aborted (core dumped)). And I tried Treedy2020's method(sudo git reset --hard 46e12c4692a37bdd31a0432fc5153d7d22bc7f72), solved the issue.
ollama version is 0.1.37

@pdevine
Copy link
Contributor

pdevine commented May 30, 2024

The problem was that llama.cpp changed how the tokenizer worked because of changes w/ llama3 tokenization. This should be fixed in 0.1.39 though, so I'll go ahead and close the issue. @Anorid LMK if it's still persisting and I can reopen.

@pdevine pdevine closed this as completed May 30, 2024
@markg85
Copy link

markg85 commented Jun 7, 2024

Could this be re-opened?
I have the very same issue too.

Jun 07 02:14:13 newphobos ollama[4528]: {"function":"server_params_parse","level":"INFO","line":2604,"msg":"logging to file is disabled.","tid":"129450009160768","timestamp":1717719253}
Jun 07 02:14:13 newphobos ollama[4528]: {"build":1,"commit":"952d03d","function":"main","level":"INFO","line":2821,"msg":"build info","tid":"129450009160768","timestamp":1717719253}
Jun 07 02:14:13 newphobos ollama[4528]: {"function":"main","level":"INFO","line":2828,"msg":"system info","n_threads":16,"n_threads_batch":-1,"system_info":"AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | ","tid":"129450009160768","timestamp":1717719253,"total_threads":32}
Jun 07 02:14:13 newphobos ollama[4379]: llama_model_loader: loaded meta data with 21 key-value pairs and 339 tensors from /var/lib/ollama/.ollama/models/blobs/sha256-43f7a214e5329f672bb05404cfba1913cbb70fdaa1a17497224e1925046b0ed5 (version GGUF V3 (latest))
Jun 07 02:14:13 newphobos ollama[4379]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
Jun 07 02:14:13 newphobos ollama[4379]: llama_model_loader: - kv   0:                       general.architecture str              = qwen2
Jun 07 02:14:13 newphobos ollama[4379]: llama_model_loader: - kv   1:                               general.name str              = Qwen2-7B-Instruct
Jun 07 02:14:13 newphobos ollama[4379]: llama_model_loader: - kv   2:                          qwen2.block_count u32              = 28
Jun 07 02:14:13 newphobos ollama[4379]: llama_model_loader: - kv   3:                       qwen2.context_length u32              = 32768
Jun 07 02:14:13 newphobos ollama[4379]: llama_model_loader: - kv   4:                     qwen2.embedding_length u32              = 3584
Jun 07 02:14:13 newphobos ollama[4379]: llama_model_loader: - kv   5:                  qwen2.feed_forward_length u32              = 18944
Jun 07 02:14:13 newphobos ollama[4379]: llama_model_loader: - kv   6:                 qwen2.attention.head_count u32              = 28
Jun 07 02:14:13 newphobos ollama[4379]: llama_model_loader: - kv   7:              qwen2.attention.head_count_kv u32              = 4
Jun 07 02:14:13 newphobos ollama[4379]: llama_model_loader: - kv   8:                       qwen2.rope.freq_base f32              = 1000000.000000
Jun 07 02:14:13 newphobos ollama[4379]: llama_model_loader: - kv   9:     qwen2.attention.layer_norm_rms_epsilon f32              = 0.000001
Jun 07 02:14:13 newphobos ollama[4379]: llama_model_loader: - kv  10:                          general.file_type u32              = 2
Jun 07 02:14:13 newphobos ollama[4379]: llama_model_loader: - kv  11:                       tokenizer.ggml.model str              = gpt2
Jun 07 02:14:13 newphobos ollama[4379]: llama_model_loader: - kv  12:                         tokenizer.ggml.pre str              = qwen2
Jun 07 02:14:13 newphobos ollama[4379]: llama_model_loader: - kv  13:                      tokenizer.ggml.tokens arr[str,152064]  = ["!", "\"", "#", "$", "%", "&", "'", ...
Jun 07 02:14:13 newphobos ollama[4379]: llama_model_loader: - kv  14:                  tokenizer.ggml.token_type arr[i32,152064]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
Jun 07 02:14:13 newphobos ollama[4379]: llama_model_loader: - kv  15:                      tokenizer.ggml.merges arr[str,151387]  = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
Jun 07 02:14:13 newphobos ollama[4379]: llama_model_loader: - kv  16:                tokenizer.ggml.eos_token_id u32              = 151645
Jun 07 02:14:13 newphobos ollama[4379]: llama_model_loader: - kv  17:            tokenizer.ggml.padding_token_id u32              = 151643
Jun 07 02:14:13 newphobos ollama[4379]: llama_model_loader: - kv  18:                tokenizer.ggml.bos_token_id u32              = 151643
Jun 07 02:14:13 newphobos ollama[4379]: llama_model_loader: - kv  19:                    tokenizer.chat_template str              = {% for message in messages %}{% if lo...
Jun 07 02:14:13 newphobos ollama[4379]: llama_model_loader: - kv  20:               general.quantization_version u32              = 2
Jun 07 02:14:13 newphobos ollama[4379]: llama_model_loader: - type  f32:  141 tensors
Jun 07 02:14:13 newphobos ollama[4379]: llama_model_loader: - type q4_0:  197 tensors
Jun 07 02:14:13 newphobos ollama[4379]: llama_model_loader: - type q6_K:    1 tensors
Jun 07 02:14:13 newphobos ollama[4379]: llama_model_load: error loading model: error loading model vocabulary: unknown pre-tokenizer type: 'qwen2'
Jun 07 02:14:13 newphobos ollama[4379]: llama_load_model_from_file: exception loading model
Jun 07 02:14:13 newphobos ollama[4379]: terminate called after throwing an instance of 'std::runtime_error'
Jun 07 02:14:13 newphobos ollama[4379]:   what():  error loading model vocabulary: unknown pre-tokenizer type: 'qwen2'

Now there's something strange going on too.

❯ ollama --version
ollama version is 0.1.34

While i have 0.1.41 installed (arch linux):

❯ pacman -Qi ollama
Name            : ollama-rocm
Version         : 0.1.41-1
Description     : Create, run and share large language models (LLMs) with ROCm
Architecture    : x86_64
URL             : https://github.com/ollama/ollama
Licenses        : MIT
Groups          : None
Provides        : ollama
Depends On      : hipblas
Optional Deps   : None
Required By     : None
Optional For    : None
Conflicts With  : ollama
Replaces        : None
Installed Size  : 66.50 MiB
Packager        : Lukas Fleischer <lfleischer@archlinux.org>
Build Date      : Sun 02 Jun 2024 17:51:45 CEST
Install Date    : Fri 07 Jun 2024 02:22:08 CEST
Install Reason  : Explicitly installed
Install Script  : No
Validated By    : Signature

So upon further inspection, this is how it's build:
https://gitlab.archlinux.org/archlinux/packaging/packages/ollama/-/blob/main/PKGBUILD?ref_type=heads

Which builds the tag 476fb8e, that is the 0.1.41 tag: https://github.com/ollama/ollama/releases/tag/v0.1.41

The llama-cpp version is this tag ggerganov/llama.cpp@5921b8f which is just a week old.

Am i missing something here to get qwen2 working?
The version thing is weird for sure but that might be it's own bug?

@pdevine pdevine reopened this Jun 7, 2024
@cyp0633
Copy link

cyp0633 commented Jun 7, 2024

Now there's something strange going on too.

❯ ollama --version
ollama version is 0.1.34

Did you reboot your machine or do sudo systemctl restart ollama after upgrading? The running ollama service is not automatically upgraded.

@markg85
Copy link

markg85 commented Jun 7, 2024

@cyp0633 yes! :)

I did both (and a couple times), didn't help.
Let's not spend too much time in the version thing but let's check 1 thing.

Could someone else run ollama --version ion a 0.1.41 release and post your result here? If there's anyone else that has this bug too (wrong version number for the release your using) then I'll make a new issue for that. If this can't be reproduced and the command matches your install then there's something seriously wrong on my setup and I'll have to dig deep to figure it out.

@I321065
Copy link

I321065 commented Jun 7, 2024

same issue happened to me

@markg85
Copy link

markg85 commented Jun 7, 2024

Issue can be closed again.
I had installed ollama using the script on the ollama site.
And i had it installed through my package manager.

Removing the one installed through the script made things work. Version is as expected now.
100% user error, sorry for the noise!

@rallg0535
Copy link

update ollama to version 0.1.42 , then ok

@Fau57
Copy link

Fau57 commented Jun 10, 2024

I was using LM studio and just had to update btw

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests