Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mpt30b not showing any response. #5

Open
heisenbergwasuncertain opened this issue Jun 28, 2023 · 8 comments
Open

mpt30b not showing any response. #5

heisenbergwasuncertain opened this issue Jun 28, 2023 · 8 comments

Comments

@heisenbergwasuncertain
Copy link

during the inference after the user input the model waits for few seconds but does not respond anything just returns empty. I'm using it on dell optiplex 7070 micro with intel core i7 9700t with 8 cores and 32gb ram.
Screenshot (1)

@rodrigofarias-MECH
Copy link

rodrigofarias-MECH commented Jun 28, 2023

I'm having the same problem. Processing goes to 100% for a few seconds but returns empty answers. It goes around 24Gb of RAM usage.
I tested in VScode and in cmd. Same behaviour.
Ive tried to debug, but the "generator" variable had no kind of string text inside it.

I'm running mpt-30b-chat.ggmlv0.q5_1.bin model instead of default q4_0.

PC: Ryzen 5900X and 32 Gb RAM.

@abacaj
Copy link
Owner

abacaj commented Jun 28, 2023

For cases like this I recommend docker because of the environment issues. I have windows as well, here's how I run it.

Use a container like so:

docker run -it -w /transformers --mount type=volume,source=transformers,target=/transformers python:3.11.4 /bin/bash

Clone the repo:

git clone git@github.com:abacaj/mpt-30B-inference.git

Follow directions in the readme for the rest: https://github.com/abacaj/mpt-30B-inference#setup.
I just ran through this process once again now, and it works I can get model to generate correctly on my Ryzen/Windows machine:

image

@rodrigofarias-MECH
Copy link

rodrigofarias-MECH commented Jun 28, 2023

For cases like this I recommend docker because of the environment issues. I have windows as well, here's how I run it.

Use a container like so:

docker run -it --mount type=volume,source=transformers,target=/transformerspython:3.11.4 /bin/bash

Clone the repo:

git clone git@github.com:abacaj/mpt-30B-inference.git

Follow directions in the readme for the rest: https://github.com/abacaj/mpt-30B-inference#setup. I just ran through this process once again now, and it works I can get model to generate correctly on my Ryzen/Windows machine:

image

Thank you.
Ive created a conda env, installed the requirements, manually downloaded 2 models (q5_1 and q4_1). Any hint on why this empty responses? I really prefer not to use a container.

Great work by the way!

@abacaj
Copy link
Owner

abacaj commented Jun 28, 2023

For cases like this I recommend docker because of the environment issues. I have windows as well, here's how I run it.
Use a container like so:

docker run -it --mount type=volume,source=transformers,target=/transformerspython:3.11.4 /bin/bash

Clone the repo:

git clone git@github.com:abacaj/mpt-30B-inference.git

Follow directions in the readme for the rest: https://github.com/abacaj/mpt-30B-inference#setup. I just ran through this process once again now, and it works I can get model to generate correctly on my Ryzen/Windows machine:
image

Thank you. Ive created a conda env, installed the requirements, manually downloaded 2 models (q5_1 and q4_1). Any hint on why this empty responses? I really prefer not to use a container.

Great work by the way!

Likely has to do with ctransformers library, since that is how the bindings work from python -> ggml (though I'm not certain of it)

@mzubairumt
Copy link

I have observed that when processing user queries, the CPU usage increases but I do not receive a response.
[user]: What is the capital of France?
[assistant]:
[user]"

@mzubairumt
Copy link

For cases like this I recommend docker because of the environment issues. I have windows as well, here's how I run it.
Use a container like so:

docker run -it --mount type=volume,source=transformers,target=/transformerspython:3.11.4 /bin/bash

Clone the repo:

git clone git@github.com:abacaj/mpt-30B-inference.git

Follow directions in the readme for the rest: https://github.com/abacaj/mpt-30B-inference#setup. I just ran through this process once again now, and it works I can get model to generate correctly on my Ryzen/Windows machine:
image

Thank you. Ive created a conda env, installed the requirements, manually downloaded 2 models (q5_1 and q4_1). Any hint on why this empty responses? I really prefer not to use a container.

Great work by the way!

python3 inference.py
Fetching 1 files: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 3584.88it/s]
GGML_ASSERT: /home/runner/work/ctransformers/ctransformers/models/ggml/ggml.c:4103: ctx->mem_buffer != NULL
Aborted

@mzubairumt
Copy link

Issue Fixed Replace files with
https://github.com/mzubair31102/llama2.git

@renanfferreira
Copy link

I'm also facing this issue on Windows.
However, the main problem is when I run this on a container it produces very slow responses.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants