Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

generate_completions feedback #11

Open
bennmann opened this issue Apr 3, 2023 · 1 comment
Open

generate_completions feedback #11

bennmann opened this issue Apr 3, 2023 · 1 comment

Comments

@bennmann
Copy link

bennmann commented Apr 3, 2023

generate_completions seems to be very bad at narration for any meaningful length (past about 200 words), often hallucinating more or repeating passages (for the 8GB fp16quanti4 14B Raven instruct 6 model).

also general usage could be easily improved with a while loop

Would be good to implement a simple while loop so the model does not leave RAM if someone wants to generate more and some of the repetition penalty logic (using GEN_alpha_presence = 0.2 # Presence Penalty and GEN_alpha_frequency = 0.2 # Frequency Penalty) that BlinkDL uses in ChatRWKV.

It's pretty fast on AVX2 ! this is an awesome repo and thank you for your work

@saharNooby
Copy link
Collaborator

Do you use Q4_X (quantized weights)? If so, please test FP16 for quality and maybe check out #12

the model does not leave RAM if someone wants to generate more

We have an example of interactive mode in chat_with_bot.py -- probably, it could be extended.

For other improvements -- PRs are very welcome :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants