-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cast to fp32 if using bf16 weights on cpu during merge_and_unload
#1978
Cast to fp32 if using bf16 weights on cpu during merge_and_unload
#1978
Conversation
@BenjaminBossan Our team found this bug through LoRA runs that were hanging for a very long time on certain CPU types and addressed it with this simple fix. would be great if you could take a look. Thanks! |
Thanks for working on this fix. I think this should be okay to merge, but I would like to check if I can replicate the slowness. Do you have an example that I could check? |
It's dependent on the CPU, but instantiating any LoraModel, converting it to fp16 (should be fast) and bf16 (should be slow), and calling |
Actually @BenjaminBossan here's a script which worked for me. Running locally on my mac (current peft v0.12.0), I got:
Clearly, bf16 takes way longer due to the lack of fast bf16 matmul support on many cpus. Script:
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Thanks for providing the example. On one machine, I could not see any slowdown with bf16, on the other there was a factor of 100. Regarding the comment above the code:
it is not quite correct anymore, right? Originally, we had to cast to fp32 because there was an error with fp16 on CPU (which I think is fixed in newer PyTorch versions). Would it make sense to add a comment about accelerating the operation on some CPUs? |
Sure, let me change that. |
@BenjaminBossan updated! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this improvement, LGTM.
@BenjaminBossan when will the next release be? |
Sorry, no release soon, as we had a release just last week. You could install from |
Should address #1977