Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support QBits kernel for CPU device #660

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

PenghuiCheng
Copy link

Based on the suggestion of #597, we have implemented the inference of GPTQ model on the CPU device. This PR will support Weight-Only quantization on CPU devices and infernce with QBbits backend. QBits backend has a 'bestla' kernel for CPU gemm op. And QBits is a module of intel-extension-for-transformers package.

Signed-off-by: Cheng Penghui <penghui.cheng@intel.com>
@Qubitium
Copy link
Contributor

Qubitium commented Apr 29, 2024

@PenghuiCheng Awesome. Does Qbits support any x86_64 including amd and non-validated intel? Also is this limited to only some recent gen chips? I only see that two classes of intel cpu validated. Obviously Intel may not validate non-intel such as amd but are the instructions tested also on amd?

Intel Xeon Scalable Processors ✔ (INT8, FP8) ✔ (INT4, FP4, NF4)
Intel Xeon CPU Max Series ✔ (INT8, FP8) ✔ (INT4, FP4, NF4)

@PenghuiCheng
Copy link
Author

@PenghuiCheng Awesome. Does Qbits support any x86_64 including amd and non-validated intel? Also is this limited to only some recent gen chips? I only see that two classes of intel cpu validated. Obviously Intel may not validate non-intel such as amd but are the instructions tested also on amd?

ntel Xeon Scalable Processors ✔ ✔ ✔ (INT8, FP8) ✔ (INT4, FP4, NF4)
Intel Xeon CPU Max Series ✔ ✔ ✔ (INT8, FP8) ✔ (INT4, FP4, NF4)

For non-intel devices, we believe the code would work well as long as the instructions are compliant to Intel CPU ISAs.

@PenghuiCheng
Copy link
Author

@Qubitium , Hi, we are not sure if we have done everything appropriately, but we expect your review. Please let us know if there's anything we can do to improve it 😄

@Qubitium
Copy link
Contributor

@PenghuiCheng Thank you for your patience. I do not have the authority to approve this PR. Both @PanQiWei and @fxmarty Are extremely busy at the moment so I am not sure about timeline.

There is discussion in issues about completely replacing the stagnant qigen code with qbit code. It will be of great benefit to the gptq community to have Intel come on board to actively support x86_64 cpu kernels.

Can you test the state of the qigen code in autogptq main? I have a tingly it is in a semi-inoperable state. If qigen is not fully functional, my suggestion is to modify the pr and have qbit completely take over all cpu paths (remove and replace all qigen code).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants