Support QBits kernel for CPU device #660

PenghuiCheng · 2024-04-29T09:39:37Z

Based on the suggestion of #597, we have implemented the inference of GPTQ model on the CPU device. This PR will support Weight-Only quantization on CPU devices and infernce with QBbits backend. QBits backend has a 'bestla' kernel for CPU gemm op. And QBits is a module of intel-extension-for-transformers package.

Signed-off-by: Cheng Penghui <penghui.cheng@intel.com>

Qubitium · 2024-04-29T12:39:22Z

@PenghuiCheng Awesome. Does Qbits support any x86_64 including amd and non-validated intel? Also is this limited to only some recent gen chips? I only see that two classes of intel cpu validated. Obviously Intel may not validate non-intel such as amd but are the instructions tested also on amd?

Intel Xeon Scalable Processors	✔	✔	✔ (INT8, FP8)	✔ (INT4, FP4, NF4)
Intel Xeon CPU Max Series	✔	✔	✔ (INT8, FP8)	✔ (INT4, FP4, NF4)

PenghuiCheng · 2024-04-30T08:02:52Z

@PenghuiCheng Awesome. Does Qbits support any x86_64 including amd and non-validated intel? Also is this limited to only some recent gen chips? I only see that two classes of intel cpu validated. Obviously Intel may not validate non-intel such as amd but are the instructions tested also on amd?

ntel Xeon Scalable Processors ✔ ✔ ✔ (INT8, FP8) ✔ (INT4, FP4, NF4)
Intel Xeon CPU Max Series ✔ ✔ ✔ (INT8, FP8) ✔ (INT4, FP4, NF4)

For non-intel devices, we believe the code would work well as long as the instructions are compliant to Intel CPU ISAs.

PenghuiCheng · 2024-05-16T05:53:37Z

@Qubitium , Hi, we are not sure if we have done everything appropriately, but we expect your review. Please let us know if there's anything we can do to improve it 😄

Qubitium · 2024-05-22T14:43:27Z

@PenghuiCheng Thank you for your patience. I do not have the authority to approve this PR. Both @PanQiWei and @fxmarty Are extremely busy at the moment so I am not sure about timeline.

There is discussion in issues about completely replacing the stagnant qigen code with qbit code. It will be of great benefit to the gptq community to have Intel come on board to actively support x86_64 cpu kernels.

Can you test the state of the qigen code in autogptq main? I have a tingly it is in a semi-inoperable state. If qigen is not fully functional, my suggestion is to modify the pr and have qbit completely take over all cpu paths (remove and replace all qigen code).

Support QBits kernel for CPU device

e95958b

Signed-off-by: Cheng Penghui <penghui.cheng@intel.com>

Qubitium mentioned this pull request Apr 29, 2024

[DEPRECATION] Discussion on Fused attention and QiGEN #655

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support QBits kernel for CPU device #660

Support QBits kernel for CPU device #660

PenghuiCheng commented Apr 29, 2024

Qubitium commented Apr 29, 2024 •

edited

PenghuiCheng commented Apr 30, 2024

PenghuiCheng commented May 16, 2024

Qubitium commented May 22, 2024

Support QBits kernel for CPU device #660

Are you sure you want to change the base?

Support QBits kernel for CPU device #660

Conversation

PenghuiCheng commented Apr 29, 2024

Qubitium commented Apr 29, 2024 • edited

PenghuiCheng commented Apr 30, 2024

PenghuiCheng commented May 16, 2024

Qubitium commented May 22, 2024

Qubitium commented Apr 29, 2024 •

edited