sync : ggml #2991

ggerganov · 2025-04-02T12:14:28Z

No description provided.

* Vulkan: Add DP4A MMQ and Q8_1 quantization shader * Add q4_0 x q8_1 matrix matrix multiplication support * Vulkan: Add int8 coopmat MMQ support * Vulkan: Add q4_1, q5_0 and q5_1 quants, improve integer dot code * Add GL_EXT_integer_dot_product check * Remove ggml changes, fix mmq pipeline picker * Remove ggml changes, restore Intel coopmat behaviour * Fix glsl compile attempt when integer vec dot is not supported * Remove redundant code, use non-saturating integer dot, enable all matmul sizes for mmq * Remove redundant comment * Fix integer dot check * Fix compile issue with unsupported int dot glslc * Update Windows build Vulkan SDK version

* faster ssm_scan * delete unused commnet * clang format * add space * modify unnecessary calculations * faster ssm conv implementatioin * modify file name with dash

* Rename oneMKL Interface to oneMath * Use oneMath for Intel vendor * Rename occurences to mkl * clang-format * Silence verbose warnings * Set oneMath HIP_TARGETS * Fix silence warnings * Remove step to build oneMath from build instructions * Use fixed oneMath version * Remove INTEL_CPU * Fold CMake oneDNN conditions * Use Intel oneMKL for Intel devices * Improve CMake message * Link against MKL::MKL_SYCL::BLAS only * Move oneMath documentation to Nvidia and AMD sections

* Fix clang warning in gguf_check_reserved_keys Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * Fix typo Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> --------- Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

* metal : use F32 prec in FA kernels ggml-ci * cont : fix FA vec kernel ggml-ci

issue: CodeLinaro/llama.cpp#17 (comment) This patch fixes the memory allocation size not exceeding the maximum size of the OpenCL device.

* [CANN]get_rows and dup optimization. Co-authored-by: hipudding <huafengchun@gmail.com> Signed-off-by: noemotiovon <noemotiovon@gmail.com> * [CANN]GET_ROWS and CPY/DUP optimization Co-authored-by: hipudding <huafengchun@gmail.com> Signed-off-by: noemotiovon <noemotiovon@gmail.com> * [CANN]code style adjustment Signed-off-by: noemotiovon <noemotiovon@gmail.com> * [CANN]code style adjustment Signed-off-by: noemotiovon <noemotiovon@gmail.com> * [CANN]code style adjustment Signed-off-by: noemotiovon <noemotiovon@gmail.com> * [CANN]code style adjustment Signed-off-by: noemotiovon <noemotiovon@gmail.com> --------- Signed-off-by: noemotiovon <noemotiovon@gmail.com> Co-authored-by: noemotiovon <noemotiovon@gmail.com> Co-authored-by: hipudding <huafengchun@gmail.com>

ggml-ci

ggerganov and others added 11 commits April 2, 2025 15:13

cmake : fix whitespace (llama/0)

fc0b844

ggml : faster ssm scan (llama/10558)

d9b53a0

* faster ssm_scan * delete unused commnet * clang format * add space * modify unnecessary calculations * faster ssm conv implementatioin * modify file name with dash

SYCL: switch to SYCL namespace (llama/12674)

5014a78

vulkan: fix build when glslc doesn't support coopmat (llama/12683)

690fa65

metal : use F32 prec in FA kernels (llama/12688)

ee44a0e

* metal : use F32 prec in FA kernels ggml-ci * cont : fix FA vec kernel ggml-ci

opencl : fix memory allocation size (llama/12649)

e8c0a27

issue: CodeLinaro/llama.cpp#17 (comment) This patch fixes the memory allocation size not exceeding the maximum size of the OpenCL device.

sync : ggml

Loading
Loading status checks…

7ac68eb

ggml-ci

danbev approved these changes Apr 2, 2025

View reviewed changes

ggerganov merged commit ad4e350 into master Apr 2, 2025
60 checks passed

ggerganov deleted the sync-ggml-25-04-02 branch April 2, 2025 12:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sync : ggml #2991

sync : ggml #2991

ggerganov commented Apr 2, 2025

sync : ggml #2991

sync : ggml #2991

Conversation

ggerganov commented Apr 2, 2025