[Hardware][Intel-Gaudi] Enable FusedSDPA support for Intel Gaudi (HPU) #12359

scsudhak-intel · 2025-01-23T14:24:27Z

FusedSDPA is a Gaudi specific implementation of flash attention. https://docs.habana.ai/en/v1.19.1/PyTorch/Model_Optimization_PyTorch/Optimization_in_PyTorch_Models.html

This PR enables support for FusedSDPA in Intel Gaudi accelerators.

github-actions · 2025-01-23T14:24:40Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

kzawora-intel · 2025-01-28T10:14:03Z

vllm/attention/backends/hpu_attn.py

+        HPUFusedSDPA = FusedSDPA
+        self.fused_scaled_dot_product_attention = None if HPUFusedSDPA is None \
+            else ModuleFusedSDPA(HPUFusedSDPA)


this does not consider VLLM_PROMPT_USE_FUSEDSDPA defined in hpu_model_runner.py:295 - doesn't this break the default scenario, considering that non-FSDPA case requires different attention bias handling than the default implementation?

Incorporated the suggested changes as per our offline discussion.

mgoin

LGTM

Signed-off-by: Sanju C Sudhakaran <scsudhakaran@habana.ai>

scsudhak-intel marked this pull request as ready for review January 28, 2025 09:36

kzawora-intel reviewed Jan 28, 2025

View reviewed changes

scsudhak-intel force-pushed the hpu-fused-sdpa branch from e25010d to 178ab73 Compare January 29, 2025 13:21

kzawora-intel approved these changes Jan 30, 2025

View reviewed changes

mgoin approved these changes Feb 4, 2025

View reviewed changes

mgoin added the ready label Feb 4, 2025

mgoin enabled auto-merge (squash) February 4, 2025 16:22

Enable FusedSDPA support for HPU

Loading
Loading status checks…

a34ab04

Signed-off-by: Sanju C Sudhakaran <scsudhakaran@habana.ai>

auto-merge was automatically disabled February 5, 2025 04:55
Head branch was pushed to by a user without write access

scsudhak-intel force-pushed the hpu-fused-sdpa branch from 178ab73 to a34ab04 Compare February 5, 2025 04:55

simon-mo merged commit af8486d into vllm-project:main Feb 5, 2025
45 of 48 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GitHub Sponsors

[Hardware][Intel-Gaudi] Enable FusedSDPA support for Intel Gaudi (HPU) #12359

[Hardware][Intel-Gaudi] Enable FusedSDPA support for Intel Gaudi (HPU) #12359

scsudhak-intel commented Jan 23, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Jan 23, 2025

kzawora-intel Jan 28, 2025

scsudhak-intel Jan 29, 2025

mgoin left a comment

[Hardware][Intel-Gaudi] Enable FusedSDPA support for Intel Gaudi (HPU) #12359

[Hardware][Intel-Gaudi] Enable FusedSDPA support for Intel Gaudi (HPU) #12359

Conversation

scsudhak-intel commented Jan 23, 2025 • edited by github-actions bot Loading

github-actions bot commented Jan 23, 2025

kzawora-intel Jan 28, 2025

Choose a reason for hiding this comment

scsudhak-intel Jan 29, 2025

Choose a reason for hiding this comment

mgoin left a comment

Choose a reason for hiding this comment

scsudhak-intel commented Jan 23, 2025 •

edited by github-actions bot

Loading