Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Paged Attention support for FA3 #1268

Merged
merged 1 commit into from
Nov 10, 2024
Merged

Paged Attention support for FA3 #1268

merged 1 commit into from
Nov 10, 2024

Conversation

kadeng
Copy link
Contributor

@kadeng kadeng commented Oct 10, 2024

Adding support for Paged Attention / block table to the Flash-Attention 3 Kernel.

Limits:

Test Plan:

cd hopper
pytest test_flash_attn.py -k "test_flash_attn_varlen_paged"  -s

Update

@kadeng kadeng marked this pull request as ready for review October 10, 2024 16:30
@kadeng kadeng force-pushed the main branch 2 times, most recently from a5bac6b to 956692c Compare October 17, 2024 12:46
@kadeng
Copy link
Contributor Author

kadeng commented Oct 22, 2024 via email

@alexngng
Copy link

Thank you for your response!

@kadeng
Copy link
Contributor Author

kadeng commented Oct 29, 2024

I was investigating a flaky test failure that I was seeing on this PR. I could narrow it down to a preexisting issue: The flash attention varlen implementation does not seem to work properly yet for d=256. I have disabled testing for d=256 for now, similar to how it's handled in hopper/test_flash_attn.py::test_flash_attn_varlen_output

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants