You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
vLLM is a fast and easy-to-use library for LLM inference and serving that already has a file limit increase to 400 MB in issue # 3792. nm-vllm is an enterprise-supported fork of vLLM that requires similar file size because of the amount of compiled kernels.
Reasons for the request
Pre-compiling these kernels means that users can deploy quickly and deterministically, rather than needing to setup a compilation environment where they need to deploy. As we extend our optimized inference for more hardware platforms, the binary size will grow so we would like to follow the standard vLLM sets.
Code of Conduct
I agree to follow the PSF Code of Conduct
The text was updated successfully, but these errors were encountered:
Project URL
https://pypi.org/project/nm-vllm/
Does this project already exist?
New Limit
400 MB
Update issue title
Which indexes
PyPI
About the project
vLLM is a fast and easy-to-use library for LLM inference and serving that already has a file limit increase to 400 MB in issue # 3792. nm-vllm is an enterprise-supported fork of vLLM that requires similar file size because of the amount of compiled kernels.
Reasons for the request
Pre-compiling these kernels means that users can deploy quickly and deterministically, rather than needing to setup a compilation environment where they need to deploy. As we extend our optimized inference for more hardware platforms, the binary size will grow so we would like to follow the standard vLLM sets.
Code of Conduct
The text was updated successfully, but these errors were encountered: