Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File Limit Request: nm-vllm - 400 MiB #4076

Open
3 tasks done
mgoin opened this issue May 20, 2024 · 0 comments
Open
3 tasks done

File Limit Request: nm-vllm - 400 MiB #4076

mgoin opened this issue May 20, 2024 · 0 comments

Comments

@mgoin
Copy link

mgoin commented May 20, 2024

Project URL

https://pypi.org/project/nm-vllm/

Does this project already exist?

  • Yes

New Limit

400 MB

Update issue title

  • I have updated the title.

Which indexes

PyPI

About the project

vLLM is a fast and easy-to-use library for LLM inference and serving that already has a file limit increase to 400 MB in issue # 3792. nm-vllm is an enterprise-supported fork of vLLM that requires similar file size because of the amount of compiled kernels.

Reasons for the request

Pre-compiling these kernels means that users can deploy quickly and deterministically, rather than needing to setup a compilation environment where they need to deploy. As we extend our optimized inference for more hardware platforms, the binary size will grow so we would like to follow the standard vLLM sets.

Code of Conduct

  • I agree to follow the PSF Code of Conduct
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant