[Misc]: can vllm support long content inference like 800k #4909

yunll · 2024-05-19T15:40:52Z

we will finetune a 70B model that support long content with 800k, can vllm support to inference this model?

mgoin · 2024-05-20T13:02:53Z

Hi @yunll, yes if you have enough GPU memory available for a context length that large it will run. I tested Mistral 128k last week and was able to use its full length. Model for reference: https://huggingface.co/NousResearch/Yarn-Mistral-7b-128k

yunll added the misc label May 19, 2024

mgoin closed this as completed May 20, 2024

Provide feedback