Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] serving pod launched by Arena is not handling SIGTERM signal #1077

Open
TrafalgarZZZ opened this issue Apr 26, 2024 · 0 comments
Open

Comments

@TrafalgarZZZ
Copy link

TrafalgarZZZ commented Apr 26, 2024

I'm running Kserve serving with arena, with the following command:

arena serve kserve \
    --name=qwen \
    --image=vllm/vllm-openai:0.4.1 \
    --gpus=1 \
    --cpu=4 \
    --memory=20Gi \
    --min-replicas 0 \
    --data="llm-model:/mnt/" \
    "python3 -m vllm.entrypoints.openai.api_server --port 8080 --trust-remote-code --served-model-name qwen --model /mnt/models/Qwen-7B-Chat --gpu-memory-utilization 0.95"

When I try to delete the InferenceService created by Arena, I found the pod stuck in Terminating state for a very long time. It seems that my python3 process does not receive SIGTERM signal so the Pod keep terminating until it reaches TerminationGracePeriod which is set to 300s by default.

IMO, ability to handle SIGTERM signal is necessary for serving Pods because they may rely on such signals to stop gracefully (e.g. refuse new coming request and wait for running requests to finish).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant