[BUG] serving pod launched by Arena is not handling SIGTERM signal #1077

TrafalgarZZZ · 2024-04-26T07:24:36Z

I'm running Kserve serving with arena, with the following command:

arena serve kserve \
    --name=qwen \
    --image=vllm/vllm-openai:0.4.1 \
    --gpus=1 \
    --cpu=4 \
    --memory=20Gi \
    --min-replicas 0 \
    --data="llm-model:/mnt/" \
    "python3 -m vllm.entrypoints.openai.api_server --port 8080 --trust-remote-code --served-model-name qwen --model /mnt/models/Qwen-7B-Chat --gpu-memory-utilization 0.95"

When I try to delete the InferenceService created by Arena, I found the pod stuck in Terminating state for a very long time. It seems that my python3 process does not receive SIGTERM signal so the Pod keep terminating until it reaches TerminationGracePeriod which is set to 300s by default.

IMO, ability to handle SIGTERM signal is necessary for serving Pods because they may rely on such signals to stop gracefully (e.g. refuse new coming request and wait for running requests to finish).

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] serving pod launched by Arena is not handling SIGTERM signal #1077

[BUG] serving pod launched by Arena is not handling SIGTERM signal #1077

TrafalgarZZZ commented Apr 26, 2024 •

edited

[BUG] serving pod launched by Arena is not handling SIGTERM signal #1077

[BUG] serving pod launched by Arena is not handling SIGTERM signal #1077

Comments

TrafalgarZZZ commented Apr 26, 2024 • edited

TrafalgarZZZ commented Apr 26, 2024 •

edited