You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# Apply batch inference for all input data.
ds = ds.map_batches(
LLMPredictor,
# Set the concurrency to the number of LLM instances.
concurrency=10,
# Specify the number of GPUs required per LLM instance.
# NOTE: Do NOT set `num_gpus` when using vLLM with tensor-parallelism
# (i.e., `tensor_parallel_size`).
num_gpus=1,
# Specify the batch size for inference.
batch_size=32,
)
The text was updated successfully, but these errors were encountered:
GodHforever
changed the title
[Usage]: why I can not sey gpu nums while use "tensor_parallel_size"?
[Usage]: why can't I set gpu nums while use "tensor_parallel_size"?
May 17, 2024
Your current environment
How would you like to use vllm
I notice there is a annotation, but why
The text was updated successfully, but these errors were encountered: