[Frontend] Support embeddings in the run_batch API #7132

pooyadavoodi · 2024-08-05T00:36:49Z

Currently run_batch which is the offline batching API only supports chat completion. This PR adds embedding supports to run_batch. The PR also adds support for empty lines (or lines with only whitespace) in the input file of run_batch.

github-actions · 2024-08-05T00:37:02Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which consists a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of default ones by unblocking the steps in your fast-check build on Buildkite UI.

Once the PR is approved and ready to go, please make sure to run full CI as it is required to merge (or just use auto-merge).

To run full CI, you can do one of these:

Comment /ready on the PR
Add ready label to the PR
Enable auto-merge.

🚀

Looks like the way CI runs formatting is not totally consistent with simply running format.sh

wuisawesome

Looks great, do you mind adding a simple example to examples/offline_inference_openai.md?

pooyadavoodi · 2024-08-09T03:37:54Z

Looks great, do you mind adding a simple example to examples/offline_inference_openai.md?

Thanks @wuisawesome
I updated the md file.
I wrote that vllm >= 0.5.5 is required as that's the next version that will be released. Is that okay?

wuisawesome · 2024-08-09T16:43:35Z

I wrote that vllm >= 0.5.5 is required as that's the next version that will be released. Is that okay?

Yep!

examples/offline_inference_openai.md

Co-authored-by: Simon Mo <simon.mo@hey.com> Signed-off-by: Alvant <alvasian@yandex.ru>

Co-authored-by: Simon Mo <simon.mo@hey.com> Signed-off-by: LeiWang1999 <leiwang1999@outlook.com>

pooyadavoodi force-pushed the batch_embed branch from 61b92fe to 54acbc9 Compare August 5, 2024 00:51

pooyadavoodi added 10 commits August 9, 2024 01:27

Support embeddings in the run_batch API

175ff1b

change format to make CI happy

f6f9898

Looks like the way CI runs formatting is not totally consistent with simply running format.sh

Fix format

b198857

disable yapf

94c2644

support empty files

60d2031

use url instead of body to check. And move the checks into loop

2b04ed6

support mix of request types

d0b62c0

merge run_rrequest functions

9182cb1

no lazy initialization

ad37a3d

remove request_body

Loading
Loading status checks…

10f4fcb

pooyadavoodi force-pushed the batch_embed branch from bc923d2 to 10f4fcb Compare August 8, 2024 16:53

pooyadavoodi added 2 commits August 9, 2024 01:54

remove note

Loading
Loading status checks…

090d931

fix format

Loading
Loading status checks…

bbc940f

wuisawesome approved these changes Aug 8, 2024

View reviewed changes

add example

Loading
Loading status checks…

b121c96

simon-mo approved these changes Aug 9, 2024

View reviewed changes

examples/offline_inference_openai.md Outdated Show resolved Hide resolved

examples/offline_inference_openai.md Outdated Show resolved Hide resolved

simon-mo added 2 commits August 9, 2024 09:47

Update examples/offline_inference_openai.md

Loading
Loading status checks…

1c3818f

Update examples/offline_inference_openai.md

Loading
Loading status checks…

708c9f7

simon-mo merged commit 249b882 into vllm-project:main Aug 9, 2024
16 of 18 checks passed

pooyadavoodi deleted the batch_embed branch August 9, 2024 17:15

Alvant pushed a commit to compressa-ai/vllm that referenced this pull request Oct 26, 2024

[Frontend] Support embeddings in the run_batch API (vllm-project#7132)

84550f3

Co-authored-by: Simon Mo <simon.mo@hey.com> Signed-off-by: Alvant <alvasian@yandex.ru>

LeiWang1999 pushed a commit to LeiWang1999/vllm-bitblas that referenced this pull request Mar 26, 2025

[Frontend] Support embeddings in the run_batch API (vllm-project#7132)

a194900

Co-authored-by: Simon Mo <simon.mo@hey.com> Signed-off-by: LeiWang1999 <leiwang1999@outlook.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GitHub Sponsors

[Frontend] Support embeddings in the run_batch API #7132

[Frontend] Support embeddings in the run_batch API #7132

pooyadavoodi commented Aug 5, 2024

github-actions bot commented Aug 5, 2024

wuisawesome left a comment

pooyadavoodi commented Aug 9, 2024

wuisawesome commented Aug 9, 2024

[Frontend] Support embeddings in the run_batch API #7132

[Frontend] Support embeddings in the run_batch API #7132

Conversation

pooyadavoodi commented Aug 5, 2024

github-actions bot commented Aug 5, 2024

wuisawesome left a comment

Choose a reason for hiding this comment

pooyadavoodi commented Aug 9, 2024

wuisawesome commented Aug 9, 2024