Would it be possible to support `n_probs` / `logprobs` in chat completion API? #409

cbowdon · 2024-05-10T08:44:44Z

Hi, first of all thank you so much for llamafile. I am very conscious of data privacy and wary of being locked-in to OpenAI, so llamafile is amazing.

There is a small disparity between the /completions endpoint and /v1/chat/completions, which is that the latter doesn't seem to support n_probs.

Here's an example of `n_probs` not working.

def chat(prompt):
    res = httpx.post(
        "http://localhost:8080/v1/chat/completions",
        json={
            "model": "LLaMA_CPP",
            "messages": [
                {"role": "user", "content": prompt}
            ],
            "n_predict": 1,
            "n_probs": 3,            
        },
        timeout=30
    )
    data = res.json()
    return data
chat("Say 'true'. Just say 'true'. Do not say anything except 'true'.")
Output
{'choices': [{'finish_reason': 'stop',

'index': 0,

'message': {'content': 'true', 'role': 'assistant'}}],

'created': 1715328516,

'id': 'chatcmpl-1aS707tCzO40Q2gIt1zZEYfg7etlQSMZ',

'model': 'LLaMA_CPP',

'object': 'chat.completion',

'usage': {'completion_tokens': 7, 'prompt_tokens': 46, 'total_tokens': 53}}

Sadly the OpenAI logprobs and top_logprobs parameters didn't work either. It looks like it's because they are not mapped here in the OpenAI compatibility function:

https://github.com/Mozilla-Ocho/llamafile/blob/main/llama.cpp/server/oai.h#L20

I'm not brave or competent enough to try and make the change myself, but I think the necessary logic would be:

    bool logprobs = json_value(body, "logprobs", false);
    int top_logprobs = json_value(body, "top_logprobs", 0);

    int n_probs;
    if (top_logprobs > 0) {
        n_probs = top_logprobs;
    } else if (logprobs == true) {
        n_probs = 1;
    } else {
        n_probs = 0;
    }

    llama_params["n_probs"]         = n_probs;

That should emulate the OpenAI behaviour described here.

Please consider supporting this as it would be very convenient. Manually calling /completion with chat templates is how I'm working around it at the moment.

The text was updated successfully, but these errors were encountered:

mofosyne added question enhancement labels May 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Would it be possible to support `n_probs` / `logprobs` in chat completion API? #409

Would it be possible to support `n_probs` / `logprobs` in chat completion API? #409

cbowdon commented May 10, 2024

Output

Would it be possible to support n_probs / logprobs in chat completion API? #409

Would it be possible to support n_probs / logprobs in chat completion API? #409

Comments

cbowdon commented May 10, 2024

Output

Would it be possible to support `n_probs` / `logprobs` in chat completion API? #409

Would it be possible to support `n_probs` / `logprobs` in chat completion API? #409