Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Would it be possible to support n_probs / logprobs in chat completion API? #409

Open
cbowdon opened this issue May 10, 2024 · 0 comments
Open

Comments

@cbowdon
Copy link

cbowdon commented May 10, 2024

Hi, first of all thank you so much for llamafile. I am very conscious of data privacy and wary of being locked-in to OpenAI, so llamafile is amazing.

There is a small disparity between the /completions endpoint and /v1/chat/completions, which is that the latter doesn't seem to support n_probs.

Here's an example of `n_probs` not working.
def chat(prompt):
    res = httpx.post(
        "http://localhost:8080/v1/chat/completions",
        json={
            "model": "LLaMA_CPP",
            "messages": [
                {"role": "user", "content": prompt}
            ],
            "n_predict": 1,
            "n_probs": 3,            
        },
        timeout=30
    )
    data = res.json()
    return data

chat("Say 'true'. Just say 'true'. Do not say anything except 'true'.")

Output

{'choices': [{'finish_reason': 'stop',
'index': 0,
'message': {'content': 'true', 'role': 'assistant'}}],
'created': 1715328516,
'id': 'chatcmpl-1aS707tCzO40Q2gIt1zZEYfg7etlQSMZ',
'model': 'LLaMA_CPP',
'object': 'chat.completion',
'usage': {'completion_tokens': 7, 'prompt_tokens': 46, 'total_tokens': 53}}

Sadly the OpenAI logprobs and top_logprobs parameters didn't work either. It looks like it's because they are not mapped here in the OpenAI compatibility function:

https://github.com/Mozilla-Ocho/llamafile/blob/main/llama.cpp/server/oai.h#L20

I'm not brave or competent enough to try and make the change myself, but I think the necessary logic would be:

    bool logprobs = json_value(body, "logprobs", false);
    int top_logprobs = json_value(body, "top_logprobs", 0);

    int n_probs;
    if (top_logprobs > 0) {
        n_probs = top_logprobs;
    } else if (logprobs == true) {
        n_probs = 1;
    } else {
        n_probs = 0;
    }

    llama_params["n_probs"]         = n_probs;

That should emulate the OpenAI behaviour described here.

Please consider supporting this as it would be very convenient. Manually calling /completion with chat templates is how I'm working around it at the moment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants