(EAI-653): Run LLM-as-a-judge evals on all rated messages #594

mongodben · 2025-01-15T19:10:54Z

Jira: https://jira.mongodb.org/browse/EAI-653

Changes

Run LLM-as-a-judge evals on all rated messages
Standardize updateTracing func for all routes

Notes

Note: there'll be a bit of re-running LLM-as-a-judge evals on rated messages where a message is already LLM as a judge evaluated. Given that approx 2% of messages are rated, and we're running evals on on 10% of all messages. This means that evals will be rerun on .2% of all messages (.1 * .02).

…ssageToConversation.ts

nlarew

LGTM

re: the re-run evals, doesn't seem like a big deal ultimately. That said, should we store info about evals in the conversations collection? e.g. a flag (e.g. Message.hasEval) or pointers to the evals in braintrust (e.g. Message.evals = ["<link to braintrust>"]?

mongodben · 2025-01-23T14:23:19Z

re: the re-run evals, doesn't seem like a big deal ultimately. That said, should we store info about evals in the conversations collection? e.g. a flag (e.g. Message.hasEval) or pointers to the evals in braintrust (e.g. Message.evals = ["<link to braintrust>"]?

thats an interesting idea but i'd rather not so that the tracing/online evals can be fully non-mutative, just exists on top of the existing behavior

mongodben added 25 commits December 11, 2024 16:35

add braintrust to core

Verified

This commit was signed with the committer’s verified signature.

ziegenberg Daniel Ziegenberg

GPG key ID: 7E6F98FFADBEFD39

Verified
Learn about vigilant mode

Loading
Loading status checks…

11a46d9

working server logging instrumentation

Verified

This commit was signed with the committer’s verified signature.

oliverklee Oliver Klee

GPG key ID: E74BB46157AA8FDF

Verified
Learn about vigilant mode

Loading
Loading status checks…

6e8d89c

Merge remote-tracking branch 'upstream/main' into EAI-622

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified
Learn about vigilant mode

51a2a7d

checkpoint

Loading
Loading status checks…

f4fc167

Merge remote-tracking branch 'upstream/main' into EAI-622

Loading
Loading status checks…

49383b6

target more recent version to use es2023

a0199f5

update lib to latest

6b15748

update tracing

Loading
Loading status checks…

b5062e9

lets see

Loading
Loading status checks…

e9ddbd0

braintrust tracing in test

Loading
Loading status checks…

f5b257e

no trace in test

Loading
Loading status checks…

e7bcf50

tracing tests

Loading
Loading status checks…

47825b3

up timeout on flaky test

Loading
Loading status checks…

d6ac82c

chatbot staging

Loading
Loading status checks…

378078f

add basic LLM as a judge tracing

9bfe579

tracing w/ LLM as a judge

Loading
Loading status checks…

7eba328

add judge vars to drone

Loading
Loading status checks…

67cf129

standardize custom tracing

Loading
Loading status checks…

b9e46c6

additional clean up

Loading
Loading status checks…

e4b5b33

remove unused import

Loading
Loading status checks…

3238029

clean up

Loading
Loading status checks…

1e478cd

Merge remote-tracking branch 'upstream/main' into EAI-625

Loading
Loading status checks…

7ccae6c

implement NL feedback

Loading
Loading status checks…

cbd20db

Merge remote-tracking branch 'upstream/EAI-625' into EAI-653

Loading
Loading status checks…

ddcbd5a

mongodben changed the base branch from main to EAI-625 January 16, 2025 21:37

Base automatically changed from EAI-625 to main January 21, 2025 16:26

mongodben added 2 commits January 21, 2025 11:32

Merge remote-tracking branch 'upstream/main' into EAI-653

Loading
Loading status checks…

fbf441b

add user message rating eval

Loading
Loading status checks…

65cb023

mongodben marked this pull request as ready for review January 21, 2025 17:36

mongodben added 4 commits January 21, 2025 12:37

remove unused tests

Loading
Loading status checks…

9649c86

trace correct route

Loading
Loading status checks…

ea0b7f9

remove console.log

Loading
Loading status checks…

777b40e

remove trace

Loading
Loading status checks…

e314ef9

nlarew approved these changes Jan 22, 2025

View reviewed changes

mongodben merged commit 0bf7a5c into main Jan 23, 2025
1 check passed

mongodben deleted the EAI-653 branch January 23, 2025 14:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(EAI-653): Run LLM-as-a-judge evals on all rated messages #594

(EAI-653): Run LLM-as-a-judge evals on all rated messages #594

mongodben commented Jan 15, 2025 •

edited

Loading

nlarew left a comment

mongodben commented Jan 23, 2025

(EAI-653): Run LLM-as-a-judge evals on all rated messages #594

(EAI-653): Run LLM-as-a-judge evals on all rated messages #594

Conversation

mongodben commented Jan 15, 2025 • edited Loading

Changes

Notes

nlarew left a comment

Choose a reason for hiding this comment

mongodben commented Jan 23, 2025

mongodben commented Jan 15, 2025 •

edited

Loading