-
Notifications
You must be signed in to change notification settings - Fork 69
(EAI-506): Refactor full response generation as stateless function #489
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
const miscCases = getConversationsEvalCasesFromYaml( | ||
fs.readFileSync(path.resolve(basePath, "conversations.yml"), "utf8") | ||
); | ||
const faqCases = getConversationsEvalCasesFromYaml( | ||
fs.readFileSync(path.resolve(basePath, "faq_conversations.yml"), "utf8") | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When do you see us putting a set of evaluation cases in this file vs the processors/*.eval.ts
approach?
I'm a bit confused on the name of the file too. Could we call this e.g. conversations.eval.ts
or is there something particular about config.ts
that we're evaluating here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When do you see us putting a set of evaluation cases in this file vs the
processors/*.eval.ts
approach?
i see this file as our "end-to-end" evaluation.
in contrast, the processors/*.eval.ts
are for each AI component. i see these as more like "unit" evaluations.
I'm a bit confused on the name of the file too. Could we call this e.g.
conversations.eval.ts
or is there something particular aboutconfig.ts
that we're evaluating here?
we can def change the name. i was also not loving the name. just put this as it is b/c it's largely evaluating the stuff in the config.ts
file. i think conversations.eval.ts
makes sense here.
more philosophically, i think there's a larger conversation that we ought to have about:
- how to programmatically run the evals
- how to structure evals for different features. for example for the OM/CM stuff your doing, i think we agree that we shoudl accompany that work w/ some evals. but where do we put them? in this
conversations.evals.ts
file/evaluation? create a separate file, etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my $.02 is put them in the same yaml files and include tags for the relevant products, run the whole eval suite in the conversations.evals.ts
file. see how we perform on the new eval cases and make sure there's no regression elsewhere.
const ConversationContextRelevancy: ConversationEvalScorer = async (args) => { | ||
return ContextRelevancy(getConversationRagasConfig(args, judgeModelConfig)); | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LLM rate limits hit (HTTP status 429
). the error is somewhere in the UI, but you have to look around/scroll a bit for it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a blocker for this PR - do we have retries for this? maybe a good future work EAI?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by not running tests in parallel we should be able to avoid this...fairly annoying for spikey workloads like this but ai magic aint infinite (yet)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM % merge conflict fix & some non-blocking discussion points, maybe we can talk about them in standup soon
Co-authored-by: Nick Larew <nick.larew@mongodb.com>
Jira: https://jira.mongodb.org/browse/EAI-506
Changes
"^0.1.0"
b/c it was causing issues w/ braintrust CLI (see CLI Error:[ERROR] Could not resolve "readline/promises"
braintrustdata/braintrust-sdk#356)Notes