-
Notifications
You must be signed in to change notification settings - Fork 13.6k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Docs: Revamp Extraction Use Case (#18588)
Revamp the extraction use case documentation --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
- Loading branch information
Showing
9 changed files
with
1,900 additions
and
831 deletions.
There are no files selected for viewing
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,68 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "raw", | ||
"id": "913dd5a2-24d1-4f8e-bc15-ab518483eef9", | ||
"metadata": {}, | ||
"source": [ | ||
"---\n", | ||
"title: Guidelines\n", | ||
"sidebar_position: 5\n", | ||
"---" | ||
] | ||
}, | ||
{ | ||
"attachments": {}, | ||
"cell_type": "markdown", | ||
"id": "9e161a8a-fcf0-4d55-933e-da271ce28d7e", | ||
"metadata": {}, | ||
"source": [ | ||
"The quality of extraction results depends on many factors. \n", | ||
"\n", | ||
"Here is a set of guidelines to help you squeeze out the best performance from your models:\n", | ||
"\n", | ||
"* Set the model temperature to `0`.\n", | ||
"* Improve the prompt. The prompt should be precise and to the point.\n", | ||
"* Document the schema: Make sure the schema is documented to provide more information to the LLM.\n", | ||
"* Provide reference examples! Diverse examples can help, including examples where nothing should be extracted.\n", | ||
"* If you have a lot of examples, use a retriever to retrieve the most relevant examples.\n", | ||
"* Benchmark with the best available LLM/Chat Model (e.g., gpt-4, claude-3, etc) -- check with the model provider which one is the latest and greatest!\n", | ||
"* If the schema is very large, try breaking it into multiple smaller schemas, run separate extractions and merge the results.\n", | ||
"* Make sure that the schema allows the model to REJECT extracting information. If it doesn't, the model will be forced to make up information!\n", | ||
"* Add verification/correction steps (ask an LLM to correct or verify the results of the extraction).\n", | ||
"\n", | ||
"## Benchmark\n", | ||
"\n", | ||
"* Create and benchmark data for your use case using [LangSmith 🦜️🛠️](https://docs.smith.langchain.com/).\n", | ||
"* Is your LLM good enough? Use [langchain-benchmarks 🦜💯 ](https://github.com/langchain-ai/langchain-benchmarks) to test out your LLM using existing datasets.\n", | ||
"\n", | ||
"## Keep in mind! 😶🌫️\n", | ||
"\n", | ||
"* LLMs are great, but are not required for all cases! If you’re extracting information from a single structured source (e.g., linkedin), using an LLM is not a good idea – traditional web-scraping will be much cheaper and reliable.\n", | ||
"\n", | ||
"* **human in the loop** If you need **perfect quality**, you'll likely need to plan on having a human in the loop -- even the best LLMs will make mistakes when dealing with complex extraction tasks." | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3 (ipykernel)", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.11.2" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 5 | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
label: 'How-To Guides' | ||
position: 1 |
Oops, something went wrong.