Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs: Revamp Extraction Use Case #18588

Merged
merged 31 commits into from
Mar 6, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
831 changes: 0 additions & 831 deletions docs/docs/use_cases/extraction.ipynb

This file was deleted.

68 changes: 68 additions & 0 deletions docs/docs/use_cases/extraction/guidelines.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
{
"cells": [
{
"cell_type": "raw",
"id": "913dd5a2-24d1-4f8e-bc15-ab518483eef9",
"metadata": {},
"source": [
"---\n",
"title: Guidelines\n",
"sidebar_position: 5\n",
"---"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "9e161a8a-fcf0-4d55-933e-da271ce28d7e",
"metadata": {},
"source": [
"The quality of extraction results depends on many factors. \n",
"\n",
"Here is a set of guidelines to help you squeeze out the best performance from your models:\n",
"\n",
"* Set the model temperature to `0`.\n",
"* Improve the prompt. The prompt should be precise and to the point.\n",
"* Document the schema: Make sure the schema is documented to provide more information to the LLM.\n",
"* Provide reference examples! Diverse examples can help, including examples where nothing should be extracted.\n",
"* If you have a lot of examples, use a retriever to retrieve the most relevant examples.\n",
"* Benchmark with the best available LLM/Chat Model (e.g., gpt-4, claude-3, etc) -- check with the model provider which one is the latest and greatest!\n",
"* If the schema is very large, try breaking it into multiple smaller schemas, run separate extractions and merge the results.\n",
"* Make sure that the schema allows the model to REJECT extracting information. If it doesn't, the model will be forced to make up information!\n",
"* Add verification/correction steps (ask an LLM to correct or verify the results of the extraction).\n",
"\n",
"## Benchmark\n",
"\n",
"* Create and benchmark data for your use case using [LangSmith 🦜️🛠️](https://docs.smith.langchain.com/).\n",
"* Is your LLM good enough? Use [langchain-benchmarks 🦜💯 ](https://github.com/langchain-ai/langchain-benchmarks) to test out your LLM using existing datasets.\n",
"\n",
"## Keep in mind! 😶‍🌫️\n",
"\n",
"* LLMs are great, but are not required for all cases! If you’re extracting information from a single structured source (e.g., linkedin), using an LLM is not a good idea – traditional web-scraping will be much cheaper and reliable.\n",
"\n",
"* **human in the loop** If you need **perfect quality**, you'll likely need to plan on having a human in the loop -- even the best LLMs will make mistakes when dealing with complex extraction tasks."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.2"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
2 changes: 2 additions & 0 deletions docs/docs/use_cases/extraction/how_to/_category_.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
label: 'How-To Guides'
hwchase17 marked this conversation as resolved.
Show resolved Hide resolved
position: 1