Skip to content

Commit

Permalink
Docs: Revamp Extraction Use Case (langchain-ai#18588)
Browse files Browse the repository at this point in the history
Revamp the extraction use case documentation

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
  • Loading branch information
2 people authored and gkorland committed Mar 30, 2024
1 parent 78e7630 commit a18cdef
Show file tree
Hide file tree
Showing 9 changed files with 1,900 additions and 831 deletions.
831 changes: 0 additions & 831 deletions docs/docs/use_cases/extraction.ipynb

This file was deleted.

68 changes: 68 additions & 0 deletions docs/docs/use_cases/extraction/guidelines.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
{
"cells": [
{
"cell_type": "raw",
"id": "913dd5a2-24d1-4f8e-bc15-ab518483eef9",
"metadata": {},
"source": [
"---\n",
"title: Guidelines\n",
"sidebar_position: 5\n",
"---"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "9e161a8a-fcf0-4d55-933e-da271ce28d7e",
"metadata": {},
"source": [
"The quality of extraction results depends on many factors. \n",
"\n",
"Here is a set of guidelines to help you squeeze out the best performance from your models:\n",
"\n",
"* Set the model temperature to `0`.\n",
"* Improve the prompt. The prompt should be precise and to the point.\n",
"* Document the schema: Make sure the schema is documented to provide more information to the LLM.\n",
"* Provide reference examples! Diverse examples can help, including examples where nothing should be extracted.\n",
"* If you have a lot of examples, use a retriever to retrieve the most relevant examples.\n",
"* Benchmark with the best available LLM/Chat Model (e.g., gpt-4, claude-3, etc) -- check with the model provider which one is the latest and greatest!\n",
"* If the schema is very large, try breaking it into multiple smaller schemas, run separate extractions and merge the results.\n",
"* Make sure that the schema allows the model to REJECT extracting information. If it doesn't, the model will be forced to make up information!\n",
"* Add verification/correction steps (ask an LLM to correct or verify the results of the extraction).\n",
"\n",
"## Benchmark\n",
"\n",
"* Create and benchmark data for your use case using [LangSmith 🦜️🛠️](https://docs.smith.langchain.com/).\n",
"* Is your LLM good enough? Use [langchain-benchmarks 🦜💯 ](https://github.com/langchain-ai/langchain-benchmarks) to test out your LLM using existing datasets.\n",
"\n",
"## Keep in mind! 😶‍🌫️\n",
"\n",
"* LLMs are great, but are not required for all cases! If you’re extracting information from a single structured source (e.g., linkedin), using an LLM is not a good idea – traditional web-scraping will be much cheaper and reliable.\n",
"\n",
"* **human in the loop** If you need **perfect quality**, you'll likely need to plan on having a human in the loop -- even the best LLMs will make mistakes when dealing with complex extraction tasks."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.2"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
2 changes: 2 additions & 0 deletions docs/docs/use_cases/extraction/how_to/_category_.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
label: 'How-To Guides'
position: 1

0 comments on commit a18cdef

Please sign in to comment.