Docs: Revamp Extraction Use Case (langchain-ai#18588)

Revamp the extraction use case documentation --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
FalkorDB · Mar 30, 2024 · a18cdef · a18cdef
1 parent 78e7630
commit a18cdef
Show file tree

Hide file tree

Showing 9 changed files with 1,900 additions and 831 deletions.
diff --git a/docs/docs/use_cases/extraction.ipynb b/docs/docs/use_cases/extraction.ipynb
diff --git a/docs/docs/use_cases/extraction/guidelines.ipynb b/docs/docs/use_cases/extraction/guidelines.ipynb
@@ -0,0 +1,68 @@
+{
+ "cells": [
+  {
+   "cell_type": "raw",
+   "id": "913dd5a2-24d1-4f8e-bc15-ab518483eef9",
+   "metadata": {},
+   "source": [
+    "---\n",
+    "title: Guidelines\n",
+    "sidebar_position: 5\n",
+    "---"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "id": "9e161a8a-fcf0-4d55-933e-da271ce28d7e",
+   "metadata": {},
+   "source": [
+    "The quality of extraction results depends on many factors. \n",
+    "\n",
+    "Here is a set of guidelines to help you squeeze out the best performance from your models:\n",
+    "\n",
+    "* Set the model temperature to `0`.\n",
+    "* Improve the prompt. The prompt should be precise and to the point.\n",
+    "* Document the schema: Make sure the schema is documented to provide more information to the LLM.\n",
+    "* Provide reference examples! Diverse examples can help, including examples where nothing should be extracted.\n",
+    "* If you have a lot of examples, use a retriever to retrieve the most relevant examples.\n",
+    "* Benchmark with the best available LLM/Chat Model (e.g., gpt-4, claude-3, etc) -- check with the model provider which one is the latest and greatest!\n",
+    "* If the schema is very large, try breaking it into multiple smaller schemas, run separate extractions and merge the results.\n",
+    "* Make sure that the schema allows the model to REJECT extracting information. If it doesn't, the model will be forced to make up information!\n",
+    "* Add verification/correction steps (ask an LLM to correct or verify the results of the extraction).\n",
+    "\n",
+    "## Benchmark\n",
+    "\n",
+    "* Create and benchmark data for your use case using [LangSmith 🦜️🛠️](https://docs.smith.langchain.com/).\n",
+    "* Is your LLM good enough? Use [langchain-benchmarks 🦜💯 ](https://github.com/langchain-ai/langchain-benchmarks) to test out your LLM using existing datasets.\n",
+    "\n",
+    "## Keep in mind! 😶‍🌫️\n",
+    "\n",
+    "* LLMs are great, but are not required for all cases! If you’re extracting information from a single structured source (e.g., linkedin), using an LLM is not a good idea – traditional web-scraping will be much cheaper and reliable.\n",
+    "\n",
+    "* **human in the loop** If you need **perfect quality**, you'll likely need to plan on having a human in the loop -- even the best LLMs will make mistakes when dealing with complex extraction tasks."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.2"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/docs/docs/use_cases/extraction/how_to/_category_.yml b/docs/docs/use_cases/extraction/how_to/_category_.yml
@@ -0,0 +1,2 @@
+label: 'How-To Guides'
+position: 1