feat: add bedrock anthropic for token usage counting

feat: update fix: fix spell docs: remove unnecessary args from the pip install (langchain-ai#19823) **Description:** An additional `U` argument was added for the instructions to install the pip packages for the MediaWiki Dump Document loader which was leading to error in installing the package. Removing the argument fixed the command to install. **Issue:** langchain-ai#19820 **Dependencies:** No dependency change requierd **Twitter handle:** [@vardhaman722](https://twitter.com/vardhaman722) Update cross_encoder_reranker.ipynb (langchain-ai#19846) HuggingFace -> Hugging Face core: generate mermaid syntax and render visual graph (langchain-ai#19599) - **Description:** Add functionality to generate Mermaid syntax and render flowcharts from graph data. This includes support for custom node colors and edge curve styles, as well as the ability to export the generated graphs to PNG images using either the Mermaid.INK API or Pyppeteer for local rendering. - **Dependencies:** Optional dependencies are `pyppeteer` if rendering wants to be done using Pypeteer and Javascript code. --------- Co-authored-by: Angel Igareta <angel.igareta@klarna.com> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> feat: fix feat: update fix: fix lint fix: fix lint fix: fix lint ai21[patch]: release 0.1.3 (langchain-ai#19867) 👥 Update LangChain people data (langchain-ai#19858) 👥 Update LangChain people data Co-authored-by: github-actions <github-actions@github.com> community[patch]: Revert " Fix the bug that Chroma does not specify `e… (langchain-ai#19866) …mbedding_function` (langchain-ai#19277)" This reverts commit 7042934. Fixes langchain-ai#19848 fix: fix lint
Sukitly · Apr 1, 2024 · ffb2ccb · ffb2ccb
1 parent 003c98e
commit ffb2ccb
Show file tree

Hide file tree

Showing 13 changed files with 1,530 additions and 558 deletions.
diff --git a/docs/data/people.yml b/docs/data/people.yml
diff --git a/docs/docs/integrations/document_loaders/mediawikidump.ipynb b/docs/docs/integrations/document_loaders/mediawikidump.ipynb
@@ -24,10 +24,10 @@
    "outputs": [],
    "source": [
     "# mediawiki-utilities supports XML schema 0.11 in unmerged branches\n",
-    "%pip install --upgrade --quiet  U git+https://github.com/mediawiki-utilities/python-mwtypes@updates_schema_0.11\n",
+    "%pip install --upgrade --quiet git+https://github.com/mediawiki-utilities/python-mwtypes@updates_schema_0.11\n",
     "# mediawiki-utilities mwxml has a bug, fix PR pending\n",
-    "%pip install --upgrade --quiet  U git+https://github.com/gdedrouas/python-mwxml@xml_format_0.11\n",
-    "%pip install --upgrade --quiet  U mwparserfromhell"
+    "%pip install --upgrade --quiet git+https://github.com/gdedrouas/python-mwxml@xml_format_0.11\n",
+    "%pip install --upgrade --quiet mwparserfromhell"
    ]
   },
   {

diff --git a/docs/docs/integrations/document_transformers/cross_encoder_reranker.ipynb b/docs/docs/integrations/document_transformers/cross_encoder_reranker.ipynb
@@ -7,11 +7,11 @@
    "source": [
     "# Cross Encoder Reranker\n",
     "\n",
-    "This notebook shows how to implement reranker in a retriever with your own cross encoder from [HuggingFace cross encoder models](https://huggingface.co/cross-encoder) or HuggingFace models that implements cross encoder function ([example: BAAI/bge-reranker-base](https://huggingface.co/BAAI/bge-reranker-base)). `SagemakerEndpointCrossEncoder` enables you to use these HuggingFace models loaded on Sagemaker.\n",
+    "This notebook shows how to implement reranker in a retriever with your own cross encoder from [Hugging Face cross encoder models](https://huggingface.co/cross-encoder) or Hugging Face models that implements cross encoder function ([example: BAAI/bge-reranker-base](https://huggingface.co/BAAI/bge-reranker-base)). `SagemakerEndpointCrossEncoder` enables you to use these HuggingFace models loaded on Sagemaker.\n",
     "\n",
     "This builds on top of ideas in the [ContextualCompressionRetriever](/docs/modules/data_connection/retrievers/contextual_compression/). Overall structure of this document came from [Cohere Reranker documentation](/docs/integrations/retrievers/cohere-reranker.ipynb).\n",
     "\n",
-    "For more about why cross encoder can be used as reranking mechanism in conjunction with embeddings for better retrieval, refer to [HuggingFace Cross-Encoders documentation](https://www.sbert.net/examples/applications/cross-encoder/README.html)."
+    "For more about why cross encoder can be used as reranking mechanism in conjunction with embeddings for better retrieval, refer to [Hugging Face Cross-Encoders documentation](https://www.sbert.net/examples/applications/cross-encoder/README.html)."
    ]
   },
   {
@@ -173,11 +173,11 @@
    "id": "419a2bf3-de4b-4c4d-9a40-4336552f604c",
    "metadata": {},
    "source": [
-    "## Uploading HuggingFace model to SageMaker endpoint\n",
+    "## Uploading Hugging Face model to SageMaker endpoint\n",
     "\n",
     "Refer to [this article](https://www.philschmid.de/custom-inference-huggingface-sagemaker) for general guideline. Here is a simple `inference.py` for creating an endpoint that works with `SagemakerEndpointCrossEncoder`.\n",
     "\n",
-    "It downloads HuggingFace model on the fly, so you do not need to keep the model artifacts such as `pytorch_model.bin` in your `model.tar.gz`."
+    "It downloads Hugging Face model on the fly, so you do not need to keep the model artifacts such as `pytorch_model.bin` in your `model.tar.gz`."
    ]
   },
   {

diff --git a/libs/community/langchain_community/callbacks/bedrock_anthropic_callback.py b/libs/community/langchain_community/callbacks/bedrock_anthropic_callback.py
@@ -0,0 +1,108 @@
+import threading
+from typing import Any, Dict, List, Union
+
+from langchain_core.callbacks import BaseCallbackHandler
+from langchain_core.outputs import LLMResult
+
+MODEL_COST_PER_1K_INPUT_TOKENS = {
+    "anthropic.claude-instant-v1": 0.0008,
+    "anthropic.claude-v2": 0.008,
+    "anthropic.claude-v2:1": 0.008,
+    "anthropic.claude-3-sonnet-20240229-v1:0": 0.003,
+    "anthropic.claude-3-haiku-20240307-v1:0": 0.00025,
+}
+
+MODEL_COST_PER_1K_OUTPUT_TOKENS = {
+    "anthropic.claude-instant-v1": 0.0024,
+    "anthropic.claude-v2": 0.024,
+    "anthropic.claude-v2:1": 0.024,
+    "anthropic.claude-3-sonnet-20240229-v1:0": 0.015,
+    "anthropic.claude-3-haiku-20240307-v1:0": 0.00125,
+}
+
+
+def _get_anthropic_claude_token_cost(
+    prompt_tokens: int, completion_tokens: int, model_id: Union[str, None]
+) -> float:
+    """Get the cost of tokens for the Claude model."""
+    if not model_id:
+        raise ValueError("Model name is required to calculate cost.")
+    return (prompt_tokens / 1000) * MODEL_COST_PER_1K_INPUT_TOKENS[model_id] + (
+        completion_tokens / 1000
+    ) * MODEL_COST_PER_1K_OUTPUT_TOKENS[model_id]
+
+
+class BedrockAnthropicTokenUsageCallbackHandler(BaseCallbackHandler):
+    """Callback Handler that tracks bedrock anthropic info."""
+
+    total_tokens: int = 0
+    prompt_tokens: int = 0
+    completion_tokens: int = 0
+    successful_requests: int = 0
+    total_cost: float = 0.0
+
+    def __init__(self) -> None:
+        super().__init__()
+        self._lock = threading.Lock()
+
+    def __repr__(self) -> str:
+        return (
+            f"Tokens Used: {self.total_tokens}\n"
+            f"\tPrompt Tokens: {self.prompt_tokens}\n"
+            f"\tCompletion Tokens: {self.completion_tokens}\n"
+            f"Successful Requests: {self.successful_requests}\n"
+            f"Total Cost (USD): ${self.total_cost}"
+        )
+
+    @property
+    def always_verbose(self) -> bool:
+        """Whether to call verbose callbacks even if verbose is False."""
+        return True
+
+    def on_llm_start(
+        self, serialized: Dict[str, Any], prompts: List[str], **kwargs: Any
+    ) -> None:
+        """Print out the prompts."""
+        pass
+
+    def on_llm_new_token(self, token: str, **kwargs: Any) -> None:
+        """Print out the token."""
+        pass
+
+    def on_llm_end(self, response: LLMResult, **kwargs: Any) -> None:
+        """Collect token usage."""
+        if response.llm_output is None:
+            return None
+
+        if "usage" not in response.llm_output:
+            with self._lock:
+                self.successful_requests += 1
+            return None
+
+        # compute tokens and cost for this request
+        token_usage = response.llm_output["usage"]
+        completion_tokens = token_usage.get("completion_tokens", 0)
+        prompt_tokens = token_usage.get("prompt_tokens", 0)
+        total_tokens = token_usage.get("total_tokens", 0)
+        model_id = response.llm_output.get("model_id", None)
+        total_cost = _get_anthropic_claude_token_cost(
+            prompt_tokens=prompt_tokens,
+            completion_tokens=completion_tokens,
+            model_id=model_id,
+        )
+
+        # update shared state behind lock
+        with self._lock:
+            self.total_cost += total_cost
+            self.total_tokens += total_tokens
+            self.prompt_tokens += prompt_tokens
+            self.completion_tokens += completion_tokens
+            self.successful_requests += 1
+
+    def __copy__(self) -> "BedrockAnthropicTokenUsageCallbackHandler":
+        """Return a copy of the callback handler."""
+        return self
+
+    def __deepcopy__(self, memo: Any) -> "BedrockAnthropicTokenUsageCallbackHandler":
+        """Return a deep copy of the callback handler."""
+        return self
diff --git a/libs/community/langchain_community/callbacks/manager.py b/libs/community/langchain_community/callbacks/manager.py
@@ -10,6 +10,9 @@
 
 from langchain_core.tracers.context import register_configure_hook
 
+from langchain_community.callbacks.bedrock_anthropic_callback import (
+    BedrockAnthropicTokenUsageCallbackHandler,
+)
 from langchain_community.callbacks.openai_info import OpenAICallbackHandler
 from langchain_community.callbacks.tracers.comet import CometTracer
 from langchain_community.callbacks.tracers.wandb import WandbTracer
@@ -19,14 +22,18 @@
 openai_callback_var: ContextVar[Optional[OpenAICallbackHandler]] = ContextVar(
     "openai_callback", default=None
 )
-wandb_tracing_callback_var: ContextVar[Optional[WandbTracer]] = ContextVar(  # noqa: E501
+bedrock_anthropic_callback_var: (ContextVar)[
+    Optional[BedrockAnthropicTokenUsageCallbackHandler]
+] = ContextVar("bedrock_anthropic_callback", default=None)
+wandb_tracing_callback_var: ContextVar[Optional[WandbTracer]] = ContextVar(
     "tracing_wandb_callback", default=None
 )
 comet_tracing_callback_var: ContextVar[Optional[CometTracer]] = ContextVar(  # noqa: E501
     "tracing_comet_callback", default=None
 )
 
 register_configure_hook(openai_callback_var, True)
+register_configure_hook(bedrock_anthropic_callback_var, True)
 register_configure_hook(
     wandb_tracing_callback_var, True, WandbTracer, "LANGCHAIN_WANDB_TRACING"
 )
@@ -53,6 +60,26 @@ def get_openai_callback() -> Generator[OpenAICallbackHandler, None, None]:
     openai_callback_var.set(None)
 
 
+@contextmanager
+def get_bedrock_anthropic_callback() -> \
+        Generator[BedrockAnthropicTokenUsageCallbackHandler, None, None]:
+    """Get the Bedrock anthropic callback handler in a context manager.
+    which conveniently exposes token and cost information.
+
+    Returns:
+        BedrockAnthropicTokenUsageCallbackHandler:
+            The Bedrock anthropic callback handler.
+
+    Example:
+        >>> with get_bedrock_anthropic_callback() as cb:
+        ...     # Use the Bedrock anthropic callback handler
+    """
+    cb = BedrockAnthropicTokenUsageCallbackHandler()
+    bedrock_anthropic_callback_var.set(cb)
+    yield cb
+    bedrock_anthropic_callback_var.set(None)
+
+
 @contextmanager
 def wandb_tracing_enabled(
     session_name: str = "default",

diff --git a/libs/community/langchain_community/chat_models/bedrock.py b/libs/community/langchain_community/chat_models/bedrock.py
@@ -308,7 +308,7 @@ def _combine_llm_outputs(self, llm_outputs: List[Optional[dict]]) -> dict:
         final_output = {}
         for output in llm_outputs:
             output = output or {}
-            usage = output.pop("usage", {})
+            usage = output.get("usage", {})
             for token_type, token_count in usage.items():
                 final_usage[token_type] += token_count
             final_output.update(output)

diff --git a/libs/community/langchain_community/vectorstores/chroma.py b/libs/community/langchain_community/vectorstores/chroma.py
@@ -28,7 +28,6 @@
     import chromadb.config
     from chromadb.api.types import ID, OneOrMany, Where, WhereDocument
 
-
 logger = logging.getLogger()
 DEFAULT_K = 4  # Number of Documents to return.
 
@@ -81,7 +80,6 @@ def __init__(
         try:
             import chromadb
             import chromadb.config
-            from chromadb.utils import embedding_functions
         except ImportError:
             raise ImportError(
                 "Could not import chromadb python package. "
@@ -124,12 +122,10 @@ def __init__(
                 _client_settings.persist_directory or persist_directory
             )
 
-        self._embedding_function = (
-            embedding_function or embedding_functions.DefaultEmbeddingFunction()
-        )
+        self._embedding_function = embedding_function
         self._collection = self._client.get_or_create_collection(
             name=collection_name,
-            embedding_function=self._embedding_function,
+            embedding_function=None,
             metadata=collection_metadata,
         )
         self.override_relevance_score_fn = relevance_score_fn

diff --git a/libs/community/tests/unit_tests/callbacks/test_callback_manager.py b/libs/community/tests/unit_tests/callbacks/test_callback_manager.py
@@ -7,6 +7,7 @@
 from langchain_core.tracers.langchain import LangChainTracer, wait_for_all_tracers
 
 from langchain_community.callbacks import get_openai_callback
+from langchain_community.callbacks.manager import get_bedrock_anthropic_callback
 from langchain_community.llms.openai import BaseOpenAI
 
 
@@ -77,6 +78,37 @@ def test_callback_manager_configure_context_vars(
                     )
                     mngr.on_llm_start({}, ["prompt"])[0].on_llm_end(response)
 
+                    # The callback handler has been updated
+                    assert cb.successful_requests == 1
+                    assert cb.total_tokens == 3
+                    assert cb.prompt_tokens == 2
+                    assert cb.completion_tokens == 1
+                    assert cb.total_cost > 0
+
+                with get_bedrock_anthropic_callback() as cb:
+                    # This is a new empty callback handler
+                    assert cb.successful_requests == 0
+                    assert cb.total_tokens == 0
+
+                    # configure adds this bedrock anthropic cb,
+                    # but doesn't modify the group manager
+                    mngr = CallbackManager.configure(group_manager)
+                    assert mngr.handlers == [tracer, cb]
+                    assert group_manager.handlers == [tracer]
+
+                    response = LLMResult(
+                        generations=[],
+                        llm_output={
+                            "usage": {
+                                "prompt_tokens": 2,
+                                "completion_tokens": 1,
+                                "total_tokens": 3,
+                            },
+                            "model_id": "anthropic.claude-instant-v1"
+                        },
+                    )
+                    mngr.on_llm_start({}, ["prompt"])[0].on_llm_end(response)
+
                     # The callback handler has been updated
                     assert cb.successful_requests == 1
                     assert cb.total_tokens == 3