Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

voyageai[patch]: init package #19098

Merged
merged 43 commits into from
Mar 15, 2024
Merged
Show file tree
Hide file tree
Changes from 40 commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
0ff1332
Change model to a required argument
fodizoltan Feb 15, 2024
be7d6cc
Removed the model attribute declaration as it isn't use
fodizoltan Feb 15, 2024
146f0f1
Referring to the documentation
fodizoltan Feb 16, 2024
4de3553
- model:str
fodizoltan Feb 16, 2024
6e53bf9
Merge pull request #3 from voyage-ai/voyage_2_model
thomas0809 Feb 16, 2024
2275b99
Reformatting voyage
fodizoltan Feb 16, 2024
1041892
Merge pull request #4 from voyage-ai/voyage_2_model
thomas0809 Feb 19, 2024
af41f94
Correct validator and log a warning if no model is defined
fodizoltan Feb 22, 2024
0d3ae1a
Add test for handling the default value and small code correction
fodizoltan Feb 22, 2024
6a789d3
Add test for handling the default value and small code correction
fodizoltan Feb 22, 2024
59efab5
Corrections, due to comments (and more)
fodizoltan Feb 26, 2024
e9bac0e
VoyageAI package init
fodizoltan Feb 26, 2024
7fd85d1
v0.0.1
fodizoltan Feb 26, 2024
e37f8af
Corrections
fodizoltan Feb 27, 2024
5617a10
Corrections
fodizoltan Feb 27, 2024
de61f28
Adding tests
fodizoltan Feb 27, 2024
83da26e
Loop and remove bad test
fodizoltan Feb 28, 2024
9bfb0a0
Finalizing the package
fodizoltan Feb 28, 2024
2b9a1d7
Merge branch 'langchain-ai:master' into master
thomas0809 Feb 28, 2024
853293c
Merge pull request #5 from voyage-ai/voyage_2_model
thomas0809 Feb 28, 2024
9eddafd
Update truncation default value
thomas0809 Feb 28, 2024
9d59a2f
Update argument type for lint
thomas0809 Feb 29, 2024
c463b1d
Implemented async
fodizoltan Mar 3, 2024
969cbb2
Revert changes in pyproject.toml
fodizoltan Mar 3, 2024
cf71a5f
Change in LICENSE
fodizoltan Mar 3, 2024
c8f12d2
Revert poetry.lock changes
fodizoltan Mar 3, 2024
1966be3
Revert pyproject.toml changes
fodizoltan Mar 3, 2024
b908f18
Simple for loop
fodizoltan Mar 6, 2024
8599e6a
Remove default values
fodizoltan Mar 6, 2024
a0c6b06
Merge branch 'master' into voyageai_package
fzowl Mar 6, 2024
84b3a7f
Merge pull request #6 from voyage-ai/voyageai_package
thomas0809 Mar 6, 2024
747c6d2
Merge branch 'langchain-ai:master' into master
thomas0809 Mar 6, 2024
7b65429
Correct some lint issues
fodizoltan Mar 6, 2024
067bb40
Merge pull request #7 from voyage-ai/voyageai_package_correction
thomas0809 Mar 6, 2024
57a3fd7
Poetry lock
fodizoltan Mar 6, 2024
dd3a6da
Merge pull request #8 from voyage-ai/voyageai_package_correction
thomas0809 Mar 6, 2024
f804eec
One small correction in the ipynb file
fodizoltan Mar 7, 2024
f92b1dc
Merge pull request #9 from voyage-ai/voyageai_package_correction
thomas0809 Mar 7, 2024
d7996ea
Merge remote-tracking branch 'origin/master' into erick/voyageai-patc…
efriis Mar 15, 2024
6391186
fixes
efriis Mar 15, 2024
ad2aec1
deprecate community voyage in favor of package
efriis Mar 15, 2024
6bae4e4
license
efriis Mar 15, 2024
a5bb732
fix integration test
efriis Mar 15, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/_integration_test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,7 @@ jobs:
ES_API_KEY: ${{ secrets.ES_API_KEY }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} # for airbyte
MONGODB_ATLAS_URI: ${{ secrets.MONGODB_ATLAS_URI }}
VOYAGE_API_KEY: ${{ secrets.VOYAGE_API_KEY }}
run: |
make integration_tests

Expand Down
1 change: 1 addition & 0 deletions .github/workflows/_release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -196,6 +196,7 @@ jobs:
ES_API_KEY: ${{ secrets.ES_API_KEY }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} # for airbyte
MONGODB_ATLAS_URI: ${{ secrets.MONGODB_ATLAS_URI }}
VOYAGE_API_KEY: ${{ secrets.VOYAGE_API_KEY }}
run: make integration_tests
working-directory: ${{ inputs.working-directory }}

Expand Down
24 changes: 24 additions & 0 deletions docs/docs/integrations/platforms/voyageai.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# VoyageAI

All functionality related to VoyageAI

>[VoyageAI](https://www.voyageai.com/) Voyage AI builds embedding models, customized for your domain and company, for better retrieval quality.
> customized for your domain and company, for better retrieval quality.

## Installation and Setup

Install the integration package with
```bash
pip install langchain-voyageai
```

Get an VoyageAI api key and set it as an environment variable (`VOYAGE_API_KEY`)


## Text Embedding Model

See a [usage example](/docs/integrations/text_embedding/voyageai)

```python
from langchain_voyageai import VoyageAIEmbeddings
```
6 changes: 3 additions & 3 deletions docs/docs/integrations/text_embedding/voyageai.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
"\n",
">[Voyage AI](https://www.voyageai.com/) provides cutting-edge embedding/vectorizations models.\n",
"\n",
"Let's load the Voyage Embedding class."
"Let's load the Voyage Embedding class. (Install the LangChain partner package with `pip install langchain-voyageai`)"
]
},
{
Expand All @@ -19,7 +19,7 @@
"metadata": {},
"outputs": [],
"source": [
"from langchain_community.embeddings import VoyageEmbeddings"
"from langchain_voyageai import VoyageAIEmbeddings"
]
},
{
Expand All @@ -37,7 +37,7 @@
"metadata": {},
"outputs": [],
"source": [
"embeddings = VoyageEmbeddings(\n",
"embeddings = VoyageAIEmbeddings(\n",
" voyage_api_key=\"[ Your Voyage API key ]\", model=\"voyage-2\"\n",
")"
]
Expand Down
1 change: 1 addition & 0 deletions libs/partners/voyageai/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
__pycache__
21 changes: 21 additions & 0 deletions libs/partners/voyageai/LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2023 Voyage AI
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

going to update this to LangChain to match other packages, and happy to discuss! Just want to make sure I'm not doing something that will create issues for folks using langchain-x packages

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me!


Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
57 changes: 57 additions & 0 deletions libs/partners/voyageai/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
.PHONY: all format lint test tests integration_tests docker_tests help extended_tests

# Default target executed when no arguments are given to make.
all: help

# Define a variable for the test file path.
TEST_FILE ?= tests/unit_tests/
integration_test integration_tests: TEST_FILE=tests/integration_tests/

test tests integration_test integration_tests:
poetry run pytest $(TEST_FILE)


######################
# LINTING AND FORMATTING
######################

# Define a variable for Python and notebook files.
PYTHON_FILES=.
MYPY_CACHE=.mypy_cache
lint format: PYTHON_FILES=.
lint_diff format_diff: PYTHON_FILES=$(shell git diff --relative=libs/partners/voyageai --name-only --diff-filter=d master | grep -E '\.py$$|\.ipynb$$')
lint_package: PYTHON_FILES=langchain_voyageai
lint_tests: PYTHON_FILES=tests
lint_tests: MYPY_CACHE=.mypy_cache_test

lint lint_diff lint_package lint_tests:
poetry run ruff .
poetry run ruff format $(PYTHON_FILES) --diff
poetry run ruff --select I $(PYTHON_FILES)
mkdir $(MYPY_CACHE); poetry run mypy $(PYTHON_FILES) --cache-dir $(MYPY_CACHE)

format format_diff:
poetry run ruff format $(PYTHON_FILES)
poetry run ruff --select I --fix $(PYTHON_FILES)

spell_check:
poetry run codespell --toml pyproject.toml

spell_fix:
poetry run codespell --toml pyproject.toml -w

check_imports: $(shell find langchain_voyageai -name '*.py')
poetry run python ./scripts/check_imports.py $^

######################
# HELP
######################

help:
@echo '----'
@echo 'check_imports - check imports'
@echo 'format - run code formatters'
@echo 'lint - run linters'
@echo 'test - run unit tests'
@echo 'tests - run unit tests'
@echo 'test TEST_FILE=<test_file> - run all tests in file'
21 changes: 21 additions & 0 deletions libs/partners/voyageai/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# langchain-voyageai

This package contains the LangChain integrations for VoyageAI through their `voyageai` client package.

## Installation and Setup

- Install the LangChain partner package
```bash
pip install langchain-voyageai
```
- Get an VoyageAI api key and set it as an environment variable (`VOYAGE_API_KEY`) or use the API key as a parameter in the Client.



## Text Embedding Model

See a [usage example](https://python.langchain.com/docs/integrations/text_embedding/voyageai)

```python
from langchain_voyageai import VoyageAIEmbeddings
```
5 changes: 5 additions & 0 deletions libs/partners/voyageai/langchain_voyageai/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
from langchain_voyageai.embeddings import VoyageAIEmbeddings

__all__ = [
"VoyageAIEmbeddings",
]
130 changes: 130 additions & 0 deletions libs/partners/voyageai/langchain_voyageai/embeddings.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
import logging
import os
from typing import Iterable, List, Optional

import voyageai # type: ignore
from langchain_core.embeddings import Embeddings
from langchain_core.pydantic_v1 import (
BaseModel,
Extra,
Field,
SecretStr,
root_validator,
)
from langchain_core.utils import convert_to_secret_str

logger = logging.getLogger(__name__)


class VoyageAIEmbeddings(BaseModel, Embeddings):
"""VoyageAIEmbeddings embedding model.

Example:
.. code-block:: python

from langchain_voyageai import VoyageAIEmbeddings

model = VoyageAIEmbeddings()
"""

_client: voyageai.Client = Field(exclude=True)
_aclient: voyageai.client_async.AsyncClient = Field(exclude=True)
model: str
batch_size: int
show_progress_bar: bool = False
truncation: Optional[bool] = None
voyage_api_key: Optional[SecretStr] = None

class Config:
extra = Extra.forbid

@root_validator(pre=True)
def default_values(cls, values: dict) -> dict:
"""Set default batch size based on model"""

model = values.get("model")
batch_size = values.get("batch_size")
if batch_size is None:
print("batch size", batch_size)
values["batch_size"] = 72 if model in ["voyage-2", "voyage-02"] else 7
return values

@root_validator()
def validate_environment(cls, values: dict) -> dict:
"""Validate that VoyageAI credentials exist in environment."""
voyage_api_key = values.get("voyage_api_key") or os.getenv(
"VOYAGE_API_KEY", None
)
if voyage_api_key:
api_key_secretstr = convert_to_secret_str(voyage_api_key)
values["voyage_api_key"] = api_key_secretstr

api_key_str = api_key_secretstr.get_secret_value()
else:
api_key_str = None
values["_client"] = voyageai.Client(api_key=api_key_str)
values["_aclient"] = voyageai.client_async.AsyncClient(api_key=api_key_str)
return values

def _get_batch_iterator(self, texts: List[str]) -> Iterable:
if self.show_progress_bar:
try:
from tqdm.auto import tqdm # type: ignore
except ImportError as e:
raise ImportError(
"Must have tqdm installed if `show_progress_bar` is set to True. "
"Please install with `pip install tqdm`."
) from e

_iter = tqdm(range(0, len(texts), self.batch_size))
else:
_iter = range(0, len(texts), self.batch_size) # type: ignore

return _iter

def embed_documents(self, texts: List[str]) -> List[List[float]]:
"""Embed search docs."""
embeddings: List[List[float]] = []

_iter = self._get_batch_iterator(texts)
for i in _iter:
embeddings.extend(
self._client.embed(
texts[i : i + self.batch_size],
model=self.model,
input_type="document",
truncation=self.truncation,
).embeddings
)

return embeddings

def embed_query(self, text: str) -> List[float]:
"""Embed query text."""
return self._client.embed(
[text], model=self.model, input_type="query", truncation=self.truncation
).embeddings[0]

async def aembed_documents(self, texts: List[str]) -> List[List[float]]:
embeddings: List[List[float]] = []

_iter = self._get_batch_iterator(texts)
for i in _iter:
r = await self._aclient.embed(
texts[i : i + self.batch_size],
model=self.model,
input_type="document",
truncation=self.truncation,
)
embeddings.extend(r.embeddings)

return embeddings

async def aembed_query(self, text: str) -> List[float]:
r = await self._aclient.embed(
[text],
model=self.model,
input_type="query",
truncation=self.truncation,
)
return r.embeddings[0]
Empty file.