-
Notifications
You must be signed in to change notification settings - Fork 13.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
voyageai[patch]: init package #19098
Merged
Merged
Changes from 40 commits
Commits
Show all changes
43 commits
Select commit
Hold shift + click to select a range
0ff1332
Change model to a required argument
fodizoltan be7d6cc
Removed the model attribute declaration as it isn't use
fodizoltan 146f0f1
Referring to the documentation
fodizoltan 4de3553
- model:str
fodizoltan 6e53bf9
Merge pull request #3 from voyage-ai/voyage_2_model
thomas0809 2275b99
Reformatting voyage
fodizoltan 1041892
Merge pull request #4 from voyage-ai/voyage_2_model
thomas0809 af41f94
Correct validator and log a warning if no model is defined
fodizoltan 0d3ae1a
Add test for handling the default value and small code correction
fodizoltan 6a789d3
Add test for handling the default value and small code correction
fodizoltan 59efab5
Corrections, due to comments (and more)
fodizoltan e9bac0e
VoyageAI package init
fodizoltan 7fd85d1
v0.0.1
fodizoltan e37f8af
Corrections
fodizoltan 5617a10
Corrections
fodizoltan de61f28
Adding tests
fodizoltan 83da26e
Loop and remove bad test
fodizoltan 9bfb0a0
Finalizing the package
fodizoltan 2b9a1d7
Merge branch 'langchain-ai:master' into master
thomas0809 853293c
Merge pull request #5 from voyage-ai/voyage_2_model
thomas0809 9eddafd
Update truncation default value
thomas0809 9d59a2f
Update argument type for lint
thomas0809 c463b1d
Implemented async
fodizoltan 969cbb2
Revert changes in pyproject.toml
fodizoltan cf71a5f
Change in LICENSE
fodizoltan c8f12d2
Revert poetry.lock changes
fodizoltan 1966be3
Revert pyproject.toml changes
fodizoltan b908f18
Simple for loop
fodizoltan 8599e6a
Remove default values
fodizoltan a0c6b06
Merge branch 'master' into voyageai_package
fzowl 84b3a7f
Merge pull request #6 from voyage-ai/voyageai_package
thomas0809 747c6d2
Merge branch 'langchain-ai:master' into master
thomas0809 7b65429
Correct some lint issues
fodizoltan 067bb40
Merge pull request #7 from voyage-ai/voyageai_package_correction
thomas0809 57a3fd7
Poetry lock
fodizoltan dd3a6da
Merge pull request #8 from voyage-ai/voyageai_package_correction
thomas0809 f804eec
One small correction in the ipynb file
fodizoltan f92b1dc
Merge pull request #9 from voyage-ai/voyageai_package_correction
thomas0809 d7996ea
Merge remote-tracking branch 'origin/master' into erick/voyageai-patc…
efriis 6391186
fixes
efriis ad2aec1
deprecate community voyage in favor of package
efriis 6bae4e4
license
efriis a5bb732
fix integration test
efriis File filter
Filter by extension
Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
# VoyageAI | ||
|
||
All functionality related to VoyageAI | ||
|
||
>[VoyageAI](https://www.voyageai.com/) Voyage AI builds embedding models, customized for your domain and company, for better retrieval quality. | ||
> customized for your domain and company, for better retrieval quality. | ||
|
||
## Installation and Setup | ||
|
||
Install the integration package with | ||
```bash | ||
pip install langchain-voyageai | ||
``` | ||
|
||
Get an VoyageAI api key and set it as an environment variable (`VOYAGE_API_KEY`) | ||
|
||
|
||
## Text Embedding Model | ||
|
||
See a [usage example](/docs/integrations/text_embedding/voyageai) | ||
|
||
```python | ||
from langchain_voyageai import VoyageAIEmbeddings | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
__pycache__ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
MIT License | ||
|
||
Copyright (c) 2023 Voyage AI | ||
|
||
Permission is hereby granted, free of charge, to any person obtaining a copy | ||
of this software and associated documentation files (the "Software"), to deal | ||
in the Software without restriction, including without limitation the rights | ||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | ||
copies of the Software, and to permit persons to whom the Software is | ||
furnished to do so, subject to the following conditions: | ||
|
||
The above copyright notice and this permission notice shall be included in all | ||
copies or substantial portions of the Software. | ||
|
||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | ||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | ||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | ||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | ||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | ||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE | ||
SOFTWARE. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,57 @@ | ||
.PHONY: all format lint test tests integration_tests docker_tests help extended_tests | ||
|
||
# Default target executed when no arguments are given to make. | ||
all: help | ||
|
||
# Define a variable for the test file path. | ||
TEST_FILE ?= tests/unit_tests/ | ||
integration_test integration_tests: TEST_FILE=tests/integration_tests/ | ||
|
||
test tests integration_test integration_tests: | ||
poetry run pytest $(TEST_FILE) | ||
|
||
|
||
###################### | ||
# LINTING AND FORMATTING | ||
###################### | ||
|
||
# Define a variable for Python and notebook files. | ||
PYTHON_FILES=. | ||
MYPY_CACHE=.mypy_cache | ||
lint format: PYTHON_FILES=. | ||
lint_diff format_diff: PYTHON_FILES=$(shell git diff --relative=libs/partners/voyageai --name-only --diff-filter=d master | grep -E '\.py$$|\.ipynb$$') | ||
lint_package: PYTHON_FILES=langchain_voyageai | ||
lint_tests: PYTHON_FILES=tests | ||
lint_tests: MYPY_CACHE=.mypy_cache_test | ||
|
||
lint lint_diff lint_package lint_tests: | ||
poetry run ruff . | ||
poetry run ruff format $(PYTHON_FILES) --diff | ||
poetry run ruff --select I $(PYTHON_FILES) | ||
mkdir $(MYPY_CACHE); poetry run mypy $(PYTHON_FILES) --cache-dir $(MYPY_CACHE) | ||
|
||
format format_diff: | ||
poetry run ruff format $(PYTHON_FILES) | ||
poetry run ruff --select I --fix $(PYTHON_FILES) | ||
|
||
spell_check: | ||
poetry run codespell --toml pyproject.toml | ||
|
||
spell_fix: | ||
poetry run codespell --toml pyproject.toml -w | ||
|
||
check_imports: $(shell find langchain_voyageai -name '*.py') | ||
poetry run python ./scripts/check_imports.py $^ | ||
|
||
###################### | ||
# HELP | ||
###################### | ||
|
||
help: | ||
@echo '----' | ||
@echo 'check_imports - check imports' | ||
@echo 'format - run code formatters' | ||
@echo 'lint - run linters' | ||
@echo 'test - run unit tests' | ||
@echo 'tests - run unit tests' | ||
@echo 'test TEST_FILE=<test_file> - run all tests in file' |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
# langchain-voyageai | ||
|
||
This package contains the LangChain integrations for VoyageAI through their `voyageai` client package. | ||
|
||
## Installation and Setup | ||
|
||
- Install the LangChain partner package | ||
```bash | ||
pip install langchain-voyageai | ||
``` | ||
- Get an VoyageAI api key and set it as an environment variable (`VOYAGE_API_KEY`) or use the API key as a parameter in the Client. | ||
|
||
|
||
|
||
## Text Embedding Model | ||
|
||
See a [usage example](https://python.langchain.com/docs/integrations/text_embedding/voyageai) | ||
|
||
```python | ||
from langchain_voyageai import VoyageAIEmbeddings | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
from langchain_voyageai.embeddings import VoyageAIEmbeddings | ||
|
||
__all__ = [ | ||
"VoyageAIEmbeddings", | ||
] |
130 changes: 130 additions & 0 deletions
130
libs/partners/voyageai/langchain_voyageai/embeddings.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,130 @@ | ||
import logging | ||
import os | ||
from typing import Iterable, List, Optional | ||
|
||
import voyageai # type: ignore | ||
from langchain_core.embeddings import Embeddings | ||
from langchain_core.pydantic_v1 import ( | ||
BaseModel, | ||
Extra, | ||
Field, | ||
SecretStr, | ||
root_validator, | ||
) | ||
from langchain_core.utils import convert_to_secret_str | ||
|
||
logger = logging.getLogger(__name__) | ||
|
||
|
||
class VoyageAIEmbeddings(BaseModel, Embeddings): | ||
"""VoyageAIEmbeddings embedding model. | ||
|
||
Example: | ||
.. code-block:: python | ||
|
||
from langchain_voyageai import VoyageAIEmbeddings | ||
|
||
model = VoyageAIEmbeddings() | ||
""" | ||
|
||
_client: voyageai.Client = Field(exclude=True) | ||
_aclient: voyageai.client_async.AsyncClient = Field(exclude=True) | ||
model: str | ||
batch_size: int | ||
show_progress_bar: bool = False | ||
truncation: Optional[bool] = None | ||
voyage_api_key: Optional[SecretStr] = None | ||
|
||
class Config: | ||
extra = Extra.forbid | ||
|
||
@root_validator(pre=True) | ||
def default_values(cls, values: dict) -> dict: | ||
"""Set default batch size based on model""" | ||
|
||
model = values.get("model") | ||
batch_size = values.get("batch_size") | ||
if batch_size is None: | ||
print("batch size", batch_size) | ||
values["batch_size"] = 72 if model in ["voyage-2", "voyage-02"] else 7 | ||
return values | ||
|
||
@root_validator() | ||
def validate_environment(cls, values: dict) -> dict: | ||
"""Validate that VoyageAI credentials exist in environment.""" | ||
voyage_api_key = values.get("voyage_api_key") or os.getenv( | ||
"VOYAGE_API_KEY", None | ||
) | ||
if voyage_api_key: | ||
api_key_secretstr = convert_to_secret_str(voyage_api_key) | ||
values["voyage_api_key"] = api_key_secretstr | ||
|
||
api_key_str = api_key_secretstr.get_secret_value() | ||
else: | ||
api_key_str = None | ||
values["_client"] = voyageai.Client(api_key=api_key_str) | ||
values["_aclient"] = voyageai.client_async.AsyncClient(api_key=api_key_str) | ||
return values | ||
|
||
def _get_batch_iterator(self, texts: List[str]) -> Iterable: | ||
if self.show_progress_bar: | ||
try: | ||
from tqdm.auto import tqdm # type: ignore | ||
except ImportError as e: | ||
raise ImportError( | ||
"Must have tqdm installed if `show_progress_bar` is set to True. " | ||
"Please install with `pip install tqdm`." | ||
) from e | ||
|
||
_iter = tqdm(range(0, len(texts), self.batch_size)) | ||
else: | ||
_iter = range(0, len(texts), self.batch_size) # type: ignore | ||
|
||
return _iter | ||
|
||
def embed_documents(self, texts: List[str]) -> List[List[float]]: | ||
"""Embed search docs.""" | ||
embeddings: List[List[float]] = [] | ||
|
||
_iter = self._get_batch_iterator(texts) | ||
for i in _iter: | ||
embeddings.extend( | ||
self._client.embed( | ||
texts[i : i + self.batch_size], | ||
model=self.model, | ||
input_type="document", | ||
truncation=self.truncation, | ||
).embeddings | ||
) | ||
|
||
return embeddings | ||
|
||
def embed_query(self, text: str) -> List[float]: | ||
"""Embed query text.""" | ||
return self._client.embed( | ||
[text], model=self.model, input_type="query", truncation=self.truncation | ||
).embeddings[0] | ||
|
||
async def aembed_documents(self, texts: List[str]) -> List[List[float]]: | ||
embeddings: List[List[float]] = [] | ||
|
||
_iter = self._get_batch_iterator(texts) | ||
for i in _iter: | ||
r = await self._aclient.embed( | ||
texts[i : i + self.batch_size], | ||
model=self.model, | ||
input_type="document", | ||
truncation=self.truncation, | ||
) | ||
embeddings.extend(r.embeddings) | ||
|
||
return embeddings | ||
|
||
async def aembed_query(self, text: str) -> List[float]: | ||
r = await self._aclient.embed( | ||
[text], | ||
model=self.model, | ||
input_type="query", | ||
truncation=self.truncation, | ||
) | ||
return r.embeddings[0] |
Empty file.
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
going to update this to LangChain to match other packages, and happy to discuss! Just want to make sure I'm not doing something that will create issues for folks using langchain-x packages
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me!