Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

community: Improved deeplake.py init documentation #17549

Merged
merged 2 commits into from
Feb 21, 2024
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
32 changes: 13 additions & 19 deletions libs/community/langchain_community/vectorstores/deeplake.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@
embedding: Optional[Embeddings] = None,
embedding_function: Optional[Embeddings] = None,
read_only: bool = False,
ingestion_batch_size: int = 1000,
ingestion_batch_size: int = 1024,
num_workers: int = 0,
verbose: bool = True,
exec_option: Optional[str] = None,
Expand All @@ -85,8 +85,12 @@
... )

Args:
dataset_path (str): Path to existing dataset or where to create
a new one. Defaults to _LANGCHAIN_DEFAULT_DEEPLAKE_PATH.
dataset_path (str): The full path for storing to the Deep Lake Vector Store. It can be:

Check failure on line 88 in libs/community/langchain_community/vectorstores/deeplake.py

View workflow job for this annotation

GitHub Actions / cd libs/community / - / make lint #3.11

Ruff (E501)

langchain_community/vectorstores/deeplake.py:88:89: E501 Line too long (99 > 88)
- a Deep Lake cloud path of the form ``hub://org_id/dataset_name``. Requires registration with Deep Lake.

Check failure on line 89 in libs/community/langchain_community/vectorstores/deeplake.py

View workflow job for this annotation

GitHub Actions / cd libs/community / - / make lint #3.11

Ruff (E501)

langchain_community/vectorstores/deeplake.py:89:89: E501 Line too long (121 > 88)
- an s3 path of the form ``s3://bucketname/path/to/dataset``. Credentials are required in either the environment or passed to the creds argument.

Check failure on line 90 in libs/community/langchain_community/vectorstores/deeplake.py

View workflow job for this annotation

GitHub Actions / cd libs/community / - / make lint #3.11

Ruff (E501)

langchain_community/vectorstores/deeplake.py:90:89: E501 Line too long (161 > 88)
- a local file system path of the form ``./path/to/dataset`` or ``~/path/to/dataset`` or ``path/to/dataset``.

Check failure on line 91 in libs/community/langchain_community/vectorstores/deeplake.py

View workflow job for this annotation

GitHub Actions / cd libs/community / - / make lint #3.11

Ruff (E501)

langchain_community/vectorstores/deeplake.py:91:89: E501 Line too long (125 > 88)
- a memory path of the form ``mem://path/to/dataset`` which doesn't save the dataset but keeps it in memory instead. Should be used only for testing as it does not persist.

Check failure on line 92 in libs/community/langchain_community/vectorstores/deeplake.py

View workflow job for this annotation

GitHub Actions / cd libs/community / - / make lint #3.11

Ruff (E501)

langchain_community/vectorstores/deeplake.py:92:89: E501 Line too long (188 > 88)
Defaults to _LANGCHAIN_DEFAULT_DEEPLAKE_PATH.
token (str, optional): Activeloop token, for fetching credentials
to the dataset at path if it is a Deep Lake dataset.
Tokens are normally autogenerated. Optional.
Expand All @@ -98,26 +102,16 @@
read_only (bool): Open dataset in read-only mode. Default is False.
ingestion_batch_size (int): During data ingestion, data is divided
into batches. Batch size is the size of each batch.
Default is 1000.
Default is 1024.
num_workers (int): Number of workers to use during data ingestion.
Default is 0.
verbose (bool): Print dataset summary after each operation.
Default is True.
exec_option (str, optional): DeepLakeVectorStore supports 3 ways to perform
searching - "python", "compute_engine", "tensor_db" and auto.
Default is None.
- ``auto``- Selects the best execution method based on the storage
location of the Vector Store. It is the default option.
- ``python`` - Pure-python implementation that runs on the client.
WARNING: using this with big datasets can lead to memory
issues. Data can be stored anywhere.
- ``compute_engine`` - C++ implementation of the Deep Lake Compute
Engine that runs on the client. Can be used for any data stored in
or connected to Deep Lake. Not for in-memory or local datasets.
- ``tensor_db`` - Hosted Managed Tensor Database that is
responsible for storage and query execution. Only for data stored in
the Deep Lake Managed Database. Use runtime = {"db_engine": True}
during dataset creation.
exec_option (str, optional): Default method for search execution. It could be either ``"auto"``, ``"python"``, ``"compute_engine"`` or ``"tensor_db"``. Defaults to ``"auto"``. If None, it's set to "auto".

Check failure on line 110 in libs/community/langchain_community/vectorstores/deeplake.py

View workflow job for this annotation

GitHub Actions / cd libs/community / - / make lint #3.11

Ruff (E501)

langchain_community/vectorstores/deeplake.py:110:89: E501 Line too long (216 > 88)
- ``auto``- Selects the best execution method based on the storage location of the Vector Store. It is the default option.

Check failure on line 111 in libs/community/langchain_community/vectorstores/deeplake.py

View workflow job for this annotation

GitHub Actions / cd libs/community / - / make lint #3.11

Ruff (E501)

langchain_community/vectorstores/deeplake.py:111:89: E501 Line too long (138 > 88)
- ``python`` - Pure-python implementation that runs on the client and can be used for data stored anywhere. WARNING: using this option with big datasets is discouraged because it can lead to memory issues.

Check failure on line 112 in libs/community/langchain_community/vectorstores/deeplake.py

View workflow job for this annotation

GitHub Actions / cd libs/community / - / make lint #3.11

Ruff (E501)

langchain_community/vectorstores/deeplake.py:112:89: E501 Line too long (221 > 88)
- ``compute_engine`` - Performant C++ implementation of the Deep Lake Compute Engine that runs on the client and can be used for any data stored in or connected to Deep Lake. It cannot be used with in-memory or local datasets.

Check failure on line 113 in libs/community/langchain_community/vectorstores/deeplake.py

View workflow job for this annotation

GitHub Actions / cd libs/community / - / make lint #3.11

Ruff (E501)

langchain_community/vectorstores/deeplake.py:113:89: E501 Line too long (242 > 88)
- ``tensor_db`` - Performant and fully-hosted Managed Tensor Database that is responsible for storage and query execution. Only available for data stored in the Deep Lake Managed Database. Store datasets in this database by specifying runtime = {"tensor_db": True} during dataset creation.

Check failure on line 114 in libs/community/langchain_community/vectorstores/deeplake.py

View workflow job for this annotation

GitHub Actions / cd libs/community / - / make lint #3.11

Ruff (E501)

langchain_community/vectorstores/deeplake.py:114:89: E501 Line too long (305 > 88)
runtime (Dict, optional): Parameters for creating the Vector Store in
Deep Lake's Managed Tensor Database. Not applicable when loading an
existing Vector Store. To create a Vector Store in the Managed Tensor
Expand Down