Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Google Cloud Enterprise Search retriever #7857

Merged
merged 5 commits into from
Jul 19, 2023

Conversation

jarokaz
Copy link
Contributor

@jarokaz jarokaz commented Jul 17, 2023

No description provided.

Verified

This commit was signed with the committer’s verified signature.
@vercel
Copy link

vercel bot commented Jul 17, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
langchain ⬜️ Ignored (Inspect) Jul 18, 2023 8:41pm

@dosubot dosubot bot added the 🤖:enhancement A large net-new component, integration, or chain. Use sparingly. The largest features label Jul 17, 2023
@jarokaz
Copy link
Contributor Author

jarokaz commented Jul 17, 2023

This PR adds a LangChain retriever that encapsulates Enterprise Search on Google Cloud Gen App Builder.

https://cloud.google.com/enterprise-search

Copy link
Collaborator

@baskaryan baskaryan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one small comment, otherwise looks awesome. thanks @jarokaz!!

"""Filter expression."""
get_extractive_answers: bool = False
"""If True the retriever will return Extractive Answers. Otherwise Extractive Segments."""
max_documents: int = Field(default=5, ge=1, le=100)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

most retriever / doc store interfaces refer to this as top_k, could we change for consistency?

and out of curiosity, are the bounds enforced on the server side as well? if not, do we need them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@baskaryan

This setting is a little bit different than top_k as the underlying retrieval mechanism is a little bit different than in many other vector stores . The max_documents, max_extractive_answer_count, and max_extractive_segment_count settings together decide how many matching "chunks" will be returned. The max_documents controls how many top ranked documents - as ingested by Enterprise Search engine - will be used to prepare extractive answers or extractive segments. When you populate Enterprise Search you ingest full document (rather than chunks) and the engine does chunking and embedding generations in the backend.

For example, if you set max_documents to 3 and max_extractive_answers_count to 3 you can get up to 9 LangChain documents.

When we have the retriever in the main, we are planning to publish a number of code samples that show how to use different LangChain retrieval chains and agents with Enterprise Search. In these code samples we plan to go into pretty deep detail in describing different strategies for ingesting/quering both structured and unstructured data with Enterprise Search.

fmt

Verified

This commit was signed with the committer’s verified signature.
@baskaryan
Copy link
Collaborator

lgtm, thanks @jarokaz!!

@afirstenberg
Copy link

@jarokaz - I just spoke to Kristopher Overholt, and he suggested I talk to you about having me work on this for LangChainJS if work isn't already being done and anything I should be aware of in doing so.

baskaryan pushed a commit that referenced this pull request Jul 28, 2023
…se Search retriever (#8369)

Followup to #7857

- Changes `_convert_search_response()` to use object attributes instead
of converting to dictionary
- Simplifies logic for readability
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:enhancement A large net-new component, integration, or chain. Use sparingly. The largest features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants