-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory - Qdrant Vector Database Connector #210
Conversation
… to SK branch (#83) ### Motivation and Context To provide SK connector for Qdrant Vector database using memory architecture of kernel This pull request/PR review is to add the ability for the Semantic Kernel to persist embeddings/memory to external vector databases like Qdrant. This submission has modifications and additions to allow for integration into the current SK Memory architecture and the subsequent SDK/API of various Vector Databases. The VectorDB Skill/Connectors has significant changes which are likely more than initial estimations. Please help reviewers and future users, providing the following information: 1. Why is this change required? This change is required in order to allow SK developers/users to persist and search for embeddings/memory from a Vector Database like Qdrant. 2. What problem does it solve? Adds capabilities of Long Term Memory/Embedding storage to a vector databases from the Semantic Kernel. 3. What scenario does it contribute to? Scenario: Long term, scalability memory storage, retrieval, search, and filtering of Embeddings. 4. If it fixes an open issue, please link to the issue here. N/A ### Description This PR currently includes connection for the Qdrant VectorDB only. What is out of scope: This PR removes the initial Milvus VectorDB addition and generic VectorDB client interfaces used for consistency across various external vector databases. This concept will be provided in forthcoming design & PR. **Addition and Modification of custom SK Qdrant.Dotnet SDK** * Removal of VectorRecord and VectorMetaData replacing with VectorRecordData which inheirits from IEmbeddingwithMetadata, IDataStore, IEmbeddingIndex * Adding FetchVectorsRequest and FetchVectorsResponse classes for calls the Points/Scroll API of Qdrant vector database to get ALL vectors *without* vectorid by collection name only. Initially not in original custom SDK * Update of QdrantDB constructor for connection to add port so Qdrant API for both REST/gRPC (which have same calls per Qdrant) are supported for performance and binary data * Adding Points internal class to support various Qdrant Points API calls * Adding a default vectorsize if not passed in that is default as ADA model. * Update of IVectorDbCollection.cs to support VectorRecordData class and DataEntry<VectorRecordData> * Adding method to QdrantCollections for GetAllVectorsAsync method * Update to QdrantCollections.cs to support updated IVectorDBCollections interface changes. * Changing almost add signatures of Qdrant Methods to return DataEntry<VectorRecordData<float>> instead of VectorRecord * Changing of SearchVectorsResponse.cs * Adding FetchAllCollectionNamesRequest and FetchAllCollectionNamesResponse classes for calls the List Collection API for Qdrant Vector database to get vectors *without* vectorid by collection name only. Initial not in SDK **Additons to Skills.Memory.VectorDB** * Adding new namespace: Skills.Memory.VectorDB * Adding Qdrant VectorDB client for SK, QdrantVectorDB * Adding ILongTermMemoryStore interface for VectorDB * Creating/Adding Qdrant Memory Store class, QdrantMemoryStore.cs. Adding new method for connecting, retrieving collections and embedding from Qdrant DB in cloud. These notes will help understanding how your code works. Thanks! ***Note*** * This does build but it has several warnings as the existing SDK code that was unchanged has separate Logger and other functionality that SK now provides that I am requesting review of possible removal. - Based upon comments in fork, several of these have been addressed. * Question about the need or method for GetCollections to retrieve ALL collections in a external database which could be significant data request as external vector databases store embeddings: Would like to discuss possible established limit from SK on pull. Adding API from Qdrant REST API to handle Getting Collection names.
…192) Updating the way metadata is serialized in memory. Updating tests and examples.
Removing "Internal" folder, bumping all files up a level. Moving DTOs from Internal/DataModels to Http/ApiSchema folder. Updating all namespaces accordingly. +Formatting fixes, usings fixes.
Adding new dotnet syntax example.
- Removing extraneous Test classes (bad merge) and fixing build warnings. - Simplifying HttpRequest.GetJsonContent - Cleaning up a few more warnings, simplifying some calls
Please update the PR description with Motivation & Context as well as Changes made |
Please update contribution checklist |
dotnet/src/SemanticKernel.Skills/Skills.Memory.Qdrant/Skills.Memory.Qdrant.csproj
Outdated
Show resolved
Hide resolved
Address PR Feedback --------- Co-authored-by: Devis Lucato <dluc@users.noreply.github.com> Co-authored-by: Abby Harrison <abharris@microsoft.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hooray getting this capability in!
dotnet/src/SemanticKernel.Skills/Skills.Memory.Qdrant/QdrantVectorDbClient.cs
Show resolved
Hide resolved
dotnet/src/SemanticKernel.Skills/Skills.Memory.Qdrant/Skills.Memory.Qdrant.csproj
Show resolved
Hide resolved
dotnet/src/SemanticKernel.Skills/Skills.Memory.Qdrant/QdrantVectorDbClient.cs
Show resolved
Hide resolved
dotnet/src/SemanticKernel.Skills/Skills.Memory.Qdrant/QdrantVectorDbClient.cs
Outdated
Show resolved
Hide resolved
dotnet/src/SemanticKernel.Skills/Skills.Memory.Qdrant/QdrantVectorDbClient.cs
Show resolved
Hide resolved
- Added a similarity search example to Example19_Qdrant - Piped the similarity threshold down to the Qdrant GetNearestInCollection method - Renamed UnableToSerializeRecordPayload -> UnableToDeserializeRecordPayload - Renamed UnableToSerializeMetadata -> UnableToDeserializeMetadata - Added QDRANT secret names to kernel-syntax-examples README --------- Co-authored-by: Abby Harrison <abharris@microsoft.com> Co-authored-by: Dan Marshall <danmar@microsoft.com> Co-authored-by: Shawn Callegari <36091529+shawncal@users.noreply.github.com>
- Set QdrantMemoryStore vectorsize at instance creation. Otherwise, the store is forced to use 1536 which is very limiting. --------- Co-authored-by: Abby Harrison <abharris@microsoft.com> Co-authored-by: Shawn Callegari <36091529+shawncal@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with Devis re: Client not being tied to kernel. Would be okay addressing in a fast follow with other planned changes.
dotnet/src/SemanticKernel.Skills/Skills.Memory.Qdrant/QdrantMemoryStore.cs
Outdated
Show resolved
Hide resolved
dotnet/src/SemanticKernel.Skills/Skills.Memory.Qdrant/QdrantMemoryStore.cs
Show resolved
Hide resolved
Summary: The QdrantMemoryStore class was not properly checking if the vector data returned by the Qdrant client had a value, which could cause a null reference exception when trying to access it. This commit adds a null check and unwraps the optional value before returning it. This ensures that the memory store returns a valid embedding with metadata or null if not found.
Summary: - Removed unused parameter collectionVectorSize from CreateCollectionAsync method - Removed trailing whitespace from SearchVectorsRequest class
completed updated description |
dotnet/src/SemanticKernel.Skills/Skills.Memory.Qdrant/Diagnostics/IValidatable.cs
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awesome work, thanks!
### Motivation and Context This pull request/PR review is to add the ability for the Semantic Kernel to persist embeddings/memory to external vector databases like Qdrant. This submission has modifications and additions to allow for integration into the current SK Memory architecture and the subsequent SDK/API of various Vector Databases. **Why is this change required?** This change is required in order to allow SK developers/users to persist and search for embeddings/memory from a Vector Database like Qdrant. **What problem does it solve?** Adds capabilities of Long Term Memory/Embedding storage to a vector databases from the Semantic Kernel. **What scenario does it contribute to?** Scenario: Long term, scalability memory storage, retrieval, search, and filtering of Embeddings. **If it fixes an open issue, please link to the issue here.** N/A ### Description This PR currently includes connection for the Qdrant VectorDB only, removing the initial Milvus VectorDB addition and VectorDB client interfaces for consistency across various external vector databases, which will be provided in forthcoming PR. - Addition and Modification of Qdrant.Dotnet SDK - Addition of new namespace Skills.Memory.QdrantDB - Creating/Adding Qdrant Memory Client class and QdrantMemoryStore.cs. Adding methods for connecting, retrieving collections and embeddings from Qdrant vector database in cloud.
Motivation and Context
This pull request/PR review is to add the ability for the Semantic Kernel to persist embeddings/memory to external vector databases like Qdrant. This submission has modifications and additions to allow for integration into the current SK Memory architecture and the subsequent SDK/API of various Vector Databases.
Why is this change required?
This change is required in order to allow SK developers/users to persist and search for embeddings/memory from a Vector Database like Qdrant.
What problem does it solve?
Adds capabilities of Long Term Memory/Embedding storage to a vector databases from the Semantic Kernel.
What scenario does it contribute to?
Scenario: Long term, scalability memory storage, retrieval, search, and filtering of Embeddings.
If it fixes an open issue, please link to the issue here.
N/A
Description
This PR currently includes connection for the Qdrant VectorDB only, removing the initial Milvus VectorDB addition and VectorDB client interfaces for consistency across various external vector databases, which will be provided in forthcoming PR.
Contribution Checklist
dotnet format