Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory - Qdrant Vector Database Connector #210

Merged
merged 16 commits into from
Mar 30, 2023
Merged

Memory - Qdrant Vector Database Connector #210

merged 16 commits into from
Mar 30, 2023

Conversation

tawalke
Copy link
Contributor

@tawalke tawalke commented Mar 29, 2023

Motivation and Context

This pull request/PR review is to add the ability for the Semantic Kernel to persist embeddings/memory to external vector databases like Qdrant. This submission has modifications and additions to allow for integration into the current SK Memory architecture and the subsequent SDK/API of various Vector Databases.

Why is this change required?
This change is required in order to allow SK developers/users to persist and search for embeddings/memory from a Vector Database like Qdrant.

What problem does it solve?
Adds capabilities of Long Term Memory/Embedding storage to a vector databases from the Semantic Kernel.

What scenario does it contribute to?
Scenario: Long term, scalability memory storage, retrieval, search, and filtering of Embeddings.

If it fixes an open issue, please link to the issue here.
N/A

Description

This PR currently includes connection for the Qdrant VectorDB only, removing the initial Milvus VectorDB addition and VectorDB client interfaces for consistency across various external vector databases, which will be provided in forthcoming PR.

  • Addition and Modification of Qdrant.Dotnet SDK
  • Addition of new namespace Skills.Memory.QdrantDB
  • Creating/Adding Qdrant Memory Client class and QdrantMemoryStore.cs. Adding methods for connecting, retrieving collections and embeddings from Qdrant vector database in cloud.

Contribution Checklist

Sorry, something went wrong.

tawalke and others added 6 commits March 28, 2023 14:36

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
… to SK branch (#83)

### Motivation and Context

To provide SK connector for Qdrant Vector database using memory
architecture of kernel

This pull request/PR review is to add the ability for the Semantic
Kernel to persist embeddings/memory to external vector databases like
Qdrant. This submission has modifications and additions to allow for
integration into the current SK Memory architecture and the subsequent
SDK/API of various Vector Databases. The VectorDB Skill/Connectors has
significant changes which are likely more than initial estimations.

Please help reviewers and future users, providing the following
information:

1. Why is this change required?
This change is required in order to allow SK developers/users to persist
and search for embeddings/memory from a Vector Database like Qdrant.

2. What problem does it solve?
Adds capabilities of Long Term Memory/Embedding storage to a vector
databases from the Semantic Kernel.

3. What scenario does it contribute to?
Scenario: Long term, scalability memory storage, retrieval, search, and
filtering of Embeddings.

4. If it fixes an open issue, please link to the issue here.
N/A

### Description

This PR currently includes connection for the Qdrant VectorDB only. What
is out of scope: This PR removes the initial Milvus VectorDB addition
and generic VectorDB client interfaces used for consistency across
various external vector databases. This concept will be provided in
forthcoming design & PR.

**Addition and Modification of custom SK Qdrant.Dotnet SDK**
* Removal of VectorRecord and VectorMetaData replacing with
VectorRecordData which inheirits from IEmbeddingwithMetadata,
IDataStore, IEmbeddingIndex
* Adding FetchVectorsRequest and FetchVectorsResponse classes for calls
the Points/Scroll API of Qdrant vector database to get ALL vectors
*without* vectorid by collection name only. Initially not in original
custom SDK
* Update of QdrantDB constructor for connection to add port so Qdrant
API for both REST/gRPC (which have same calls per Qdrant) are supported
for performance and binary data
* Adding Points internal class to support various Qdrant Points API
calls
* Adding a default vectorsize if not passed in that is default as ADA
model.
* Update of IVectorDbCollection.cs to support VectorRecordData class and
DataEntry<VectorRecordData>
* Adding method to QdrantCollections for GetAllVectorsAsync method
* Update to QdrantCollections.cs to support updated IVectorDBCollections
interface changes.
* Changing almost add signatures of Qdrant Methods to return
DataEntry<VectorRecordData<float>> instead of VectorRecord
* Changing of SearchVectorsResponse.cs
* Adding FetchAllCollectionNamesRequest and
FetchAllCollectionNamesResponse classes for calls the List Collection
API for Qdrant Vector database to get vectors *without* vectorid by
collection name only. Initial not in SDK

**Additons to Skills.Memory.VectorDB**
* Adding new namespace: Skills.Memory.VectorDB
* Adding Qdrant VectorDB client for SK, QdrantVectorDB 
* Adding ILongTermMemoryStore interface for VectorDB
* Creating/Adding Qdrant Memory Store class, QdrantMemoryStore.cs.
Adding new method for connecting, retrieving collections and embedding
from Qdrant DB in cloud.

These notes will help understanding how your code works. Thanks!

***Note***
* This does build but it has several warnings as the existing SDK code
that was unchanged has separate Logger and other functionality that SK
now provides that I am requesting review of possible removal. - Based
upon comments in fork, several of these have been addressed.
* Question about the need or method for GetCollections to retrieve ALL
collections in a external database which could be significant data
request as external vector databases store embeddings: Would like to
discuss possible established limit from SK on pull. Adding API from
Qdrant REST API to handle Getting Collection names.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
…192)

Updating the way metadata is serialized in memory.
Updating tests and examples.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Removing "Internal" folder, bumping all files up a level.
Moving DTOs from Internal/DataModels to Http/ApiSchema folder.
Updating all namespaces accordingly.

+Formatting fixes, usings fixes.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Adding new dotnet syntax example.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
- Removing extraneous Test classes (bad merge) and fixing build
warnings.
- Simplifying HttpRequest.GetJsonContent
- Cleaning up a few more warnings, simplifying some calls
@tawalke tawalke changed the title Memory Qdrant PR Memory - Qdrant Vector Database Mar 29, 2023
@awharrison-28
Copy link
Contributor

Please update the PR description with Motivation & Context as well as Changes made

@tawalke tawalke changed the title Memory - Qdrant Vector Database Memory - Qdrant Vector Database Connector Mar 29, 2023
@awharrison-28 awharrison-28 added enhancement PR: ready for review All feedback addressed, ready for reviews .NET Issue or Pull requests regarding .NET code labels Mar 29, 2023
@awharrison-28
Copy link
Contributor

Please update contribution checklist

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Address PR Feedback

---------

Co-authored-by: Devis Lucato <dluc@users.noreply.github.com>
Co-authored-by: Abby Harrison <abharris@microsoft.com>

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Copy link
Member

@lemillermicrosoft lemillermicrosoft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hooray getting this capability in!

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
- Added a similarity search example to Example19_Qdrant
- Piped the similarity threshold down to the Qdrant
GetNearestInCollection method
- Renamed UnableToSerializeRecordPayload ->
UnableToDeserializeRecordPayload
- Renamed UnableToSerializeMetadata -> UnableToDeserializeMetadata
- Added QDRANT secret names to kernel-syntax-examples README

---------

Co-authored-by: Abby Harrison <abharris@microsoft.com>
Co-authored-by: Dan Marshall <danmar@microsoft.com>
Co-authored-by: Shawn Callegari <36091529+shawncal@users.noreply.github.com>
shawncal and others added 2 commits March 29, 2023 20:31

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
- Set QdrantMemoryStore vectorsize at instance creation. Otherwise, the
store is forced to use 1536 which is very limiting.

---------

Co-authored-by: Abby Harrison <abharris@microsoft.com>
Co-authored-by: Shawn Callegari <36091529+shawncal@users.noreply.github.com>
Copy link
Member

@lemillermicrosoft lemillermicrosoft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with Devis re: Client not being tied to kernel. Would be okay addressing in a fast follow with other planned changes.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
…moryStore.cs
      Summary: The QdrantMemoryStore class was not properly checking if the vector data returned by the Qdrant client had a value, which could cause a null reference exception when trying to access it. This commit adds a null check and unwraps the optional value before returning it. This ensures that the memory store returns a valid embedding with metadata or null if not found.
      Summary:
      - Removed unused parameter collectionVectorSize from CreateCollectionAsync method
      - Removed trailing whitespace from SearchVectorsRequest class
@tawalke
Copy link
Contributor Author

tawalke commented Mar 30, 2023

Please update the PR description with Motivation & Context as well as Changes made

completed updated description

Copy link
Contributor

@dluc dluc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome work, thanks!

@dluc dluc merged commit 0963e22 into main Mar 30, 2023
@dluc dluc deleted the memory-qdrant branch April 4, 2023 20:22
dehoward pushed a commit to lemillermicrosoft/semantic-kernel that referenced this pull request Jun 1, 2023
### Motivation and Context

This pull request/PR review is to add the ability for the Semantic
Kernel to persist embeddings/memory to external vector databases like
Qdrant. This submission has modifications and additions to allow for
integration into the current SK Memory architecture and the subsequent
SDK/API of various Vector Databases.

**Why is this change required?**
This change is required in order to allow SK developers/users to persist
and search for embeddings/memory from a Vector Database like Qdrant.

**What problem does it solve?**
Adds capabilities of Long Term Memory/Embedding storage to a vector
databases from the Semantic Kernel.

**What scenario does it contribute to?**
Scenario: Long term, scalability memory storage, retrieval, search, and
filtering of Embeddings.

**If it fixes an open issue, please link to the issue here.**
N/A

### Description

This PR currently includes connection for the Qdrant VectorDB only,
removing the initial Milvus VectorDB addition and VectorDB client
interfaces for consistency across various external vector databases,
which will be provided in forthcoming PR.

- Addition and Modification of Qdrant.Dotnet SDK
- Addition of new namespace Skills.Memory.QdrantDB
- Creating/Adding Qdrant Memory Client class and QdrantMemoryStore.cs.
Adding methods for connecting, retrieving collections and embeddings
from Qdrant vector database in cloud.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
.NET Issue or Pull requests regarding .NET code PR: ready for review All feedback addressed, ready for reviews
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants