-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: milvus plugin #156
feat: milvus plugin #156
Conversation
WalkthroughThe changes introduce a new Milvus plugin within the Genkit framework, enabling indexing and retrieval of data using the Milvus vector database. Key components include a Changes
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
🧹 Outside diff range and nitpick comments (5)
plugins/milvus/README.md (2)
1-10
: Enhance introduction and installation documentation.Consider adding:
- A brief explanation of what vector databases are and their use cases
- Required peer dependencies (if any)
- Supported Milvus versions
54-84
: Enhance usage documentation with more examples.Consider adding:
- TypeScript type information for parameters and return values
- Error handling examples
- Complete example with actual document structure
- Example of how to query with different similarity metrics
plugins/milvus/tests/index_test.ts (2)
31-34
: Add documentation for client parameters.The empty client parameters will cause most tests to be skipped. Consider adding:
- A comment explaining how to configure these parameters
- Instructions for setting up test credentials
- Environment variable support for CI/CD
const clientParams = { - address: '', - token: '', + address: process.env.MILVUS_ADDRESS || '', + token: process.env.MILVUS_TOKEN || '', }; + +// Note: Tests requiring a Milvus connection will be skipped if MILVUS_ADDRESS is not set +// To run all tests locally: +// 1. Start a Milvus instance +// 2. Set MILVUS_ADDRESS (e.g., 'localhost:19530') +// 3. Set MILVUS_TOKEN if authentication is enabled
105-107
: Strengthen assertions for search results.The current assertion only checks the error code and array length. Consider adding more comprehensive checks:
- Type validation for search results
- Verification of result structure
- Score threshold validation
- assert( - result.status.error_code === 'Success' && result.results.length >= 0 - ); + assert(result.status.error_code === 'Success', 'Search should succeed'); + assert(Array.isArray(result.results), 'Results should be an array'); + assert(result.results.length > 0, 'Should return at least one result'); + + // Verify result structure + const firstResult = result.results[0]; + assert('score' in firstResult, 'Result should contain a score'); + assert('id' in firstResult, 'Result should contain an id'); + assert(firstResult.score >= 0 && firstResult.score <= 1, 'Score should be between 0 and 1');plugins/milvus/src/index.ts (1)
297-305
: Handle Sensitive Information in Error LoggingIn the
deleteMilvusCollection
function, logging the error withconsole.error
might expose sensitive information. Consider sanitizing the error message or logging less sensitive details to avoid potential information leakage.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
⛔ Files ignored due to path filters (1)
package-lock.json
is excluded by!**/package-lock.json
📒 Files selected for processing (6)
plugins/milvus/README.md
(1 hunks)plugins/milvus/package.json
(1 hunks)plugins/milvus/src/index.ts
(1 hunks)plugins/milvus/tests/index_test.ts
(1 hunks)plugins/milvus/tsconfig.json
(1 hunks)plugins/milvus/tsup.config.ts
(1 hunks)
✅ Files skipped from review due to trivial changes (3)
- plugins/milvus/package.json
- plugins/milvus/tsconfig.json
- plugins/milvus/tsup.config.ts
🧰 Additional context used
🪛 LanguageTool
plugins/milvus/README.md
[uncategorized] ~35-~35: Loose punctuation mark.
Context: ...e three optional parameters: - dbName
: Specified database - clientParams
: I...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~37-~37: Loose punctuation mark.
Context: ...e: Specified database -
clientParams`: If you're not running your Milvus serve...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~48-~48: Loose punctuation mark.
Context: ...oken: "", } ``` - embedderOptions
: Use this parameter to pass options to t...
(UNLIKELY_OPENING_PUNCTUATION)
🔇 Additional comments (4)
plugins/milvus/README.md (1)
1-84
: 🛠️ Refactor suggestion
Add essential documentation sections.
Consider adding the following sections to make the documentation more comprehensive:
- Troubleshooting guide with common issues and solutions
- Migration guide for users coming from similar vector database plugins
- Performance considerations and best practices
- Security considerations when connecting to remote Milvus instances
🧰 Tools
🪛 LanguageTool
[uncategorized] ~35-~35: Loose punctuation mark.
Context: ...e three optional parameters: - dbName
: Specified database - clientParams
: I...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~37-~37: Loose punctuation mark.
Context: ...e: Specified database -
clientParams`: If you're not running your Milvus serve...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~48-~48: Loose punctuation mark.
Context: ...oken: "", } ``` - embedderOptions
: Use this parameter to pass options to t...
(UNLIKELY_OPENING_PUNCTUATION)
plugins/milvus/tests/index_test.ts (2)
1-28
: LGTM! Proper license and imports.
The license header is complete, and the imports are appropriately structured using built-in Node.js test modules.
37-54
: Verify vector dimensions match collection configuration.
The mock entries use 4-dimensional vectors, but there's no validation to ensure this matches the collection's configuration. Consider:
- Adding a constant for the vector dimension
- Validating mock data dimensions
+const VECTOR_DIMENSION = 4;
+
+// Validate vector dimensions
+const validateVectorDimension = (vector: number[]) => {
+ if (vector.length !== VECTOR_DIMENSION) {
+ throw new Error(`Vector dimension must be ${VECTOR_DIMENSION}`);
+ }
+};
+
const mockEntries = [
{
vector: [0.1, 0.2, 0.3, 0.4],
document: 'This is a test document',
metadata: {
✅ Verification successful
Let me check the actual dimension configurations in these files to verify if there's consistency.
Let me check how this test file is using these mock entries and if there's any connection to the embedders.
Mock vector dimensions correctly match collection configuration
The mock entries with 4-dimensional vectors are correctly aligned with the collection configuration. The test file explicitly creates a Milvus collection with dimension: 4
and uses consistent 4-dimensional vectors in the mock data for testing insert and search operations. No changes are needed.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Verify vector dimension consistency across the codebase
rg -l "dimension.*4" --type ts
Length of output: 131
Script:
#!/bin/bash
# Check the context around dimension configurations
rg "dimension.*4" -B 2 -A 2 --type ts
Length of output: 1842
Script:
#!/bin/bash
# Get more context around the test file usage
cat plugins/milvus/tests/index_test.ts
Length of output: 3210
plugins/milvus/src/index.ts (1)
105-159
: milvusRetriever
Function Implementation Looks Good
The milvusRetriever
function is well-implemented, correctly handles collection existence, and performs querying appropriately. Good job on integrating embedding and retrieval logic.
describe('Test Genkit Milvus Plugin', () => { | ||
it('should create a retriever', async () => { | ||
const retriever = milvusRetrieverRef({ | ||
collectionName: COLLECTION_NAME, | ||
}); | ||
assert(retriever.name === `milvus/${COLLECTION_NAME}`); | ||
}); | ||
|
||
it('should create an indexer', async () => { | ||
const indexer = milvusIndexerRef({ | ||
collectionName: COLLECTION_NAME, | ||
}); | ||
assert(indexer.name === `milvus/${COLLECTION_NAME}`); | ||
}); | ||
|
||
if (clientParams.address) { | ||
it('should create a collection', async () => { | ||
const isCollectionExists = await hasMilvusCollection({ | ||
collectionName: COLLECTION_NAME, | ||
clientParams, | ||
}); | ||
if (isCollectionExists) { | ||
console.log('Collection already exists'); | ||
assert(true); | ||
return; | ||
} | ||
const result = await createMilvusCollection({ | ||
collectionName: COLLECTION_NAME, | ||
dimension: 4, | ||
clientParams, | ||
}); | ||
assert(result.error_code === 'Success'); | ||
}); | ||
|
||
it('should insert data', async () => { | ||
const result = await insertMilvusData({ | ||
collectionName: COLLECTION_NAME, | ||
entries: mockEntries, | ||
clientParams, | ||
}); | ||
assert(result.insert_cnt === `${mockEntries.length}`); | ||
}); | ||
|
||
it('should search data', async () => { | ||
const result = await searchMilvusData({ | ||
collectionName: COLLECTION_NAME, | ||
query: mockEntries[0].vector, | ||
clientParams, | ||
}); | ||
assert( | ||
result.status.error_code === 'Success' && result.results.length >= 0 | ||
); | ||
}); | ||
|
||
it('should delete a collection', async () => { | ||
const result = await deleteMilvusCollection({ | ||
collectionName: COLLECTION_NAME, | ||
clientParams, | ||
}); | ||
assert(result.error_code === 'Success'); | ||
}); | ||
} | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Enhance test robustness and coverage.
The test suite could benefit from several improvements:
- Add cleanup in an
afterAll
hook to ensure collection deletion - Add timeout handling for Milvus operations
- Include negative test cases
- Add more assertions for edge cases
Here's a suggested improvement:
describe('Test Genkit Milvus Plugin', () => {
+ // Set timeout for long-running operations
+ jest.setTimeout(30000);
+
+ // Ensure cleanup after all tests
+ afterAll(async () => {
+ if (clientParams.address) {
+ try {
+ await deleteMilvusCollection({
+ collectionName: COLLECTION_NAME,
+ clientParams,
+ });
+ } catch (error) {
+ console.error('Cleanup failed:', error);
+ }
+ }
+ });
+
// Existing tests...
+
+ // Add negative test cases
+ if (clientParams.address) {
+ it('should handle invalid vector dimensions', async () => {
+ const invalidEntry = {
+ vector: [0.1, 0.2], // Wrong dimension
+ document: 'Invalid vector',
+ metadata: { text: 'Invalid vector', pos: 1 },
+ };
+
+ await assert.rejects(
+ async () => {
+ await insertMilvusData({
+ collectionName: COLLECTION_NAME,
+ entries: [invalidEntry],
+ clientParams,
+ });
+ },
+ /dimension/i
+ );
+ });
+ }
});
Committable suggestion skipped: line range outside the PR's diff.
address: configOrAddress, | ||
token: '', // Default token is an empty string | ||
username: '', // Default username is an empty string | ||
password: '', // Default password is an empty string | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid Hardcoding Empty Credentials
The getDefaultConfig
function sets default credentials (token
, username
, password
) to empty strings. This may pose security risks if default credentials are inadvertently used in production. Recommend requiring explicit credentials or throwing an error when they're not provided.
de272c8
to
80be313
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Outside diff range and nitpick comments (7)
plugins/milvus/README.md (3)
1-10
: Add prerequisites and version compatibility information.Consider enhancing the introduction section with:
- Required Milvus version compatibility
- Prerequisites (e.g., running Milvus instance, minimum Node.js version)
- Link to Milvus installation guide for new users
54-84
: Enhance usage examples.Consider adding:
- Error handling examples and common error scenarios
- Search parameters documentation (e.g., top_k, search radius)
- Batch processing examples for efficient indexing
- Example of how to delete or update indexed documents
1-84
: Add security and performance considerations.Please add sections covering:
- Security best practices:
- Token management
- Network security recommendations
- Access control configuration
- Performance optimization:
- Connection pooling
- Batch size recommendations
- Index optimization guidelines
🧰 Tools
🪛 LanguageTool
[uncategorized] ~35-~35: Loose punctuation mark.
Context: ...e three optional parameters: -dbName
: Specified database -clientParams
: I...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~37-~37: Loose punctuation mark.
Context: ...e: Specified database -
clientParams`: If you're not running your Milvus serve...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~48-~48: Loose punctuation mark.
Context: ...oken: "", } ``` -embedderOptions
: Use this parameter to pass options to t...(UNLIKELY_OPENING_PUNCTUATION)
plugins/milvus/src/index.ts (4)
206-236
: Add retry mechanism and timeout configuration.Collection management operations could benefit from:
- Retry mechanism for transient failures
- Configurable timeout
- Error handling with specific error types
export async function createMilvusCollection(params: { collectionName: string; dimension: number; dbName?: string; clientParams?: MilvusClientParams; + timeout?: number; + maxRetries?: number; }) { const { clientParams, collectionName, dimension, dbName } = params; const milvus = getMilvusInstance(clientParams); - return milvus.createCollection({ - collection_name: collectionName, - db_name: dbName, - dimension, - auto_id: true, - enable_dynamic_field: true, - }); + const retries = params.maxRetries ?? 3; + let lastError; + + for (let i = 0; i < retries; i++) { + try { + return await milvus.createCollection({ + collection_name: collectionName, + db_name: dbName, + dimension, + auto_id: true, + enable_dynamic_field: true, + timeout: params.timeout, + }); + } catch (error) { + lastError = error; + if (i < retries - 1) { + await new Promise(resolve => setTimeout(resolve, Math.pow(2, i) * 1000)); + } + } + } + throw new Error(`Failed to create collection after ${retries} attempts: ${lastError}`);
238-272
: Add input validation and rate limiting.Data operations could benefit from:
- Input validation for entries and query parameters
- Rate limiting mechanism
- Metrics collection for monitoring
Consider implementing a rate limiter utility:
class RateLimiter { private queue: Array<() => Promise<any>> = []; private processing = false; constructor(private rateLimit: number, private interval: number) {} async execute<T>(fn: () => Promise<T>): Promise<T> { return new Promise((resolve, reject) => { this.queue.push(async () => { try { resolve(await fn()); } catch (error) { reject(error); } }); this.process(); }); } private async process() { if (this.processing) return; this.processing = true; while (this.queue.length > 0) { const batch = this.queue.splice(0, this.rateLimit); await Promise.all(batch.map(fn => fn())); if (this.queue.length > 0) { await new Promise(resolve => setTimeout(resolve, this.interval)); } } this.processing = false; } }
274-291
: Enhance error handling and cleanup.The collection deletion could be improved with:
- Specific error types
- Cleanup of related resources
- Logging of deletion attempts
export async function deleteMilvusCollection(params: { collectionName: string; dbName?: string; clientParams?: MilvusClientParams; }) { const { clientParams, collectionName, dbName } = params; const milvus = getMilvusInstance(clientParams); try { + console.log(`Attempting to delete collection: ${collectionName}`); + + // Release collection resources first + await milvus.releaseCollection({ + collection_name: collectionName, + db_name: dbName, + }); + return await milvus.dropCollection({ collection_name: collectionName, db_name: dbName, }); } catch (error) { - console.error('Failed to delete Milvus collection:', error); + if (error.message?.includes('collection not exists')) { + console.warn(`Collection ${collectionName} does not exist`); + return; + } + console.error(`Failed to delete collection ${collectionName}:`, error); throw error; } }
1-323
: Consider implementing monitoring and observability.To improve operational visibility, consider adding:
- Prometheus metrics for operation latencies and error rates
- OpenTelemetry tracing for debugging and performance analysis
- Health check endpoints for the Milvus connection
- Structured logging with correlation IDs
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
⛔ Files ignored due to path filters (1)
package-lock.json
is excluded by!**/package-lock.json
📒 Files selected for processing (6)
plugins/milvus/README.md
(1 hunks)plugins/milvus/package.json
(1 hunks)plugins/milvus/src/index.ts
(1 hunks)plugins/milvus/tests/index_test.ts
(1 hunks)plugins/milvus/tsconfig.json
(1 hunks)plugins/milvus/tsup.config.ts
(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- plugins/milvus/package.json
🚧 Files skipped from review as they are similar to previous changes (3)
- plugins/milvus/tests/index_test.ts
- plugins/milvus/tsconfig.json
- plugins/milvus/tsup.config.ts
🧰 Additional context used
🪛 LanguageTool
plugins/milvus/README.md
[uncategorized] ~35-~35: Loose punctuation mark.
Context: ...e three optional parameters: - dbName
: Specified database - clientParams
: I...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~37-~37: Loose punctuation mark.
Context: ...e: Specified database -
clientParams`: If you're not running your Milvus serve...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~48-~48: Loose punctuation mark.
Context: ...oken: "", } ``` - embedderOptions
: Use this parameter to pass options to t...
(UNLIKELY_OPENING_PUNCTUATION)
🔇 Additional comments (3)
plugins/milvus/src/index.ts (3)
1-40
: LGTM! Well-structured imports and schema definitions.
The imports, schema definitions, and interface declarations are well-organized and properly typed.
42-84
: LGTM! Clean plugin registration and reference implementations.
The plugin registration and reference implementations are well-structured and type-safe.
293-307
: LGTM! Collection existence check implementation.
The implementation for checking collection existence is clean and well-structured.
{ | ||
collectionName: 'collection_01', | ||
embedder: textEmbeddingGecko, | ||
}, | ||
]), | ||
], | ||
// ... | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add technical configuration details.
The configuration section should include:
- Required collection schema (dimension size, field types, etc.)
- Collection creation behavior (auto-create vs. manual setup)
- Performance-related settings (index type, metric type)
- Example showing all possible configuration options
/** | ||
* Configures a Milvus vector store indexer. | ||
*/ | ||
export function milvusIndexer<EmbedderCustomOptions extends z.ZodTypeAny>( | ||
params: MilvusPluginOptions<EmbedderCustomOptions> | ||
) { | ||
const { collectionName, embedder, embedderOptions, dbName } = params; | ||
const milvusConfig = params.clientParams ?? getDefaultConfig(); | ||
|
||
return defineIndexer( | ||
{ | ||
name: `milvus/${collectionName}`, | ||
configSchema: MilvusIndexerOptionsSchema.optional(), | ||
}, | ||
async (docs) => { | ||
const embeddings = await Promise.all( | ||
docs.map((doc) => | ||
embed({ | ||
embedder, | ||
content: doc, | ||
options: embedderOptions, | ||
}) | ||
) | ||
); | ||
|
||
const entries = embeddings.map((value, i) => { | ||
const metadata: RowData = { | ||
...docs[i].metadata, | ||
}; | ||
|
||
return { | ||
vector: value, | ||
document: docs[i].text(), | ||
metadata, | ||
}; | ||
}); | ||
|
||
const isCollectionExists = await hasMilvusCollection({ | ||
collectionName, | ||
dbName, | ||
clientParams: milvusConfig, | ||
}); | ||
if (!isCollectionExists) { | ||
await createMilvusCollection({ | ||
dbName, | ||
collectionName, | ||
dimension: embeddings[0].length, | ||
clientParams: milvusConfig, | ||
}); | ||
} | ||
|
||
await insertMilvusData({ | ||
collectionName: collectionName, | ||
dbName, | ||
entries, | ||
clientParams: milvusConfig, | ||
}); | ||
} | ||
); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add batch processing and error handling.
The parallel processing of documents could benefit from:
- Batch processing to prevent memory issues with large document sets
- Validation for empty embeddings array
- Error handling for failed embeddings
async (docs) => {
+ if (!docs.length) {
+ return;
+ }
+
+ const BATCH_SIZE = 1000;
+ for (let i = 0; i < docs.length; i += BATCH_SIZE) {
+ const batch = docs.slice(i, i + BATCH_SIZE);
const embeddings = await Promise.all(
- docs.map((doc) =>
+ batch.map((doc) =>
embed({
embedder,
content: doc,
options: embedderOptions,
})
)
);
+ if (!embeddings.length || !embeddings[0].length) {
+ throw new Error('Failed to generate embeddings');
+ }
Committable suggestion skipped: line range outside the PR's diff.
/** | ||
* Configures a Milvus vector store retriever. | ||
*/ | ||
export function milvusRetriever<EmbedderCustomOptions extends z.ZodTypeAny>( | ||
params: MilvusPluginOptions<EmbedderCustomOptions> | ||
) { | ||
const { collectionName, embedder, embedderOptions, dbName } = params; | ||
const milvusConfig = params.clientParams ?? getDefaultConfig(); | ||
|
||
return defineRetriever( | ||
{ | ||
name: `milvus/${collectionName}`, | ||
configSchema: MilvusRetrieverOptionsSchema, | ||
}, | ||
async (content, options) => { | ||
const queryEmbeddings = await embed({ | ||
embedder, | ||
content, | ||
options: embedderOptions, | ||
}); | ||
|
||
const isCollectionExists = await hasMilvusCollection({ | ||
collectionName, | ||
dbName, | ||
clientParams: milvusConfig, | ||
}); | ||
if (!isCollectionExists) { | ||
await createMilvusCollection({ | ||
dbName, | ||
collectionName, | ||
dimension: queryEmbeddings.length, | ||
clientParams: milvusConfig, | ||
}); | ||
} | ||
|
||
const response = await searchMilvusData({ | ||
collectionName, | ||
dbName, | ||
query: queryEmbeddings, | ||
limit: options.limit, | ||
filter: JSON.stringify(options.filter), | ||
topk: options.k, | ||
clientParams: milvusConfig, | ||
}); | ||
|
||
return { | ||
documents: response.results | ||
.map((result) => result['$meta']) | ||
.filter( | ||
(m): m is { document: string; metadata: Record<string, any> } => !!m | ||
) | ||
.map((m) => { | ||
return Document.fromText(m.document, m.metadata).toJSON(); | ||
}), | ||
}; | ||
} | ||
); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add error handling and validation.
Consider adding:
- Error handling for embedding generation
- Validation for empty query embeddings
- Validation that
queryEmbeddings.length > 0
before creating collection
async (content, options) => {
+ if (!content) {
+ throw new Error('Content is required for embedding');
+ }
+
const queryEmbeddings = await embed({
embedder,
content,
options: embedderOptions,
});
+
+ if (!queryEmbeddings || queryEmbeddings.length === 0) {
+ throw new Error('Failed to generate embeddings');
+ }
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
/** | |
* Configures a Milvus vector store retriever. | |
*/ | |
export function milvusRetriever<EmbedderCustomOptions extends z.ZodTypeAny>( | |
params: MilvusPluginOptions<EmbedderCustomOptions> | |
) { | |
const { collectionName, embedder, embedderOptions, dbName } = params; | |
const milvusConfig = params.clientParams ?? getDefaultConfig(); | |
return defineRetriever( | |
{ | |
name: `milvus/${collectionName}`, | |
configSchema: MilvusRetrieverOptionsSchema, | |
}, | |
async (content, options) => { | |
const queryEmbeddings = await embed({ | |
embedder, | |
content, | |
options: embedderOptions, | |
}); | |
const isCollectionExists = await hasMilvusCollection({ | |
collectionName, | |
dbName, | |
clientParams: milvusConfig, | |
}); | |
if (!isCollectionExists) { | |
await createMilvusCollection({ | |
dbName, | |
collectionName, | |
dimension: queryEmbeddings.length, | |
clientParams: milvusConfig, | |
}); | |
} | |
const response = await searchMilvusData({ | |
collectionName, | |
dbName, | |
query: queryEmbeddings, | |
limit: options.limit, | |
filter: JSON.stringify(options.filter), | |
topk: options.k, | |
clientParams: milvusConfig, | |
}); | |
return { | |
documents: response.results | |
.map((result) => result['$meta']) | |
.filter( | |
(m): m is { document: string; metadata: Record<string, any> } => !!m | |
) | |
.map((m) => { | |
return Document.fromText(m.document, m.metadata).toJSON(); | |
}), | |
}; | |
} | |
); | |
} | |
/** | |
* Configures a Milvus vector store retriever. | |
*/ | |
export function milvusRetriever<EmbedderCustomOptions extends z.ZodTypeAny>( | |
params: MilvusPluginOptions<EmbedderCustomOptions> | |
) { | |
const { collectionName, embedder, embedderOptions, dbName } = params; | |
const milvusConfig = params.clientParams ?? getDefaultConfig(); | |
return defineRetriever( | |
{ | |
name: `milvus/${collectionName}`, | |
configSchema: MilvusRetrieverOptionsSchema, | |
}, | |
async (content, options) => { | |
if (!content) { | |
throw new Error('Content is required for embedding'); | |
} | |
const queryEmbeddings = await embed({ | |
embedder, | |
content, | |
options: embedderOptions, | |
}); | |
if (!queryEmbeddings || queryEmbeddings.length === 0) { | |
throw new Error('Failed to generate embeddings'); | |
} | |
const isCollectionExists = await hasMilvusCollection({ | |
collectionName, | |
dbName, | |
clientParams: milvusConfig, | |
}); | |
if (!isCollectionExists) { | |
await createMilvusCollection({ | |
dbName, | |
collectionName, | |
dimension: queryEmbeddings.length, | |
clientParams: milvusConfig, | |
}); | |
} | |
const response = await searchMilvusData({ | |
collectionName, | |
dbName, | |
query: queryEmbeddings, | |
limit: options.limit, | |
filter: JSON.stringify(options.filter), | |
topk: options.k, | |
clientParams: milvusConfig, | |
}); | |
return { | |
documents: response.results | |
.map((result) => result['$meta']) | |
.filter( | |
(m): m is { document: string; metadata: Record<string, any> } => !!m | |
) | |
.map((m) => { | |
return Document.fromText(m.document, m.metadata).toJSON(); | |
}), | |
}; | |
} | |
); | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @zhanshuyou, thanks for your contribution!
Looks LGTM to me. I'll take care of the release.
@all-contributors add @zhanshuyou as contributor for code, docs, tests |
I've put up a pull request to add @zhanshuyou! 🎉 |
3bf84b7
to
a449367
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for these contributions!!
Before you submit a pull request, please make sure you have read and understood the contribution guidelines and the code of conduct.
This pull request is related to:
I have checked the following:
Description:
The Milvus plugin provides indexer and retriever implementatons that use the Milvus vector database.
Related issues:
Please, link the related issues that this pull request will resolve here (if any).
Summary by CodeRabbit
New Features
Documentation
Tests
Chores