feat: milvus plugin #156

zhanshuyou · 2024-11-07T06:54:50Z

Before you submit a pull request, please make sure you have read and understood the contribution guidelines and the code of conduct.

This pull request is related to:

A bug
A new feature
Documentation
Other (please specify)

I have checked the following:

I have read and understood the contribution guidelines and the code of conduct;
I have added new tests (for bug fixes/features);
I have added/updated the documentation (for bug fixes / features).

Description:
The Milvus plugin provides indexer and retriever implementatons that use the Milvus vector database.

Related issues:
Please, link the related issues that this pull request will resolve here (if any).

Summary by CodeRabbit

New Features
- Introduced a new Milvus plugin for indexing and retrieving data with the Milvus vector database.
- Added functionalities for creating retrievers and indexers, along with collection management.
- Provided examples and detailed instructions in the README for easy setup and usage.
Documentation
- Updated README to include installation instructions, configuration details, and usage examples for the new plugin.
Tests
- Added a suite of unit tests to validate the functionality of the Milvus plugin, including retriever and indexer operations.
Chores
- Created necessary configuration files for TypeScript and build settings specific to the Milvus plugin.

coderabbitai · 2024-11-07T06:54:58Z

Walkthrough

The changes introduce a new Milvus plugin within the Genkit framework, enabling indexing and retrieval of data using the Milvus vector database. Key components include a README.md detailing installation and configuration, a package.json defining dependencies and scripts, and an index.ts file implementing core functionalities such as retrievers and indexers. Additional files include a test suite for unit testing the plugin's features, TypeScript configuration, and build configuration files. Overall, the updates establish a structured integration of the Milvus database into the Genkit ecosystem.

Changes

File	Change Summary
plugins/milvus/README.md	Added documentation for the new Milvus plugin, including installation, configuration, and usage examples.
plugins/milvus/package.json	Created `package.json` for the `genkitx-milvus` plugin, specifying metadata, dependencies, and scripts.
plugins/milvus/src/index.ts	Introduced core functionality for the Milvus plugin, including schemas, retriever, indexer, and utility functions.
plugins/milvus/tests/index_test.ts	Added unit tests for the Milvus plugin, covering retriever and indexer creation, collection management, and data operations.
plugins/milvus/tsconfig.json	Created TypeScript configuration file extending the base config and including the `src` directory.
plugins/milvus/tsup.config.ts	Introduced build configuration for the Milvus plugin, extending common settings and defining build options.

Poem

In the garden of code, a new plugin blooms,
With Milvus at heart, it dispels all the glooms.
Indexing and searching, oh what a delight,
Genkit and Milvus, a powerful sight!
So hop to the rhythm, let data take flight,
In the world of vectors, we dance through the night! 🐇✨

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 6

🧹 Outside diff range and nitpick comments (5)

plugins/milvus/README.md (2)

1-10: Enhance introduction and installation documentation.

Consider adding:

A brief explanation of what vector databases are and their use cases

Required peer dependencies (if any)

Supported Milvus versions

54-84: Enhance usage documentation with more examples.

Consider adding:

TypeScript type information for parameters and return values

Error handling examples

Complete example with actual document structure

Example of how to query with different similarity metrics
plugins/milvus/tests/index_test.ts (2)
31-34: Add documentation for client parameters.

The empty client parameters will cause most tests to be skipped. Consider adding:

A comment explaining how to configure these parameters

Instructions for setting up test credentials

Environment variable support for CI/CD
 const clientParams = {
-  address: '',
-  token: '',
+  address: process.env.MILVUS_ADDRESS || '',
+  token: process.env.MILVUS_TOKEN || '',
 };
+
+// Note: Tests requiring a Milvus connection will be skipped if MILVUS_ADDRESS is not set
+// To run all tests locally:
+// 1. Start a Milvus instance
+// 2. Set MILVUS_ADDRESS (e.g., 'localhost:19530')
+// 3. Set MILVUS_TOKEN if authentication is enabled
105-107: Strengthen assertions for search results.

The current assertion only checks the error code and array length. Consider adding more comprehensive checks:

Type validation for search results

Verification of result structure

Score threshold validation
-      assert(
-        result.status.error_code === 'Success' && result.results.length >= 0
-      );
+      assert(result.status.error_code === 'Success', 'Search should succeed');
+      assert(Array.isArray(result.results), 'Results should be an array');
+      assert(result.results.length > 0, 'Should return at least one result');
+      
+      // Verify result structure
+      const firstResult = result.results[0];
+      assert('score' in firstResult, 'Result should contain a score');
+      assert('id' in firstResult, 'Result should contain an id');
+      assert(firstResult.score >= 0 && firstResult.score <= 1, 'Score should be between 0 and 1');
plugins/milvus/src/index.ts (1)

297-305: Handle Sensitive Information in Error Logging

In the deleteMilvusCollection function, logging the error with console.error might expose sensitive information. Consider sanitizing the error message or logging less sensitive details to avoid potential information leakage.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 25df5a1 and de272c8.

⛔ Files ignored due to path filters (1)

package-lock.json is excluded by !**/package-lock.json

📒 Files selected for processing (6)

plugins/milvus/README.md (1 hunks)
plugins/milvus/package.json (1 hunks)
plugins/milvus/src/index.ts (1 hunks)
plugins/milvus/tests/index_test.ts (1 hunks)
plugins/milvus/tsconfig.json (1 hunks)
plugins/milvus/tsup.config.ts (1 hunks)

✅ Files skipped from review due to trivial changes (3)

plugins/milvus/package.json
plugins/milvus/tsconfig.json
plugins/milvus/tsup.config.ts

🧰 Additional context used

🪛 LanguageTool

plugins/milvus/README.md

[uncategorized] ~35-~35: Loose punctuation mark.
Context: ...e three optional parameters: - dbName: Specified database - clientParams: I...

(UNLIKELY_OPENING_PUNCTUATION)

[uncategorized] ~37-~37: Loose punctuation mark.
Context: ...e: Specified database - clientParams`: If you're not running your Milvus serve...

(UNLIKELY_OPENING_PUNCTUATION)

[uncategorized] ~48-~48: Loose punctuation mark.
Context: ...oken: "", } ``` - embedderOptions: Use this parameter to pass options to t...

(UNLIKELY_OPENING_PUNCTUATION)

🔇 Additional comments (4)

plugins/milvus/README.md (1)

1-84: 🛠️ Refactor suggestion

Add essential documentation sections.

Consider adding the following sections to make the documentation more comprehensive:

Troubleshooting guide with common issues and solutions
Migration guide for users coming from similar vector database plugins
Performance considerations and best practices
Security considerations when connecting to remote Milvus instances

🧰 Tools

🪛 LanguageTool

[uncategorized] ~35-~35: Loose punctuation mark.
Context: ...e three optional parameters: - dbName: Specified database - clientParams: I...

(UNLIKELY_OPENING_PUNCTUATION)

[uncategorized] ~37-~37: Loose punctuation mark.
Context: ...e: Specified database - clientParams`: If you're not running your Milvus serve...

(UNLIKELY_OPENING_PUNCTUATION)

[uncategorized] ~48-~48: Loose punctuation mark.
Context: ...oken: "", } ``` - embedderOptions: Use this parameter to pass options to t...

(UNLIKELY_OPENING_PUNCTUATION)

plugins/milvus/tests/index_test.ts (2)

1-28: LGTM! Proper license and imports.

The license header is complete, and the imports are appropriately structured using built-in Node.js test modules.

37-54: Verify vector dimensions match collection configuration.

The mock entries use 4-dimensional vectors, but there's no validation to ensure this matches the collection's configuration. Consider:

Adding a constant for the vector dimension
Validating mock data dimensions

+const VECTOR_DIMENSION = 4;
+
+// Validate vector dimensions
+const validateVectorDimension = (vector: number[]) => {
+  if (vector.length !== VECTOR_DIMENSION) {
+    throw new Error(`Vector dimension must be ${VECTOR_DIMENSION}`);
+  }
+};
+
 const mockEntries = [
   {
     vector: [0.1, 0.2, 0.3, 0.4],
     document: 'This is a test document',
     metadata: {

✅ Verification successful

Let me check the actual dimension configurations in these files to verify if there's consistency.

Let me check how this test file is using these mock entries and if there's any connection to the embedders.

Mock vector dimensions correctly match collection configuration

The mock entries with 4-dimensional vectors are correctly aligned with the collection configuration. The test file explicitly creates a Milvus collection with dimension: 4 and uses consistent 4-dimensional vectors in the mock data for testing insert and search operations. No changes are needed.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Verify vector dimension consistency across the codebase
rg -l "dimension.*4" --type ts

Length of output: 131

Script:

#!/bin/bash
# Check the context around dimension configurations
rg "dimension.*4" -B 2 -A 2 --type ts

Length of output: 1842

Script:

#!/bin/bash
# Get more context around the test file usage
cat plugins/milvus/tests/index_test.ts

Length of output: 3210

plugins/milvus/src/index.ts (1)

105-159: milvusRetriever Function Implementation Looks Good

The milvusRetriever function is well-implemented, correctly handles collection existence, and performs querying appropriately. Good job on integrating embedding and retrieval logic.

plugins/milvus/README.md

coderabbitai · 2024-11-07T07:04:29Z

plugins/milvus/tests/index_test.ts

+describe('Test Genkit Milvus Plugin', () => {
+  it('should create a retriever', async () => {
+    const retriever = milvusRetrieverRef({
+      collectionName: COLLECTION_NAME,
+    });
+    assert(retriever.name === `milvus/${COLLECTION_NAME}`);
+  });
+
+  it('should create an indexer', async () => {
+    const indexer = milvusIndexerRef({
+      collectionName: COLLECTION_NAME,
+    });
+    assert(indexer.name === `milvus/${COLLECTION_NAME}`);
+  });
+
+  if (clientParams.address) {
+    it('should create a collection', async () => {
+      const isCollectionExists = await hasMilvusCollection({
+        collectionName: COLLECTION_NAME,
+        clientParams,
+      });
+      if (isCollectionExists) {
+        console.log('Collection already exists');
+        assert(true);
+        return;
+      }
+      const result = await createMilvusCollection({
+        collectionName: COLLECTION_NAME,
+        dimension: 4,
+        clientParams,
+      });
+      assert(result.error_code === 'Success');
+    });
+
+    it('should insert data', async () => {
+      const result = await insertMilvusData({
+        collectionName: COLLECTION_NAME,
+        entries: mockEntries,
+        clientParams,
+      });
+      assert(result.insert_cnt === `${mockEntries.length}`);
+    });
+
+    it('should search data', async () => {
+      const result = await searchMilvusData({
+        collectionName: COLLECTION_NAME,
+        query: mockEntries[0].vector,
+        clientParams,
+      });
+      assert(
+        result.status.error_code === 'Success' && result.results.length >= 0
+      );
+    });
+
+    it('should delete a collection', async () => {
+      const result = await deleteMilvusCollection({
+        collectionName: COLLECTION_NAME,
+        clientParams,
+      });
+      assert(result.error_code === 'Success');
+    });
+  }
+});


🛠️ Refactor suggestion

Enhance test robustness and coverage.

The test suite could benefit from several improvements:

Add cleanup in an afterAll hook to ensure collection deletion

Add timeout handling for Milvus operations

Include negative test cases

Add more assertions for edge cases

Here's a suggested improvement:

describe('Test Genkit Milvus Plugin', () => { + // Set timeout for long-running operations + jest.setTimeout(30000); + + // Ensure cleanup after all tests + afterAll(async () => { + if (clientParams.address) { + try { + await deleteMilvusCollection({ + collectionName: COLLECTION_NAME, + clientParams, + }); + } catch (error) { + console.error('Cleanup failed:', error); + } + } + }); + // Existing tests... + + // Add negative test cases + if (clientParams.address) { + it('should handle invalid vector dimensions', async () => { + const invalidEntry = { + vector: [0.1, 0.2], // Wrong dimension + document: 'Invalid vector', + metadata: { text: 'Invalid vector', pos: 1 }, + }; + + await assert.rejects( + async () => { + await insertMilvusData({ + collectionName: COLLECTION_NAME, + entries: [invalidEntry], + clientParams, + }); + }, + /dimension/i + ); + }); + } });

Committable suggestion skipped: line range outside the PR's diff.

plugins/milvus/src/index.ts

coderabbitai · 2024-11-07T07:04:29Z

plugins/milvus/src/index.ts

+    address: configOrAddress,
+    token: '', // Default token is an empty string
+    username: '', // Default username is an empty string
+    password: '', // Default password is an empty string
+  };


⚠️ Potential issue

Avoid Hardcoding Empty Credentials

The getDefaultConfig function sets default credentials (token, username, password) to empty strings. This may pose security risks if default credentials are inadvertently used in production. Recommend requiring explicit credentials or throwing an error when they're not provided.

plugins/milvus/src/index.ts

coderabbitai

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (7)

plugins/milvus/README.md (3)

1-10: Add prerequisites and version compatibility information.

Consider enhancing the introduction section with:

Required Milvus version compatibility

Prerequisites (e.g., running Milvus instance, minimum Node.js version)

Link to Milvus installation guide for new users

54-84: Enhance usage examples.

Consider adding:

Error handling examples and common error scenarios

Search parameters documentation (e.g., top_k, search radius)

Batch processing examples for efficient indexing

Example of how to delete or update indexed documents

1-84: Add security and performance considerations.

Please add sections covering:

Security best practices:

Token management

Network security recommendations

Access control configuration

Performance optimization:

Connection pooling

Batch size recommendations

Index optimization guidelines

🧰 Tools

🪛 LanguageTool

[uncategorized] ~35-~35: Loose punctuation mark.
Context: ...e three optional parameters: - dbName: Specified database - clientParams: I...

(UNLIKELY_OPENING_PUNCTUATION)

[uncategorized] ~37-~37: Loose punctuation mark.
Context: ...e: Specified database - clientParams`: If you're not running your Milvus serve...

(UNLIKELY_OPENING_PUNCTUATION)

[uncategorized] ~48-~48: Loose punctuation mark.
Context: ...oken: "", } ``` - embedderOptions: Use this parameter to pass options to t...

(UNLIKELY_OPENING_PUNCTUATION)
plugins/milvus/src/index.ts (4)
206-236: Add retry mechanism and timeout configuration.

Collection management operations could benefit from:

Retry mechanism for transient failures

Configurable timeout

Error handling with specific error types
 export async function createMilvusCollection(params: {
   collectionName: string;
   dimension: number;
   dbName?: string;
   clientParams?: MilvusClientParams;
+  timeout?: number;
+  maxRetries?: number;
 }) {
   const { clientParams, collectionName, dimension, dbName } = params;
   const milvus = getMilvusInstance(clientParams);

-  return milvus.createCollection({
-    collection_name: collectionName,
-    db_name: dbName,
-    dimension,
-    auto_id: true,
-    enable_dynamic_field: true,
-  });
+  const retries = params.maxRetries ?? 3;
+  let lastError;
+
+  for (let i = 0; i < retries; i++) {
+    try {
+      return await milvus.createCollection({
+        collection_name: collectionName,
+        db_name: dbName,
+        dimension,
+        auto_id: true,
+        enable_dynamic_field: true,
+        timeout: params.timeout,
+      });
+    } catch (error) {
+      lastError = error;
+      if (i < retries - 1) {
+        await new Promise(resolve => setTimeout(resolve, Math.pow(2, i) * 1000));
+      }
+    }
+  }
+  throw new Error(`Failed to create collection after ${retries} attempts: ${lastError}`);
238-272: Add input validation and rate limiting.

Data operations could benefit from:

Input validation for entries and query parameters

Rate limiting mechanism

Metrics collection for monitoring

Consider implementing a rate limiter utility:
class RateLimiter {
  private queue: Array<() => Promise<any>> = [];
  private processing = false;
  
  constructor(private rateLimit: number, private interval: number) {}
  
  async execute<T>(fn: () => Promise<T>): Promise<T> {
    return new Promise((resolve, reject) => {
      this.queue.push(async () => {
        try {
          resolve(await fn());
        } catch (error) {
          reject(error);
        }
      });
      this.process();
    });
  }
  
  private async process() {
    if (this.processing) return;
    this.processing = true;
    
    while (this.queue.length > 0) {
      const batch = this.queue.splice(0, this.rateLimit);
      await Promise.all(batch.map(fn => fn()));
      if (this.queue.length > 0) {
        await new Promise(resolve => setTimeout(resolve, this.interval));
      }
    }
    
    this.processing = false;
  }
}
274-291: Enhance error handling and cleanup.

The collection deletion could be improved with:

Specific error types

Cleanup of related resources

Logging of deletion attempts
 export async function deleteMilvusCollection(params: {
   collectionName: string;
   dbName?: string;
   clientParams?: MilvusClientParams;
 }) {
   const { clientParams, collectionName, dbName } = params;
   const milvus = getMilvusInstance(clientParams);

   try {
+    console.log(`Attempting to delete collection: ${collectionName}`);
+    
+    // Release collection resources first
+    await milvus.releaseCollection({
+      collection_name: collectionName,
+      db_name: dbName,
+    });
+
     return await milvus.dropCollection({
       collection_name: collectionName,
       db_name: dbName,
     });
   } catch (error) {
-    console.error('Failed to delete Milvus collection:', error);
+    if (error.message?.includes('collection not exists')) {
+      console.warn(`Collection ${collectionName} does not exist`);
+      return;
+    }
+    console.error(`Failed to delete collection ${collectionName}:`, error);
     throw error;
   }
 }
1-323: Consider implementing monitoring and observability.

To improve operational visibility, consider adding:

Prometheus metrics for operation latencies and error rates

OpenTelemetry tracing for debugging and performance analysis

Health check endpoints for the Milvus connection

Structured logging with correlation IDs

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between de272c8 and 80be313.

⛔ Files ignored due to path filters (1)

package-lock.json is excluded by !**/package-lock.json

📒 Files selected for processing (6)

plugins/milvus/README.md (1 hunks)
plugins/milvus/package.json (1 hunks)
plugins/milvus/src/index.ts (1 hunks)
plugins/milvus/tests/index_test.ts (1 hunks)
plugins/milvus/tsconfig.json (1 hunks)
plugins/milvus/tsup.config.ts (1 hunks)

✅ Files skipped from review due to trivial changes (1)

plugins/milvus/package.json

🚧 Files skipped from review as they are similar to previous changes (3)

plugins/milvus/tests/index_test.ts
plugins/milvus/tsconfig.json
plugins/milvus/tsup.config.ts

🧰 Additional context used

🪛 LanguageTool

plugins/milvus/README.md

[uncategorized] ~35-~35: Loose punctuation mark.
Context: ...e three optional parameters: - dbName: Specified database - clientParams: I...

(UNLIKELY_OPENING_PUNCTUATION)

[uncategorized] ~37-~37: Loose punctuation mark.
Context: ...e: Specified database - clientParams`: If you're not running your Milvus serve...

(UNLIKELY_OPENING_PUNCTUATION)

[uncategorized] ~48-~48: Loose punctuation mark.
Context: ...oken: "", } ``` - embedderOptions: Use this parameter to pass options to t...

(UNLIKELY_OPENING_PUNCTUATION)

🔇 Additional comments (3)

plugins/milvus/src/index.ts (3)

1-40: LGTM! Well-structured imports and schema definitions.

The imports, schema definitions, and interface declarations are well-organized and properly typed.

42-84: LGTM! Clean plugin registration and reference implementations.

The plugin registration and reference implementations are well-structured and type-safe.

293-307: LGTM! Collection existence check implementation.

The implementation for checking collection existence is clean and well-structured.

coderabbitai · 2024-11-07T11:25:55Z

plugins/milvus/README.md

+      {
+        collectionName: 'collection_01',
+        embedder: textEmbeddingGecko,
+      },
+    ]),
+  ],
+  // ...
+});


🛠️ Refactor suggestion

Add technical configuration details.

The configuration section should include:

Required collection schema (dimension size, field types, etc.)

Collection creation behavior (auto-create vs. manual setup)

Performance-related settings (index type, metric type)

Example showing all possible configuration options

coderabbitai · 2024-11-07T11:25:56Z

plugins/milvus/src/index.ts

+/**
+ * Configures a Milvus vector store indexer.
+ */
+export function milvusIndexer<EmbedderCustomOptions extends z.ZodTypeAny>(
+  params: MilvusPluginOptions<EmbedderCustomOptions>
+) {
+  const { collectionName, embedder, embedderOptions, dbName } = params;
+  const milvusConfig = params.clientParams ?? getDefaultConfig();
+
+  return defineIndexer(
+    {
+      name: `milvus/${collectionName}`,
+      configSchema: MilvusIndexerOptionsSchema.optional(),
+    },
+    async (docs) => {
+      const embeddings = await Promise.all(
+        docs.map((doc) =>
+          embed({
+            embedder,
+            content: doc,
+            options: embedderOptions,
+          })
+        )
+      );
+
+      const entries = embeddings.map((value, i) => {
+        const metadata: RowData = {
+          ...docs[i].metadata,
+        };
+
+        return {
+          vector: value,
+          document: docs[i].text(),
+          metadata,
+        };
+      });
+
+      const isCollectionExists = await hasMilvusCollection({
+        collectionName,
+        dbName,
+        clientParams: milvusConfig,
+      });
+      if (!isCollectionExists) {
+        await createMilvusCollection({
+          dbName,
+          collectionName,
+          dimension: embeddings[0].length,
+          clientParams: milvusConfig,
+        });
+      }
+
+      await insertMilvusData({
+        collectionName: collectionName,
+        dbName,
+        entries,
+        clientParams: milvusConfig,
+      });
+    }
+  );
+}


🛠️ Refactor suggestion

Add batch processing and error handling.

The parallel processing of documents could benefit from:

Batch processing to prevent memory issues with large document sets

Validation for empty embeddings array

Error handling for failed embeddings

async (docs) => { + if (!docs.length) { + return; + } + + const BATCH_SIZE = 1000; + for (let i = 0; i < docs.length; i += BATCH_SIZE) { + const batch = docs.slice(i, i + BATCH_SIZE); const embeddings = await Promise.all( - docs.map((doc) => + batch.map((doc) => embed({ embedder, content: doc, options: embedderOptions, }) ) ); + if (!embeddings.length || !embeddings[0].length) { + throw new Error('Failed to generate embeddings'); + }

Committable suggestion skipped: line range outside the PR's diff.

coderabbitai · 2024-11-07T11:25:56Z

plugins/milvus/src/index.ts

+/**
+ * Configures a Milvus vector store retriever.
+ */
+export function milvusRetriever<EmbedderCustomOptions extends z.ZodTypeAny>(
+  params: MilvusPluginOptions<EmbedderCustomOptions>
+) {
+  const { collectionName, embedder, embedderOptions, dbName } = params;
+  const milvusConfig = params.clientParams ?? getDefaultConfig();
+
+  return defineRetriever(
+    {
+      name: `milvus/${collectionName}`,
+      configSchema: MilvusRetrieverOptionsSchema,
+    },
+    async (content, options) => {
+      const queryEmbeddings = await embed({
+        embedder,
+        content,
+        options: embedderOptions,
+      });
+
+      const isCollectionExists = await hasMilvusCollection({
+        collectionName,
+        dbName,
+        clientParams: milvusConfig,
+      });
+      if (!isCollectionExists) {
+        await createMilvusCollection({
+          dbName,
+          collectionName,
+          dimension: queryEmbeddings.length,
+          clientParams: milvusConfig,
+        });
+      }
+
+      const response = await searchMilvusData({
+        collectionName,
+        dbName,
+        query: queryEmbeddings,
+        limit: options.limit,
+        filter: JSON.stringify(options.filter),
+        topk: options.k,
+        clientParams: milvusConfig,
+      });
+
+      return {
+        documents: response.results
+          .map((result) => result['$meta'])
+          .filter(
+            (m): m is { document: string; metadata: Record<string, any> } => !!m
+          )
+          .map((m) => {
+            return Document.fromText(m.document, m.metadata).toJSON();
+          }),
+      };
+    }
+  );
+}


🛠️ Refactor suggestion

Add error handling and validation.

Consider adding:

Error handling for embedding generation

Validation for empty query embeddings

Validation that queryEmbeddings.length > 0 before creating collection

async (content, options) => { + if (!content) { + throw new Error('Content is required for embedding'); + } + const queryEmbeddings = await embed({ embedder, content, options: embedderOptions, }); + + if (!queryEmbeddings || queryEmbeddings.length === 0) { + throw new Error('Failed to generate embeddings'); + }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

/**

* Configures a Milvus vector store retriever.

*/

export function milvusRetriever<EmbedderCustomOptions extends z.ZodTypeAny>(

params: MilvusPluginOptions<EmbedderCustomOptions>

) {

const { collectionName, embedder, embedderOptions, dbName } = params;

const milvusConfig = params.clientParams ?? getDefaultConfig();

return defineRetriever(

{

name: `milvus/${collectionName}`,

configSchema: MilvusRetrieverOptionsSchema,

},

async (content, options) => {

const queryEmbeddings = await embed({

embedder,

content,

options: embedderOptions,

});

const isCollectionExists = await hasMilvusCollection({

collectionName,

dbName,

clientParams: milvusConfig,

});

if (!isCollectionExists) {

await createMilvusCollection({

dbName,

collectionName,

dimension: queryEmbeddings.length,

clientParams: milvusConfig,

});

}

const response = await searchMilvusData({

collectionName,

dbName,

query: queryEmbeddings,

limit: options.limit,

filter: JSON.stringify(options.filter),

topk: options.k,

clientParams: milvusConfig,

});

return {

documents: response.results

.map((result) => result['$meta'])

.filter(

(m): m is { document: string; metadata: Record<string, any> } => !!m

)

.map((m) => {

return Document.fromText(m.document, m.metadata).toJSON();

}),

};

}

);

}

/**

* Configures a Milvus vector store retriever.

*/

export function milvusRetriever<EmbedderCustomOptions extends z.ZodTypeAny>(

params: MilvusPluginOptions<EmbedderCustomOptions>

) {

const { collectionName, embedder, embedderOptions, dbName } = params;

const milvusConfig = params.clientParams ?? getDefaultConfig();

return defineRetriever(

{

name: `milvus/${collectionName}`,

configSchema: MilvusRetrieverOptionsSchema,

},

async (content, options) => {

if (!content) {

throw new Error('Content is required for embedding');

}

const queryEmbeddings = await embed({

embedder,

content,

options: embedderOptions,

});

if (!queryEmbeddings || queryEmbeddings.length === 0) {

throw new Error('Failed to generate embeddings');

}

const isCollectionExists = await hasMilvusCollection({

collectionName,

dbName,

clientParams: milvusConfig,

});

if (!isCollectionExists) {

await createMilvusCollection({

dbName,

collectionName,

dimension: queryEmbeddings.length,

clientParams: milvusConfig,

});

}

const response = await searchMilvusData({

collectionName,

dbName,

query: queryEmbeddings,

limit: options.limit,

filter: JSON.stringify(options.filter),

topk: options.k,

clientParams: milvusConfig,

});

return {

documents: response.results

.map((result) => result['$meta'])

.filter(

(m): m is { document: string; metadata: Record<string, any> } => !!m

)

.map((m) => {

return Document.fromText(m.document, m.metadata).toJSON();

}),

};

}

);

}

EPMatt

Hey @zhanshuyou, thanks for your contribution!

Looks LGTM to me. I'll take care of the release.

EPMatt · 2024-11-15T11:47:20Z

@all-contributors add @zhanshuyou as contributor for code, docs, tests

allcontributors · 2024-11-15T11:47:30Z

@EPMatt

I've put up a pull request to add @zhanshuyou! 🎉

davidoort

Thanks so much for these contributions!!

zhanshuyou requested review from davidoort and EPMatt as code owners November 7, 2024 06:54

coderabbitai bot reviewed Nov 7, 2024

View reviewed changes

zhanshuyou force-pushed the feat/milvus-plugin branch from de272c8 to 80be313 Compare November 7, 2024 11:18

coderabbitai bot reviewed Nov 7, 2024

View reviewed changes

EPMatt previously approved these changes Nov 15, 2024

View reviewed changes

EPMatt dismissed their stale review via 342c1f8 November 15, 2024 11:46

allcontributors bot mentioned this pull request Nov 15, 2024

docs: add zhanshuyou as a contributor for code, doc, and test #160

Merged

davidoort previously approved these changes Nov 15, 2024

View reviewed changes

EPMatt dismissed davidoort’s stale review via 9c0a78f November 15, 2024 13:34

zhanshuyou and others added 7 commits November 15, 2024 13:37

feat: milvus plugin

b473ddc

add plugin owner

4978a2a

add plugin link, standardize docs

a4053ac

add docs page for milvus plugin

6d7369f

fix lockfile

a48dda2

bump genkit peerdeps to 0.9.0

316975e

fix peerdeps, fix lockfile

Loading
Loading status checks…

a449367

EPMatt force-pushed the feat/milvus-plugin branch from 3bf84b7 to a449367 Compare November 15, 2024 13:45

EPMatt requested a review from davidoort November 15, 2024 13:49

davidoort approved these changes Nov 16, 2024

View reviewed changes

Merge branch 'main' into feat/milvus-plugin

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified
Learn about vigilant mode

Loading
Loading status checks…

d177064

EPMatt enabled auto-merge (squash) November 17, 2024 22:19

EPMatt merged commit 8403d97 into TheFireCo:main Nov 17, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: milvus plugin #156

feat: milvus plugin #156

zhanshuyou commented Nov 7, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 7, 2024 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

coderabbitai bot Nov 7, 2024

coderabbitai bot Nov 7, 2024

coderabbitai bot left a comment

coderabbitai bot Nov 7, 2024

coderabbitai bot Nov 7, 2024

coderabbitai bot Nov 7, 2024

EPMatt left a comment

EPMatt commented Nov 15, 2024

allcontributors bot commented Nov 15, 2024

davidoort left a comment

feat: milvus plugin #156

feat: milvus plugin #156

Conversation

zhanshuyou commented Nov 7, 2024 • edited by coderabbitai bot Loading

Summary by CodeRabbit

coderabbitai bot commented Nov 7, 2024 • edited Loading

Walkthrough

Changes

Poem

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Nov 7, 2024

Choose a reason for hiding this comment

coderabbitai bot Nov 7, 2024

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Nov 7, 2024

Choose a reason for hiding this comment

coderabbitai bot Nov 7, 2024

Choose a reason for hiding this comment

coderabbitai bot Nov 7, 2024

Choose a reason for hiding this comment

EPMatt left a comment

Choose a reason for hiding this comment

EPMatt commented Nov 15, 2024

allcontributors bot commented Nov 15, 2024

davidoort left a comment

Choose a reason for hiding this comment

zhanshuyou commented Nov 7, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 7, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)