Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance metadata handling in GoogleVectorStore #11018

Merged

Conversation

alex-feel
Copy link
Contributor

Description

This update introduces several enhancements to the GoogleVectorStore class, focusing on improved handling and flexibility of metadata during document retrieval. The changes are designed to provide users with more control over which metadata to include in the query results, aligning with the requirements of various use cases.

Type of Change

  • New feature (non-breaking change which adds functionality)

Changes:

  1. Metadata Inclusion Control: Added the ability to specify whether to include custom metadata in query results through the include_metadata parameter. This boolean parameter allows users to decide if metadata should be part of the query response, enhancing the versatility of data retrieval.

  2. Selective Metadata Filtering: Introduced the metadata_keys parameter, enabling users to specify a list of metadata keys that should be included in the query results. This feature allows for the selective inclusion of metadata, ensuring that only relevant information is returned, which can be particularly useful for reducing payload sizes or focusing on specific data aspects.

  3. Constructor and Factory Method Updates: Modified the from_corpus factory method of the GoogleVectorStore class to accept include_metadata and metadata_keys parameters. These updates allow the configuration of metadata handling preferences at the instance creation stage, providing a seamless and intuitive setup process.

  4. Query Method Enhancement: Updated the query method to leverage the include_metadata and metadata_keys attributes for filtering metadata based on user preferences. The method now conditionally includes metadata in the query results, based on the instance configuration, and filters the included metadata according to the specified keys.

Usage Example:

store = GoogleVectorStore.from_corpus(
    corpus_id="my-corpus-id",
    include_metadata=True,
    metadata_keys=['file_name', 'creation_date']
)

This enhancement improves the GoogleVectorStore's flexibility and usability, providing users with precise control over metadata handling in their applications. By allowing selective metadata inclusion and filtering, it caters to a wider range of use cases and data processing requirements.

How Has This Been Tested?

  • I stared at the code and made sure it makes sense
  • I tested it locally

Sorry, something went wrong.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
This update introduces several enhancements to the `GoogleVectorStore` class, focusing on improved handling and flexibility of metadata during document retrieval. The changes are designed to provide users with more control over which metadata to include in the query results, aligning with the requirements of various use cases.

### Changes:

1. **Metadata Inclusion Control**: Added the ability to specify whether to include custom metadata in query results through the `include_metadata` parameter. This boolean parameter allows users to decide if metadata should be part of the query response, enhancing the versatility of data retrieval.

2. **Selective Metadata Filtering**: Introduced the `metadata_keys` parameter, enabling users to specify a list of metadata keys that should be included in the query results. This feature allows for the selective inclusion of metadata, ensuring that only relevant information is returned, which can be particularly useful for reducing payload sizes or focusing on specific data aspects.

3. **Constructor and Factory Method Updates**: Modified the `from_corpus` factory method of the `GoogleVectorStore` class to accept `include_metadata` and `metadata_keys` parameters. These updates allow the configuration of metadata handling preferences at the instance creation stage, providing a seamless and intuitive setup process.

4. **Query Method Enhancement**: Updated the `query` method to leverage the `include_metadata` and `metadata_keys` attributes for filtering metadata based on user preferences. The method now conditionally includes metadata in the query results, based on the instance configuration, and filters the included metadata according to the specified keys.

### Usage Example:

```python
store = GoogleVectorStore.from_corpus(
    corpus_id="my-corpus-id",
    include_metadata=True,
    metadata_keys=['file_name', 'creation_date']
)
```

This enhancement improves the GoogleVectorStore's flexibility and usability, providing users with precise control over metadata handling in their applications. By allowing selective metadata inclusion and filtering, it caters to a wider range of use cases and data processing requirements.
@dosubot dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Feb 20, 2024
@alex-feel
Copy link
Contributor Author

@logan-markewich, @nerdai, I'm not entirely certain if these changes align with your current plans and vision for the project structure. Could you please review and let me know if this should be integrated differently?

Copy link
Collaborator

@logan-markewich logan-markewich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Feb 20, 2024
@logan-markewich logan-markewich merged commit 79fce27 into run-llama:main Feb 20, 2024
8 checks passed
@alex-feel alex-feel deleted the add-metadata-retrieval-support branch February 20, 2024 22:00
Dominastorm pushed a commit to uptrain-ai/llama_index that referenced this pull request Feb 28, 2024
anoopshrma pushed a commit to anoopshrma/llama_index that referenced this pull request Mar 2, 2024
Izukimat pushed a commit to Izukimat/llama_index that referenced this pull request Mar 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm This PR has been approved by a maintainer size:M This PR changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants