Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved PGVector metadata filtering (no breaking changes) #12977

Closed

Conversation

bradfordben
Copy link

@bradfordben bradfordben commented Nov 7, 2023

Added more complex filtering of metadata to PGVector

Example filters:

{"column":"value"}
will result in:
WHERE langchain_pg_embedding.collection_id = 'xxxxx'::uuid::UUID AND (langchain_pg_embedding.cmetadata ->> 'column') = 'value'

{"column": {"in": ["value1", "value2"]}}
will result in:
WHERE langchain_pg_embedding.collection_id = 'xxxxx'::uuid::UUID AND (langchain_pg_embedding.cmetadata ->> 'column') IN ('value1', 'value2')

{"and":[
¦ {"or":[
¦ ¦ {"column1": "value1"},
¦ ¦ {"column2": "value2"}
¦ ]},
¦ {"or":[
¦ ¦ {"column3": "value3"},
¦ ¦ {"column3": {"like": "value4%"}}
¦ ]}
]}
will result in:
WHERE langchain_pg_embedding.collection_id = 'xxxxx'::uuid::UUID
¦ AND ((langchain_pg_embedding.cmetadata ->> 'column1') = 'value1'
¦ ¦ OR (langchain_pg_embedding.cmetadata ->> 'column2') = 'value2')
¦ AND ((langchain_pg_embedding.cmetadata ->> 'column3') = 'value3'
¦ ¦ OR langchain_pg_embedding.cmetadata ->> 'column3' LIKE 'value4%')

Example filters:

{"column":"value"}
will result in:
WHERE langchain_pg_embedding.collection_id = 'xxxxx'::uuid::UUID AND (langchain_pg_embedding.cmetadata ->> 'column') = 'value'

{"column": {"in": ["value1", "value2"]}}
will result in:
WHERE langchain_pg_embedding.collection_id = 'xxxxx'::uuid::UUID AND (langchain_pg_embedding.cmetadata ->> 'column') IN ('value1', 'value2')

{"and":[
¦   {"or":[
¦   ¦   {"column1": "value1"},
¦   ¦   {"column2": "value2"}
¦   ]},
¦   {"or":[
¦   ¦   {"column3": "value3"},
¦   ¦   {"column3": {"like": "value4%"}}
¦   ]}
]}
will result in:
WHERE langchain_pg_embedding.collection_id = 'xxxxx'::uuid::UUID
¦   AND ((langchain_pg_embedding.cmetadata ->> 'column1') = 'value1'
¦   ¦   OR (langchain_pg_embedding.cmetadata ->> 'column2') = 'value2')
¦   AND ((langchain_pg_embedding.cmetadata ->> 'column3') = 'value3'
¦   ¦   OR langchain_pg_embedding.cmetadata ->> 'column3' LIKE 'value4%')
Copy link

vercel bot commented Nov 7, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
langchain ❌ Failed (Inspect) Nov 28, 2023 8:03pm

@dosubot dosubot bot added Ɑ: vector store Related to vector store module 🤖:improvement Medium size change to existing code to handle new use-cases labels Nov 7, 2023
@raghav-knowbe4
Copy link

Any update on this PR? Our team is unable to use langchain with PGVector due to its lack of support for "OR" filter.
Having advanced metadata filtering like that in Pinecone/Qdrant would really help

Thanks

@bradfordben
Copy link
Author

@raghav-knowbe4 The PR is ready for review by the langchain developers. Not sure the next steps to get it reviewed.

I can change the format of things to use the Pinecone format but the issue is that "in" was already added with the format "in" so i just followed that to keep things consistent within the module. It would be good to have a standard across all the different data modules in langchain but thats complicated as it would be a breaking change for some modules.

@eyurtsev
Copy link
Collaborator

Closing in favor of: #18992

@eyurtsev eyurtsev closed this Mar 13, 2024
@eyurtsev
Copy link
Collaborator

@bradfordben thank you for the contribution sorry it took so long to review the PR.

I made some larger changes to get the filter application working well for postgres.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:improvement Medium size change to existing code to handle new use-cases size:L This PR changes 100-499 lines, ignoring generated files. Ɑ: vector store Related to vector store module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants