Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

community[minor]: Revamp PGVector Filtering #18992

Merged
merged 22 commits into from Mar 14, 2024
Merged

Conversation

eyurtsev
Copy link
Collaborator

@eyurtsev eyurtsev commented Mar 12, 2024

This PR makes the following updates in the pgvector database:

  1. Use JSONB field for metadata instead of JSON
  2. Update operator syntax to include required $ prefix before the operators (otherwise there will be name collisions with fields)
  3. The change is non-breaking, old functionality is still the default, but it will emit a deprecation warning
  4. Previous functionality has bugs associated with comparisons due to casting to text (so lexical ordering is used incorrectly for numeric fields)
  5. Adds an a GIN index on the JSONB field for more efficient querying

Copy link

vercel bot commented Mar 12, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
langchain ⬜️ Ignored (Inspect) Visit Preview Mar 14, 2024 7:55pm

@eyurtsev eyurtsev marked this pull request as ready for review March 13, 2024 19:56
@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. Ɑ: vector store Related to vector store module 🔌: postgres Related to postgres integrations 🤖:improvement Medium size change to existing code to handle new use-cases labels Mar 13, 2024
@eyurtsev eyurtsev changed the title PGVector filtering community[minor]: Revamp PGVector Filtering Mar 13, 2024
@@ -142,6 +210,20 @@ def _results_to_docs(docs_and_scores: Any) -> List[Document]:
return [doc for doc, _ in docs_and_scores]


def _sanitized_double_quoted_string(string: str) -> str:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nfcampos could you give this a second look -- this is dangerous code

@eyurtsev
Copy link
Collaborator Author


======================= 74 passed, 19 warnings in 3.02s ========================
PASSED                                   [  1%]PASSED                        [  2%]PASSED                    [  4%]PASSED        [  5%]PASSED                 [  6%]PASSED         [  8%]PASSED              [  9%]PASSED          [ 10%]PASSED                [ 12%]PASSED               [ 13%]PASSED                       [ 14%]PASSED                   [ 16%]PASSED        [ 17%]PASSED [ 18%]PASSED     [ 20%]PASSED [ 21%]PASSED            [ 22%]PASSED           [ 24%]PASSED [ 25%]PASSED [ 27%]PASSED [ 28%]PASSED [ 29%]PASSED [ 31%]PASSED [ 32%]PASSED [ 33%]PASSED [ 35%]PASSED [ 36%]PASSED [ 37%]PASSED [ 39%]PASSED [ 40%]PASSED [ 41%]PASSED [ 43%]PASSED [ 44%]PASSED [ 45%]PASSED [ 47%]PASSED [ 48%]PASSED [ 50%]PASSED [ 51%]PASSED [ 52%]PASSED [ 54%]PASSED [ 55%]PASSED [ 56%]PASSED [ 58%]PASSED [ 59%]PASSED [ 60%]PASSED [ 62%]PASSED [ 63%]PASSED [ 64%]PASSED [ 66%]PASSED [ 67%]PASSED [ 68%]PASSED [ 70%]PASSED [ 71%]PASSED           [ 72%]PASSED           [ 74%]PASSED           [ 75%]PASSED           [ 77%]PASSED           [ 78%]PASSED           [ 79%]PASSED              [ 81%]PASSED [ 82%]PASSED [ 83%]PASSED [ 85%]PASSED [ 86%]PASSED [ 87%]PASSED [ 89%]PASSED [ 90%]PASSED [ 91%]PASSED [ 93%]PASSED [ 94%]PASSED [ 95%]PASSED [ 97%]PASSED [ 98%]PASSED                         [100%]
Process finished with exit code 0

@eyurtsev
Copy link
Collaborator Author

Resolves issues by:

#12977
#11897
#17067

@dosubot dosubot bot added the lgtm PR looks good. Use to confirm that a PR is ready for merging. label Mar 14, 2024
@eyurtsev eyurtsev merged commit 6cdca43 into master Mar 14, 2024
59 checks passed
@eyurtsev eyurtsev deleted the eugene/pgvector_filtering branch March 14, 2024 20:56
rahul-trip pushed a commit to daxa-ai/langchain that referenced this pull request Mar 27, 2024
This PR makes the following updates in the pgvector database:

1. Use JSONB field for metadata instead of JSON
2. Update operator syntax to include required `$` prefix before the
operators (otherwise there will be name collisions with fields)
3. The change is non-breaking, old functionality is still the default,
but it will emit a deprecation warning
4. Previous functionality has bugs associated with comparisons due to
casting to text (so lexical ordering is used incorrectly for numeric
fields)
5. Adds an a GIN index on the JSONB field for more efficient querying
bechbd pushed a commit to bechbd/langchain that referenced this pull request Mar 29, 2024
This PR makes the following updates in the pgvector database:

1. Use JSONB field for metadata instead of JSON
2. Update operator syntax to include required `$` prefix before the
operators (otherwise there will be name collisions with fields)
3. The change is non-breaking, old functionality is still the default,
but it will emit a deprecation warning
4. Previous functionality has bugs associated with comparisons due to
casting to text (so lexical ordering is used incorrectly for numeric
fields)
5. Adds an a GIN index on the JSONB field for more efficient querying
gkorland pushed a commit to FalkorDB/langchain that referenced this pull request Mar 30, 2024
This PR makes the following updates in the pgvector database:

1. Use JSONB field for metadata instead of JSON
2. Update operator syntax to include required `$` prefix before the
operators (otherwise there will be name collisions with fields)
3. The change is non-breaking, old functionality is still the default,
but it will emit a deprecation warning
4. Previous functionality has bugs associated with comparisons due to
casting to text (so lexical ordering is used incorrectly for numeric
fields)
5. Adds an a GIN index on the JSONB field for more efficient querying
hinthornw pushed a commit that referenced this pull request Apr 26, 2024
This PR makes the following updates in the pgvector database:

1. Use JSONB field for metadata instead of JSON
2. Update operator syntax to include required `$` prefix before the
operators (otherwise there will be name collisions with fields)
3. The change is non-breaking, old functionality is still the default,
but it will emit a deprecation warning
4. Previous functionality has bugs associated with comparisons due to
casting to text (so lexical ordering is used incorrectly for numeric
fields)
5. Adds an a GIN index on the JSONB field for more efficient querying
@mlucool
Copy link

mlucool commented May 6, 2024

Hi.

As noted in #19681, this is a breaking change. Can this be made non breaking from a sqlalchemy POV? Today langchian declares SQLAlchemy = ">=1.4,<3"

Can this be made backwards compatible and/or reverted?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:improvement Medium size change to existing code to handle new use-cases lgtm PR looks good. Use to confirm that a PR is ready for merging. 🔌: postgres Related to postgres integrations size:XL This PR changes 500-999 lines, ignoring generated files. Ɑ: vector store Related to vector store module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants