Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question]: When searching vector databases, queries that have no parameters that will not meet the threshold requirements will be excluded. #13571

Open
1 task done
union-cmd opened this issue May 18, 2024 · 1 comment
Labels
question Further information is requested

Comments

@union-cmd
Copy link

Question Validation

  • I have searched both the documentation and discord for an answer.

Question

def rag(self, memory, index, prompt, k=1):
chat_engine = index.as_chat_engine(
chat_mode="condense_plus_context",
memory=memory,
temperature=0.9,
similarity_top_ps=0.9,
similarity_top_k=k,
)
result = self.transform_str(chat_engine.chat(prompt))
return result

this is my code, I want queries that do not meet the threshold requirement to be excluded when searching the index.

@union-cmd union-cmd added the question Further information is requested label May 18, 2024
Copy link

dosubot bot commented May 18, 2024

To ensure queries that do not meet a certain threshold requirement are excluded when searching the index, you can use metadata filters for advanced querying. This involves leveraging the SimpleVectorStore from the LlamaIndex framework, which allows for filtering based on metadata. Here's a concise guide on how to implement this:

  1. Initialize the SimpleVectorStore and add your data to it. This store will be used to perform queries with filters.

  2. Define your query embedding and the number of top results (similarity_top_k) you wish to retrieve.

  3. Apply metadata filters to exclude results below a certain threshold. You can use MetadataFilters and ExactMatchFilter to specify these filters. For example, to exclude results below a similarity score of 0.5, you can use ExactMatchFilter(key="similarity_score", value=0.5).

  4. Perform the query with the specified filters using VectorStoreQuery. This will automatically exclude queries that do not meet your specified criteria.

Here's a simplified example to illustrate the process:

from llama_index.core.vector_stores import SimpleVectorStore
from llama_index.core.vector_stores.types import (
    ExactMatchFilter,
    MetadataFilters,
    VectorStoreQuery,
)

simple_vector_store = SimpleVectorStore()
# Add your data to the store here

query_embedding = [1.0, 1.0]  # Replace with your actual query embedding
similarity_top_k = 10  # Number of top results to retrieve

filters = MetadataFilters(filters=[
    ExactMatchFilter(key="your_filter_key", value="your_filter_value")
])

query = VectorStoreQuery(
    query_embedding=query_embedding,
    filters=filters,
    similarity_top_k=similarity_top_k
)
result = simple_vector_store.query(query)

print(result.ids)

This approach ensures that only the queries meeting your specified threshold or criteria are included in the search results, effectively filtering out undesired queries.

Details

Sources


About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant