Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC-2544: Adding new doctest to support updated VSS article #2886

Merged
merged 114 commits into from
Aug 31, 2023
Merged
Changes from 3 commits
Commits
Show all changes
114 commits
Select commit Hold shift + click to select a range
5e258a1
Add missing `Union` type in method `StreamCommands.xclaim()` (#2553)
ant1fact Jan 22, 2023
e39c7ba
Simplify the sync SocketBuffer, add type hints (#2543)
kristjanvalur Jan 22, 2023
42604b6
trivial typo fix (#2566)
rbowen Jan 29, 2023
9e6a9b5
Fix unlink in cluster pipeline (#2562)
gmbnomis Jan 29, 2023
428d609
Fix issue 2540: Synchronise concurrent command calls to single-client…
Vivanov98 Jan 29, 2023
31a1c0b
Fix: tuple function cannot be passed more than one argument (#2573)
kosuke-zhang Feb 6, 2023
ffbe879
Use hiredis::pack_command to serialized the commands. (#2570)
prokazov Feb 6, 2023
9e00b91
Fix issue 2567: NoneType check before raising exception (#2569)
SoulPancake Feb 6, 2023
e7306aa
Fix issue 2349: Let async HiredisParser finish parsing after a Connec…
kristjanvalur Feb 6, 2023
fcd8f98
Add TS.MGET example for OS Redis Cluster (#2507)
uglide Feb 7, 2023
f517287
Fix issue with `pack_commands` returning an empty byte sequence (#2416)
jmcbailey Feb 7, 2023
5cb5712
Version 4.5.0 (#2580)
dvora-h Feb 7, 2023
2b470cb
Fix #2581 UnixDomainSocketConnection' object has no attribute '_comma…
prokazov Feb 8, 2023
fd7a79d
Version 4.5.1 (#2586)
dvora-h Feb 8, 2023
e9ad2a3
Fix for `lpop` and `rpop` return typing (#2590)
Galtozzy Feb 15, 2023
6c708c2
Update README to make pip install copy-pastable on zsh (#2584)
uglide Feb 19, 2023
b546a9a
update json().arrindex() default values (#2611)
davemcphee Mar 15, 2023
5588ae0
Speeding up the protocol parsing (#2596)
chayim Mar 15, 2023
3edd49b
Fixed CredentialsProvider examples (#2587)
barshaul Mar 15, 2023
6d1061f
ConnectionPool SSL example (#2605)
CrimsonGlory Mar 15, 2023
a372ba4
[types] update return type of smismember to list[int] (#2617)
ryin1 Mar 15, 2023
8bfd492
Making search document subscriptable (#2615)
aksinha334 Mar 15, 2023
91ab12a
Remove redundant assignment. (#2620)
thebarbershop Mar 16, 2023
25e85e5
fix: replace async_timeout by asyncio.timeout (#2602)
sileht Mar 16, 2023
c61eeb2
Adding supported redis/library details (#2621)
chayim Mar 16, 2023
d63313b
add queue_class to REDIS_ALLOWED_KEYS (#2577)
zakaf Mar 16, 2023
c871723
pypy-3.9 CI (#2608)
chayim Mar 16, 2023
7d474f9
introduce AbstractConnection so that UnixDomainSocketConnection can c…
woutdenolf Mar 16, 2023
1b2f408
Fix behaviour of async PythonParser to match RedisParser as for issue…
kristjanvalur Mar 16, 2023
318b114
Version 4.5.2 (#2627)
dvora-h Mar 20, 2023
66a4d6b
AsyncIO Race Condition Fix (#2641)
chayim Mar 22, 2023
4802530
fix: do not use asyncio's timeout lib before 3.11.2 (#2659)
bellini666 Mar 27, 2023
4856813
UnixDomainSocketConnection missing constructor argument (#2630)
woutdenolf Mar 27, 2023
326bb1c
removing useless files (#2642)
chayim Mar 27, 2023
6d886d7
Fix issue 2660: PytestUnraisableExceptionWarning from asycio client (…
shacharPash Mar 28, 2023
5acbde3
Fixing cancelled async futures (#2666)
chayim Mar 29, 2023
ef3f086
Fix async (#2673)
dvora-h Mar 29, 2023
e1017fd
Version 4.5.4 (#2674)
dvora-h Mar 29, 2023
7ae8464
Really do not use asyncio's timeout lib before 3.11.2 (#2699)
mirekdlugosz Apr 13, 2023
6a4240b
asyncio: Fix memory leak caused by hiredis (#2693) (#2694)
oranav Apr 13, 2023
db9a85c
Update example of Redisearch creating index (#2703)
mzdehbashi-github Apr 13, 2023
7fc4c76
Improving Vector Similarity Search Example (#2661)
tylerhutcherson Apr 13, 2023
d6bb457
Fix incorrect usage of once flag in async Sentinel (#2718)
felipou Apr 27, 2023
fddd3d6
Fix topk list example. (#2724)
AYMENJD Apr 27, 2023
8e0b84d
Improve error output for master discovery (#2720)
scoopex Apr 27, 2023
8b58ebb
return response in case of KeyError (#2628)
shacharPash Apr 30, 2023
bf528fc
Add WITHSCORES to ZREVRANK Command (#2725)
shacharPash Apr 30, 2023
1ca223a
Fix `ClusterCommandProtocol` not itself being marked as a protocol (#…
Avasam May 1, 2023
ac15d52
Fix potential race condition during disconnection (#2719)
Anthchirp May 1, 2023
a7857e1
add "address_remap" feature to RedisCluster (#2726)
kristjanvalur May 2, 2023
e52fd67
nermina changes from NRedisStack (#2736)
shacharPash May 2, 2023
6d32503
Updated AWS Elasticache IAM Connection Example (#2702)
NickG123 May 3, 2023
ffb2b83
pinning urllib3 to fix CI (#2748)
chayim May 7, 2023
3748a8b
Add RedisCluster.remap_host_port, Update tests for CWE 404 (#2706)
kristjanvalur May 7, 2023
906e413
Update redismodules.rst (#2747)
cristianmatache May 8, 2023
cfdcfd8
Add support for cluster myshardid (#2704)
SoulPancake May 8, 2023
9370711
clean warnings (#2731)
dvora-h May 8, 2023
093232d
fix parse_slowlog_get (#2732)
dvora-h May 8, 2023
c0833f6
Optionally disable disconnects in read_response (#2695)
kristjanvalur May 8, 2023
8c06d67
Add client no-touch (#2745)
aciddust May 8, 2023
984b733
fix create single_connection_client from url (#2752)
dvora-h May 8, 2023
4a4566b
Fix `xadd` allow non negative maxlen (#2739)
aciddust May 8, 2023
f056118
Version 4.5.5 (#2753)
dvora-h May 8, 2023
35b7e09
Kristjan/issue #2754: Add missing argument to SentinelManagedConnecti…
kristjanvalur May 10, 2023
2d9b5ac
support JSON.MERGE Command (#2761)
shacharPash May 16, 2023
db7b9dd
Issue #2749: Remove unnecessary __del__ handlers (#2755)
kristjanvalur May 28, 2023
d95d8a2
Add WITHSCORE to ZRANK (#2758)
bodevone May 28, 2023
4d396f8
Fix JSON.MERGE Summary (#2786)
shacharPash Jun 17, 2023
3cdecc1
Fixed key error in parse_xinfo_stream (#2788)
Smit-Parmar Jun 19, 2023
29dfbb2
insert newline to prevent sphinx from assuming code block (#2796)
bmacphee Jun 20, 2023
2bb7f10
Introduce OutOfMemoryError exception for Redis write command rejectio…
bmacphee Jun 20, 2023
53bed27
Add unit tests for the `connect` method of all Redis connection class…
woutdenolf Jun 23, 2023
4f466d6
Fix dead weakref in sentinel connection causing ReferenceError (#2767…
shahar-lev Jun 23, 2023
abc04b5
chore(documentation): fix redirects and some small cleanups (#2801)
vmihailenco Jun 23, 2023
cecf78b
Add waitaof (#2760)
aciddust Jun 23, 2023
40a769e
Extract abstract async connection class (#2734)
woutdenolf Jun 23, 2023
d25a96b
Fix type hint for retry_on_error in async cluster (#2804)
TheKevJames Jun 23, 2023
04aadd7
Fix CI (#2809)
dvora-h Jun 25, 2023
ab617a1
Support JSON.MSET Command (#2766)
shacharPash Jun 25, 2023
9f50357
Version 4.6.0 (#2810)
dvora-h Jun 25, 2023
2732a85
Merge 5.0 to master (#2849)
dvora-h Jul 16, 2023
2c2860d
Change cluster docker to edge and enable debug command (#2853)
chayim Jul 26, 2023
8e5d5ce
Fix socket garbage collection (#2859)
kristjanvalur Jul 31, 2023
471f860
Fixing doc builds (#2869)
chayim Aug 2, 2023
a49e656
RESP3 connection examples (#2863)
chayim Aug 2, 2023
dc62e19
EOL for Python 3.7 (#2852)
chayim Aug 2, 2023
7d70c91
Fix a duplicate word in `CONTRIBUTING.md` (#2848)
kurtmckee Aug 2, 2023
66bad8e
Add sync modules (except search) tests to cluster CI (#2850)
dvora-h Aug 3, 2023
da27f4b
Fix timeout retrying on Redis pipeline execution (#2812)
pall-j Aug 3, 2023
3e50d28
Fix type hints in SearchCommands (#2817)
JoanFM Aug 6, 2023
8370c4a
Add a Dependabot config to auto-update GitHub action versions (#2847)
kurtmckee Aug 7, 2023
38c7de6
Dependabot label change (#2880)
chayim Aug 8, 2023
0ed8077
Bump pypa/gh-action-pip-audit from 1.0.0 to 1.0.8 (#2879)
dependabot[bot] Aug 8, 2023
673617d
Bump actions/upload-artifact from 2 to 3 (#2877)
dependabot[bot] Aug 8, 2023
a532f89
Add py.typed in accordance with PEP-561 (#2738)
zmievsa Aug 8, 2023
b0abd55
RESP 3 feature documentation (#2872)
chayim Aug 8, 2023
d5c2d1d
Adding support for triggered functions (TFUNCTION) (#2861)
shacharPash Aug 8, 2023
f121cf2
Add support for `CLIENT SETINFO` (#2857)
dvora-h Aug 9, 2023
2f67926
Version 5.0.0 (#2874)
chayim Aug 9, 2023
4e4ff48
DOC-2544: Adding new doctest to support updated VSS article
dwdougherty Aug 9, 2023
28cc65c
Updating all client licenses to clearly be MIT (#2884)
chayim Aug 10, 2023
e680924
DOC-2554: update import order
dwdougherty Aug 11, 2023
b3a92c4
DOC-2544: update formatting
dwdougherty Aug 11, 2023
b42d19a
DOC-2544: Update import (again)
dwdougherty Aug 14, 2023
724807a
Merge branch 'redis:master' into doc-2544
dwdougherty Aug 16, 2023
d23058a
DOC-2544: one more attempt using Chayims advice
dwdougherty Aug 16, 2023
b8372bd
lint fixes
chayim Aug 31, 2023
5f50fdc
and a reqs file
chayim Aug 31, 2023
4016a67
another missing requirement
chayim Aug 31, 2023
ce0f076
and sentence transformers
chayim Aug 31, 2023
d5b42af
and the optional, unlisted dependency tabulate. Thanks conda
chayim Aug 31, 2023
8dde72a
Updating README for doctests howto
chayim Aug 31, 2023
30c1179
align isort with black
chayim Aug 31, 2023
894a4b6
typo
chayim Aug 31, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
309 changes: 309 additions & 0 deletions doctests/search_vss.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,309 @@
# EXAMPLE: search_vss
# STEP_START imports
import json
import time

import numpy as np
import pandas as pd
import requests
from sentence_transformers import SentenceTransformer

import redis
from redis.commands.search.field import (
NumericField,
TagField,
TextField,
VectorField,
)
dwdougherty marked this conversation as resolved.
Show resolved Hide resolved
from redis.commands.search.indexDefinition import IndexDefinition, IndexType
from redis.commands.search.query import Query

# STEP_END

# STEP_START get_data
url = "https://raw.githubusercontent.com/bsbodden/redis_vss_getting_started/main/data/bikes.json"
response = requests.get(url)
bikes = response.json()
# STEP_END
# REMOVE_START
assert bikes[0]["model"] == "Jigger"
# REMOVE_END

# STEP_START dump_data
json.dumps(bikes[0], indent=2)
# STEP_END

# STEP_START connect
client = redis.Redis(host="localhost", port=6379, decode_responses=True)
# STEP_END

# STEP_START connection_test
res = client.ping()
# >>> True
# STEP_END
# REMOVE_START
assert res == True
# REMOVE_END

# STEP_START load_data
pipeline = client.pipeline()
for i, bike in enumerate(bikes, start=1):
redis_key = f"bikes:{i:03}"
pipeline.json().set(redis_key, "$", bike)
res = pipeline.execute()
# >>> [True, True, True, True, True, True, True, True, True, True, True]
# STEP_END
# REMOVE_START
assert res == [True, True, True, True, True, True, True, True, True, True, True]
# REMOVE_END

# STEP_START get
res = client.json().get("bikes:010", "$.model")
# >>> ['Summit']
# STEP_END
# REMOVE_START
assert res == ["Summit"]
# REMOVE_END

# STEP_START get_keys
keys = sorted(client.keys("bikes:*"))
# >>> ['bikes:001', 'bikes:002', ..., 'bikes:011']
# STEP_END
# REMOVE_START
assert keys[0] == "bikes:001"
# REMOVE_END

# STEP_START generate_embeddings
descriptions = client.json().mget(keys, "$.description")
descriptions = [item for sublist in descriptions for item in sublist]
embedder = SentenceTransformer("msmarco-distilbert-base-v4")
embeddings = embedder.encode(descriptions).astype(np.float32).tolist()
VECTOR_DIMENSION = len(embeddings[0])
# >>> 768
# STEP_END
# REMOVE_START
assert VECTOR_DIMENSION == 768
# REMOVE_END

# STEP_START load_embeddings
pipeline = client.pipeline()
for key, embedding in zip(keys, embeddings):
pipeline.json().set(key, "$.description_embeddings", embedding)
pipeline.execute()
# >>> [True, True, True, True, True, True, True, True, True, True, True]
# STEP_END

# STEP_START dump_example
res = client.json().get("bikes:010")
# >>>
# {
# "model": "Summit",
# "brand": "nHill",
# "price": 1200,
# "type": "Mountain Bike",
# "specs": {
# "material": "alloy",
# "weight": "11.3"
# },
# "description": "This budget mountain bike from nHill performs well..."
# "description_embeddings": [
# -0.538114607334137,
# -0.49465855956077576,
# -0.025176964700222015,
# ...
# ]
# }
# STEP_END
# REMOVE_START
assert len(res["description_embeddings"]) == 768
# REMOVE_END

# STEP_START create_index
schema = (
TextField("$.model", no_stem=True, as_name="model"),
TextField("$.brand", no_stem=True, as_name="brand"),
NumericField("$.price", as_name="price"),
TagField("$.type", as_name="type"),
TextField("$.description", as_name="description"),
VectorField(
"$.description_embeddings",
"FLAT",
{
"TYPE": "FLOAT32",
"DIM": VECTOR_DIMENSION,
"DISTANCE_METRIC": "COSINE",
},
as_name="vector",
),
)
definition = IndexDefinition(prefix=["bikes:"], index_type=IndexType.JSON)
res = client.ft("idx:bikes_vss").create_index(
fields=schema, definition=definition
)
# >>> 'OK'
# STEP_END
# REMOVE_START
assert res == "OK"
time.sleep(2)
# REMOVE_END

# STEP_START validate_index
info = client.ft("idx:bikes_vss").info()
num_docs = info["num_docs"]
indexing_failures = info["hash_indexing_failures"]
# print(f"{num_docs} documents indexed with {indexing_failures} failures")
# >>> 11 documents indexed with 0 failures
# STEP_END
# REMOVE_START
assert (num_docs == "11") and (indexing_failures == "0")
# REMOVE_END

# STEP_START simple_query_1
query = Query("@brand:Peaknetic")
res = client.ft("idx:bikes_vss").search(query).docs
# print(res)
# >>> [Document {'id': 'bikes:008', 'payload': None, 'brand': 'Peaknetic', 'model': 'Soothe Electric bike', 'price': '1950', 'description_embeddings': ...
# STEP_END
# REMOVE_START

assert all(
item in [x.__dict__["id"] for x in res]
for item in ["bikes:008", "bikes:009"]
)
# REMOVE_END

# STEP_START simple_query_2
query = Query("@brand:Peaknetic").return_fields("id", "brand", "model", "price")
res = client.ft("idx:bikes_vss").search(query).docs
# print(res)
# >>> [Document {'id': 'bikes:008', 'payload': None, 'brand': 'Peaknetic', 'model': 'Soothe Electric bike', 'price': '1950'}, Document {'id': 'bikes:009', 'payload': None, 'brand': 'Peaknetic', 'model': 'Secto', 'price': '430'}]
# STEP_END
# REMOVE_START
assert all(
item in [x.__dict__["id"] for x in res]
for item in ["bikes:008", "bikes:009"]
)
# REMOVE_END

# STEP_START simple_query_3
query = Query("@brand:Peaknetic @price:[0 1000]").return_fields(
"id", "brand", "model", "price"
)
res = client.ft("idx:bikes_vss").search(query).docs
# print(res)
# >>> [Document {'id': 'bikes:009', 'payload': None, 'brand': 'Peaknetic', 'model': 'Secto', 'price': '430'}]
# STEP_END
# REMOVE_START
assert all(item in [x.__dict__["id"] for x in res] for item in ["bikes:009"])
# REMOVE_END

# STEP_START def_bulk_queries
queries = [
"Bike for small kids",
"Best Mountain bikes for kids",
"Cheap Mountain bike for kids",
"Female specific mountain bike",
"Road bike for beginners",
"Commuter bike for people over 60",
"Comfortable commuter bike",
"Good bike for college students",
"Mountain bike for beginners",
"Vintage bike",
"Comfortable city bike",
]
# STEP_END

# STEP_START enc_bulk_queries
encoded_queries = embedder.encode(queries)
len(encoded_queries)
# >>> 11
# STEP_END
# REMOVE_START
assert len(encoded_queries) == 11
# REMOVE_END


# STEP_START define_bulk_query
def create_query_table(query, queries, encoded_queries, extra_params={}):
results_list = []
for i, encoded_query in enumerate(encoded_queries):
result_docs = (
client.ft("idx:bikes_vss")
.search(
query,
{
"query_vector": np.array(
encoded_query, dtype=np.float32
).tobytes()
}
| extra_params,
)
.docs
)
for doc in result_docs:
vector_score = round(1 - float(doc.vector_score), 2)
results_list.append(
{
"query": queries[i],
"score": vector_score,
"id": doc.id,
"brand": doc.brand,
"model": doc.model,
"description": doc.description,
}
)

# Optional: convert the table to Markdown using Pandas
queries_table = pd.DataFrame(results_list)
queries_table.sort_values(
by=["query", "score"], ascending=[True, False], inplace=True
)
queries_table["query"] = queries_table.groupby("query")["query"].transform(
lambda x: [x.iloc[0]] + [""] * (len(x) - 1)
)
queries_table["description"] = queries_table["description"].apply(
lambda x: (x[:497] + "...") if len(x) > 500 else x
)
queries_table.to_markdown(index=False)


# STEP_END

# STEP_START run_knn_query
query = (
Query("(*)=>[KNN 3 @vector $query_vector AS vector_score]")
.sort_by("vector_score")
.return_fields("vector_score", "id", "brand", "model", "description")
.dialect(2)
)

create_query_table(query, queries, encoded_queries)
# >>> | Best Mountain bikes for kids | 0.54 | bikes:003... (+ 32 more results)
# STEP_END

# STEP_START run_hybrid_query
hybrid_query = (
Query("(@brand:Peaknetic)=>[KNN 3 @vector $query_vector AS vector_score]")
.sort_by("vector_score")
.return_fields("vector_score", "id", "brand", "model", "description")
.dialect(2)
)
create_query_table(hybrid_query, queries, encoded_queries)
# >>> | Best Mountain bikes for kids | 0.3 | bikes:008... (+22 more results)
# STEP_END

# STEP_START run_range_query
range_query = (
Query(
"@vector:[VECTOR_RANGE $range $query_vector]=>{$YIELD_DISTANCE_AS: vector_score}"
)
.sort_by("vector_score")
.return_fields("vector_score", "id", "brand", "model", "description")
.paging(0, 4)
.dialect(2)
)
create_query_table(
range_query, queries[:1], encoded_queries[:1], {"range": 0.55}
)
# >>> | Bike for small kids | 0.52 | bikes:001 | Velorim |... (+1 more result)
# STEP_END