gRPC: parse_query_response: Skip parsing empty Usage (#301)

## Problem When parsing the result of a gRPC query() call, we unconditionally create a Usage model object, even if no usage information was returned (e.g. non-serverless index). ## Solution This adds a small but not insignificant cost to every query() call - mostly due to the fact we use OpenAPI auto-generated model code for the Usage and QueryResponse objects. Benchmarks using a simple PineconeGRPC-based program making query() calls against a p2.x4 pod show a 1.05x improvment in QPS by only constructing a Usage class (and associating it to QueryResponse) if a 'usage' field is present in the protobuf response: Before: Type Name # reqs # fails | Avg Min Max Med | req/s failures/s --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- grpc query_pinecone_no_filter 3223 0(0.00%) | 17 17 139 18 | 55.01 0.00 --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- After: Type Name # reqs # fails | Avg Min Max Med | req/s failures/s --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- grpc query_pinecone_no_filter 3408 0(0.00%) | 17 16 96 17 | 57.55 0.00 --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- ## Type of Change - [ ] Bug fix (non-breaking change which fixes an issue) - [x] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected) - [ ] This change requires a documentation update - [ ] Infrastructure change (CI configs, etc) - [ ] Non-code change (docs, etc) - [ ] None of the above: (explain here) ## Test Plan - Existing unit tests - Tested manually against pinecone-field/pinecone-stress-test
pinecone-io · Feb 5, 2024 · 921e0b4 · 921e0b4
1 parent 61c19ad
commit 921e0b4
Showing 1 changed file with 14 additions and 11 deletions.
diff --git a/pinecone/grpc/utils.py b/pinecone/grpc/utils.py
@@ -50,13 +50,13 @@ def parse_fetch_response(response: dict):
     return FetchResponse(
         vectors=vd, 
         namespace=namespace,
-        usage=parse_usage(response),
+        usage=parse_usage(response.get("usage", {})),
         _check_type=False
     )
 
-def parse_usage(response):
-    u = response.get("usage", {})
-    return Usage(read_units=int(u.get("readUnits", 0)))
+
+def parse_usage(usage: dict):
+    return Usage(read_units=int(usage.get("readUnits", 0)))
 
 
 def parse_query_response(response: dict, _check_type: bool = False):
@@ -72,13 +72,16 @@ def parse_query_response(response: dict, _check_type: bool = False):
         )
         matches.append(sc)
 
-    return QueryResponse(
-        namespace=response.get("namespace", ""), 
-        matches=matches,
-        usage = parse_usage(response),
-        _check_type=_check_type
-    )
-
+    # Due to OpenAPI model classes / actual parsing cost, we want to avoid
+    # creating empty `Usage` objects and then passing them into QueryResponse
+    # when they are not actually present in the response from the server.
+    args = {'namespace': response.get("namespace", ""),
+            'matches': matches,
+            '_check_type': _check_type}
+    usage = response.get("usage")
+    if usage:
+        args['usage'] = parse_usage(usage)
+    return QueryResponse(**args)
 
 def parse_stats_response(response: dict):
     fullness = response.get("indexFullness", 0.0)