Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration tests for collections #299

Merged
merged 4 commits into from
Jan 24, 2024
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
61 changes: 53 additions & 8 deletions .github/workflows/testing-integration.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -45,8 +45,55 @@ jobs:
# spec: '${{ matrix.spec }}'
# PINECONE_API_KEY: '${{ secrets.PINECONE_API_KEY }}'

control-rest:
name: control plane
control-rest-pod:
name: control plane pod/collection tests
runs-on: ubuntu-latest
strategy:
matrix:
pineconeEnv:
- prod
testConfig:
- python-version: 3.8
pod: { environment: 'us-east1-gcp'}
- python-version: 3.11
pod: { environment: 'us-east4-gcp'}
fail-fast: false
steps:
- uses: actions/checkout@v4
- name: 'Set up Python ${{ matrix.testConfig.python-version }}'
uses: actions/setup-python@v4
with:
python-version: '${{ matrix.testConfig.python-version }}'
- name: Setup Poetry
uses: ./.github/actions/setup-poetry
- name: 'Run integration tests (REST, prod)'
if: matrix.pineconeEnv == 'prod'
run: poetry run pytest tests/integration/control/serverless -s -v
env:
PINECONE_DEBUG_CURL: 'true'
PINECONE_CONTROLLER_HOST: 'https://api.pinecone.io'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: We could probably leave this out for the standard non-staging run, similar to down in control-rest-serverless. Small complaint, more for consistencies sake.

PINECONE_API_KEY: '${{ secrets.PINECONE_API_KEY }}'
PINECONE_ENVIRONMENT: '${{ matrix.testConfig.pod.environment }}'
GITHUB_BUILD_NUMBER: '${{ github.run_number }}-s-${{ matrix.testConfig.python-version}}'
DIMENSION: '1536'
METRIC: 'cosine'
- name: 'Run integration tests (REST, staging)'
if: matrix.pineconeEnv == 'staging'
run: poetry run pytest tests/integration/control/serverless -s -v
env:
PINECONE_DEBUG_CURL: 'true'
PINECONE_CONTROLLER_HOST: 'https://api-staging.pinecone.io'
PINECONE_API_KEY: '${{ secrets.PINECONE_API_KEY_STAGING }}'
PINECONE_ENVIRONMENT: '${{ matrix.testConfig.pod.environment }}'
GITHUB_BUILD_NUMBER: '${{ github.run_number }}-p-${{ matrix.testConfig.python-version}}'
DIMENSION: '1536'
METRIC: 'cosine'




control-rest-serverless:
name: control plane serverless
runs-on: ubuntu-latest
strategy:
matrix:
Expand All @@ -59,7 +106,6 @@ jobs:
- python-version: 3.11
pod: { environment: 'us-east1-gcp'}
serverless: { cloud: 'aws', region: 'us-west-2'}
max-parallel: 1
fail-fast: false
steps:
- uses: actions/checkout@v4
Expand All @@ -71,21 +117,20 @@ jobs:
uses: ./.github/actions/setup-poetry
- name: 'Run integration tests (REST, prod)'
if: matrix.pineconeEnv == 'prod'
run: poetry run pytest tests/integration/control -s -v
run: poetry run pytest tests/integration/control/serverless -s -vv
env:
PINECONE_CONTROLLER_HOST: 'https://api.pinecone.io'
PINECONE_DEBUG_CURL: 'true'
PINECONE_API_KEY: '${{ secrets.PINECONE_API_KEY }}'
GITHUB_BUILD_NUMBER: '${{ github.run_number }}-p-${{ matrix.testConfig.python-version}}'
POD_ENVIRONMENT: '${{ matrix.testConfig.pod.environment }}'
SERVERLESS_CLOUD: '${{ matrix.testConfig.serverless.cloud }}'
SERVERLESS_REGION: '${{ matrix.testConfig.serverless.region }}'
- name: 'Run integration tests (REST, staging)'
if: matrix.pineconeEnv == 'staging'
run: poetry run pytest tests/integration -s -v
run: poetry run pytest tests/integration/control/serverless -s -vv
env:
PINECONE_DEBUG_CURL: 'true'
PINECONE_CONTROLLER_HOST: 'https://api-staging.pinecone.io'
PINECONE_API_KEY: '${{ secrets.PINECONE_API_KEY_STAGING }}'
GITHUB_BUILD_NUMBER: '${{ github.run_number }}-s-${{ matrix.testConfig.python-version}}'
POD_ENVIRONMENT: '${{ matrix.testConfig.pod.environment }}'
SERVERLESS_CLOUD: '${{ matrix.testConfig.serverless.cloud }}'
SERVERLESS_REGION: '${{ matrix.testConfig.serverless.region }}'
5 changes: 5 additions & 0 deletions pinecone/models/pod_spec.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,11 @@ class PodSpec(NamedTuple):
{'indexed': ['field1', 'field2']}
"""

source_collection: Optional[str] = None
"""
The name of the collection to use as the source for the pod index. This configuration is only used when creating a pod index from an existing collection.
"""

Comment on lines +63 to +67
Copy link
Collaborator Author

@jhamon jhamon Jan 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This missing property was the cause of create_index with source_collection failing.

def asdict(self):
"""
Returns the PodSpec as a dictionary.
Expand Down
20 changes: 20 additions & 0 deletions scripts/delete-all-collections.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
import os
from pinecone import Pinecone

def read_env_var(name):
value = os.environ.get(name)
if value is None:
raise Exception('Environment variable {} is not set'.format(name))
return value

def main():
pc = Pinecone(api_key=read_env_var('PINECONE_API_KEY'))

collections = pc.list_collections().names()
for collection in collections:
if collection != "":
pc.delete_collection(collection)

if __name__ == '__main__':
main()

Empty file.
149 changes: 149 additions & 0 deletions tests/integration/control/pod/conftest.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,149 @@
import pytest
import random
import string
import time
from pinecone import Pinecone, PodSpec
from ...helpers import generate_index_name, get_environment_var

@pytest.fixture()
def client():
api_key = get_environment_var('PINECONE_API_KEY')
return Pinecone(
api_key=api_key,
additional_headers={'sdk-test-suite': 'pinecone-python-client'}
)

@pytest.fixture()
def environment():
return get_environment_var('PINECONE_ENVIRONMENT')

@pytest.fixture()
def dimension():
return int(get_environment_var('DIMENSION'))

@pytest.fixture()
def create_index_params(index_name, environment, dimension, metric):
spec = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, I thought you had to use the PodSpec class for spec rather than passing a plain object.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The python client is flexible enough to accept both.

'pod': {
'environment': environment,
'pod_type': 'p1.x1'
}
}
return dict(
name=index_name,
dimension=dimension,
metric=metric,
spec=spec,
timeout=-1
)

@pytest.fixture()
def metric():
return get_environment_var('METRIC')

@pytest.fixture()
def random_vector(dimension):
def _random_vector():
return [random.uniform(0, 1) for _ in range(dimension)]
return _random_vector

@pytest.fixture()
def index_name(request):
test_name = request.node.name
return generate_index_name(test_name)

@pytest.fixture()
def ready_index(client, index_name, create_index_params):
create_index_params['timeout'] = None
client.create_index(**create_index_params)
time.sleep(10) # Extra wait, since status is sometimes inaccurate
yield index_name
client.delete_index(index_name, -1)

@pytest.fixture()
def notready_index(client, index_name, create_index_params):
create_index_params.update({'timeout': -1 })
client.create_index(**create_index_params)
yield index_name

def index_exists(index_name, client):
return index_name in client.list_indexes().names()


def random_string():
return ''.join(random.choice(string.ascii_lowercase) for i in range(10))

@pytest.fixture(scope='session')
def reusable_collection():
pc = Pinecone(
api_key=get_environment_var('PINECONE_API_KEY'),
additional_headers={'sdk-test-suite': 'pinecone-python-client'}
)
index_name = 'temp-index-' + random_string()
dimension = int(get_environment_var('DIMENSION'))
print(f"Creating index {index_name} to prepare a collection...")
pc.create_index(
name=index_name,
dimension=dimension,
metric=get_environment_var('METRIC'),
spec=PodSpec(
environment=get_environment_var('PINECONE_ENVIRONMENT'),
)
)
print(f"Created index {index_name}. Waiting 10 seconds to make sure it's ready...")
time.sleep(10)

num_vectors = 10
vectors = [
(str(i), [random.uniform(0, 1) for _ in range(dimension)]) for i in range(num_vectors) ]

index = pc.Index(index_name)
index.upsert(vectors=vectors)

collection_name = 'reused-coll-' + random_string()
pc.create_collection(
name=collection_name,
source=index_name
)

time_waited = 0
desc = pc.describe_collection(collection_name)
collection_ready = desc['status']
while collection_ready.lower() != 'ready' and time_waited < 120:
print(f"Waiting for collection {collection_name} to be ready. Waited {time_waited} seconds...")
time.sleep(5)
time_waited += 5
desc = pc.describe_collection(collection_name)
collection_ready = desc['status']

if time_waited >= 120:
raise Exception(f"Collection {collection_name} is not ready after 120 seconds")

print(f"Collection {collection_name} is ready. Deleting index {index_name}...")
pc.delete_index(index_name)

yield collection_name

print(f"Deleting collection {collection_name}...")
pc.delete_collection(collection_name)

@pytest.fixture(autouse=True)
def cleanup(client, index_name):
yield

time_waited = 0
while index_exists(index_name, client) and time_waited < 120:
print(f"Waiting for index {index_name} to be ready to delete. Waited {time_waited} seconds..")
time_waited += 5
time.sleep(5)
try:
print(f"Attempting delete of index {index_name}")
client.delete_index(index_name, -1)
print(f"Deleted index {index_name}")
break
except Exception as e:
print(f"Unable to delete index {index_name}: {e}")
pass

if time_waited >= 120:
raise Exception(f"Index {index_name} is not ready to delete after 120 seconds")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Wording here could be: "Index {index_name} was not able to be deleted after 120 seconds"

98 changes: 98 additions & 0 deletions tests/integration/control/pod/test_collections.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
import string
import random
import pytest
import time
from pinecone import PodSpec

def random_string():
return ''.join(random.choice(string.ascii_lowercase) for i in range(10))

class TestCollectionsHappyPath:
def test_index_to_collection_to_index_happy_path(self, client, environment, dimension, metric, ready_index, random_vector):
index = client.Index(ready_index)
num_vectors = 10
vectors = [ (str(i), random_vector()) for i in range(num_vectors) ]
index.upsert(vectors=vectors)

collection_name = 'coll1-' + random_string()
client.create_collection(name=collection_name, source=ready_index)
desc = client.describe_collection(collection_name)
assert desc['name'] == collection_name
assert desc['environment'] == environment
assert desc['status'] == 'Initializing'

time_waited = 0
collection_ready = desc['status']
while collection_ready.lower() != 'ready' and time_waited < 120:
print(f"Waiting for collection {collection_name} to be ready. Waited {time_waited} seconds...")
time.sleep(5)
time_waited += 5
desc = client.describe_collection(collection_name)
collection_ready = desc['status']

assert collection_name in client.list_collections().names()

if time_waited >= 120:
raise Exception(f"Collection {collection_name} is not ready after 120 seconds")

# After collection ready, these should all be defined
assert desc['name'] == collection_name
assert desc['status'] == 'Ready'
assert desc['environment'] == environment
assert desc['dimension'] == dimension
assert desc['vector_count'] == num_vectors
assert desc['size'] != None
assert desc['size'] > 0

# Create index from collection
index_name = 'index-from-collection-' + collection_name
print(f"Creating index {index_name} from collection {collection_name}...")
client.create_index(
name=index_name,
dimension=dimension,
metric=metric,
spec=PodSpec(
environment=environment,
source_collection=collection_name
)
)
print(f"Created index {index_name} from collection {collection_name}. Waiting a little more to make sure it's ready...")
time.sleep(30)
desc = client.describe_index(index_name)
assert desc['name'] == index_name
assert desc['status']['ready'] == True

new_index = client.Index(index_name)

# Verify stats reflect the vectors present in the collection
stats = new_index.describe_index_stats()
print(stats)
assert stats.total_vector_count == num_vectors

# Verify the vectors from the collection can be fetched
results = new_index.fetch(ids=[v[0] for v in vectors])
print(results)
for v in vectors:
assert results.vectors[v[0]].id == v[0]
assert results.vectors[v[0]].values == pytest.approx(v[1], rel=0.01)

# Cleanup
client.delete_collection(collection_name)
client.delete_index(index_name)

def test_create_index_with_different_metric_from_orig_index(self, client, dimension, metric, environment, reusable_collection):
metrics = ['cosine', 'euclidean', 'dotproduct']
target_metric = random.choice([x for x in metrics if x != metric])

index_name = 'from-coll-' + random_string()
client.create_index(
name=index_name,
dimension=dimension,
metric=target_metric,
spec=PodSpec(
environment=environment,
source_collection=reusable_collection
)
)
time.sleep(10)
client.delete_index(index_name, -1)