Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: Documents are not sliced during manifest generation (#5967) #5968

Merged
merged 12 commits into from
Mar 5, 2024

Conversation

nadove-ucsc
Copy link
Contributor

@nadove-ucsc nadove-ucsc commented Feb 17, 2024

Connected issues: #5967

Checklist

Author

  • PR is a draft
  • Target branch is develop
  • Name of PR branch matches issues/<GitHub handle of author>/<issue#>-<slug>
  • On ZenHub, PR is connected to all issues it (partially) resolves
  • PR description links to connected issues
  • PR title matches1 that of a connected issue or comment in PR explains why they're different
  • PR title references all connected issues
  • For each connected issue, there is at least one commit whose title references that issue

Author (partiality)

  • Added p tag to titles of partial commits
  • Added partial label to PR or this PR completely resolves all connected issues
  • All connected issues are resolved partially or this PR does not have the partial label

1 when the issue title describes a problem, the corresponding PR
title is Fix: followed by the issue title

Author (reindex, API changes)

  • Added r tag to commit title or this PR does not require reindexing
  • Added reindex label to PR or this PR does not require reindexing
  • PR and connected issue are labeled API or this PR does not modify a REST API
  • Added a (A) tag to commit title for backwards (in)compatible changes or this PR does not modify a REST API
  • Updated REST API version number in app.py or this PR does not modify a REST API

Author (chains)

  • This PR is blocked by previous PR in the chain or this PR is not chained to another PR
  • Added base label to the blocking PR or this PR is not chained to another PR
  • Added chained label to this PR or this PR is not chained to another PR

Author (upgrading deployments)

  • Ran make image_manifests.json and committed any resulting changes or this PR does not modify azul_docker_images or any other variables referenced in the definition of that variable
  • Documented upgrading of deployments in UPGRADING.rst or this PR does not require upgrading deployments
  • Added u tag to commit title or this PR does not require upgrading deployments
  • Added upgrade label to PR or this PR does not require upgrading deployments

Author (operator tasks)

  • Added checklist items for additional operator tasks or this PR does not require additional tasks

Author (hotfixes)

  • Added F tag to main commit title or this PR does not include permanent fix for a temporary hotfix
  • Reverted the temporary hotfixes for any connected issues or the prod branch has no temporary hotfixes for any connected issues

Author (before every review)

  • Rebased PR branch on develop, squashed old fixups
  • Ran make requirements_update or this PR does not touch requirements*.txt, common.mk, Makefile and Dockerfile
  • Added R tag to commit title or this PR does not touch requirements*.txt
  • Added reqs label to PR or this PR does not touch requirements*.txt
  • make integration_test passes in personal deployment or this PR does not touch functionality that could break the IT

Peer reviewer (after requesting changes)

Uncheck the Author (before every review) checklists.

Peer reviewer (after approval)

  • PR is not a draft
  • Ticket is in Review requested column
  • Requested review from system administrator
  • PR is assigned to system administrator

System administrator (after requesting changes)

Uncheck the before every review checklists. Update the N reviews label.

System administrator (after approval)

  • Actually approved the PR
  • Labeled connected issues as demo or no demo
  • Commented on connected issues about demo expectations or all connected issues are labeled no demo
  • Decided if PR can be labeled no sandbox
  • PR title is appropriate as title of merge commit
  • N reviews label is accurate
  • Moved ticket to Approved column
  • PR is assigned to current operator

Operator (before pushing merge the commit)

  • Checked reindex label and r commit title tag
  • Checked that demo expectations are clear or all connected issues are labeled no demo
  • PR has checklist items for upgrading instructions or PR is not labeled upgrade
  • Squashed PR branch and rebased onto develop
  • Sanity-checked history
  • Pushed PR branch to GitHub
  • Added sandbox label or PR is labeled no sandbox
  • Pushed PR branch to GitLab dev or PR is labeled no sandbox
  • Pushed PR branch to GitLab anvildev or PR is labeled no sandbox
  • Pushed PR branch to GitLab anvilprod or PR is labeled no sandbox
  • Build passes in sandbox deployment or PR is labeled no sandbox
  • Build passes in anvilbox deployment or PR is labeled no sandbox
  • Build passes in hammerbox deployment or PR is labeled no sandbox
  • Reviewed build logs for anomalies in sandbox deployment or PR is labeled no sandbox
  • Reviewed build logs for anomalies in anvilbox deployment or PR is labeled no sandbox
  • Reviewed build logs for anomalies in hammerbox deployment or PR is labeled no sandbox
  • Deleted unreferenced indices in sandbox or this PR does not remove catalogs or otherwise causes unreferenced indices in dev
  • Deleted unreferenced indices in anvilbox or this PR does not remove catalogs or otherwise causes unreferenced indices in anvildev
  • Deleted unreferenced indices in hammerbox or this PR does not remove catalogs or otherwise causes unreferenced indices in anvilprod
  • Started reindex in sandbox or this PR does not require reindexing dev
  • Started reindex in anvilbox or this PR does not require reindexing anvildev
  • Started reindex in hammerbox or this PR does not require reindexing anvilprod
  • Checked for failures in sandbox or this PR does not require reindexing dev
  • Checked for failures in anvilbox or this PR does not require reindexing anvildev
  • Checked for failures in hammerbox or this PR does not require reindexing anvilprod
  • Title of merge commit starts with title from this PR
  • Added PR reference to merge commit title
  • Collected commit title tags in merge commit title but only include p if the PR is labeled partial
  • Moved connected issues to Merged column in ZenHub
  • Pushed merge commit to GitHub

Operator (chain shortening)

  • Changed the target branch of the blocked PR to develop or this PR is not labeled base
  • Removed the chained label from the blocked PR or this PR is not labeled base
  • Removed the blocking relationship from the blocked PR or this PR is not labeled base
  • Removed the base label from this PR or this PR is not labeled base

Operator (after pushing the merge commit)

  • Pushed merge commit to GitLab dev or PR is labeled no sandbox
  • Pushed merge commit to GitLab anvildev or PR is labeled no sandbox
  • Pushed merge commit to GitLab anvilprod or PR is labeled no sandbox
  • Build passes on GitLab dev1
  • Reviewed build logs for anomalies on GitLab dev1
  • Build passes on GitLab anvildev1
  • Reviewed build logs for anomalies on GitLab anvildev1
  • Build passes on GitLab anvilprod1
  • Reviewed build logs for anomalies on GitLab anvilprod1
  • Deleted PR branch from GitHub
  • Deleted PR branch from GitLab dev
  • Deleted PR branch from GitLab anvildev
  • Deleted PR branch from GitLab anvilprod

1 When pushing the merge commit is skipped due to the PR being
labelled no sandbox, the next build triggered by a PR whose merge commit is
pushed determines this checklist item.

Operator (reindex)

  • Deleted unreferenced indices in dev or this PR does not remove catalogs or otherwise causes unreferenced indices in dev
  • Deleted unreferenced indices in anvildev or this PR does not remove catalogs or otherwise causes unreferenced indices in anvildev
  • Deleted unreferenced indices in anvilprod or this PR does not remove catalogs or otherwise causes unreferenced indices in anvilprod
  • Considered deindexing individual sources in dev or this PR does not merely remove sources from existing catalogs in dev
  • Considered deindexing individual sources in anvildev or this PR does not merely remove sources from existing catalogs in anvildev
  • Considered deindexing individual sources in anvilprod or this PR does not merely remove sources from existing catalogs in anvilprod
  • Considered indexing individual sources in dev or this PR does not merely add sources to existing catalogs in dev
  • Considered indexing individual sources in anvildev or this PR does not merely add sources to existing catalogs in anvildev
  • Considered indexing individual sources in anvilprod or this PR does not merely add sources to existing catalogs in anvilprod
  • Started reindex in dev or this PR does not require reindexing dev
  • Started reindex in anvildev or this PR does not require reindexing anvildev
  • Started reindex in anvilprod or this PR does not require reindexing anvilprod
  • Checked for and triaged indexing failures in dev or this PR does not require reindexing dev
  • Checked for and triaged indexing failures in anvildev or this PR does not require reindexing anvildev
  • Checked for and triaged indexing failures in anvilprod or this PR does not require reindexing anvilprod
  • Emptied fail queues in dev deployment or this PR does not require reindexing dev
  • Emptied fail queues in anvildev deployment or this PR does not require reindexing anvildev
  • Emptied fail queues in anvilprod deployment or this PR does not require reindexing anvilprod

Operator

  • PR is assigned to no one

Shorthand for review comments

  • L line is too long
  • W line wrapping is wrong
  • Q bad quotes
  • F other formatting problem

@github-actions github-actions bot added the orange [process] Done by the Azul team label Feb 17, 2024
@nadove-ucsc nadove-ucsc added the spike:1 [process] Spike estimate of one point label Feb 17, 2024
@nadove-ucsc nadove-ucsc force-pushed the issues/nadove-ucsc/5967-documents-not-sliced branch from f2881ef to 24cdb00 Compare February 17, 2024 02:01
Copy link

codecov bot commented Feb 17, 2024

Codecov Report

Attention: Patch coverage is 95.83333% with 1 lines in your changes are missing coverage. Please review.

Project coverage is 85.17%. Comparing base (a44b350) to head (3fe7c4f).

Files Patch % Lines
test/integration_test.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##           develop    #5968   +/-   ##
========================================
  Coverage    85.17%   85.17%           
========================================
  Files          154      154           
  Lines        19893    19898    +5     
========================================
+ Hits         16943    16948    +5     
  Misses        2950     2950           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@coveralls
Copy link

coveralls commented Feb 17, 2024

Coverage Status

coverage: 85.192% (+0.003%) from 85.189%
when pulling 3fe7c4f on issues/nadove-ucsc/5967-documents-not-sliced
into a44b350 on develop.

@nadove-ucsc nadove-ucsc force-pushed the issues/nadove-ucsc/5967-documents-not-sliced branch 7 times, most recently from 1efdfee to 3cbd080 Compare February 22, 2024 08:06
@nadove-ucsc nadove-ucsc force-pushed the issues/nadove-ucsc/5967-documents-not-sliced branch from 3cbd080 to bde95c3 Compare February 28, 2024 09:57
@nadove-ucsc
Copy link
Contributor Author

First commit resolves python-attrs/attrs#1081 which would become a problem on the 2nd commit

@nadove-ucsc nadove-ucsc force-pushed the issues/nadove-ucsc/5967-documents-not-sliced branch from a7358b3 to 33fdb29 Compare February 28, 2024 10:03
@nadove-ucsc nadove-ucsc added the reqs [process] PR includes commit requiring ``make requirements`` label Feb 28, 2024
Copy link
Member

@achave11-ucsc achave11-ucsc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some convention nits, otherwise, LGTM!

src/azul/indexer/document.py Outdated Show resolved Hide resolved
src/azul/indexer/document.py Outdated Show resolved Hide resolved
@achave11-ucsc achave11-ucsc removed their assignment Feb 28, 2024
@nadove-ucsc nadove-ucsc force-pushed the issues/nadove-ucsc/5967-documents-not-sliced branch from 0f80558 to 1b49705 Compare February 29, 2024 01:45
achave11-ucsc
achave11-ucsc previously approved these changes Feb 29, 2024
@achave11-ucsc achave11-ucsc marked this pull request as ready for review February 29, 2024 02:39
@hannes-ucsc hannes-ucsc force-pushed the issues/nadove-ucsc/5967-documents-not-sliced branch from 1bd8865 to 4166683 Compare March 1, 2024 02:38
@dsotirho-ucsc dsotirho-ucsc force-pushed the issues/nadove-ucsc/5967-documents-not-sliced branch from 6238300 to 4e5ac02 Compare March 1, 2024 16:47
@dsotirho-ucsc dsotirho-ucsc added the sandbox [process] Resolution is being verified in sandbox deployment label Mar 1, 2024
@dsotirho-ucsc
Copy link
Contributor

Failed IT on anvilbox and hammerbox twice each.

======================================================================
ERROR: test_indexing (integration_test.IndexingIntegrationTest.test_indexing) [manifest] (catalog='anvil-it', format=<ManifestFormat.compact: 'compact'>, fetch=True)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/builds/ucsc/azul/test/integration_test.py", line 427, in subTest
…
  File "/build/.venv/lib/python3.11/site-packages/urllib3/connectionpool.py", line 894, in urlopen
    return self.urlopen(
           ^^^^^^^^^^^^^
  [Previous line repeated 2 more times]
  File "/build/.venv/lib/python3.11/site-packages/urllib3/connectionpool.py", line 884, in urlopen
    retries = retries.increment(method, url, response=response, _pool=self)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/build/.venv/lib/python3.11/site-packages/urllib3/util/retry.py", line 592, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='service.anvilbox.anvil.gi.ucsc.edu', port=443): Max retries exceeded with url: /fetch/manifest/files/k8QgmPY4EJ6F-lYud3vWCH7ksTw2SyNORfWu_Zw23S9DHygAAQ== (Caused by ResponseError('too many 500 error responses'))
======================================================================
ERROR: test_indexing (integration_test.IndexingIntegrationTest.test_indexing) [managed_access_manifest] (catalog='anvil-it')
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/builds/ucsc/azul/test/integration_test.py", line 427, in subTest
…
  File "/build/.venv/lib/python3.11/site-packages/urllib3/connectionpool.py", line 894, in urlopen
    return self.urlopen(
           ^^^^^^^^^^^^^
  [Previous line repeated 2 more times]
  File "/build/.venv/lib/python3.11/site-packages/urllib3/connectionpool.py", line 884, in urlopen
    retries = retries.increment(method, url, response=response, _pool=self)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/build/.venv/lib/python3.11/site-packages/urllib3/util/retry.py", line 592, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='service.anvilbox.anvil.gi.ucsc.edu', port=443): Max retries exceeded with url: /manifest/files/k8QgaakyRoQrFgut4wDc2mS4L_-raW4p0pjJc9GQFZAP_dIAAQ== (Caused by ResponseError('too many 500 error responses'))

@hannes-ucsc hannes-ucsc force-pushed the issues/nadove-ucsc/5967-documents-not-sliced branch from 2973957 to b1dd698 Compare March 2, 2024 20:10
@hannes-ucsc
Copy link
Member

Tested against a personal deployment resembling sandbox and another one resembling anvilbox.

hannes-ucsc
hannes-ucsc previously approved these changes Mar 3, 2024
@dsotirho-ucsc dsotirho-ucsc merged commit 360efa6 into develop Mar 5, 2024
12 checks passed
@dsotirho-ucsc dsotirho-ucsc deleted the issues/nadove-ucsc/5967-documents-not-sliced branch March 5, 2024 05:34
@dsotirho-ucsc dsotirho-ucsc removed their assignment Mar 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
orange [process] Done by the Azul team reqs [process] PR includes commit requiring ``make requirements`` sandbox [process] Resolution is being verified in sandbox deployment spike:1 [process] Spike estimate of one point
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants