Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add initial pages + pagesQuery endpoint to /replay.json APIs #2380

Merged
merged 41 commits into from
Feb 14, 2025

Conversation

ikreymer
Copy link
Member

@ikreymer ikreymer commented Feb 11, 2025

Fixes #2360

  • Adds seedPages to /replay.json response for collections, returning upto 25 seed pages.
  • Adds pagesQueryUrl to /replay.json
  • Adds a public pages search endpoint to support public collections.
  • Adds preloadResources, including list of WACZ files that should always be loaded, to /replay.json

Draft pending work in wabac.js to ensure this is complete.

@ikreymer ikreymer requested a review from tw4l February 11, 2025 00:48
@tw4l
Copy link
Member

tw4l commented Feb 11, 2025

Do we also want to return seeds and the additional pages query url in the replay.json for crawls?

ikreymer and others added 15 commits February 11, 2025 23:18

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
tests: add test 'pagesQuery' for both private and public collections
Needs some testing and possible refinement
Includes information for all WACZ files in collection that contain
seed pages or have no associated pages in the database.
- rename pages -> seedPages
- rename alwaysLoad -> preloadResources
- rename pagesQuery -> pagesQueryUrl
- optimize loading preloadResources as part of resource lookup
- remove seed page wacz files from preloadResources, can be computed from seedPages list
- tests: add additional tests for preloadResources, seedPages
update tests
@ikreymer
Copy link
Member Author

Do we also want to return seeds and the additional pages query url in the replay.json for crawls?

Added pagesQueryUrl and seedPages for crawls.

ikreymer and others added 18 commits February 12, 2025 12:23
tests: add test 'pagesQuery' for both private and public collections
Needs some testing and possible refinement
Includes information for all WACZ files in collection that contain
seed pages or have no associated pages in the database.
- rename pages -> seedPages
- rename alwaysLoad -> preloadResources
- rename pagesQuery -> pagesQueryUrl
- optimize loading preloadResources as part of resource lookup
- remove seed page wacz files from preloadResources, can be computed from seedPages list
- tests: add additional tests for preloadResources, seedPages
@tw4l tw4l force-pushed the replay-json-pages branch from 61d0a15 to dd664cb Compare February 12, 2025 17:23
@ikreymer ikreymer added this to the Public Collections milestone Feb 13, 2025
@ikreymer ikreymer marked this pull request as ready for review February 13, 2025 04:21
@ikreymer
Copy link
Member Author

One more small change, what if instead of seedPages we went with initialPages and it included the first 25 pages?
The default sort order can be {"$sort": {"isSeed": -1, "ts": 1}}, so we still list all seeds, then all non-seeds, sorted by time. This would allow for infinite scroll to load additional pages w/o a query.

add totalPages
collection pages: default sort order seeds first, then by timestamp
…x-cloud into replay-json-pages
@ikreymer ikreymer merged commit 7b2932c into main Feb 14, 2025
23 checks passed
@ikreymer ikreymer deleted the replay-json-pages branch February 14, 2025 00:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature]: Add initial list of pages + search endpoint to /replay.json APIs
2 participants