Improve performance of the reverse dependencies endpoint #3655

pietroalbini · 2021-05-27T16:22:08Z

Me and Justin spent some time a couple days ago trying to figure out why the reverse dependencies endpoint is so slow for crates with a large number of reverse dependencies. This is the EXPLAIN ANALYZE of that query.

The goal of the query is to return all crates whose last version depends on the current crate. As the latest version is not stored anywhere the query is doing a subquery that sorts all the versions of each crate by their semver, which is really slow.

We should denormalize the latest version ID (semver-wise) in the crates table, and change the query to remove the subquery. After removing the subquery the query should be small enough to be converted to Diesel. Ideally denormalizing the latest version ID and updating the query should be done in separate PRs.

The text was updated successfully, but these errors were encountered:

pietroalbini · 2021-05-27T17:49:19Z

Note that this is practically reversing what we did in #592, but this is now starting to actually impact our performance.

jyn514 · 2021-06-27T21:34:53Z

Is it possible to use an index instead of a column? That would avoid issues with the column getting out of sync.

pietroalbini · 2021-06-28T08:55:00Z

I'm pretty sure that query is too complex to go on an index. A materialized view might work depending on how long it takes for it to update.

pietroalbini · 2021-06-28T09:28:16Z

Yeah, played around with it and indexes don't seem to help much. This query though takes around 3.2s (explain) on the primary db:

SELECT DISTINCT ON (crate_id) crate_id, id
FROM versions
WHERE NOT yanked
ORDER BY crate_id, to_semver_no_prerelease(num) DESC NULLS LAST;

We could turn it into a materialized view we refresh every time a new crate is published/yanked/unyanked/deleted, but I'm worried its speed will continue to decrease as time goes by. Another option to investigate would be to store the parsed semver in the database, without calling to_semver_no_prerelease in the database which is just slow.

pietroalbini added the A-backend ⚙️ label May 27, 2021

Turbo87 added the C-internal 🔧 Category: Nonessential work that would make the codebase more consistent or clear label Jun 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance of the reverse dependencies endpoint #3655

Improve performance of the reverse dependencies endpoint #3655

pietroalbini commented May 27, 2021

pietroalbini commented May 27, 2021

jyn514 commented Jun 27, 2021

pietroalbini commented Jun 28, 2021

pietroalbini commented Jun 28, 2021

Improve performance of the reverse dependencies endpoint #3655

Improve performance of the reverse dependencies endpoint #3655

Comments

pietroalbini commented May 27, 2021

pietroalbini commented May 27, 2021

jyn514 commented Jun 27, 2021

pietroalbini commented Jun 28, 2021

pietroalbini commented Jun 28, 2021