Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve book library page query performance for author sort order #4080

Merged
merged 10 commits into from
Mar 17, 2025

Conversation

mikiher
Copy link
Contributor

@mikiher mikiher commented Mar 7, 2025

Brief summary

Similar to #3952, this significantly improves book library page query performance for the following sort orders:

  • Author (First Last)
  • Author (Last, First)

Which issue is fixed?

No Issue. This is a continuation of my work on #2073

In-depth Description

To replace per-record subqueries, I added two denormalized columns to libraryItems, authorNamesFirstLast, and authorNames LastFirst. These columns are kept indexed and are updated when the corresponding records in bookAuthors/authors are modified.

Using the new columns in library page queries (instead of per-record subqueries and unindexed sorting) resulted in an optimized query plan, and significantly improved query performance.

How have you tested this?

Correctness

  • Throughly unit-tested the db migration with in memory Sequelize
  • Ran the migration on an existing large database
  • Tested on a newly created database (no migration)
    • new columns, indexes, and triggers are properly created
  • Tested population of new columns in newly scanned library items
  • Made sure new columns were correctly updated on bookAuthors/authors changes
    • Added and removed authors from books (existing and new authors)
    • Modifed author name
    • Removed author

Performance

Tested on loading 72 consecutive page of 35 books each, on first-last author sort order, on an ABS docker container running on a Synology 920+ NAS.

Last-first author sort order results were similar.

All measurements are in ms.

Summary

Overall, we see a ~97% drop in mean and median query time.

Current (edge):

[2025-03-07 06:38:14.983] INFO: [findAndCountAll] histogram values: [
  3152,  826,  816,  990,  907,  997, 1235, 1079,
  1058, 1576, 1158, 1201, 1342, 1197, 1392, 1293,
  1546, 1523, 1593, 1636, 1465, 1967, 1439, 1832,
  1772, 1731, 1819, 1805, 1713, 1777, 1862, 1858,
  1614, 2066, 1703, 1677, 1705, 1624, 1748, 1768,
  1895, 2022, 2259, 1696, 1990, 1759, 1856, 2074,
  2077, 2001, 2617, 2459, 1987, 2047, 2172, 2431,
  2762, 2241, 1983, 2371, 2101, 2144, 2201, 2382,
  2367, 2138, 2256, 2164, 2422, 2428, 2222, 2297
]
[2025-03-07 06:38:14.984] INFO: [findAndCountAll] histogram: Histogram {
  min: 816,
  max: 3153,
  mean: 1823.5555555555557,
  exceeds: 0,
  stddev: 473.1667514643838,
  count: 72,
  percentiles: SafeMap(9) [Map] {
    0 => 816,
    50 => 1819,
    75 => 2144,
    87.5 => 2366,
    93.75 => 2430,
    96.875 => 2616,
    98.4375 => 2762,
    99.21875 => 3152,
    100 => 3152
  }
}

After denormalizing authorNames:

[2025-03-07 07:03:36.957] INFO: [findAndCountAll] histogram values: [
  86, 51, 81, 94, 78, 50, 55, 69, 65, 50, 43, 59,
  65, 43, 52, 53, 65, 45, 56, 46, 61, 53, 59, 52,
  53, 84, 43, 57, 52, 52, 50, 54, 56, 47, 64, 53,
  58, 72, 54, 48, 59, 60, 76, 45, 51, 71, 56, 52,
  64, 90, 47, 51, 56, 65, 61, 50, 50, 58, 64, 80,
  71, 58, 75, 52, 55, 66, 53, 62, 79, 67, 55, 66,
  66
]
[2025-03-07 07:03:36.958] INFO: [findAndCountAll] histogram: Histogram {
  min: 43,
  max: 94,
  mean: 59.71232876712329,
  exceeds: 0,
  stddev: 11.520653819981632,
  count: 73,
  percentiles: SafeMap(9) [Map] {
    0 => 43,
    50 => 56,
    75 => 65,
    87.5 => 75,
    93.75 => 81,
    96.875 => 86,
    98.4375 => 90,
    99.21875 => 94,
    100 => 94
  }
}

mikiher added 5 commits March 6, 2025 19:08

Verified

This commit was signed with the committer’s verified signature.
zkochan Zoltan Kochan
…ng update triggers and indices

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
…new databases)

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
@mikiher mikiher marked this pull request as ready for review March 7, 2025 13:12
@nichwall
Copy link
Contributor

nichwall commented Mar 7, 2025

Nice! I also ran the new unit test over 300 times in a loop with no errors as a sanity check.

Unverified

No user is associated with the committer email.
@mikiher
Copy link
Contributor Author

mikiher commented Mar 9, 2025

I added the last commit to fix #4043 (unrelated to the other changes in the PR)

This isn't the best possible fix in terms of query performance (crearing a denormalized libraryId column in podcastEpisodes, and changing the recent episodes query to use this column would result in a better performing query), but it does improve performance significanly on the library containing 130,000 episodes on my Synology ABS (~80-90% decrease in query time).

@advplyr
Copy link
Owner

advplyr commented Mar 16, 2025

I found an issue with how multiple authors are being sorted. This issue happened for existing books with multiple authors and when testing adding multiple authors after the migration.

For example:

  1. Edit a book and add 2 authors and press save.
  2. Both authors are added to the bookAuthors table where the first author has a slightly earlier createdAt time.
  3. In the UI the authors are shown in the correct order they were entered. "Zed Z" then "Abe A"
  4. In the libraryItems author columns they were added in reverse.

UI showing correct order of authors but is sorted as if "Abe A" was first.
image

bookAuthors table with "Zed Z" having an earlier createdAt time.
image

libraryItems table showing reverse order.
image

I just realized after writing this that the GROUP_CONCAT is not respecting the ORDER BY

@advplyr
Copy link
Owner

advplyr commented Mar 16, 2025

I was able to fix it by using the ORDER BY in the GROUP_CONCAT introduced in an earlier sqlite version so I also updated sqlite3 to https://github.com/TryGhost/node-sqlite3/releases/tag/v5.1.7.

Can you do some testing to see if this will work? I'm out of time today so won't be making any more changes

@mikiher
Copy link
Contributor Author

mikiher commented Mar 17, 2025

Sure. The unit test also fails now - I'll check.

@mikiher
Copy link
Contributor Author

mikiher commented Mar 17, 2025

OK, I did the following changes:

  • Fixed the test
  • Made the test data createdAt order different from the insertion/id order, to make sure that the group_concat sorting is working.
  • Fixed the triggers in Database.js (for new databases)

I've re-tested with existing and new databases, with emphasis on adding and removing multiple authors from the same book.
Seems to work well now.

@advplyr
Copy link
Owner

advplyr commented Mar 17, 2025

Thanks! I did a bunch of testing as well.

@advplyr advplyr merged commit 40504da into advplyr:master Mar 17, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants