feat: use block matrix to reduce peak memory usage for matmul #3947

badGarnet · 2025-03-06T21:21:28Z

This PR targets the most memory expensive operation in partition pdf and images: deduplicate pdfminer elements. In large pages the number of elements can be over 10k, which would generate multiple 10k x 10k square double float matrices during deduplication, pushing peak memory usage close to 13Gb

This PR breaks this computation down by computing partial IOU. More precisely it computes IOU for each 2000 elements against all the elements at a time to reduce peak memory usage by about 10x to around 1.6Gb.

The block size is configurable based on user preference for peak memory usage and it is set by changing the env UNST_MATMUL_MEMORY_CAP_IN_GB.

christinestraub · 2025-03-06T23:26:40Z

CHANGELOG.md

@@ -1,8 +1,9 @@
-## 0.16.24-dev4
+## 0.16.24-dev5


What do you think of cutting a new release at this point?

ctrahey

Do you have timing observations for this?
If it is a significant regression in time, we can consider processpool...

Nice work!

christinestraub

What do you think of creating an experiment PR to check metrics changes in core-product before merging this PR? I think it would worth to have an experiment PR in core-product first for all the features that could impact metrics scores.

ctrahey · 2025-03-06T23:37:40Z

Do you have timing observations for this? If it is a significant regression in time, we can consider processpool...

Nice work!

Thinking on this more... there is nuance because I suppose if the reduction in memory is linear with time, then it's not anything gained to do it in processpool... unless each thread working on a block is faster with the smaller overall "job size". Then it would be a net win.

badGarnet · 2025-03-06T23:50:39Z

Do you have timing observations for this? If it is a significant regression in time, we can consider processpool...

Nice work!

it is actually slightly faster because of the counter intuitive way cpu works. This reduces cpu cache trips actually (computation can fit into cache easier). One can even try to optimize against specific cpu by tweaking the block size smaller.

badGarnet · 2025-03-06T23:51:26Z

What do you think of creating an experiment PR to check metrics changes in core-product before merging this PR? I think it would worth to have an experiment PR in core-product first for all the features that could impact metrics scores.

this change doesn't change any change logic at all; it is purely a different procedure to compute the same thing

…-to-cap-memory-usage

ctrahey · 2025-03-06T23:59:16Z

Do you have timing observations for this? If it is a significant regression in time, we can consider processpool...
Nice work!

it is actually slightly faster because of the counter intuitive way cpu works. This reduces cpu cache trips actually (computation can fit into cache easier). One can even try to optimize against specific cpu by tweaking the block size smaller.

Heck yeah - I was hoping you'd say that!

badGarnet added 2 commits March 6, 2025 15:15

feat: use block matrix to reduce peak memory usage for matmul

Loading
Loading status checks…

f870acd

update changelog and bump version

Loading
Loading status checks…

69af246

badGarnet marked this pull request as ready for review March 6, 2025 21:53

badGarnet requested review from ryannikolaidis, ctrahey, christinestraub and MaksOpp March 6, 2025 21:54

christinestraub reviewed Mar 6, 2025

View reviewed changes

ctrahey approved these changes Mar 6, 2025

View reviewed changes

christinestraub reviewed Mar 6, 2025

View reviewed changes

badGarnet added 2 commits March 6, 2025 17:52

Merge remote-tracking branch 'origin/main' into feat/use-block-matrix…

a077f63

…-to-cap-memory-usage

make it a release

Loading
Loading status checks…

13e6f95

badGarnet enabled auto-merge March 6, 2025 23:53

badGarnet added this pull request to the merge queue Mar 7, 2025

Merged via the queue into main with commit 961c8d5 Mar 7, 2025
43 checks passed

badGarnet deleted the feat/use-block-matrix-to-cap-memory-usage branch March 7, 2025 01:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: use block matrix to reduce peak memory usage for matmul #3947

feat: use block matrix to reduce peak memory usage for matmul #3947

badGarnet commented Mar 6, 2025 •

edited

Loading

christinestraub Mar 6, 2025

ctrahey left a comment

christinestraub left a comment

ctrahey commented Mar 6, 2025

badGarnet commented Mar 6, 2025

badGarnet commented Mar 6, 2025

ctrahey commented Mar 6, 2025

feat: use block matrix to reduce peak memory usage for matmul #3947

feat: use block matrix to reduce peak memory usage for matmul #3947

Conversation

badGarnet commented Mar 6, 2025 • edited Loading

christinestraub Mar 6, 2025

Choose a reason for hiding this comment

ctrahey left a comment

Choose a reason for hiding this comment

christinestraub left a comment

Choose a reason for hiding this comment

ctrahey commented Mar 6, 2025

badGarnet commented Mar 6, 2025

badGarnet commented Mar 6, 2025

ctrahey commented Mar 6, 2025

badGarnet commented Mar 6, 2025 •

edited

Loading