fix(gatsby): don't block event loop during inference #37780
Merged
+48
−12
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
These changes drop gatsbyjs.com schema building time from 16s to 7s.
For a Contentful site with 4.9M nodes it drops schema building time from 520s to 380s. It also allows the 4.9M node site to build with significantly less memory. Previously it needed 64Gi of memory, now it can build with 24Gi with these changes (+ some other changes I'll be PRing soon).
Vlad previously added code in https://github.com/gatsbyjs/gatsby/pull/19781/files#diff-d380fd3fbf5adf3933e07c737228eb75e520cdc7a5050d4d6b710acd5256d40cR48 that lets the event loop breathe between inferring each node type which was good.
The problem with that though is it means for each node type, every node of that type will be loaded into memory before node can have an opportunity to garbage collect if it needs to. Enough memory is needed to load all of the type of nodes which has the greatest count into memory.
For large sites this means huge amounts of memory is required. For small/medium sites it still means more memory is needed than should be needed but also inferring is slower than it could be because the event loop is blocked for larger chunks of time.
With this code node can throw away the last inferred nodes if it needs to before moving on to the next chunk of 1000.
Apparently v8 can gc whenever it wants to, but from what I saw here that was not the case. Memory linearly grew until each type was finished inferring. With these changes memory usage stays relatively flat during inference. Possibly it's because not blocking the event loop allows other parallel code to complete, allowing node to GC elsewhere.