Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an explicit cache on Python entry points #614

Merged
merged 1 commit into from
Mar 11, 2024

Conversation

cottsay
Copy link
Member

@cottsay cottsay commented Feb 7, 2024

Whenever we enumerate Python entry points to load colcon extension points, we're re-parsing metadata for every Python package found on the system. Worse yet, accessing attributes on importlib.metadata.Distribution typically results in re-reading the metadata each time, so we're hitting the disk pretty hard.

We don't generally expect the entry points available to change, so we should cache that information once and parse each package's metadata a single time.

Closes #600

@cottsay cottsay added the enhancement New feature or request label Feb 7, 2024
@cottsay cottsay self-assigned this Feb 7, 2024
@cottsay cottsay force-pushed the cottsay/cache-extension-points branch from 568ae80 to 1b85894 Compare February 7, 2024 21:59
Copy link

codecov bot commented Feb 7, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 83.54%. Comparing base (6cf24ea) to head (69e20f9).

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #614      +/-   ##
==========================================
+ Coverage   83.34%   83.54%   +0.20%     
==========================================
  Files          66       66              
  Lines        3794     3816      +22     
  Branches      739      745       +6     
==========================================
+ Hits         3162     3188      +26     
+ Misses        557      554       -3     
+ Partials       75       74       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@cottsay
Copy link
Member Author

cottsay commented Feb 8, 2024

I think we can do better than this, actually.

I didn't know this, but the importlib.metadata API for distributions and entry_points does absolutely no caching at all, to the point where even accessing properties on distribution objects typically results in reading the metadata from disk every time. I did some fooling around and got my previous 0.4s to under 0.3s by specifically caching the underlying metadata to avoid the disk reads. Some strategic structuring of that underlying data to avoid iterating over it might yield even more savings.

I didn't realize that the startup performance had regressed so badly. SSDs and OS caching hide how much IO is happening here. I can imagine that cold invocations on spinning disks are brutal...

@cottsay cottsay changed the title Use functools.lru_cache to cache extension point discovery Add an explicit cache on Python entry points Feb 16, 2024
@cottsay
Copy link
Member Author

cottsay commented Feb 16, 2024

Alright, I dropped the lru_cache stuff in favor of an explicit cache. This change brought baseline loading from 0.8s to 0.3s on my machine. Pyflame looks a lot better now.

@cottsay cottsay marked this pull request as ready for review February 16, 2024 20:29
@cottsay cottsay changed the base branch from master to cottsay/extension-point-tests February 22, 2024 17:12
@cottsay cottsay force-pushed the cottsay/cache-extension-points branch from 893cc79 to 5ed75ac Compare February 22, 2024 17:12
@cottsay cottsay marked this pull request as draft February 22, 2024 17:12
@cottsay cottsay force-pushed the cottsay/cache-extension-points branch 4 times, most recently from f834f9a to 86c11bb Compare February 22, 2024 21:58
@cottsay cottsay marked this pull request as ready for review February 22, 2024 22:04
Copy link
Contributor

@nuclearsandwich nuclearsandwich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

colcon_core/extension_point.py Show resolved Hide resolved
@delete-merged-branch delete-merged-branch bot deleted the branch master March 11, 2024 23:00
@cottsay cottsay changed the base branch from cottsay/extension-point-tests to master March 11, 2024 23:01
Whenever we enumerate Python entry points to load colcon extension
points, we're re-parsing metadata for every Python package found on the
system. Worse yet, accessing attributes on
importlib.metadata.Distribution typically results in re-reading the
metadata each time, so we're hitting the disk pretty hard.

We don't generally expect the entry points available to change, so we
should cache that information once and parse each package's metadata a
single time.

This change jumps through a lot of hoops to specifically use the
`importlib.metadata.entry_points()` function wherever possible because
it has an optimization that allows us to avoid reading each package's
metadata while still properly handling package shadowing between paths.
This has a measurable impact on extension point loading performance.
@cottsay cottsay force-pushed the cottsay/cache-extension-points branch from 86c11bb to 69e20f9 Compare March 11, 2024 23:01
@cottsay cottsay merged commit 2208a3b into master Mar 11, 2024
42 checks passed
@delete-merged-branch delete-merged-branch bot deleted the cottsay/cache-extension-points branch March 11, 2024 23:18
@cottsay cottsay added this to the 0.15.3 milestone Mar 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Development

Successfully merging this pull request may close these issues.

Cache get_extension_points output
2 participants