Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduced GroundingDataset annotation unpacking speed #18382

Merged

Conversation

Lornatang
Copy link
Contributor

@Lornatang Lornatang commented Dec 25, 2024

Reduce IO latency caused by multiple unpacking when faced with a large number of annotations

🛠️ PR Summary

Made with ❤️ by Ultralytics Actions

🌟 Summary

Improved dataset handling by reorganizing captions for cleaner and more efficient processing. 🛠️

📊 Key Changes

  • Refactored code to directly assign caption = img["caption"] before using it in label extraction logic.
  • Simplified and clarified how category names (cat_name) are pulled from captions during annotation processing.

🎯 Purpose & Impact

  • Purpose: To enhance code readability and maintainability by removing redundancy in caption handling.
  • Impact: This change ensures smoother, more efficient dataset operations, which is great for both developers working on the code and users expecting swift and reliable annotation processing. 🚀

Verified

This commit was signed with the committer’s verified signature. The key has expired.
LiviaMedeiros Livia Medeiros
Copy link

sentry-io bot commented Dec 25, 2024

🔍 Existing Issues For Review

Your pull request is modifying functions with the following pre-existing issues:

📄 File: ultralytics/data/dataset.py

Function Unhandled Issue
get_labels ValueError: PosixPath('.') has an empty name path...
Event Count: 5
get_labels PermissionError: [Errno 1] Operation not permitted: '/content/drive/.shortcut-targets-by-id/1Kevj-bsv2CmtRnaGIR2wK... ...
Event Count: 3
get_labels ValueError: not enough values to unpack (expected 3, got 0) ultralytics.data.dataset in...
Event Count: 2
get_labels EOFError: No data left in file numpy.lib.npyio in...
Event Count: 2

Did you find this useful? React with a 👍 or 👎

Copy link

codecov bot commented Dec 25, 2024

Codecov Report

Attention: Patch coverage is 0% with 2 lines in your changes missing coverage. Please review.

Project coverage is 73.73%. Comparing base (2aac80d) to head (fdc1c87).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
ultralytics/data/dataset.py 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #18382      +/-   ##
==========================================
+ Coverage   73.70%   73.73%   +0.03%     
==========================================
  Files         129      129              
  Lines       17278    17279       +1     
==========================================
+ Hits        12735    12741       +6     
+ Misses       4543     4538       -5     
Flag Coverage Δ
Benchmarks 34.88% <0.00%> (+0.03%) ⬆️
GPU 38.30% <0.00%> (-0.01%) ⬇️
Tests 67.52% <0.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Lornatang and others added 3 commits December 26, 2024 16:37

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

Verified

This commit was signed with the committer’s verified signature. The key has expired.
LiviaMedeiros Livia Medeiros

Verified

This commit was signed with the committer’s verified signature. The key has expired.
LiviaMedeiros Livia Medeiros
UltralyticsAssistant and others added 2 commits December 26, 2024 22:29

Verified

This commit was signed with the committer’s verified signature. The key has expired.
LiviaMedeiros Livia Medeiros

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
@glenn-jocher glenn-jocher self-requested a review December 28, 2024 04:25
@glenn-jocher glenn-jocher merged commit 1051255 into ultralytics:main Dec 28, 2024
15 checks passed
@UltralyticsAssistant
Copy link
Member

🎉 PR Merged! Huge thanks to @Lornatang for leading this improvement, with valuable insights from @Laughing-q and @glenn-jocher. Your collaborative effort to simplify and refine dataset handling ensures faster, cleaner, and smarter processes for everyone. 🚀✨

As Leonardo da Vinci once said, "Simplicity is the ultimate sophistication." This work embodies that spirit by making the code more elegant and impactful. Your contributions today will ripple out into countless projects, empowering developers and creators alike. Bravo! 🙌

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants