Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests: split DDP & common + running xdist #2127

Merged
merged 17 commits into from
Jan 11, 2024
Merged

tests: split DDP & common + running xdist #2127

merged 17 commits into from
Jan 11, 2024

Conversation

Borda
Copy link
Member

@Borda Borda commented Oct 2, 2023

What does this PR do?

Identify DDP tests and split testing into two steps:

  • run DDP in a single thread as it was until now
  • the rest of the tests try to run in parallel

In addition move unitest steps back to the main workflow

Before submitting
  • Was this discussed/agreed via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure to update the docs?
  • Did you write any new necessary tests?
PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃


📚 Documentation preview 📚: https://torchmetrics--2127.org.readthedocs.build/en/2127/

Sorry, something went wrong.

@Borda Borda added the test / CI testing or CI label Oct 2, 2023
@Borda Borda added this to the v1.3.0 milestone Oct 2, 2023
@Borda Borda requested a review from ethanwharris as a code owner October 2, 2023 09:41
@Borda
Copy link
Member Author

Borda commented Oct 2, 2023

Seems like this does not even start properly, could be because of?

def pytest_sessionstart():
"""Global initialization of multiprocessing pool.
Runs before any test.
"""
pool = Pool(processes=NUM_PROCESSES)
pool.starmap(setup_ddp, [(rank, NUM_PROCESSES) for rank in range(NUM_PROCESSES)])
pytest.pool = pool
def pytest_sessionfinish():
"""Correctly closes the global multiprocessing pool.
Runs after all tests.
"""
pytest.pool.close()
pytest.pool.join()

cc: @SkafteNicki

@SkafteNicki
Copy link
Member

Seems like this does not even start properly, could be because of?

Yeah that would definitely be the problem, if I remember correctly you cannot start a child process inside a child process. So I am not sure if we will be able to use xdist for testing.

@SkafteNicki
Copy link
Member

I can get it to run locally, but it is complaining with error

AttributeError: module pytest has no attribute pool

meaning that we need to refactor the code where we initialize the pool used for multiprocessing testing if we want this PR do land.

@mergify mergify bot removed the has conflicts label Oct 17, 2023
@Borda Borda marked this pull request as draft January 9, 2024 12:46
@Borda Borda marked this pull request as ready for review January 11, 2024 12:41
@Borda
Copy link
Member Author

Borda commented Jan 11, 2024

Pivoting and splitting into two steps can eventually be different jobs.
so all DDPs are marked and will run as they are now; the rest will be run xdist

BTW, mowing unittest action to the main workflow as we do not use conda env anymore so, having it separately does not make sense anymore, and it will be easier to read/follow steps

Copy link

codecov bot commented Jan 11, 2024

Codecov Report

Merging #2127 (b3e39f5) into master (41e942c) will decrease coverage by 54%.
The diff coverage is n/a.

Additional details and impacted files
@@           Coverage Diff            @@
##           master   #2127     +/-   ##
========================================
- Coverage      87%     33%    -54%     
========================================
  Files         303     303             
  Lines       17059   17059             
========================================
- Hits        14821    5613   -9208     
- Misses       2238   11446   +9208     

@Borda
Copy link
Member Author

Borda commented Jan 11, 2024

this is stated from master:
image
so seems we get 15-40min boost on GPU
image

@Borda Borda enabled auto-merge (squash) January 11, 2024 17:31
@Borda Borda requested a review from lantiga January 11, 2024 17:31
@Borda Borda modified the milestones: v1.3.0, v1.4.0, v1.3.x Jan 11, 2024
Copy link
Member

@SkafteNicki SkafteNicki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic work @Borda

@mergify mergify bot added the ready label Jan 11, 2024

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
@lantiga
Copy link
Contributor

lantiga commented Jan 11, 2024

Great job @Borda ⚡️

@stancld
Copy link
Contributor

stancld commented Jan 11, 2024

@Borda Wuaah, this is amazing! 🥇

@Borda Borda disabled auto-merge January 11, 2024 23:19
@Borda Borda changed the title tests: running xdist tests: split DDP & common + running xdist Jan 11, 2024
@Borda Borda merged commit fd2e332 into master Jan 11, 2024
@Borda Borda deleted the test/xdist branch January 11, 2024 23:21
Borda added a commit that referenced this pull request Jan 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready test / CI testing or CI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants