Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a max jobs per worker budget #4965

Merged
merged 9 commits into from Sep 10, 2023
Merged

Conversation

staabm
Copy link
Contributor

@staabm staabm commented Sep 10, 2023

@staabm staabm marked this pull request as ready for review September 10, 2023 07:24
@staabm
Copy link
Contributor Author

staabm commented Sep 10, 2023

will be AFK for a few hours

@@ -59,7 +59,7 @@ public function disableParallel(): void
SimpleParameterProvider::setParameter(Option::PARALLEL, false);
}

public function parallel(int $seconds = 120, int $maxNumberOfProcess = 16, int $jobSize = 20): void
public function parallel(int $seconds = 120, int $maxNumberOfProcess = 16, int $jobSize = 15): void
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reduced the jobsize, which leads to less memory used

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

config/config.php need to be updated as well:

$rectorConfig->parallel(120, 16, 20);

could you also update the different between $jobSize vs MAX_CHUNKS_PER_WORKER ? Thank you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dropped the args in this file, as these just reflect the defaults and therefore don't need to be kept in sync

could you also update the different between $jobSize vs MAX_CHUNKS_PER_WORKER ?

where?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here seems ok

#4965 (review)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good 👍

if ($jobs === []) {
$this->processPool->quitProcess($processIdentifier);
return;
}

$job = array_pop($jobs);
$jobsChunk = array_pop($jobs);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renaming because the previous name suggested it would be a single job

@staabm
Copy link
Contributor Author

staabm commented Sep 10, 2023

with the parameter adjustments at this point we can run rector across mautic with

    $rectorConfig->paths([
        __DIR__.'/app/',
        __DIR__.'/plugins',
    ]);

with a peak of 14.75 GB. it never grows higher then that:

time php ../rector-src/bin/rector.php --dry-run --no-progress-bar --no-diffs --clear-cache
                                               
 [OK] 10 files would have changed (dry-run) by Rector                                                                   
                                                                                                                       
php ../rector-src/bin/rector.php --dry-run --no-progress-bar --no-diffs   379.85s user 11.00s system 798% cpu 48.939 total

with rector 0.18.2 it peaks at 16.5 GB RAM and takes

➜  mautic git:(5.x) ✗ time php ../rector-src/bin/rector.php --dry-run --no-progress-bar --no-diffs --clear-cache
                                                                                                                        
 [OK] 30 files would have changed (dry-run) by Rector                                                                                                                                                                                      

php ../rector-src/bin/rector.php --dry-run --no-progress-bar --no-diffs   516.54s user 8.36s system 766% cpu 1:08.51 total

=> with this PR (and all the already merged optimizations) we are 20 seconds faster and take ~1.75 GB less memory

*
* @var int
*/
private const MAX_CHUNKS_PER_WORKER = 8;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is a bit ambiguous the different between $jobSize vs MAX_CHUNKS_PER_WORKER, could you write a comment here? Thank you.

Copy link
Member

@samsonasik samsonasik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TomasVotruba let's quick merge and revert in case of it doesn't work :)

Thank you @staabm

@samsonasik samsonasik merged commit 0717306 into rectorphp:main Sep 10, 2023
38 checks passed
@staabm staabm deleted the para branch September 10, 2023 12:36
@samsonasik
Copy link
Member

@staabm I am testing in CodeIgniter 4 project, when we define:

$rectorConfig->parallel(240, 8, 1);

the procss got 2 times slower, but when we define default value:

$rectorConfig->parallel();

the performance seems equals, I mention you in PR test at:

so in case other CodeIgniter4 project maintainer has different use case, they can give you feedback faster :)

@staabm
Copy link
Contributor Author

staabm commented Sep 10, 2023

the procss got 2 times slower,

thats expected. the smaller the jobsize is and the smaller the chunks-per-worker is, the slower the rector job will get.
but it also uses less memory.

the higher the jobsize and the higher the chunks-per-worker are, rector gets faster but requires more memory.

its a balance between memory and speed. before this PR all default settings were set for maximum speed but without limits for memory. thats the actual problem the bigger the project gets we analyze.

with this PR we have a mean to cut the memory peaks.

performance and memory also depends whether rector just scans the codebase and does not need todo anything or whether it needs to refactor something

@staabm
Copy link
Contributor Author

staabm commented Sep 10, 2023

@Flyingmana @yguedidi you might be interessted in testing rector@dev-main which now uses process re-spawning to recover from memory leaks and memory spikes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants