Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a max jobs per worker budget #4965

Merged
merged 9 commits into from
Sep 10, 2023
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 1 addition & 1 deletion config/config.php
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@

$rectorConfig->autoloadPaths([]);
$rectorConfig->bootstrapFiles([]);
$rectorConfig->parallel(120, 16, 20);
$rectorConfig->parallel();

// to avoid autoimporting out of the box
$rectorConfig->importNames(false, false);
Expand Down
2 changes: 1 addition & 1 deletion packages/Config/RectorConfig.php
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ public function disableParallel(): void
SimpleParameterProvider::setParameter(Option::PARALLEL, false);
}

public function parallel(int $seconds = 120, int $maxNumberOfProcess = 16, int $jobSize = 20): void
public function parallel(int $seconds = 120, int $maxNumberOfProcess = 16, int $jobSize = 15): void
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reduced the jobsize, which leads to less memory used

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

config/config.php need to be updated as well:

$rectorConfig->parallel(120, 16, 20);

could you also update the different between $jobSize vs MAX_CHUNKS_PER_WORKER ? Thank you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dropped the args in this file, as these just reflect the defaults and therefore don't need to be kept in sync

could you also update the different between $jobSize vs MAX_CHUNKS_PER_WORKER ?

where?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here seems ok

#4965 (review)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good 👍

{
SimpleParameterProvider::setParameter(Option::PARALLEL, true);
SimpleParameterProvider::setParameter(Option::PARALLEL_JOB_TIMEOUT_IN_SECONDS, $seconds);
Expand Down
60 changes: 49 additions & 11 deletions packages/Parallel/Application/ParallelFileProcessor.php
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,12 @@ final class ParallelFileProcessor
* @var int
*/
private const SYSTEM_ERROR_LIMIT = 50;
/**
* the number of chunks a worker can process before getting killed
*
* @var int
*/
private const MAX_CHUNKS_PER_WORKER = 8;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is a bit ambiguous the different between $jobSize vs MAX_CHUNKS_PER_WORKER, could you write a comment here? Thank you.


private ProcessPool|null $processPool = null;

Expand Down Expand Up @@ -98,10 +104,10 @@ public function process(
return;
}

$job = array_pop($jobs);
$jobsChunk = array_pop($jobs);
$parallelProcess->request([
ReactCommand::ACTION => Action::MAIN,
Content::FILES => $job,
Content::FILES => $jobsChunk,
]);
});
});
Expand Down Expand Up @@ -137,12 +143,24 @@ public function process(
};

$timeoutInSeconds = SimpleParameterProvider::provideIntParameter(Option::PARALLEL_JOB_TIMEOUT_IN_SECONDS);

for ($i = 0; $i < $numberOfProcesses; ++$i) {
// nothing else to process, stop now
if ($jobs === []) {
break;
}
$fileChunksBudgetPerProcess = [];

$processSpawner = function() use (
&$systemErrors,
&$fileDiffs,
&$jobs,
$postFileCallback,
&$systemErrorsCount,
&$reachedInternalErrorsCountLimit,
$mainScript,
$input,
$serverPort,
$streamSelectLoop,
$timeoutInSeconds,
$handleErrorCallable,
&$fileChunksBudgetPerProcess,
&$processSpawner
): void {

$processIdentifier = Random::generate();
$workerCommandLine = $this->workerCommandLineFactory->create(
Expand All @@ -153,6 +171,7 @@ public function process(
$processIdentifier,
$serverPort,
);
$fileChunksBudgetPerProcess[$processIdentifier] = self::MAX_CHUNKS_PER_WORKER;

$parallelProcess = new ParallelProcess($workerCommandLine, $streamSelectLoop, $timeoutInSeconds);

Expand All @@ -167,7 +186,9 @@ function (array $json) use (
&$systemErrorsCount,
&$collectedDatas,
&$reachedInternalErrorsCountLimit,
$processIdentifier
$processIdentifier,
&$fileChunksBudgetPerProcess,
&$processSpawner
): void {
// decode arrays to objects
foreach ($json[Bridge::SYSTEM_ERRORS] as $jsonError) {
Expand Down Expand Up @@ -195,16 +216,24 @@ function (array $json) use (
$this->processPool->quitAll();
}

if ($fileChunksBudgetPerProcess[$processIdentifier] <= 0) {
// kill the current worker, and spawn a fresh one to free memory
$this->processPool->quitProcess($processIdentifier);

($processSpawner)();
return;
}
if ($jobs === []) {
$this->processPool->quitProcess($processIdentifier);
return;
}

$job = array_pop($jobs);
$jobsChunk = array_pop($jobs);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renaming because the previous name suggested it would be a single job

$parallelProcess->request([
ReactCommand::ACTION => Action::MAIN,
Content::FILES => $job,
Content::FILES => $jobsChunk,
]);
--$fileChunksBudgetPerProcess[$processIdentifier];
},

// 2. callable on error
Expand All @@ -226,6 +255,15 @@ function ($exitCode, string $stdErr) use (&$systemErrors, $processIdentifier): v
);

$this->processPool->attachProcess($processIdentifier, $parallelProcess);
};

for ($i = 0; $i < $numberOfProcesses; ++$i) {
// nothing else to process, stop now
if ($jobs === []) {
break;
}

($processSpawner)();
}

$streamSelectLoop->run();
Expand Down