Skip to content

Prevent out-of-memory errors while regex array shape inference #3213

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jul 6, 2024

Conversation

staabm
Copy link
Contributor

@staabm staabm commented Jul 6, 2024

closes phpstan/phpstan#11292

before this PR the following snippet..

<?php

require 'vendor/autoload.php';

// 1. Read the grammar.
$grammar  = new Hoa\File\Read('hoa://Library/Regex/Grammar.pp');

// 2. Load the compiler.
$compiler = Hoa\Compiler\Llk\Llk::load($grammar);

$nonUrl = "[^\x7f-\xff]";
$pattern = '{(' . $nonUrl . ')}';


// 3. Lex, parse and produce the AST.
$ast      = $compiler->parse($pattern);

// 4. Dump the result.
$dump     = new Hoa\Compiler\Visitor\Dump();
echo $dump->visit($ast);

.. mistakenly detected a non-parseable regex string as a lexeme, because errors in preg_match were not reported back properly in https://github.com/staabm/HoaCompiler/blob/c620f44deff0b4c2d0c27560a3b0f5e7e376e001/Llk/Lexer.php#L275-L302

this in turn lead to a wall of warnings like

Warning: Undefined array key 0 in /Users/staabm/workspace/phpstan-src/vendor/hoa/compiler/Llk/Lexer.php on line 288
PHP Warning:  Undefined array key 0 in /Users/staabm/workspace/phpstan-src/vendor/hoa/compiler/Llk/Lexer.php on line 299

Warning: Undefined array key 0 in /Users/staabm/workspace/phpstan-src/vendor/hoa/compiler/Llk/Lexer.php on line 299
PHP Warning:  Undefined array key 0 in /Users/staabm/workspace/phpstan-src/vendor/hoa/compiler/Llk/Lexer.php on line 300

and the lexer running into a infinite loop

staabm added 2 commits July 6, 2024 11:14
@staabm staabm marked this pull request as ready for review July 6, 2024 09:18
@phpstan-bot
Copy link
Collaborator

This pull request has been marked as ready for review.

$nonUrl = "[^-_#$+.!*%'(),;/?:@~=&a-zA-Z0-9\x7f-\xff]";
$pattern = '{\\b(https?://)?(?:([^]\\\\\\x00-\\x20\\"(),:-<>[\\x7f-\\xff]{1,64})(:[^]\\\\\\x00-\\x20\\"(),:-<>[\\x7f-\\xff]{1,64})?@)?((?:[-a-zA-Z0-9\\x7f-\\xff]{1,63}\\.)+[a-zA-Z\\x7f-\\xff][-a-zA-Z0-9\\x7f-\\xff]{1,62})((:[0-9]{1,5})?(/[!$-/0-9:;=@_~\':;!a-zA-Z\\x7f-\\xff]*?)?(\\?[!$-/0-9:;=@_\':;!a-zA-Z\\x7f-\\xff]+?)?(#[!$-/0-9?:;=@_\':;!a-zA-Z\\x7f-\\xff]+?)?)(?=[)\'?.!,;:]*(' . $nonUrl . '|$))}';
if (preg_match($pattern, $s, $matches, PREG_OFFSET_CAPTURE, 0)) {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also test what kind of array shape is inferred here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also can you please explain what happened and why this fix helps?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a PR description above

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
@ondrejmirtes ondrejmirtes merged commit 18cddd6 into phpstan:1.11.x Jul 6, 2024
450 of 453 checks passed
@ondrejmirtes
Copy link
Member

Thank you.

@staabm staabm deleted the reg-mem branch July 6, 2024 09:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

preg_match exhausts all memory
3 participants