Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect lookbehind with a non-constant length or unbounded repetition #69

Open
Aloso opened this issue Jan 13, 2023 · 0 comments
Open

Detect lookbehind with a non-constant length or unbounded repetition #69

Aloso opened this issue Jan 13, 2023 · 0 comments
Assignees
Labels
bug Something isn't working good first issue Good for newcomers

Comments

@Aloso
Copy link
Member

Aloso commented Jan 13, 2023

Java, Python, and PCRE all restrict what is allowed to appear in a lookbehind assertion. These restrictions are similar, but with some differences.

  • In Python, a lookbehind must have a fixed length and may only contain repetitions like {n}. \X is forbidden in a lookbehind.

  • In PCRE, a lookbehind may contain alternatives with different lengths, which can even be nested, as well as repetitions with an upper bound. As in Python, \X and unbounded repetitions are forbidden

  • Java has the most complicated rules. It allows arbitrary repetition in a lookbehind; however, every repetition must have a fixed length, and may not include alternations or ? even when they have the same length. ? / {0,1} is not treated as a repetition by Java.

    Java treats repeated groups ((a)+) differently than other kinds of repetitions (a+, [a]+). Infinite repetitions of groups must satisfy the following constraints, where $n$ is the number of code points matched by the group:

    • when $n$ is odd:

      • it must be preceded by an odd number $r$ of infinite repetitions, or
      • the part before the repeated group must match $c$ code points (excluding infinite repetitions), so that $c - r < n$
    • when $n$ is even:

      • it must be preceded by an odd number $r$ of infinite repetitions, and
      • the part before the repeated group must match $c$ code points (excluding infinite repetitions), so that $c - r \in [0; n)$

    These rules might not be comprehensive, but I haven't found an exception so far.

Describe the bug

Pomsky allows anything in a lookbehind.

Expected behavior

A compatibility error should be produced.

@Aloso Aloso added the bug Something isn't working label Jan 13, 2023
@Aloso Aloso added the good first issue Good for newcomers label Mar 19, 2023
@Aloso Aloso self-assigned this May 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

1 participant