Skip to content
This repository has been archived by the owner on Sep 20, 2021. It is now read-only.

Incomplete support for internal option setting #29

Open
ju1ius opened this issue Jan 17, 2018 · 2 comments
Open

Incomplete support for internal option setting #29

ju1ius opened this issue Jan 17, 2018 · 2 comments

Comments

@ju1ius
Copy link

ju1ius commented Jan 17, 2018

Hi !

What works

  • Setting a single option: a(?i)b
  • Unsetting a single option: a(?-i)b

All the above work only for the i, m, s and x options.

What doesn't work:

  1. Setting / unsetting the U, X, and J options
  2. Setting several options: a(?im)b
  3. Unsetting several options: a(?-i-m)b
  4. Mixing the above two: a(?i-m)b
  5. Setting options for a non-capturing group: a(?i:b)c
  6. The grammar allows the (?+i) syntax, but according to the documentation and the PHP implementation this is invalid.

All the above fail with: Unexpected token "?" (zero_or_one) at line 1 and column 3

Possible fixes

Changing the grammar to:

// Internal options.
%token internal_option \(\?(-?[imsxJUX])+\)

solves n° 1, 2, 3, 4 & 6.
n° 5 is a bit more complex... 😉


Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

@ju1ius
Copy link
Author

ju1ius commented Jan 17, 2018

Changing the grammar to:

// Tokens
%token internal_option_             \(\?(?=-?[imsxJUX])         -> opt
%token opt:internal_option          -?[imsxJUX]                 -> opt
%token opt:semicolon                :                           -> default
%token opt:_internal_option         \)                          -> default

// Rules
internal_options:
    ::internal_option_:: options() ::_internal_option::
    | ::internal_option_:: options() ::semicolon:: alternation() ::_capturing:: #noncapturing

options:
    <internal_option>+ #internal_options

yields the following parse trees:

Pattern: a(?i)b
>  #expression
>  >  #concatenation
>  >  >  token(literal, a)
>  >  >  #internal_options
>  >  >  >  token(opt:internal_option, i)
>  >  >  token(literal, b)
Pattern: a(?i:b)c
>  #expression
>  >  #concatenation
>  >  >  token(literal, a)
>  >  >  #noncapturing
>  >  >  >  #internal_options
>  >  >  >  >  token(opt:internal_option, i)
>  >  >  >  token(literal, b)
>  >  >  token(literal, c)

Which seem syntactically correct since, to me at least, (?i:b) means «a non-capturing-group for which the i option is set».

What do you think ?

@Hywan
Copy link
Member

Hywan commented Jan 22, 2018

Thanks for the report!

  • About problem 2, I'm not aware of the U, X, and J options. Where did you find them?
  1. While reading the documentation again, I found that the option n is missing,

  2. While reading the documentation again, I found this:

    An empty options setting "(?)" is allowed. Needless to say, it has no effect.

    And right now, this form is not supported.

Let's consider the following diff:

- %token  internal_option          \(\?[\-+]?[imsx]\)
+ %token  internal_option          \(\?(-?[imnsx]+)*\)

It should solve problems 1, 2, 3, 4, 6, 7, and 8.

Problem 5 is more tricky, and it's not related to “internal option” directly. Can you open another issue to address it please?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

No branches or pull requests

2 participants