Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Address feedback from plenary #77

Merged
merged 5 commits into from
May 13, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
20 changes: 13 additions & 7 deletions spec.emu
Original file line number Diff line number Diff line change
Expand Up @@ -31,9 +31,11 @@ contributors:
1. Let _escaped_ be the empty String.
1. Let _cpList_ be StringToCodePoints(_S_).
1. For each code point _c_ in _cpList_, do
1. If _escaped_ is the empty String and _c_ is matched by |DecimalDigit|, then
1. NOTE: Escaping a leading digit ensures that output corresponds with pattern text which may be used after a `\0` character escape or a |DecimalEscape| such as `\1` and still match _S_ rather than be interpreted as an extension of the preceding escape sequence.
1. Set _escaped_ to the string-concatenation of _escaped_, the code unit 0x005C (REVERSE SOLIDUS), *"x3"*, and the code unit whose numeric value is the numeric value of _c_.
1. If _escaped_ is the empty String, and _c_ is matched by |DecimalDigit| or |AsciiLetter|, then
1. NOTE: Escaping a leading digit ensures that output corresponds with pattern text which may be used after a `\0` character escape or a |DecimalEscape| such as `\1` and still match _S_ rather than be interpreted as an extension of the preceding escape sequence. Escaping a leading ASCII letter does the same for the context after `\c`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this sentence is too long to not use a parenthetical.

Suggested change
1. NOTE: Escaping a leading digit ensures that output corresponds with pattern text which may be used after a `\0` character escape or a |DecimalEscape| such as `\1` and still match _S_ rather than be interpreted as an extension of the preceding escape sequence. Escaping a leading ASCII letter does the same for the context after `\c`.
1. NOTE: Escaping a leading digit ensures that output corresponds with pattern text, which may be used after a `\0` character escape or a |DecimalEscape| such as `\1`, and still match _S_ rather than be interpreted as an extension of the preceding escape sequence. Escaping a leading ASCII letter does the same for the context after `\c`.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those commas read as ungrammatical to me. In particular the second comma cuts a clause in the middle - the pattern text "maybe used after \0 [...] and still match S". Open to other rephrasing here but I don't like this particular suggestion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, TIL \c0 is a valid escape. This also prevents new RegExp("\\" + escape('n')) from combining.

1. Let _hex_ be Number::toString(𝔽(_c_), 16).
1. Assert: The length of _hex_ is 2.
1. Set _escaped_ to the string-concatenation of the code unit 0x005C (REVERSE SOLIDUS), *"x"*, and _hex_.
1. Else,
1. Set _escaped_ to the string-concatenation of _escaped_ and EncodeForRegExpEscape(_c_).
1. Return _escaped_.
Expand All @@ -52,13 +54,17 @@ contributors:
</h1>
<dl class="header">
<dt>description</dt>
<dd>It returns a string representing a |Pattern| for matching _c_. If _c_ is white space or an ASCII punctuator, the returned value is an escape sequence (corresponding with |HexEscapeSequence| if possible, or otherwise with |RegExpUnicodeEscapeSequence|). Otherwise, the returned value is a string representation of _c_ itself.</dd>
<dd>It returns a string representing a |Pattern| for matching _c_. If _c_ is white space or an ASCII punctuator, the returned value is an escape sequence. Otherwise, the returned value is a string representation of _c_ itself.</dd>
</dl>

<emu-alg>
1. Let _punctuators_ be the string-concatenation of *"(){}[]|,.?\*+-^$=<>/#&!%:;@~'`"*, the code unit 0x0022 (QUOTATION MARK), and the code unit 0x005C (REVERSE SOLIDUS).
1. Let _toEscape_ be StringToCodePoints(_punctuators_).
1. If _toEscape_ contains _c_ or _c_ is matched by |WhiteSpace|, then
1. If _c_ is matched by |SyntaxCharacter| or _c_ is U+002F (SOLIDUS), then
1. Return the string-concatenation of 0x005C (REVERSE SOLIDUS) and UTF16EncodeCodePoint(_c_).
Comment on lines +61 to +62
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This strikes me as a point-in-time snapshot, and I'm not a fan because it will be weird if e.g. \@ becomes a valid escape in the future. My preference remains the universally applicable \x…/\u… approach. But that said, there is no technical issue here; merely an aesthetic one.

1. Else if _c_ is the code point listed in some cell of the “Code Point” column of <emu-xref href="#table-controlescape-code-point-values"></emu-xref>, then
1. Return the string-concatenation of 0x005C (REVERSE SOLIDUS) and the string in the “ControlEscape” column of the row whose “Code Point” column contains _c_.
bakkot marked this conversation as resolved.
Show resolved Hide resolved
1. Let _otherPunctuators_ be the string-concatenation of *",-=<>#&!%:;@~'`"* and the code unit 0x0022 (QUOTATION MARK).
ljharb marked this conversation as resolved.
Show resolved Hide resolved
1. Let _toEscape_ be StringToCodePoints(_otherPunctuators_).
1. If _toEscape_ contains _c_, _c_ is matched by |WhiteSpace| or |LineTerminator|, or _c_ has the same numeric value as a leading surrogate or trailing surrogate, then
1. If _c_ ≤ 0xFF, then
1. Let _hex_ be Number::toString(𝔽(_c_), 16).
1. Return the string-concatenation of the code unit 0x005C (REVERSE SOLIDUS), *"x"*, and StringPad(_hex_, 2, *"0"*, ~start~).
Expand Down