New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: make no-misleading-character-class
report more granular errors
#18082
Conversation
✅ Deploy Preview for docs-eslint canceled.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine to produce more errors here and not mark it as breaking. It seems more like a bug fix to me.
default: | ||
return null; | ||
// Only literals and expression-less templates generate granular errors. | ||
if (!(node.type === "TemplateLiteral" && !node.expressions.length || node.type === "Literal")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finding this a bit difficult to understand, maybe this?
if (!(node.type === "TemplateLiteral" && !node.expressions.length || node.type === "Literal")) { | |
const isTemplateWithoutExpressions = node.type === "TemplateLiteral" && node.expressions.length === 0; | |
if (!isTemplateWithoutExpressions && node.type !== "Literal")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Funny, I was going to suggest if (node.type !== "TemplateLiteral" || (node.type === "Literal" && node.expressions.length)) {
. No preference between the two from me. 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I realized that that check was always falsy, so I just removed it. It's because strings and templates without expressions always produce granular reports now, and so the default case isn't hit any more. Somehow I missed that while testing.
package.json
Outdated
@@ -72,6 +72,7 @@ | |||
"@nodelib/fs.walk": "^1.2.8", | |||
"ajv": "^6.12.4", | |||
"chalk": "^4.0.0", | |||
"char-source": "^0.0.0", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, since you're the author of this package, is there any reason it's just not included in this PR as a utility?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that that package contains logic that isn't needed in ESLint. The whole validation part for one, including 50% of the unit tests, is not necessary because the syntax of the arguments can be assumed to be valid since it was checked by the parser. There's also additional information about used features (for example, to tell if a string literal is valid in strict mode or not) which seems superfluous unless we expect to use it in ESLint somewhere later.
So if we are going to extract a utility from that package we should probably do some refactoring to remove the unnecessary logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, let's remove the unnecessary code, of course. As it seems like the package was created just for this purpose (?), I'm assuming that's not an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 53e262a, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a great start! 👏 on going through the effort of making the parser. Looks from the files on GitHub like it was a lot of tricky work.
Since you mentioned bringing the code inline, requesting changes on that before doing a deeper review. Also I think I might have missed something with the charInfos
caching?
}] | ||
}, | ||
|
||
/* eslint-disable lines-around-comment -- see https://github.com/eslint/eslint/issues/18081 */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Question] Why not put the comment outside the code
? It's not necessary for the test, so I'd think that preferable even ignoring the lines-around-comment
shenanigan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, that's not for the comment inside the code
, it's this comment that is producing errors now. I though that the comments in the code would make it easier to locate a test in the source code given its title, because the code
as it's calculated here doesn't match what is printed in the console.
default: | ||
return null; | ||
// Only literals and expression-less templates generate granular errors. | ||
if (!(node.type === "TemplateLiteral" && !node.expressions.length || node.type === "Literal")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Funny, I was going to suggest if (node.type !== "TemplateLiteral" || (node.type === "Literal" && node.expressions.length)) {
. No preference between the two from me. 🙂
lib/rules/utils/char-source.js
Outdated
* @returns {Generator<CodeUnit>} Zero, one or two `CodeUnit` elements. | ||
*/ | ||
function *mapEscapeSequenceOrLineContinuation(reader) { | ||
const start = reader.pos++; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A little unclear here...are you intentionally moving the reader position by one? Or did you intend for const start = reader.pos + 1
?
Either way, I think this line needs reworking to make its intention clear.
lib/rules/utils/char-source.js
Outdated
/** | ||
* An object used to keep track of the position in a source text where the next characters will be read. | ||
*/ | ||
class SourceReader { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might rename this as TextSource
. SourceReader
implies that there's a read()
method or something that this object is supposed to be doing, but it's just a data structure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback. I've added a read()
method in 448d9a9 to read characters relative to the current position. So it's no longer necessary to store the position in a variable before accessing the source like it was done before. And I'm using +=
for relative position changes. I think this makes the code a little clearer but if that's not the case we can revert to how it was done previously and just address the suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the read()
!
SourceReader
is still a little ambiguous IMO. Maybe, CharsReader
? CharactersReader
? TextReader
? Not a blocker from my end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I don't have a preference for the name, so pick one that sounds better to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd vote for TextReader
then, to align with #18082 (comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed in 072f256.
@@ -32,6 +32,12 @@ class SourceReader { | |||
this.source = source; | |||
this.pos = 0; | |||
} | |||
|
|||
read(offset = 0, length = 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add some JSDoc here?
lib/rules/utils/char-source.js
Outdated
|
||
reader.pos = posAfterBackslash + octalStr.length; | ||
reader.pos += octalStr.length - 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because you refactored to include a read()
method in SourceReader
, it might make things clearer if you also created a advance()
method that moves pos
rather than augmenting the value like this. So this code would become:
reader.pos += octalStr.length - 1; | |
reader.advance(octalStr.length - 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea 👍🏻
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All done in b1cf05f.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Would like @JoshuaKGoldberg to review his changes before merging.
codeUnits ??= parseStringLiteral(source); | ||
start = offset + codeUnits[firstIndex].start; | ||
end = offset + codeUnits[lastIndex].end; | ||
} else { // RegExp Literal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 7bb7e55.
let start; | ||
let end; | ||
|
||
if (node.type === "TemplateLiteral") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should check if the template literal has expressions. For example, this crashes:
/* eslint no-misleading-character-class: "error" */
new RegExp(`${"[👍]"}`);
TypeError: Cannot read properties of undefined (reading 'start')
Occurred while linting C:\projects\eslint\foo.js:3
Rule: "no-misleading-character-class"
at C:\projects\eslint\lib\rules\no-misleading-character-class.js:294:64
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct! Those cases should be covered by unit tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 7bb7e55.
This could be a bug in the current implementation: RegExp(/[👍]/u);
The rule complains about a missing console.log(RegExp(/[👍]/u).flags);
// "u" |
This looks also like an existing bug: RegExp("[👍]", foo);
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! I really like the char-source.js
contents. It's super clear and helps make the rule source much easier to understand. Especially the parts I was getting confused by writing it. 😄
I tried it in a playground locally and could only come to the same discrepencies that mdjermanovic mentioned: https://github.com/eslint/eslint/pull/18082/files#r1491652787 & https://github.com/eslint/eslint/pull/18082/files#r1491592967. [localhost playground link with the two cases]
lib/rules/utils/char-source.js
Outdated
/** | ||
* An object used to keep track of the position in a source text where the next characters will be read. | ||
*/ | ||
class SourceReader { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the read()
!
SourceReader
is still a little ambiguous IMO. Maybe, CharsReader
? CharactersReader
? TextReader
? Not a blocker from my end.
I've fixed this problem and this one in this commit. I've also changed the logic and updated the unit tests for the cases described in this comment where the flags are not calculated correctly and in this false positive where the flags are unknown. Please, have a look again if the logic is correct now. |
if (patternNode.type === "Literal" && patternNode.regex) { | ||
pattern = patternNode.regex.pattern; | ||
flags = flagsNode ? getStringIfConstant(flagsNode, scope) : patternNode.regex.flags; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, if we want to entirely fix the behavior with new RegExp(<regex>, flags?)
, perhaps we should switch from getStringIfConstant
to getStaticValue
for the patternNode
.
const regex = /[👍]/u;
new RegExp(regex); // false positive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we also try to avoid duplicate reports when there is a
flags
argument alongside a regex literal?
Yes, I think it would be good to avoid duplicate reports. Perhaps in that case the rule should skip checking the regex literal and check only the constructor call?
For instance, the following might be considered a false positive because the regex created by the literal isn't actually used as such?
/*eslint no-misleading-character-class: error */
RegExp(/^[👍]$/, "u"); // error on the regex literal
That could get tricky because the reports don't need to be granular, so we'd have to compare the character matches.
I'm not sure I understand this, can you clarify? If the source of the pattern is a regex literal, than can't we just use regexpp's indexes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think it would be good to avoid duplicate reports. Perhaps in that case the rule should skip checking the regex literal and check only the constructor call?
Yes, we could do that if the argument is a regex literal. If it's a variable, it could be used in other places.
That could get tricky because the reports don't need to be granular, so we'd have to compare the character matches.
I'm not sure I understand this, can you clarify? If the source of the pattern is a regex literal, than can't we just use regexpp's indexes?
In afterthought, I'd say it would be better not to check regex literals that are supplied as arguments to the RegExp
constructor like you suggested. At least I think this is what should be done if the RegExp
call has a flags
argument. If a regex is passed as a variable, it should be fine to keep the duplicate reports.
So the whole story would look like this:
/*eslint no-misleading-character-class: error */
RegExp(/[👍]/); // error only on the regex literal
const regex = /[👍]/; // error
RegExp(regex); // no error
RegExp(/[👍]/, ""); // error only on the RegExp call
const regex1 = /[👍]/g; // error
doSomethingElse(regex1);
const regex2 = RegExp(regex1, "g"); // also error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me.
So, if the RegExp
constructor is called with one argument that is a regex value, we'll skip checking the constructor call. If regex literal is the first argument of RegExp
constructor call that also has the second argument, we'll skip checking the regex literal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated in 6792594.
if (flagsNode) { | ||
if (isRegExp(evaluatedPattern.value)) { | ||
pattern = evaluatedPattern.value.source; | ||
checkedPatternNodes.add(patternNode); | ||
} else { | ||
pattern = String(evaluatedPattern.value); | ||
} | ||
flags = getStringIfConstant(flagsNode, scope); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add this test:
{
code: "new RegExp(/^[👍]$/v, '')",
languageOptions: {
ecmaVersion: 2024
},
errors: [{
column: 15,
endColumn: 17,
messageId: "surrogatePairWithoutUFlag",
suggestions: [{ messageId: "suggestUnicodeFlag", output: "new RegExp(/^[👍]$/v, 'u')" }]
}]
}
It's interesting to see what will happen on Node 18 because it doesn't support the v
flag (per node.green)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! This will fail in Node.js 18 because the parser silently sets the value
of the regex Literal node to null
if it cannot evaluate it. It's also a known limitation of eslint-utils (code), but I don't think we want to have a different behavior depending on the version of Node.js.
The only way I found to fix this in ESLint directly is by patching the AST with the missing value
s (see commit ed8d1cd), but this could break third-party rules that don't expect this change, so not sure. Maybe a safer approach would be extending getStaticValue()
in eslint-utils under an option to keep the current API unchanged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only way I found to fix this in ESLint directly is by patching the AST with the missing
value
s (see commit ed8d1cd), but this could break third-party rules that don't expect this change, so not sure. Maybe a safer approach would be extendinggetStaticValue()
in eslint-utils under an option to keep the current API unchanged.
I think that's out of scope for this change, but we could evaluate it separately.
As for this particular case, maybe we could additionally check if the first argument is a regex literal? That would retain the current functionality where an error is reported regardless of the Node.js version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for this particular case, maybe we could additionally check if the first argument is a regex literal? That would retain the current functionality where an error is reported regardless of the Node.js version.
That would not work as expected when the first argument is a variable, e.g.:
var r = /[👶🏻]/v; // error
RegExp(r, 'v'); // error
In Node.js 18, the second line would not be flagged because getStaticValue()
reports a value of null
for the first argument.
Maybe we should just check if the node is a regex literal without using getStaticValue()
when there is a second parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should just check if the node is a regex literal without using
getStaticValue()
when there is a second parameter?
What would the rule check for the RegExp constructors node with two arguments in case the first argument isn't a regex literal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would not work as expected when the first argument is a variable, e.g.:
var r = /[👶🏻]/v; // error RegExp(r, 'v'); // error
In this case, the regex literal would be reported in all Node.js versions, but the RegExp constructor would only in Node.js > 18? That's definitely something we should aim to avoid, but I think this was the case before this change as well so we're at least not introducing new cases where the behavior depends on the runtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would the rule check for the RegExp constructors node with two arguments in case the first argument isn't a regex literal?
If the first argument is not a regex literal we could call getStaticValue()
, but if that returns a regular expression object then we should ignore it to avoid inconsistencies across different versions of Node.js.
That would not work as expected when the first argument is a variable, e.g.:
var r = /[👶🏻]/v; // error RegExp(r, 'v'); // errorIn this case, the regex literal would be reported in all Node.js versions, but the RegExp constructor would only in Node.js > 18? That's definitely something we should aim to avoid, but I think this was the case before this change as well so we're at least not introducing new cases where the behavior depends on the runtime.
Yes, if it's okay to keep the existing bug that's also an option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the first argument is not a regex literal we could call
getStaticValue()
, but if that returns a regular expression object then we should ignore it to avoid inconsistencies across different versions of Node.js.
Good idea! Let's do that to fix inconsistencies.
At some future point, we could see if getStaticValue
could be improved to better handle regexes that can't be instantiated, and then update this code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I've done the changes in 7eaf8b4. I had to comment out some of the unit tests that were no longer passing. Maybe we could also update the tests to make them pass and add a note that the expected results are not really what was intended because of the limitations with Node.js 18.
When I have some time I would like to look at how getStaticValue
is used throughout ESLint and come up to eslint-utils with a proposal to extend that API, so we can remove those limitations.
lib/linter/linter.js
Outdated
@@ -946,6 +946,13 @@ function runRules(sourceCode, configuredRules, ruleMapper, parserName, languageO | |||
Traverser.traverse(sourceCode.ast, { | |||
enter(node, parent) { | |||
node.parent = parent; | |||
if (node.type === "Literal" && node.regex) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should definitely not be modifying the AST at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that was a sort of PoC but way too hazardous. It's reverted now. At some point I would like to work with eslint-community to find a solution to improve the usability of eslint-utils in Node.js 18 without introducing any breaking changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to note that v
flag in Node.js 18 is just one example we have at the moment. It can be assumed that there will always be some new regex syntax that isn't supported in some (if not all) runtime environments we support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM pending @mdjermanovic's re-review.
* A limitation of this method is that it can only detect a regular expression if the specified node is itself a regular expression literal node. | ||
* @param {ASTNode} node The node to be inspected. | ||
* @param {Scope} [initialScope] Optional scope to start finding variables. If this scope was given, this tries to resolve identifier references which are in the given node as much as possible. | ||
* @returns {{ value: any } | { value: undefined, optional?: true } | { regex: { pattern: string, flags: string } } | null} The static value of the node, or `null`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In what cases does this function return optional: true
property?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That could only happen if node
were an optional MemberExpression
(not the ancestor ChainExpression
) in something like nullishValue?.foo
. That's not the case for this rule. Shall we remove that typing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, it looks like this property was added for internal calculations rather than as something that consumers might make use of, and it isn't documented. Either way, if it can't happen in our case, it might be better to remove that typing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* @returns {{ value: any } | { value: undefined, optional?: true } | { regex: { pattern: string, flags: string } } | null} The static value of the node, or `null`. | ||
*/ | ||
function getStaticValueOrRegex(node, initialScope) { | ||
if (node.type === "Literal" && node.regex) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would crash if there are no arguments:
/*eslint no-misleading-character-class: ["error"]*/
new RegExp();
TypeError: Cannot read properties of undefined (reading 'type')
Occurred while linting C:\projects\eslint\foo.js:3
Rule: "no-misleading-character-class"
at getStaticValueOrRegex (C:\projects\eslint\lib\rules\no-misleading-character-class.js:213:14)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad for not checking! Fixed in ebe11f8.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work, thanks!
Prerequisites checklist
What is the purpose of this pull request? (put an "X" next to an item)
[ ] Documentation update
[ ] Bug fix (template)
[ ] New rule (template)
[X] Changes an existing rule (template)
[ ] Add autofix to a rule
[ ] Add a CLI option
[ ] Add something to the core
[ ] Other, please explain:
What rule do you want to change?
no-misleading-character-class
What change do you want to make (place an "X" next to just one item)?
[X] Generate more warnings
[ ] Generate fewer warnings
[ ] Implement autofix
[ ] Implement suggestions
How will the change be implemented (place an "X" next to just one item)?
[ ] A new option
[X] A new default behavior
[ ] Other
Please provide some example code that this change will affect:
What does the rule currently do for this code?
Reports one problem per
RegExp
expression (Playground link).What will the rule do after it's changed?
A granular problem will be reported for each problematic character class. For the first
RegExp
, two problems will be reported.What changes did you make? (Give an overview)
This is an attempt to extend granular error reporting for the
no-misleading-character-class
rule to string and template literal patterns whose source code doesn't match the string value. This includes literals with escape sequences, line continuations, and unescaped CRLF line breaks in templates.This change was suggested during the implementation of granular errors, but it was postponed to keep the work manageable: #17515 (comment).
Is there anything you'd like reviewers to focus on?
new RegExp(`[ \\ufe0f]${0}`)
, but I haven't explored that yet.