Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: make no-misleading-character-class report more granular errors #18082

Merged
merged 19 commits into from
Mar 5, 2024
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
122 changes: 60 additions & 62 deletions lib/rules/no-misleading-character-class.js
fasttime marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ const { RegExpParser, visitRegExpAST } = require("@eslint-community/regexpp");
const { isCombiningCharacter, isEmojiModifier, isRegionalIndicatorSymbol, isSurrogatePair } = require("./utils/unicode");
const astUtils = require("./utils/ast-utils.js");
const { isValidWithUnicodeFlag } = require("./utils/regular-expressions");
const { parseStringLiteral, parseTemplateToken } = require("./utils/char-source");

//------------------------------------------------------------------------------
// Helpers
Expand Down Expand Up @@ -226,62 +227,6 @@ module.exports = {
const sourceCode = context.sourceCode;
const parser = new RegExpParser();

/**
* Generates a granular loc for context.report, if directly calculable.
* @param {Character[]} chars Individual characters being reported on.
* @param {Node} node Parent string node to report within.
* @returns {Object | null} Granular loc for context.report, if directly calculable.
* @see https://github.com/eslint/eslint/pull/17515
*/
function generateReportLocation(chars, node) {

// Limit to to literals and expression-less templates with raw values === their value.
switch (node.type) {
case "TemplateLiteral":
if (node.expressions.length || sourceCode.getText(node).slice(1, -1) !== node.quasis[0].value.cooked) {
return null;
}
break;

case "Literal":
if (typeof node.value === "string" && node.value !== node.raw.slice(1, -1)) {
return null;
}
break;

default:
return null;
}

return {
start: sourceCode.getLocFromIndex(node.range[0] + 1 + chars[0].start),
end: sourceCode.getLocFromIndex(node.range[0] + 1 + chars.at(-1).end)
};
}

/**
* Finds the report loc(s) for a range of matches.
* @param {Character[][]} matches Characters that should trigger a report.
* @param {Node} node The node to report.
* @returns {Object | null} Node loc(s) for context.report.
*/
function getNodeReportLocations(matches, node) {
const locs = [];

for (const chars of matches) {
const loc = generateReportLocation(chars, node);

// If a report can't match to a range, don't report any others
if (!loc) {
return [node.loc];
}

locs.push(loc);
}

return locs;
}

/**
* Verify a given regular expression.
* @param {Node} node The node to report.
Expand Down Expand Up @@ -320,12 +265,58 @@ module.exports = {
} else {
foundKindMatches.set(kind, [...findCharacterSequences[kind](chars)]);
}

}
}
}
});

let codeUnits = null;

/**
* Finds the report loc(s) for a range of matches.
* Only literals and expression-less templates generate granular errors.
* @param {Character[][]} matches Lists of individual characters being reported on.
* @returns {Location[]} locs for context.report.
* @see https://github.com/eslint/eslint/pull/17515
*/
function getNodeReportLocations(matches) {
if (!astUtils.isStaticTemplateLiteral(node) && node.type !== "Literal") {
return matches.length ? [node.loc] : [];
}
return matches.map(chars => {
const firstIndex = chars[0].start;
const lastIndex = chars.at(-1).end - 1;
let start;
let end;

if (node.type === "TemplateLiteral") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should check if the template literal has expressions. For example, this crashes:

/* eslint no-misleading-character-class: "error" */

new RegExp(`${"[👍]"}`);
TypeError: Cannot read properties of undefined (reading 'start')
Occurred while linting C:\projects\eslint\foo.js:3
Rule: "no-misleading-character-class"
    at C:\projects\eslint\lib\rules\no-misleading-character-class.js:294:64

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct! Those cases should be covered by unit tests.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 7bb7e55.

const source = sourceCode.getText(node);
const offset = node.range[0];

codeUnits ??= parseTemplateToken(source);
start = offset + codeUnits[firstIndex].start;
end = offset + codeUnits[lastIndex].end;
} else if (typeof node.value === "string") { // String Literal
const source = node.raw;
const offset = node.range[0];

codeUnits ??= parseStringLiteral(source);
start = offset + codeUnits[firstIndex].start;
end = offset + codeUnits[lastIndex].end;
} else { // RegExp Literal
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else can be anything, for example:

image

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 7bb7e55.

const offset = node.range[0] + 1; // Add 1 to skip the leading slash.

start = offset + firstIndex;
end = offset + lastIndex + 1;
}

return {
start: sourceCode.getLocFromIndex(start),
end: sourceCode.getLocFromIndex(end)
};
});
}

for (const [kind, matches] of foundKindMatches) {
let suggest;

Expand All @@ -336,7 +327,7 @@ module.exports = {
}];
}

const locs = getNodeReportLocations(matches, node);
const locs = getNodeReportLocations(matches);

for (const loc of locs) {
context.report({
Expand Down Expand Up @@ -371,12 +362,19 @@ module.exports = {
for (const { node: refNode } of tracker.iterateGlobalReferences({
RegExp: { [CALL]: true, [CONSTRUCT]: true }
})) {
let pattern, flags;
const [patternNode, flagsNode] = refNode.arguments;
const pattern = getStringIfConstant(patternNode, scope);
const flags = getStringIfConstant(flagsNode, scope);

if (typeof pattern === "string") {
verify(patternNode, pattern, flags || "", fixer => {
if (patternNode.type === "Literal" && patternNode.regex) {
pattern = patternNode.regex.pattern;
flags = flagsNode ? getStringIfConstant(flagsNode, scope) : patternNode.regex.flags;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's no flagsNode, perhaps we could just skip checking the regex here because it has already been verified as a literal?

image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, if we want to entirely fix the behavior with new RegExp(<regex>, flags?), perhaps we should switch from getStringIfConstant to getStaticValue for the patternNode.

const regex = /[👍]/u;

new RegExp(regex); // false positive

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense 👍🏻 Shall we also try to avoid duplicate reports when there is a flags argument alongside a regex literal? For example:

image

That could get tricky because the reports don't need to be granular, so we'd have to compare the character matches.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we also try to avoid duplicate reports when there is a flags argument alongside a regex literal?

Yes, I think it would be good to avoid duplicate reports. Perhaps in that case the rule should skip checking the regex literal and check only the constructor call?

For instance, the following might be considered a false positive because the regex created by the literal isn't actually used as such?

/*eslint no-misleading-character-class: error */

RegExp(/^[👍]$/, "u"); // error on the regex literal

That could get tricky because the reports don't need to be granular, so we'd have to compare the character matches.

I'm not sure I understand this, can you clarify? If the source of the pattern is a regex literal, than can't we just use regexpp's indexes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think it would be good to avoid duplicate reports. Perhaps in that case the rule should skip checking the regex literal and check only the constructor call?

Yes, we could do that if the argument is a regex literal. If it's a variable, it could be used in other places.

That could get tricky because the reports don't need to be granular, so we'd have to compare the character matches.

I'm not sure I understand this, can you clarify? If the source of the pattern is a regex literal, than can't we just use regexpp's indexes?

In afterthought, I'd say it would be better not to check regex literals that are supplied as arguments to the RegExp constructor like you suggested. At least I think this is what should be done if the RegExp call has a flags argument. If a regex is passed as a variable, it should be fine to keep the duplicate reports.

So the whole story would look like this:

/*eslint no-misleading-character-class: error */

RegExp(/[👍]/);                     // error only on the regex literal

const regex = /[👍]/;               // error
RegExp(regex);                      // no error

RegExp(/[👍]/, "");                 // error only on the RegExp call

const regex1 = /[👍]/g;             // error
doSomethingElse(regex1);
const regex2 = RegExp(regex1, "g"); // also error

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me.

So, if the RegExp constructor is called with one argument that is a regex value, we'll skip checking the constructor call. If regex literal is the first argument of RegExp constructor call that also has the second argument, we'll skip checking the regex literal.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in 6792594.

} else {
pattern = getStringIfConstant(patternNode, scope);
flags = flagsNode ? getStringIfConstant(flagsNode, scope) : "";
}

if (typeof pattern === "string" && typeof flags === "string") {
verify(patternNode, pattern, flags, fixer => {

if (!isValidWithUnicodeFlag(context.languageOptions.ecmaVersion, pattern)) {
return null;
Expand Down