Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: getFirstToken/getLastToken on comment-only node #16889

Merged
merged 4 commits into from Mar 22, 2023
Merged

fix: getFirstToken/getLastToken on comment-only node #16889

merged 4 commits into from Mar 22, 2023

Conversation

fasttime
Copy link
Member

@fasttime fasttime commented Feb 13, 2023

Prerequisites checklist

What is the purpose of this pull request? (put an "X" next to an item)

[ ] Documentation update
[x] Bug fix (template)
[ ] New rule (template)
[ ] Changes an existing rule (template)
[ ] Add autofix to a rule
[ ] Add a CLI option
[ ] Add something to the core
[ ] Other, please explain:


This PR fixes an issue with some SourceCode methods that is causing unexpected behavior when the source code contains no tokens but one or more comments. The SourceCode type is part of the public API and is used by virtually all rules, although the issue described here should not affect any built-in rules.

The problem shows up when sourceCode.getFirstToken() or sourceCode.getLastToken() is called with the Program node (the AST object) of a source code than contains no tokens but one or more comments, i.e. a comment-only file.

The expected behavior is that sourceCode.getFirstToken(ast) or sourceCode.getLastToken(ast) should return null as per documentation. If a filter callback is passed, the callback should never be called.

The actual case is that sourceCode.getFirstToken(ast) returns undefined. If a filter callback is passed, the callback is called with an undefined argument.

My original, incomplete analysis of this problem can be found in issue #16742.

Closes #16742

What changes did you make? (Give an overview)

Is there anything you'd like reviewers to focus on?

I am open to any suggestions.

@eslint-github-bot eslint-github-bot bot added bug ESLint is working incorrectly triage An ESLint team member will look at this issue soon labels Feb 13, 2023
@netlify
Copy link

netlify bot commented Feb 13, 2023

Deploy Preview for docs-eslint canceled.

Name Link
🔨 Latest commit 449e894
🔍 Latest deploy log https://app.netlify.com/sites/docs-eslint/deploys/6418ac5863cd9b0008401885

@github-actions
Copy link

Hi everyone, it looks like we lost track of this pull request. Please review and see what the next steps are. This pull request will auto-close in 7 days without an update.

@github-actions github-actions bot added the Stale label Feb 23, 2023
@Rec0iL99
Copy link
Member

Don't close. WIP: Looks like @fasttime is still working on this.

@Rec0iL99 Rec0iL99 removed the Stale label Feb 26, 2023
@fasttime
Copy link
Member Author

Thanks @Rec0iL99! Actually I'm done with the fix, but I'm not particularly satisfied with my explanation of the problem and the solution. I realize it takes some patience to grasp how this code works and I would like to elaborate on my description and make it more accessible to reviewers.


/*
* For the map of "comment's location -> token's index", it points the next token of a comment.
* In that case, +1 is unnecessary.
*/
if (token && token.range[0] >= startLoc) {
if (!token || token.range[0] >= startLoc) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If token is undefined, then index is already out of bounds, i.e. index == tokens.length. In this case, the returned cursor index should point after the last token, at tokens.length.
Refs:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be reading this wrong, but it seems like this change doesn't have any effect. I think it's just refactoring the boolean expression without changing the result. Can you double-check?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When token is null or undefined, the new expression evaluates to true, whereas the old expression would evaluate to a falsy (null or undefined) value. Other than this, the result is the same.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right you are. Boolean expressions are hard. :)

So you're saying that token is null if we've already moved past the end of the tokens array?

For clarity, I think I'd prefer to split out null token check into its own if statement and return tokens.length directly instead of adding it to this condition.


/*
* For the map of "comment's location -> token's index", it points the next token of a comment.
* In that case, -1 is necessary.
*/
if (token && token.range[1] > endLoc) {
if (!token || token.range[1] > endLoc) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If token is undefined, then tokens is an empty list and index is 0. In this case, the returned cursor index should point before the first token, at -1.
Refs:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above. :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fasttime fasttime marked this pull request as ready for review March 6, 2023 09:56
@fasttime fasttime requested a review from a team as a code owner March 6, 2023 09:56
@fasttime fasttime added the core Relates to ESLint's core APIs and features label Mar 6, 2023
@mdjermanovic mdjermanovic added accepted There is consensus among the team that this change meets the criteria for inclusion and removed triage An ESLint team member will look at this issue soon labels Mar 7, 2023
Comment on lines 679 to 687
it("should retrieve a parent node's token if a node has no tokens", () => {
const code = "foo``";
const ast = espree.parse(code, { loc: true, range: true, tokens: true, comment: true, ecmaVersion: 6 });
const tokenStore = new TokenStore(ast.tokens, ast.comments);
const emptyTemplateElementNode = ast.body[0].expression.quasi.quasis[0];
const token = tokenStore.getFirstToken(emptyTemplateElementNode);

assert.strictEqual(token.value, "``");
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a node has no tokens, I think getFirstToken should return null. Either way, this doesn't seem to be a case where a node has no tokens, because that token does belong to the TemplateElement node (ast.body[0].expression.quasi.quasis[0] node).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mdjermanovic you are correct! That node is not empty. I was fooled by AST explorer that shows the same value for both the start and end indices, probably because it uses an outdated version of Espree.
I added this test to make sure that I wouldn't inadvertently change the existing behavior, but now I think we can safely remove it because it's flawed.

Comment on lines 898 to 906
it("should retrieve a parent node's token if a node has no tokens", () => {
const code = "foo``";
const ast = espree.parse(code, { loc: true, range: true, tokens: true, comment: true, ecmaVersion: 6 });
const tokenStore = new TokenStore(ast.tokens, ast.comments);
const emptyTemplateElementNode = ast.body[0].expression.quasi.quasis[0];
const token = tokenStore.getLastToken(emptyTemplateElementNode);

assert.strictEqual(token.value, "``");
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as #16889 (comment).

Comment on lines 679 to 687
it("should retrieve a parent node's token if a node has no tokens", () => {
const code = "foo``";
const ast = espree.parse(code, { loc: true, range: true, tokens: true, comment: true, ecmaVersion: 6 });
const tokenStore = new TokenStore(ast.tokens, ast.comments);
const emptyTemplateElementNode = ast.body[0].expression.quasi.quasis[0];
const token = tokenStore.getFirstToken(emptyTemplateElementNode);

assert.strictEqual(token.value, "``");
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree.


/*
* For the map of "comment's location -> token's index", it points the next token of a comment.
* In that case, +1 is unnecessary.
*/
if (token && token.range[0] >= startLoc) {
if (!token || token.range[0] >= startLoc) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right you are. Boolean expressions are hard. :)

So you're saying that token is null if we've already moved past the end of the tokens array?

For clarity, I think I'd prefer to split out null token check into its own if statement and return tokens.length directly instead of adding it to this condition.

lib/source-code/token-store/utils.js Show resolved Hide resolved
const token = tokens[index];

// If the mapped index is out of bounds, the returned cursor index will point after the end of the tokens array.
if (!token) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, we could change this condition into index >= tokens.length and move the const token declaration after the if-block.

const token = tokens[index];

// If the mapped index is out of bounds, the returned cursor index will point before the start of the tokens array.
if (!token) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as https://github.com/eslint/eslint/pull/16889/files#r1131283705. index can exceed the array length, but it should never be negative.

@fasttime
Copy link
Member Author

fasttime commented Mar 9, 2023

Rebased to fix the CI build.

Copy link
Member

@nzakas nzakas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks! Just want @mdjermanovic to verify his suggestions.

Comment on lines 87 to 90
// If the mapped index is out of bounds, the returned cursor index will point before the start of the tokens array.
if (!token) {
return -1;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we return tokens.length - 1 (index of the last token) here? Hardcoded -1 would always mean that there are no tokens before endLoc.

I couldn't construct an example where this would be observable with the default parser because it doesn't include trailing comments in the range of the Program node, but here's an example with @typescript-eslint/parser:

const { RuleTester } = require('eslint');

const rule = {
    meta: {
        docs: { description: 'Require at least one token in the source code' },
        messages: { fail: 'no tokens found' },
    },
    create(context) {
        const sourceCode = context.getSourceCode();
        return {
            Program: node => {
                if (!sourceCode.getFirstToken(node)) {
                    context.report({ node, messageId: 'fail' });
                }
            }
        };
    }
};

new RuleTester().run(
    'must-have-tokens',
    rule,
    {
        valid:
        [
            {
                code: 'foo // comment',
                parser: require.resolve("@typescript-eslint/parser")
            }
        ],
        invalid: [],
    },
);

The expected result is that sourceCode.getFirstToken finds the token foo, but it doesn't because it looks for tokens between indexes 0 and -1.

Copy link
Member Author

@fasttime fasttime Mar 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are perfectly right @mdjermanovic, thanks for pointing me to a proper test case to disproof my incorrect assumption. I've applied the change you suggested and added a unit test to ensure the correct behavior when the root node includes a trailing comment in its range.

It strikes me as a peculiarity of Espree that comments at the beginning and at the end of the source code are not included in the root node, while standalone comments are.

Copy link
Member

@mdjermanovic mdjermanovic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@mdjermanovic mdjermanovic merged commit 1fbf118 into eslint:main Mar 22, 2023
17 checks passed
crapStone pushed a commit to Calciumdibromid/CaBr2 that referenced this pull request Mar 29, 2023
This PR contains the following updates:

| Package | Type | Update | Change |
|---|---|---|---|
| [eslint](https://eslint.org) ([source](https://github.com/eslint/eslint)) | devDependencies | minor | [`8.36.0` -> `8.37.0`](https://renovatebot.com/diffs/npm/eslint/8.36.0/8.37.0) |

---

### Release Notes

<details>
<summary>eslint/eslint</summary>

### [`v8.37.0`](https://github.com/eslint/eslint/releases/tag/v8.37.0)

[Compare Source](eslint/eslint@v8.36.0...v8.37.0)

#### Features

-   [`b6ab8b2`](eslint/eslint@b6ab8b2) feat: `require-unicode-regexp` add suggestions ([#&#8203;17007](eslint/eslint#17007)) (Josh Goldberg)
-   [`10022b1`](eslint/eslint@10022b1) feat: Copy getScope() to SourceCode ([#&#8203;17004](eslint/eslint#17004)) (Nicholas C. Zakas)
-   [`1665c02`](eslint/eslint@1665c02) feat: Use plugin metadata for flat config serialization ([#&#8203;16992](eslint/eslint#16992)) (Nicholas C. Zakas)
-   [`b3634f6`](eslint/eslint@b3634f6) feat: docs license ([#&#8203;17010](eslint/eslint#17010)) (Samuel Roldan)
-   [`892e6e5`](eslint/eslint@892e6e5) feat: languageOptions.parser must be an object. ([#&#8203;16985](eslint/eslint#16985)) (Nicholas C. Zakas)

#### Bug Fixes

-   [`619f3fd`](eslint/eslint@619f3fd) fix: correctly handle `null` default config in `RuleTester` ([#&#8203;17023](eslint/eslint#17023)) (Brad Zacher)
-   [`1fbf118`](eslint/eslint@1fbf118) fix: `getFirstToken`/`getLastToken` on comment-only node ([#&#8203;16889](eslint/eslint#16889)) (Francesco Trotta)
-   [`129e252`](eslint/eslint@129e252) fix: Fix typo in `logical-assignment-operators` rule description ([#&#8203;17000](eslint/eslint#17000)) (Francesco Trotta)

#### Documentation

-   [`75339df`](eslint/eslint@75339df) docs: fix typos and missing info in id-match docs ([#&#8203;17029](eslint/eslint#17029)) (Ed Lucas)
-   [`ec2d830`](eslint/eslint@ec2d830) docs: Fix typos in the `semi` rule docs ([#&#8203;17012](eslint/eslint#17012)) (Andrii Lundiak)
-   [`e39f28d`](eslint/eslint@e39f28d) docs: add back to top button ([#&#8203;16979](eslint/eslint#16979)) (Tanuj Kanti)
-   [`721c717`](eslint/eslint@721c717) docs: Custom Processors cleanup and expansion ([#&#8203;16838](eslint/eslint#16838)) (Ben Perlmutter)
-   [`d049f97`](eslint/eslint@d049f97) docs: 'How ESLint is Maintained' page ([#&#8203;16961](eslint/eslint#16961)) (Ben Perlmutter)
-   [`5251a92`](eslint/eslint@5251a92) docs: Describe guard options for guard-for-in ([#&#8203;16986](eslint/eslint#16986)) (alope107)
-   [`6157d81`](eslint/eslint@6157d81) docs: Add example to guard-for-in docs. ([#&#8203;16983](eslint/eslint#16983)) (alope107)
-   [`fd47998`](eslint/eslint@fd47998) docs: update `Array.prototype.toSorted` specification link ([#&#8203;16982](eslint/eslint#16982)) (Milos Djermanovic)
-   [`3e1cf6b`](eslint/eslint@3e1cf6b) docs: Copy edits on Maintain ESLint docs ([#&#8203;16939](eslint/eslint#16939)) (Ben Perlmutter)

#### Chores

-   [`c67f299`](eslint/eslint@c67f299) chore: upgrade [@&#8203;eslint/js](https://github.com/eslint/js)[@&#8203;8](https://github.com/8).37.0 ([#&#8203;17033](eslint/eslint#17033)) (Milos Djermanovic)
-   [`ee9ddbd`](eslint/eslint@ee9ddbd) chore: package.json update for [@&#8203;eslint/js](https://github.com/eslint/js) release (ESLint Jenkins)
-   [`dddb475`](eslint/eslint@dddb475) chore: upgrade [@&#8203;eslint/eslintrc](https://github.com/eslint/eslintrc)[@&#8203;2](https://github.com/2).0.2 ([#&#8203;17032](eslint/eslint#17032)) (Milos Djermanovic)
-   [`522431e`](eslint/eslint@522431e) chore: upgrade espree@9.5.1 ([#&#8203;17031](eslint/eslint#17031)) (Milos Djermanovic)
-   [`f5f9a88`](eslint/eslint@f5f9a88) chore: upgrade eslint-visitor-keys@3.4.0 ([#&#8203;17030](eslint/eslint#17030)) (Milos Djermanovic)
-   [`4dd8d52`](eslint/eslint@4dd8d52) ci: bump actions/stale from 7 to 8 ([#&#8203;17026](eslint/eslint#17026)) (dependabot\[bot])
-   [`ad9dd6a`](eslint/eslint@ad9dd6a) chore: remove duplicate scss, ([#&#8203;17005](eslint/eslint#17005)) (Strek)
-   [`ada6a3e`](eslint/eslint@ada6a3e) ci: unpin Node 19 ([#&#8203;16993](eslint/eslint#16993)) (Milos Djermanovic)
-   [`c3da975`](eslint/eslint@c3da975) chore: Remove triage label from template ([#&#8203;16990](eslint/eslint#16990)) (Nicholas C. Zakas)
-   [`69bc0e2`](eslint/eslint@69bc0e2) ci: pin Node 19 to 19.7.0 ([#&#8203;16987](eslint/eslint#16987)) (Milos Djermanovic)

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNS4yNC41IiwidXBkYXRlZEluVmVyIjoiMzUuMjQuNSJ9-->

Co-authored-by: cabr2-bot <cabr2.help@gmail.com>
Reviewed-on: https://codeberg.org/Calciumdibromid/CaBr2/pulls/1834
Reviewed-by: Epsilon_02 <epsilon_02@noreply.codeberg.org>
Co-authored-by: Calciumdibromid Bot <cabr2_bot@noreply.codeberg.org>
Co-committed-by: Calciumdibromid Bot <cabr2_bot@noreply.codeberg.org>
@eslint-github-bot eslint-github-bot bot locked and limited conversation to collaborators Sep 19, 2023
@eslint-github-bot eslint-github-bot bot added the archived due to age This issue has been archived; please open a new issue for any further discussion label Sep 19, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
accepted There is consensus among the team that this change meets the criteria for inclusion archived due to age This issue has been archived; please open a new issue for any further discussion bug ESLint is working incorrectly core Relates to ESLint's core APIs and features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug: filter callback in getFirstToken/getLastToken is called with undefined
4 participants