Add new MixedCaseRange cop #11561

rwstauner · 2023-02-09T18:08:44Z

I saw several instances of regexp character classes like [A-z] which are misleading (there are several unintended symbols between Z and a), so I thought it was worth making a cop for it.
After building the cop it found a few occurrences in this codebase :)

There is some extra complexity here to reconstruct octal escape sequences for regexp_parser < 2.7. If rubocop required 2.7+ that code could be simplified.

I tested the new cop with Regexp::Parser versions 2.6.2 and 2.8.0 and they both passed.

What do you think about the MSG?
Do you want me to squash all these commits down to one?

Before submitting the PR make sure the following are checked:

The PR relates to only one subject with a clear title and description in grammatically correct, complete sentences.
Wrote good commit messages.
Commit message starts with [Fix #issue-number] (if the related issue exists).
Feature branch is up-to-date with master (if not - rebase it).
Squashed related commits together.
Added tests.
Ran bundle exec rake default. It executes all tests and runs RuboCop on its own code.
Added an entry (file) to the changelog folder named {change_type}_{change_description}.md if the new code introduces user-observable changes. See changelog entry format for details.

found by the new Lint/UnsafeRange cop.

te is defined as ts + length plus te angers the spellchecker

bbatsov · 2023-02-12T08:22:14Z

Is Lint the right department?

I guess that's fine. I also regret not having a Regexp department at this point, as there are quite a few cops specific to regexps.

Is there a better name? UnintendedCharactersInRange ?

Perhaps we should mention Regexp in the name as well, given that range is an overloaded term in Ruby.

I'm unsure about the MSG. Part of the value in a cop like this is not only catching a mistake but in educating what is wrong, so I want to provide more explanation in the message, but am struggling with a good way to present that. I considered determining what ranges are in use and suggesting an alternative in the message such as Instead of A-z use A-Za-z and explicitly list any other characters you want, but was then confronted with a recurring battle for how generic this linter should be. If instead the cop was really just for letters crossing bounds the message could just say that without needing to determine anything programatically. Also, if there was an instance of something like A-f we could suggest instead A-Fa-f with simple code like #{provided_range.upcase}#{provided_range.downcase}. I'm not sure if there is value in keeping it so generic... are there really any other ranges that might be used like this?

I've been wondering about this myself (the other ranges that might be commonly used) There's also the broader point that A-Za-z will limit you only to ascii characters, which may or may not be what you want to do. Perhaps it's best to start small here and expand the functionality only if needed.

There is some extra complexity here to reconstruct octal escape sequences for regexp_parser < 2.7. If rubocop required 2.7+ that code could be simplified.

We'll need to support parsing Ruby 2.7 code at least until RuboCop 2 is released. Then the parser will likely target only Ruby 3+.

rwstauner · 2023-02-12T21:12:43Z

Perhaps we should mention Regexp in the name as well, given that range is an overloaded term in Ruby.

The cop handles Ranges as well, since it's basically the same problem there, too:
https://github.com/rubocop/rubocop/pull/11561/files#diff-630a175c763f054df4e7bbfeeb63b26630f1aa80d26fce1131c39bd1944c38ecR6-R7

rwstauner · 2023-02-13T15:27:56Z

I've been wondering about this myself (the other ranges that might be commonly used) There's also the broader point that A-Za-z will limit you only to ascii characters, which may or may not be what you want to do. Perhaps it's best to start small here and expand the functionality only if needed.

Indeed, there may be an opportunity here to promote better regular expression usage here as well. I'd love to suggest that people use [[:alpha:]] or something when it is appropriate, but I'm concerned about how noisy that would be. I suspect that there is a significant amount of matching a-z that intends to limit to ascii, and I don't want code to have be littered with rubocop disable comments. I'm not sure if there's a better way to handle that (a suggestion that is only useful half of the time).

I was thinking more about this over the weekend and wondering if a better approach would be to move this functionality into some sort of helper and make specific cops... Then we could have one simple, explicit cop for mixing the A-Z and a-z ranges (with specific, helpful messages), and if anyone comes up with a use case or likelihood for mixing any other ranges we could add another cop then (with another specific message).

bbatsov · 2023-02-15T07:19:07Z

I was thinking more about this over the weekend and wondering if a better approach would be to move this functionality into some sort of helper and make specific cops... Then we could have one simple, explicit cop for mixing the A-Z and a-z ranges (with specific, helpful messages), and if anyone comes up with a use case or likelihood for mixing any other ranges we could add another cop then (with another specific message).

Yeah, probably this will work better in practice.

koic · 2023-02-15T07:39:54Z

There is some extra complexity here to reconstruct octal escape sequences for regexp_parser < 2.7. If rubocop required 2.7+ that code could be simplified.

We'll need to support parsing Ruby 2.7 code at least until RuboCop 2 is released. Then the parser will likely target only Ruby 3+.

JFYI, regexp_parser 2.7 is not Ruby 2.7 :-) Support for regexp_parser 1 and regexp_parser 2 was previously discussed on #9162 and related PRs.

bbatsov · 2023-02-26T08:19:14Z

@koic My bad! Once again I misread a message. 😅

bbatsov · 2023-04-11T07:11:06Z

So, how about we agree to:

Pick a more specific name (e.g. "Ambiguous/UnsafeCharacterRanger")
limit the cop to character ranges
expand its rationale so the nature of the problem is clearer
document its limitations
make the warning message more specific given that we've decided to reduce the scope of the cop

@rwstauner @koic How does this sound?

rwstauner · 2023-04-11T23:41:36Z

yeah, that's basically what i was thinking. i'm sorry i haven't gotten back to this. it was a hackdays project and i've been busier than usual at work since then. I should be able to devote a little time to finish this up in a few weeks.

Remove number range as that is considerably less likely to be used and makes writing helpful messages more difficult.

Consuming classes don't need to care that it might be a compound expression. Instead return an object with the same interface as you'd get from Regexp::Parser.

rwstauner · 2023-05-18T19:19:04Z

I reworked this a bit to abstract and simplify it and updated the PR description.

bbatsov · 2023-06-19T11:07:03Z

Looks good to me in its current simpler form. Sorry for the slow turnaround on this!

Follow up rubocop/rubocop#11561. ```console $ bundle exec rake (snip) Offenses: lib/rubocop/cop/rails/time_zone.rb:58:33: W: [Correctable] Lint/MixedCaseRange: Ranges from upper to lower case ASCII letters may include unintended characters. Instead of A-z (which also includes several symbols) specify each range individually: A-Za-z and individually specify any symbols. TIMEZONE_SPECIFIER = /([A-z]|[+-]\d{2}:?\d{2})\z/.freeze ^^^ 281 files inspected, 1 offense detected, 1 offense autocorrectable ```

rwstauner · 2023-06-19T17:44:24Z

No worries, i was busy and slow, too. Thanks for your help!

rwstauner added 6 commits February 9, 2023 09:57

Add new Lint/UnsafeRange cop

473b438

Support ruby character ranges in Lint/UnsafeRange

f3c4831

Add changelog entry for new Lint/UnsafeRange cop

b271549

Fix character range mistakes in department_name

40ca2bf

found by the new Lint/UnsafeRange cop.

Work with open-ended ranges

862be4e

Handle octal escape expressions from regexp_parser >= 2.7

9fa2fac

rwstauner force-pushed the rwstauner/unsafe-range branch from b538b5e to 9fa2fac Compare February 9, 2023 18:09

rwstauner added 2 commits February 9, 2023 11:27

Extract "skip escapes" to separate method

cf03bbf

Use string length rather than te - ts

c82d0e0

te is defined as ts + length plus te angers the spellchecker

rwstauner marked this pull request as draft May 6, 2023 17:52

rwstauner changed the title ~~Add new UnsafeRange cop~~ Add new MixedCaseRange cop May 18, 2023

rwstauner added 4 commits May 18, 2023 10:36

Move regepx range extractor to utils namespace

cbadec4

Rename UnsafeRange cop to MixedCaseRange and limit to A-Z / a-z

556f8ae

Remove number range as that is considerably less likely to be used and makes writing helpful messages more difficult.

Correctly handle open ranges in MixedCaseRange cop

481aa60

Simplify output of regexp range pairs

d86ef52

Consuming classes don't need to care that it might be a compound expression. Instead return an object with the same interface as you'd get from Regexp::Parser.

rwstauner force-pushed the rwstauner/unsafe-range branch from f8186bd to 7e698fa Compare May 18, 2023 17:37

rwstauner marked this pull request as ready for review May 18, 2023 19:18

Add unsafe autocorrection for MixedCaseRange cop

cef3f4f

rwstauner force-pushed the rwstauner/unsafe-range branch from 7e698fa to cef3f4f Compare May 19, 2023 16:50

bbatsov merged commit 8509ced into rubocop:master Jun 19, 2023
22 checks passed

larouxn mentioned this pull request Jun 26, 2023

Enable Lint/MixedCaseRange cop Shopify/ruby-style-guide#548

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add new MixedCaseRange cop #11561

Add new MixedCaseRange cop #11561

rwstauner commented Feb 9, 2023 •

edited

bbatsov commented Feb 12, 2023

rwstauner commented Feb 12, 2023

rwstauner commented Feb 13, 2023

bbatsov commented Feb 15, 2023

koic commented Feb 15, 2023

bbatsov commented Feb 26, 2023

bbatsov commented Apr 11, 2023

rwstauner commented Apr 11, 2023

rwstauner commented May 18, 2023

bbatsov commented Jun 19, 2023

rwstauner commented Jun 19, 2023

Add new MixedCaseRange cop #11561

Add new MixedCaseRange cop #11561

Conversation

rwstauner commented Feb 9, 2023 • edited

bbatsov commented Feb 12, 2023

rwstauner commented Feb 12, 2023

rwstauner commented Feb 13, 2023

bbatsov commented Feb 15, 2023

koic commented Feb 15, 2023

bbatsov commented Feb 26, 2023

bbatsov commented Apr 11, 2023

rwstauner commented Apr 11, 2023

rwstauner commented May 18, 2023

bbatsov commented Jun 19, 2023

rwstauner commented Jun 19, 2023

rwstauner commented Feb 9, 2023 •

edited