Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new MixedCaseRange cop #11561

Merged
merged 13 commits into from
Jun 19, 2023
Merged

Conversation

rwstauner
Copy link
Contributor

@rwstauner rwstauner commented Feb 9, 2023

I saw several instances of regexp character classes like [A-z] which are misleading (there are several unintended symbols between Z and a), so I thought it was worth making a cop for it.
After building the cop it found a few occurrences in this codebase :)

There is some extra complexity here to reconstruct octal escape sequences for regexp_parser < 2.7. If rubocop required 2.7+ that code could be simplified.

I tested the new cop with Regexp::Parser versions 2.6.2 and 2.8.0 and they both passed.

What do you think about the MSG?
Do you want me to squash all these commits down to one?


Before submitting the PR make sure the following are checked:

  • The PR relates to only one subject with a clear title and description in grammatically correct, complete sentences.
  • Wrote good commit messages.
  • Commit message starts with [Fix #issue-number] (if the related issue exists).
  • Feature branch is up-to-date with master (if not - rebase it).
  • Squashed related commits together.
  • Added tests.
  • Ran bundle exec rake default. It executes all tests and runs RuboCop on its own code.
  • Added an entry (file) to the changelog folder named {change_type}_{change_description}.md if the new code introduces user-observable changes. See changelog entry format for details.

@bbatsov
Copy link
Collaborator

bbatsov commented Feb 12, 2023

Is Lint the right department?

I guess that's fine. I also regret not having a Regexp department at this point, as there are quite a few cops specific to regexps.

Is there a better name? UnintendedCharactersInRange ?

Perhaps we should mention Regexp in the name as well, given that range is an overloaded term in Ruby.

I'm unsure about the MSG. Part of the value in a cop like this is not only catching a mistake but in educating what is wrong, so I want to provide more explanation in the message, but am struggling with a good way to present that. I considered determining what ranges are in use and suggesting an alternative in the message such as Instead of A-z use A-Za-z and explicitly list any other characters you want, but was then confronted with a recurring battle for how generic this linter should be. If instead the cop was really just for letters crossing bounds the message could just say that without needing to determine anything programatically. Also, if there was an instance of something like A-f we could suggest instead A-Fa-f with simple code like #{provided_range.upcase}#{provided_range.downcase}. I'm not sure if there is value in keeping it so generic... are there really any other ranges that might be used like this?

I've been wondering about this myself (the other ranges that might be commonly used) There's also the broader point that A-Za-z will limit you only to ascii characters, which may or may not be what you want to do. Perhaps it's best to start small here and expand the functionality only if needed.

There is some extra complexity here to reconstruct octal escape sequences for regexp_parser < 2.7. If rubocop required 2.7+ that code could be simplified.

We'll need to support parsing Ruby 2.7 code at least until RuboCop 2 is released. Then the parser will likely target only Ruby 3+.

@rwstauner
Copy link
Contributor Author

Perhaps we should mention Regexp in the name as well, given that range is an overloaded term in Ruby.

The cop handles Ranges as well, since it's basically the same problem there, too:
https://github.com/rubocop/rubocop/pull/11561/files#diff-630a175c763f054df4e7bbfeeb63b26630f1aa80d26fce1131c39bd1944c38ecR6-R7

@rwstauner
Copy link
Contributor Author

I've been wondering about this myself (the other ranges that might be commonly used) There's also the broader point that A-Za-z will limit you only to ascii characters, which may or may not be what you want to do. Perhaps it's best to start small here and expand the functionality only if needed.

Indeed, there may be an opportunity here to promote better regular expression usage here as well. I'd love to suggest that people use [[:alpha:]] or something when it is appropriate, but I'm concerned about how noisy that would be. I suspect that there is a significant amount of matching a-z that intends to limit to ascii, and I don't want code to have be littered with rubocop disable comments. I'm not sure if there's a better way to handle that (a suggestion that is only useful half of the time).

I was thinking more about this over the weekend and wondering if a better approach would be to move this functionality into some sort of helper and make specific cops... Then we could have one simple, explicit cop for mixing the A-Z and a-z ranges (with specific, helpful messages), and if anyone comes up with a use case or likelihood for mixing any other ranges we could add another cop then (with another specific message).

@bbatsov
Copy link
Collaborator

bbatsov commented Feb 15, 2023

I was thinking more about this over the weekend and wondering if a better approach would be to move this functionality into some sort of helper and make specific cops... Then we could have one simple, explicit cop for mixing the A-Z and a-z ranges (with specific, helpful messages), and if anyone comes up with a use case or likelihood for mixing any other ranges we could add another cop then (with another specific message).

Yeah, probably this will work better in practice.

@koic
Copy link
Member

koic commented Feb 15, 2023

There is some extra complexity here to reconstruct octal escape sequences for regexp_parser < 2.7. If rubocop required 2.7+ that code could be simplified.

We'll need to support parsing Ruby 2.7 code at least until RuboCop 2 is released. Then the parser will likely target only Ruby 3+.

JFYI, regexp_parser 2.7 is not Ruby 2.7 :-) Support for regexp_parser 1 and regexp_parser 2 was previously discussed on #9162 and related PRs.

@bbatsov
Copy link
Collaborator

bbatsov commented Feb 26, 2023

@koic My bad! Once again I misread a message. 😅

@bbatsov
Copy link
Collaborator

bbatsov commented Apr 11, 2023

So, how about we agree to:

  • Pick a more specific name (e.g. "Ambiguous/UnsafeCharacterRanger")
  • limit the cop to character ranges
  • expand its rationale so the nature of the problem is clearer
  • document its limitations
  • make the warning message more specific given that we've decided to reduce the scope of the cop

@rwstauner @koic How does this sound?

@rwstauner
Copy link
Contributor Author

yeah, that's basically what i was thinking. i'm sorry i haven't gotten back to this. it was a hackdays project and i've been busier than usual at work since then. I should be able to devote a little time to finish this up in a few weeks.

@rwstauner rwstauner marked this pull request as draft May 6, 2023 17:52
@rwstauner rwstauner changed the title Add new UnsafeRange cop Add new MixedCaseRange cop May 18, 2023
Remove number range as that is considerably less likely to be used
and makes writing helpful messages more difficult.
Consuming classes don't need to care that it might be a compound expression.
Instead return an object with the same interface as you'd get from Regexp::Parser.
@rwstauner rwstauner marked this pull request as ready for review May 18, 2023 19:18
@rwstauner
Copy link
Contributor Author

I reworked this a bit to abstract and simplify it and updated the PR description.

@bbatsov bbatsov merged commit 8509ced into rubocop:master Jun 19, 2023
22 checks passed
@bbatsov
Copy link
Collaborator

bbatsov commented Jun 19, 2023

Looks good to me in its current simpler form. Sorry for the slow turnaround on this!

koic added a commit to rubocop/rubocop-rails that referenced this pull request Jun 19, 2023
Follow up rubocop/rubocop#11561.

```console
$ bundle exec rake
(snip)

Offenses:

lib/rubocop/cop/rails/time_zone.rb:58:33: W: [Correctable] Lint/MixedCaseRange: Ranges from upper to lower case
ASCII letters may include unintended characters. Instead of A-z (which also includes several symbols)
specify each range individually: A-Za-z and individually specify any symbols.
        TIMEZONE_SPECIFIER = /([A-z]|[+-]\d{2}:?\d{2})\z/.freeze
                                ^^^

281 files inspected, 1 offense detected, 1 offense autocorrectable
```
@rwstauner
Copy link
Contributor Author

No worries, i was busy and slow, too. Thanks for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants