Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new Style/RedundantRegexpArgument cop #11595

Merged

Conversation

koic
Copy link
Member

@koic koic commented Feb 18, 2023

This PR adds new Style/RedundantRegexpArgument cop.

It identifies places where argument can be replaced from a deterministic regexp to a string.

# bad
'foo'.byteindex(/f/)
'foo'.byterindex(/f/)
'foo'.gsub(/f/, 'x')
'foo'.gsub!(/f/, 'x')
'foo'.index(/f/)
'foo'.partition(/f/)
'foo'.rindex(/f/)
'foo'.rpartition(/f/)
'foo'.scan(/f/)
'foo'.split(/f/)
'foo'.sub(/f/, 'x')
'foo'.sub!(/f/, 'x')

# good
'foo'.byteindex('f')
'foo'.byterindex('f')
'foo'.gsub('f', 'x')
'foo'.gsub!('f', 'x')
'foo'.index('f')
'foo'.partition('f')
'foo'.rindex('f')
'foo'.rpartition('f')
'foo'.scan('f')
'foo'.split('f')
'foo'.sub('f', 'x')
'foo'.sub!('f', 'x')

This is a superset of Performance/RedundantSplitRegexpArgument which only handles String#split: https://docs.rubocop.org/rubocop-performance/1.16/cops_performance.html#performanceredundantsplitregexpargument

Changing the argument from regexp to string have a positive effect on readability as well as performance. So this cop is promoting Style department, not Performance department.

In the future Performance/RedundantSplitRegexpArgument can be deprecated and replaced by this cop.


Before submitting the PR make sure the following are checked:

  • The PR relates to only one subject with a clear title and description in grammatically correct, complete sentences.
  • Wrote good commit messages.
  • Commit message starts with [Fix #issue-number] (if the related issue exists).
  • Feature branch is up-to-date with master (if not - rebase it).
  • Squashed related commits together.
  • Added tests.
  • Ran bundle exec rake default. It executes all tests and runs RuboCop on its own code.
  • Added an entry (file) to the changelog folder named {change_type}_{change_description}.md if the new code introduces user-observable changes. See changelog entry format for details.

RESTRICT_ON_SEND = %i[
byteindex byterindex gsub gsub! index partition rindex rpartition scan split sub sub!
].freeze
DETERMINISTIC_REGEX = /\A(?:#{LITERAL_REGEX})+\Z/.freeze
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is maybe not directly relevant to this PR as it already existed before, but the LITERAL_REGEX has both false positives and false negatives.

# current regex
LITERAL_REGEX = %r{[\w\s\-,"'!#%&<>=;:`~/]|\\[^AbBdDgGhHkpPRwWXsSzZ0-9]}

# some false positives
LITERAL_REGEX.match?(/foo # comment/x.source) # => true
LITERAL_REGEX.match?(/a\Kb/.source) # => true (\K is a lookbehind)

# some false negatives
LITERAL_REGEX.match?(/😃/.source) # => false
LITERAL_REGEX.match?(/]/.source) # => false
LITERAL_REGEX.match?(/@/.source) # => false
LITERAL_REGEX.match?(/ü/.source) # => false

it might make sense to improve LITERAL_REGEX a bit, or maybe regexp_parser could be used, approximately like so (i haven't tested this):

Regexp::Parser.parse(regexp).each_expression.all? do |(exp)|
  exp.one_of?(:literal, :escape) && !exp.quantified?
end

Copy link
Member Author

@koic koic Jun 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/foo # comment/ does not seem to be a false positive. OTOH, regular expression /a\Kb/ may indeed be a false positive, or an incorrect autocorrection. LITERAL_REGEXP constant used in some existing some RuboCop Performances cops, so I would like to modify it separately.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/foo # comment/ does not seem to be a false positive

It is not a false positive in the sense that it is impossible to replace it with a string, but IIUC the cop would replace it with 'foo # comment', which would not behave the same?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, the following are compatible. What code specifically causes the problem?

'foo # comment extra'.match?(/foo # comment/) # => true
'foo # comment extra'.match?('foo # comment') # => true

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/foo # comment/x

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also / foo /x

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sure! It changed to allow when options are present.

def on_send(node)
return unless (regexp_node = node.first_argument)
return unless regexp_node.regexp_type?
return if regexp_node.ignore_case? || regexp_node.content == ' '
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not sure what regexp_node.content == ' ' implies, but maybe '' was intended instead of ' ', to allow something like split(//)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It prevents false positives in regex matches for spaces.

'foo         bar'.split(/ /) # => ["foo", "", "", "", "", "", "", "", "", "bar"]
'foo         bar'.split(' ') # => ["foo", "bar"]

The test is written below for that.
https://github.com/rubocop/rubocop/pull/11595/files#diff-86f0d8dfae0d2d9eb2236fbfd1ae5c61165805156ed3c04b0847e6f82823252eR121-R125

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, i had no idea that split had a custom "awk-like" behavior for single spaces!

@bbatsov
Copy link
Collaborator

bbatsov commented Mar 6, 2023

@koic Please address the feedback provided by @jaynetics.

byteindex byterindex gsub gsub! index partition rindex rpartition scan split sub sub!
].freeze
DETERMINISTIC_REGEX = /\A(?:#{LITERAL_REGEX})+\Z/.freeze
STR_SPECIAL_CHARS = %w[\n \" \' \\\\ \t \b \f \r].freeze
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
STR_SPECIAL_CHARS = %w[\n \" \' \\\\ \t \b \f \r].freeze
STR_SPECIAL_CHARS = %w[\a \c \C \e \f \M \n \" \' \\\\ \t \b \f \r \u \v \x \0 \1 \2 \3 \4 \5 \6 \7].freeze

a: alert
c/C/M: meta/control escapes
e: ESC
u: unicode escape
v: vertical tab
x: hex escape
0-7: octal escapes

i think that's all ...

@koic koic force-pushed the add_new_style_redundant_regexp_argument_cop branch 2 times, most recently from 876d0de to f5be4fa Compare June 16, 2023 08:50

MSG = 'Use string `%<prefer>s` as argument instead of regexp `%<current>s`.'
RESTRICT_ON_SEND = %i[
byteindex byterindex gsub gsub! index partition rindex rpartition scan split sub sub!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
byteindex byterindex gsub gsub! index partition rindex rpartition scan split sub sub!
[] []= byteindex byterindex gsub gsub! index partition rindex rpartition scan slice slice! split start_with? sub sub!

Copy link
Member Author

@koic koic Jun 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! However, [] and []= are not applicable because they are false positives when the receiver is a hash object. Others have applied :-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we check the receiver? index, rindex, partition, slice, slice! are also Array/Enumerable/Hash methods and could lead to false positives otherwise.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, it's true that slice and slice! are bad when the receiver is a Hash object, so l've removed them. Other methods can usually be kept since an Integer is used as an argument when the receiver is an Array and Enumerable, I think.

Copy link
Contributor

@jaynetics jaynetics Jun 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, thats true for partition, which only takes a regexp when called on strings, but not for index and rindex:

[/a/, /b/, /b/, /c/].index /b/  # => 1
[/a/, /b/, /b/, /c/].rindex /b/ # => 2

@koic koic force-pushed the add_new_style_redundant_regexp_argument_cop branch 2 times, most recently from 18dd202 to 7b436c1 Compare June 16, 2023 09:23
This PR adds new `Style/RedundantRegexpArgument` cop.

It identifies places where argument can be replaced from
a deterministic regexp to a string.

```ruby
# bad
'foo'.byteindex(/f/)
'foo'.byterindex(/f/)
'foo'.gsub(/f/, 'x')
'foo'.gsub!(/f/, 'x')
'foo'.index(/f/)
'foo'.partition(/f/)
'foo'.rindex(/f/)
'foo'.rpartition(/f/)
'foo'.scan(/f/)
'foo'.split(/f/)
'foo'.sub(/f/, 'x')
'foo'.sub!(/f/, 'x')

# good
'foo'.byteindex('f')
'foo'.byterindex('f')
'foo'.gsub('f', 'x')
'foo'.gsub!('f', 'x')
'foo'.index('f')
'foo'.partition('f')
'foo'.rindex('f')
'foo'.rpartition('f')
'foo'.scan('f')
'foo'.split('f')
'foo'.sub('f', 'x')
'foo'.sub!('f', 'x')
```

This is a superset of `Performance/RedundantSplitRegexpArgument` which only handles `String#split`:
https://docs.rubocop.org/rubocop-performance/1.16/cops_performance.html#performanceredundantsplitregexpargument

Changing the argument from regexp to string have a positive effect on readability as well as performance.
So this cop is promoting Style department, not Performance department.

In the future `Performance/RedundantSplitRegexpArgument` can be deprecated and replaced by this cop.
@koic koic force-pushed the add_new_style_redundant_regexp_argument_cop branch from 7b436c1 to 2ce1d3f Compare June 16, 2023 10:58
@bbatsov bbatsov merged commit f15a85a into rubocop:master Jun 20, 2023
29 checks passed
@jaynetics
Copy link
Contributor

please note this will still cause false positives for Array#{index,rindex} as mentioned here and still needs to preserve some more STR_SPECIAL_CHARS as mentioned here

koic added a commit that referenced this pull request Jun 20, 2023
Follow up #11595 (comment).

This commit prevents the following false positives for `Style/RedundantRegexpArgument`
when using `index` and `rindex`:

```console
[/foo/, /bar/, /baz/].index(/foo/)
[/foo/, /bar/, /baz/].rindex(/foo/)
```
@koic
Copy link
Member Author

koic commented Jun 20, 2023

Yeah, I've made corrections for Array#{index,rindex} in d9b431b. As for STR_SPECIAL_CHARS, I'll look into it in conjunction with Performance/RedundantSplitRegexpArgument, which it is based on. Thank you.

@koic koic deleted the add_new_style_redundant_regexp_argument_cop branch June 20, 2023 08:29
koic added a commit to koic/rubocop that referenced this pull request Jun 28, 2023
…RegexpArgument`

Fixes rubocop#11989.

This PR fixes an incorrect autocorrect for `Style/RedundantRegexpArgument`
when using unicode chars.

This is a resolution applying the feedback below:
rubocop#11595 (comment)
bbatsov pushed a commit that referenced this pull request Jun 28, 2023
…rgument`

Fixes #11989.

This PR fixes an incorrect autocorrect for `Style/RedundantRegexpArgument`
when using unicode chars.

This is a resolution applying the feedback below:
#11595 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants