Add new `Style/RedundantRegexpArgument` cop #11595

koic · 2023-02-18T18:59:21Z

This PR adds new Style/RedundantRegexpArgument cop.

It identifies places where argument can be replaced from a deterministic regexp to a string.

# bad
'foo'.byteindex(/f/)
'foo'.byterindex(/f/)
'foo'.gsub(/f/, 'x')
'foo'.gsub!(/f/, 'x')
'foo'.index(/f/)
'foo'.partition(/f/)
'foo'.rindex(/f/)
'foo'.rpartition(/f/)
'foo'.scan(/f/)
'foo'.split(/f/)
'foo'.sub(/f/, 'x')
'foo'.sub!(/f/, 'x')

# good
'foo'.byteindex('f')
'foo'.byterindex('f')
'foo'.gsub('f', 'x')
'foo'.gsub!('f', 'x')
'foo'.index('f')
'foo'.partition('f')
'foo'.rindex('f')
'foo'.rpartition('f')
'foo'.scan('f')
'foo'.split('f')
'foo'.sub('f', 'x')
'foo'.sub!('f', 'x')

This is a superset of Performance/RedundantSplitRegexpArgument which only handles String#split: https://docs.rubocop.org/rubocop-performance/1.16/cops_performance.html#performanceredundantsplitregexpargument

Changing the argument from regexp to string have a positive effect on readability as well as performance. So this cop is promoting Style department, not Performance department.

In the future Performance/RedundantSplitRegexpArgument can be deprecated and replaced by this cop.

Before submitting the PR make sure the following are checked:

The PR relates to only one subject with a clear title and description in grammatically correct, complete sentences.
Wrote good commit messages.
Commit message starts with [Fix #issue-number] (if the related issue exists).
Feature branch is up-to-date with master (if not - rebase it).
Squashed related commits together.
Added tests.
Ran bundle exec rake default. It executes all tests and runs RuboCop on its own code.
Added an entry (file) to the changelog folder named {change_type}_{change_description}.md if the new code introduces user-observable changes. See changelog entry format for details.

jaynetics · 2023-02-20T22:43:51Z

lib/rubocop/cop/style/redundant_regexp_argument.rb

+        RESTRICT_ON_SEND = %i[
+          byteindex byterindex gsub gsub! index partition rindex rpartition scan split sub sub!
+        ].freeze
+        DETERMINISTIC_REGEX = /\A(?:#{LITERAL_REGEX})+\Z/.freeze


it is maybe not directly relevant to this PR as it already existed before, but the LITERAL_REGEX has both false positives and false negatives.

# current regex LITERAL_REGEX = %r{[\w\s\-,"'!#%&<>=;:`~/]|\\[^AbBdDgGhHkpPRwWXsSzZ0-9]} # some false positives LITERAL_REGEX.match?(/foo # comment/x.source) # => true LITERAL_REGEX.match?(/a\Kb/.source) # => true (\K is a lookbehind) # some false negatives LITERAL_REGEX.match?(/😃/.source) # => false LITERAL_REGEX.match?(/]/.source) # => false LITERAL_REGEX.match?(/@/.source) # => false LITERAL_REGEX.match?(/ü/.source) # => false

it might make sense to improve LITERAL_REGEX a bit, or maybe regexp_parser could be used, approximately like so (i haven't tested this):

Regexp::Parser.parse(regexp).each_expression.all? do |(exp)| exp.one_of?(:literal, :escape) && !exp.quantified? end

/foo # comment/ does not seem to be a false positive. OTOH, regular expression /a\Kb/ may indeed be a false positive, or an incorrect autocorrection. LITERAL_REGEXP constant used in some existing some RuboCop Performances cops, so I would like to modify it separately.

/foo # comment/ does not seem to be a false positive

It is not a false positive in the sense that it is impossible to replace it with a string, but IIUC the cop would replace it with 'foo # comment', which would not behave the same?

For example, the following are compatible. What code specifically causes the problem?

'foo # comment extra'.match?(/foo # comment/) # => true 'foo # comment extra'.match?('foo # comment') # => true

/foo # comment/x

also / foo /x

Ah, sure! It changed to allow when options are present.

jaynetics · 2023-02-20T22:47:34Z

lib/rubocop/cop/style/redundant_regexp_argument.rb

+        def on_send(node)
+          return unless (regexp_node = node.first_argument)
+          return unless regexp_node.regexp_type?
+          return if regexp_node.ignore_case? || regexp_node.content == ' '


i'm not sure what regexp_node.content == ' ' implies, but maybe '' was intended instead of ' ', to allow something like split(//)?

It prevents false positives in regex matches for spaces.

'foo bar'.split(/ /) # => ["foo", "", "", "", "", "", "", "", "", "bar"] 'foo bar'.split(' ') # => ["foo", "bar"]

The test is written below for that.
https://github.com/rubocop/rubocop/pull/11595/files#diff-86f0d8dfae0d2d9eb2236fbfd1ae5c61165805156ed3c04b0847e6f82823252eR121-R125

Ah, i had no idea that split had a custom "awk-like" behavior for single spaces!

bbatsov · 2023-03-06T09:48:58Z

@koic Please address the feedback provided by @jaynetics.

jaynetics · 2023-06-15T16:25:06Z

lib/rubocop/cop/style/redundant_regexp_argument.rb

+          byteindex byterindex gsub gsub! index partition rindex rpartition scan split sub sub!
+        ].freeze
+        DETERMINISTIC_REGEX = /\A(?:#{LITERAL_REGEX})+\Z/.freeze
+        STR_SPECIAL_CHARS = %w[\n \" \' \\\\ \t \b \f \r].freeze


Suggested change

STR_SPECIAL_CHARS = %w[\n \" \' \\\\ \t \b \f \r].freeze

STR_SPECIAL_CHARS = %w[\a \c \C \e \f \M \n \" \' \\\\ \t \b \f \r \u \v \x \0 \1 \2 \3 \4 \5 \6 \7].freeze

a: alert
c/C/M: meta/control escapes
e: ESC
u: unicode escape
v: vertical tab
x: hex escape
0-7: octal escapes

i think that's all ...

jaynetics · 2023-06-16T09:01:38Z

lib/rubocop/cop/style/redundant_regexp_argument.rb

+
+        MSG = 'Use string `%<prefer>s` as argument instead of regexp `%<current>s`.'
+        RESTRICT_ON_SEND = %i[
+          byteindex byterindex gsub gsub! index partition rindex rpartition scan split sub sub!


Suggested change

byteindex byterindex gsub gsub! index partition rindex rpartition scan split sub sub!

[] []= byteindex byterindex gsub gsub! index partition rindex rpartition scan slice slice! split start_with? sub sub!

Good catch! However, [] and []= are not applicable because they are false positives when the receiver is a hash object. Others have applied :-)

could we check the receiver? index, rindex, partition, slice, slice! are also Array/Enumerable/Hash methods and could lead to false positives otherwise.

Ah, it's true that slice and slice! are bad when the receiver is a Hash object, so l've removed them. Other methods can usually be kept since an Integer is used as an argument when the receiver is an Array and Enumerable, I think.

hmm, thats true for partition, which only takes a regexp when called on strings, but not for index and rindex:

[/a/, /b/, /b/, /c/].index /b/ # => 1 [/a/, /b/, /b/, /c/].rindex /b/ # => 2

This PR adds new `Style/RedundantRegexpArgument` cop. It identifies places where argument can be replaced from a deterministic regexp to a string. ```ruby # bad 'foo'.byteindex(/f/) 'foo'.byterindex(/f/) 'foo'.gsub(/f/, 'x') 'foo'.gsub!(/f/, 'x') 'foo'.index(/f/) 'foo'.partition(/f/) 'foo'.rindex(/f/) 'foo'.rpartition(/f/) 'foo'.scan(/f/) 'foo'.split(/f/) 'foo'.sub(/f/, 'x') 'foo'.sub!(/f/, 'x') # good 'foo'.byteindex('f') 'foo'.byterindex('f') 'foo'.gsub('f', 'x') 'foo'.gsub!('f', 'x') 'foo'.index('f') 'foo'.partition('f') 'foo'.rindex('f') 'foo'.rpartition('f') 'foo'.scan('f') 'foo'.split('f') 'foo'.sub('f', 'x') 'foo'.sub!('f', 'x') ``` This is a superset of `Performance/RedundantSplitRegexpArgument` which only handles `String#split`: https://docs.rubocop.org/rubocop-performance/1.16/cops_performance.html#performanceredundantsplitregexpargument Changing the argument from regexp to string have a positive effect on readability as well as performance. So this cop is promoting Style department, not Performance department. In the future `Performance/RedundantSplitRegexpArgument` can be deprecated and replaced by this cop.

jaynetics · 2023-06-20T08:07:46Z

please note this will still cause false positives for Array#{index,rindex} as mentioned here and still needs to preserve some more STR_SPECIAL_CHARS as mentioned here

Follow up #11595 (comment). This commit prevents the following false positives for `Style/RedundantRegexpArgument` when using `index` and `rindex`: ```console [/foo/, /bar/, /baz/].index(/foo/) [/foo/, /bar/, /baz/].rindex(/foo/) ```

koic · 2023-06-20T08:29:20Z

Yeah, I've made corrections for Array#{index,rindex} in d9b431b. As for STR_SPECIAL_CHARS, I'll look into it in conjunction with Performance/RedundantSplitRegexpArgument, which it is based on. Thank you.

…RegexpArgument` Fixes rubocop#11989. This PR fixes an incorrect autocorrect for `Style/RedundantRegexpArgument` when using unicode chars. This is a resolution applying the feedback below: rubocop#11595 (comment)

…rgument` Fixes #11989. This PR fixes an incorrect autocorrect for `Style/RedundantRegexpArgument` when using unicode chars. This is a resolution applying the feedback below: #11595 (comment)

jaynetics reviewed Feb 20, 2023

View reviewed changes

jaynetics reviewed Jun 15, 2023

View reviewed changes

koic force-pushed the add_new_style_redundant_regexp_argument_cop branch 2 times, most recently from 876d0de to f5be4fa Compare June 16, 2023 08:50

jaynetics reviewed Jun 16, 2023

View reviewed changes

koic force-pushed the add_new_style_redundant_regexp_argument_cop branch 2 times, most recently from 18dd202 to 7b436c1 Compare June 16, 2023 09:23

koic force-pushed the add_new_style_redundant_regexp_argument_cop branch from 7b436c1 to 2ce1d3f Compare June 16, 2023 10:58

bbatsov merged commit f15a85a into rubocop:master Jun 20, 2023
29 checks passed

koic deleted the add_new_style_redundant_regexp_argument_cop branch June 20, 2023 08:29

koic mentioned this pull request Jun 28, 2023

[Fix #11989] Fix an incorrect autocorrect for Style/RedundantRegexpArgument #11990

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add new `Style/RedundantRegexpArgument` cop #11595

Add new `Style/RedundantRegexpArgument` cop #11595

koic commented Feb 18, 2023

jaynetics Feb 20, 2023

koic Jun 15, 2023 •

edited

jaynetics Jun 15, 2023

koic Jun 16, 2023

jaynetics Jun 16, 2023

jaynetics Jun 16, 2023

koic Jun 16, 2023

jaynetics Feb 20, 2023

koic Jun 15, 2023

jaynetics Jun 15, 2023

bbatsov commented Mar 6, 2023

jaynetics Jun 15, 2023

jaynetics Jun 16, 2023

koic Jun 16, 2023 •

edited

jaynetics Jun 16, 2023

koic Jun 16, 2023

jaynetics Jun 16, 2023 •

edited

jaynetics commented Jun 20, 2023

koic commented Jun 20, 2023

	STR_SPECIAL_CHARS = %w[\n \" \' \\\\ \t \b \f \r].freeze
	STR_SPECIAL_CHARS = %w[\a \c \C \e \f \M \n \" \' \\\\ \t \b \f \r \u \v \x \0 \1 \2 \3 \4 \5 \6 \7].freeze

	byteindex byterindex gsub gsub! index partition rindex rpartition scan split sub sub!
	[] []= byteindex byterindex gsub gsub! index partition rindex rpartition scan slice slice! split start_with? sub sub!

Add new Style/RedundantRegexpArgument cop #11595

Add new Style/RedundantRegexpArgument cop #11595

Conversation

koic commented Feb 18, 2023

Choose a reason for hiding this comment

koic Jun 15, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bbatsov commented Mar 6, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

koic Jun 16, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jaynetics Jun 16, 2023 • edited

Choose a reason for hiding this comment

jaynetics commented Jun 20, 2023

koic commented Jun 20, 2023

Add new `Style/RedundantRegexpArgument` cop #11595

Add new `Style/RedundantRegexpArgument` cop #11595

koic Jun 15, 2023 •

edited

koic Jun 16, 2023 •

edited

jaynetics Jun 16, 2023 •

edited