-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new Style/RedundantRegexpArgument
cop
#11595
Add new Style/RedundantRegexpArgument
cop
#11595
Conversation
RESTRICT_ON_SEND = %i[ | ||
byteindex byterindex gsub gsub! index partition rindex rpartition scan split sub sub! | ||
].freeze | ||
DETERMINISTIC_REGEX = /\A(?:#{LITERAL_REGEX})+\Z/.freeze |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is maybe not directly relevant to this PR as it already existed before, but the LITERAL_REGEX
has both false positives and false negatives.
# current regex
LITERAL_REGEX = %r{[\w\s\-,"'!#%&<>=;:`~/]|\\[^AbBdDgGhHkpPRwWXsSzZ0-9]}
# some false positives
LITERAL_REGEX.match?(/foo # comment/x.source) # => true
LITERAL_REGEX.match?(/a\Kb/.source) # => true (\K is a lookbehind)
# some false negatives
LITERAL_REGEX.match?(/😃/.source) # => false
LITERAL_REGEX.match?(/]/.source) # => false
LITERAL_REGEX.match?(/@/.source) # => false
LITERAL_REGEX.match?(/ü/.source) # => false
it might make sense to improve LITERAL_REGEX
a bit, or maybe regexp_parser
could be used, approximately like so (i haven't tested this):
Regexp::Parser.parse(regexp).each_expression.all? do |(exp)|
exp.one_of?(:literal, :escape) && !exp.quantified?
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/foo # comment/
does not seem to be a false positive. OTOH, regular expression /a\Kb/
may indeed be a false positive, or an incorrect autocorrection. LITERAL_REGEXP
constant used in some existing some RuboCop Performances cops, so I would like to modify it separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/foo # comment/ does not seem to be a false positive
It is not a false positive in the sense that it is impossible to replace it with a string, but IIUC the cop would replace it with 'foo # comment'
, which would not behave the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, the following are compatible. What code specifically causes the problem?
'foo # comment extra'.match?(/foo # comment/) # => true
'foo # comment extra'.match?('foo # comment') # => true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/foo # comment/x
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also / foo /x
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, sure! It changed to allow when options are present.
def on_send(node) | ||
return unless (regexp_node = node.first_argument) | ||
return unless regexp_node.regexp_type? | ||
return if regexp_node.ignore_case? || regexp_node.content == ' ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm not sure what regexp_node.content == ' '
implies, but maybe ''
was intended instead of ' '
, to allow something like split(//)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It prevents false positives in regex matches for spaces.
'foo bar'.split(/ /) # => ["foo", "", "", "", "", "", "", "", "", "bar"]
'foo bar'.split(' ') # => ["foo", "bar"]
The test is written below for that.
https://github.com/rubocop/rubocop/pull/11595/files#diff-86f0d8dfae0d2d9eb2236fbfd1ae5c61165805156ed3c04b0847e6f82823252eR121-R125
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, i had no idea that split had a custom "awk-like" behavior for single spaces!
@koic Please address the feedback provided by @jaynetics. |
byteindex byterindex gsub gsub! index partition rindex rpartition scan split sub sub! | ||
].freeze | ||
DETERMINISTIC_REGEX = /\A(?:#{LITERAL_REGEX})+\Z/.freeze | ||
STR_SPECIAL_CHARS = %w[\n \" \' \\\\ \t \b \f \r].freeze |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
STR_SPECIAL_CHARS = %w[\n \" \' \\\\ \t \b \f \r].freeze | |
STR_SPECIAL_CHARS = %w[\a \c \C \e \f \M \n \" \' \\\\ \t \b \f \r \u \v \x \0 \1 \2 \3 \4 \5 \6 \7].freeze |
a: alert
c/C/M: meta/control escapes
e: ESC
u: unicode escape
v: vertical tab
x: hex escape
0-7: octal escapes
i think that's all ...
876d0de
to
f5be4fa
Compare
|
||
MSG = 'Use string `%<prefer>s` as argument instead of regexp `%<current>s`.' | ||
RESTRICT_ON_SEND = %i[ | ||
byteindex byterindex gsub gsub! index partition rindex rpartition scan split sub sub! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
byteindex byterindex gsub gsub! index partition rindex rpartition scan split sub sub! | |
[] []= byteindex byterindex gsub gsub! index partition rindex rpartition scan slice slice! split start_with? sub sub! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! However, []
and []=
are not applicable because they are false positives when the receiver is a hash object. Others have applied :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we check the receiver? index
, rindex
, partition
, slice
, slice!
are also Array/Enumerable/Hash methods and could lead to false positives otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, it's true that slice
and slice!
are bad when the receiver is a Hash
object, so l've removed them. Other methods can usually be kept since an Integer is used as an argument when the receiver is an Array
and Enumerable
, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, thats true for partition
, which only takes a regexp when called on strings, but not for index
and rindex
:
[/a/, /b/, /b/, /c/].index /b/ # => 1
[/a/, /b/, /b/, /c/].rindex /b/ # => 2
18dd202
to
7b436c1
Compare
This PR adds new `Style/RedundantRegexpArgument` cop. It identifies places where argument can be replaced from a deterministic regexp to a string. ```ruby # bad 'foo'.byteindex(/f/) 'foo'.byterindex(/f/) 'foo'.gsub(/f/, 'x') 'foo'.gsub!(/f/, 'x') 'foo'.index(/f/) 'foo'.partition(/f/) 'foo'.rindex(/f/) 'foo'.rpartition(/f/) 'foo'.scan(/f/) 'foo'.split(/f/) 'foo'.sub(/f/, 'x') 'foo'.sub!(/f/, 'x') # good 'foo'.byteindex('f') 'foo'.byterindex('f') 'foo'.gsub('f', 'x') 'foo'.gsub!('f', 'x') 'foo'.index('f') 'foo'.partition('f') 'foo'.rindex('f') 'foo'.rpartition('f') 'foo'.scan('f') 'foo'.split('f') 'foo'.sub('f', 'x') 'foo'.sub!('f', 'x') ``` This is a superset of `Performance/RedundantSplitRegexpArgument` which only handles `String#split`: https://docs.rubocop.org/rubocop-performance/1.16/cops_performance.html#performanceredundantsplitregexpargument Changing the argument from regexp to string have a positive effect on readability as well as performance. So this cop is promoting Style department, not Performance department. In the future `Performance/RedundantSplitRegexpArgument` can be deprecated and replaced by this cop.
7b436c1
to
2ce1d3f
Compare
Follow up #11595 (comment). This commit prevents the following false positives for `Style/RedundantRegexpArgument` when using `index` and `rindex`: ```console [/foo/, /bar/, /baz/].index(/foo/) [/foo/, /bar/, /baz/].rindex(/foo/) ```
Yeah, I've made corrections for |
…RegexpArgument` Fixes rubocop#11989. This PR fixes an incorrect autocorrect for `Style/RedundantRegexpArgument` when using unicode chars. This is a resolution applying the feedback below: rubocop#11595 (comment)
…rgument` Fixes #11989. This PR fixes an incorrect autocorrect for `Style/RedundantRegexpArgument` when using unicode chars. This is a resolution applying the feedback below: #11595 (comment)
This PR adds new
Style/RedundantRegexpArgument
cop.It identifies places where argument can be replaced from a deterministic regexp to a string.
This is a superset of
Performance/RedundantSplitRegexpArgument
which only handlesString#split
: https://docs.rubocop.org/rubocop-performance/1.16/cops_performance.html#performanceredundantsplitregexpargumentChanging the argument from regexp to string have a positive effect on readability as well as performance. So this cop is promoting Style department, not Performance department.
In the future
Performance/RedundantSplitRegexpArgument
can be deprecated and replaced by this cop.Before submitting the PR make sure the following are checked:
[Fix #issue-number]
(if the related issue exists).master
(if not - rebase it).bundle exec rake default
. It executes all tests and runs RuboCop on its own code.{change_type}_{change_description}.md
if the new code introduces user-observable changes. See changelog entry format for details.