Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Metrics/CollectionLiteralLength cop #11584

Merged
merged 5 commits into from Feb 26, 2023

Conversation

sambostock
Copy link
Contributor

@sambostock sambostock commented Feb 16, 2023

This cop checks for extremely long Array and Hash literals, and Set pseudo-literals (actually parameters to the Set.[] method).


Before submitting the PR make sure the following are checked:

  • The PR relates to only one subject with a clear title and description in grammatically correct, complete sentences.
  • Wrote good commit messages.
  • Commit message starts with [Fix #issue-number] (if the related issue exists).
  • Feature branch is up-to-date with master (if not - rebase it).
  • Squashed related commits together.
  • Added tests.
  • Ran bundle exec rake default. It executes all tests and runs RuboCop on its own code.
  • Added an entry (file) to the changelog folder named {change_type}_{change_description}.md if the new code introduces user-observable changes. See changelog entry format for details.

@sambostock sambostock force-pushed the lengthy-literal branch 6 times, most recently from 82bd9d5 to ab1c871 Compare February 16, 2023 23:26
Description: 'Checks for extremely large literals, such as `Array`s and `Hash`es with many entries.'
Enabled: pending
VersionAdded: '<<next>>'
LengthThreshold: 250
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I picked this arbitrarily. 🤷

We shouldn't pick a number so high that it makes the rule useless, but also shouldn't be too aggressive in our default. Application owners are free to adjust the number as they see fit.

@koic
Copy link
Member

koic commented Feb 17, 2023

I can understand that LengthThreshold is set to a sufficient value, but I'm not sure it's of general interest to catch cases like this.
For example, hash parameters may be used in Ruby DSL instead of YAML. In this case the number doesn't matter.
As a result I'm not sure whether it's useful to have this cop in the core by default.

@sambostock
Copy link
Contributor Author

sambostock commented Feb 17, 2023

This is actually related to #11588 which I just opened.

An application of ours had an array of 2000+ sub-arrays of reconciliation data to be inserted in a backfill rake task. A O(n²) loop in Style/WordArray was making it take hours to investigate that file, which sent me down this rabbit hole (and fixing it in #11588).

I created this cop because I have seen many instances of this over the years, where otherwise simple files are crowded by large data/config sections that would be better extracted into separate files (e.g. CSV), or better yet, stored externally from the git repo all together, and uploaded to the app for one time use (via an approach like a Maintenance Tasks' CSV task).

Another occurrence I saw recently was a project where they'd tried to load a CSV, but it was taking too long (simple mistake, see rubocop/rubocop-performance#328 which I opened to catch it), so they decided to hard code a Hash of hundreds of thousands of entries as a constant in an otherwise small file. This cop would have flagged this, and hopefully the author would have investigated.

It is my opinion that in most projects very large literals would be a smell (perhaps 500, or 1,000 would be better thresholds), and that projects that use a DSL where this makes sense would simply disable the cop.

This cop checks for extremely long `Array` and `Hash` literals, and
`Set` pseudo-literals (actually parameters to the `Set.[]` method).
@bbatsov
Copy link
Collaborator

bbatsov commented Feb 18, 2023

I like the overall idea, although I'm not sure whether this is something we can have enabled by default. A couple of remarks:

  • Given that this targets only collection literals I'd go for a name that's more specific
  • The default seems pretty big and I'm curious how you came up with it (I saw in one remark this was done arbitrary, so if that's really the case we can discuss options here)

@sambostock
Copy link
Contributor Author

I'm not sure whether this is something we can have enabled by default.

Fair. We'll probably enable it in rubocop-shopify, as others can too.

Given that this targets only collection literals I'd go for a name that's more specific

Yeah, good point. I'd originally chosen a generic name because I wanted to also include strings, but as I started to look into it it wasn't as simple. Perhaps LengthyCollectionLiteral? I originally had Huge and want sure if that sounded okay. Long could also work.

The default seems pretty big and I'm curious how you came up with it (I saw in one remark this was done arbitrary, so if that's really the case we can discuss options here)

Yeah, I just picked a number that felt

  • Big enough to not be contentious
    • Something too small, like 10, could appear when doing things like building API payloads, so would have too many false positives.
  • Small enough to actually be triggered by real usage
  • I think we could all agree a literal of length 999_999_999 doesn't belong in a Ruby file, but that's not a very useful threshold.

So not fully arbitrary, but not based on any metrics or anything like that.

@bbatsov
Copy link
Collaborator

bbatsov commented Feb 23, 2023

Perhaps LengthyCollectionLiteral?

I'd be fine by me. @rubocop/rubocop-core Proposals for a better name would be welcome.

Btw, it's also debatable if this is a Style cop or a Metrics cop, given it's similar in spirit to method/class length checks. If it's there it can be simple Metrics/CollectionLength or something along those lines.

@koic
Copy link
Member

koic commented Feb 24, 2023

Btw, it's also debatable if this is a Style cop or a Metrics cop, given it's similar in spirit to method/class length checks. If it's there it can be simple Metrics/CollectionLength or something along those lines.

Interesting. Metrics may be better.

@sambostock
Copy link
Contributor Author

@bbatsov, @koic what do you think of Metrics/LongCollectionLiteral?

I think it's important to have Literal in there, because the target isn't large collections in general, it's large collection literals that shouldn't be inline in a class/module, and instead extracted to config files or something (even if the config file is a Ruby file using a DSL which is excluded from this cop).

@bbatsov
Copy link
Collaborator

bbatsov commented Feb 24, 2023

@bbatsov, @koic what do you think of Metrics/LongCollectionLiteral?

I'm fine with that name. CollectionLiteralLength would be fine as well IMO. (and more in line with current naming)

@sambostock
Copy link
Contributor Author

CollectionLiteralLength would be fine as well IMO. (and more in line with current naming)

Ah yes, I agree that is better. I'll make the change.

This test appears to disable all departments except `Style` for the
purposes of the test. However, it missed `Metrics`, which means
`pending` `Metrics` tests result in unexpected output.
@bbatsov bbatsov merged commit e4f7799 into rubocop:master Feb 26, 2023
@bbatsov
Copy link
Collaborator

bbatsov commented Feb 26, 2023

Thanks!

@sambostock sambostock deleted the lengthy-literal branch February 26, 2023 14:44
@sambostock sambostock changed the title Add Style/LengthyLiteral cop Add Metrics/CollectionLiteralLength cop Mar 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants