Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for selectively choosing deduplication keys. #102

Merged
merged 9 commits into from
Sep 28, 2023

Conversation

kyto64
Copy link
Contributor

@kyto64 kyto64 commented Sep 26, 2023

Description of changes:

Previously, when using ContentBasedDeduplication, only job_id was excluded from the deduplication key. However, depending on the requirements of Rails applications, it may be necessary to exclude keys other than job_id as well.

I have addressed this issue by allowing the deduplication key to be customized in the configuration or within each job class.
Please note that, by default, only job_id is excluded.

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license.

Copy link
Contributor

@mullermp mullermp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for opening a pull request. I think this is a valid feature to consider.

Comment on lines 56 to 57
dedup_keys = job.deduplication_keys if job.respond_to?(:deduplication_keys)
dedup_keys ||= Aws::Rails::SqsActiveJob.config.deduplication_keys.map(&:to_s)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to use symbol throughout the method (including for job_id), and then convert to string when doing body.slice. It would eliminate to_s here and in deduplicate_with.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use to_s within each definition to uniquely add job_id to keys within the deduplicate_without method and excluded_deduplication_keys getter definitions.
Anyway, I agree with your ideas and would be happy to hear any other good ideas.

# client will be created if none is provided.
#
# @option options [Array] :deduplication_keys ([])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nice to convey the default here (all properties except job id) since it is not actually an empty array, or maybe the configuration should be changed to be an exclude list instead of an override (which job id can be hard coded appended to), whichever has better customer experience.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd personally lean towards an exclude list instead of include list. That way the default doesn't need to change if other fields are added in the future. And in general, I'd imagine most use cases would want to just exclude 1 or 2 additional fields?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revised with feedback from alextwoods :)

module Aws
module Rails
module SqsActiveJob
extend ActiveSupport::Concern
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good use of concern here. Are there any other SQS active job related examples in this gem we can clean up as such?

message_body: instance_of(String),
message_attributes: instance_of(Hash),
message_group_id: Aws::Rails::SqsActiveJob.config.message_group_id,
message_deduplication_id: instance_of(String)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we try to assert the default and the override deduplication_id below to ensure that the configuration is working as expected?

Copy link
Contributor Author

@kyto64 kyto64 Sep 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed this and added some specs.

@mullermp
Copy link
Contributor

It looks like test failures are related to MiniTest somehow. The JRuby 9.4 failures appear to be either some deprecation or bug.

Copy link
Contributor

@alextwoods alextwoods left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good and I think is a useful feature - thanks for submitting!


ignored_dedup_key = 'job_id'

if dedup_keys.include?(ignored_dedup_key)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be better to check for an invalid key upfront, when its set rather then every time its used (raise the error where its being made) - I think you could do that in the setter for deduplication_keys.

# client will be created if none is provided.
#
# @option options [Array] :deduplication_keys ([])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd personally lean towards an exclude list instead of include list. That way the default doesn't need to change if other fields are added in the future. And in general, I'd imagine most use cases would want to just exclude 1 or 2 additional fields?

…eys to adopt, one can choose which keys to exclude.

That way the default doesn't need to change if other fields are added in the future.
Copy link
Contributor

@mullermp mullermp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Just some minor comments and I'll take a look at fixing CI ..

# client will be created if none is provided.
#
# @option options [Array] :excluded_deduplication_keys ('job_id')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default should be (['job_id']) since it's Array

message_body: instance_of(String),
message_attributes: instance_of(Hash),
message_group_id: Aws::Rails::SqsActiveJob.config.message_group_id,
message_deduplication_id: instance_of(String)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to strongly assert that message_deduplication_id does not contain job_class and queue_name here? Maybe an implementation detail but we could assert Digest::SHA256.hexdigest or Aws::Json.dump doesn't receive them.

@mullermp
Copy link
Contributor

The test failures in main should be fixed.

@mullermp
Copy link
Contributor

I pushed a change to main that configured and ran rubocop on everything, as well as bump the minimum rails version to 2.3. Please check to see if main merged in correctly and didn't stomp on any of your changes.

@mullermp
Copy link
Contributor

Please also fix your rubocop errors (new CI I've added)

Keito Kira added 2 commits September 28, 2023 08:38
Strongly assert that message_deduplication_id does not contain keys to exclude.
@kyto64
Copy link
Contributor Author

kyto64 commented Sep 28, 2023

To @mullermp @alextwoods

Confirmation about Rubocop:

> bundle e rubocop -E -S 
===>
test/active_job/queue_adapters/amazon_sqs_adapter_test.rb:7:3: C: Metrics/ModuleLength: Module has too many lines. [144/112]
  module QueueAdapters ...
  ^^^^^^^^^^^^^^^^^^^^
test/active_job/queue_adapters/amazon_sqs_adapter_test.rb:8:5: C: Metrics/BlockLength: Block has too many lines. [142/110]
    describe AmazonSqsAdapter do ...

Rubocop also limits the number of lines in the test files, but this may be an unnecessary calibration due to the functionality of RSpec. RSpec is written in a dedicated DSL, which inevitably increases the number of lines per block.

If you agree with this, I will add the following code to .rubocop.yml to avoid the warning:

# .rubocop.yml

Metrics/BlockLength:
  Exclude:
    - 'test/**/*.rb'

Metrics/ModuleLength:
  Exclude:
    - 'test/**/*.rb'

@kyto64
Copy link
Contributor Author

kyto64 commented Sep 28, 2023

Note that Rubocop CI still detects errors unless #102 (comment) is applied or fixed individually.

@mullermp
Copy link
Contributor

Yeah, it's fine to add to to-do or global ignore.

@kyto64
Copy link
Contributor Author

kyto64 commented Sep 28, 2023

#102 (comment)
Added it to global ignore :)

@mullermp mullermp merged commit 925fbb5 into aws:main Sep 28, 2023
35 checks passed
@mullermp
Copy link
Contributor

Thanks for your contribution! I will get this released tomorrow.

@mullermp
Copy link
Contributor

Actually, I was able to do this today, please check to see if it works for you!

@kyto64
Copy link
Contributor Author

kyto64 commented Sep 29, 2023

Thanks a ton!

@kyto64 kyto64 deleted the add-dedup-keys-customization branch September 29, 2023 07:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants