Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perform_job.good_job notification for successful jobs has error_event: :discarded #1609

Closed
ellispritchard opened this issue Feb 21, 2025 · 3 comments · Fixed by #1616
Closed

Comments

@ellispritchard
Copy link

I'm trying to add some custom metrics using the perform_job.good_job notification, but I'm finding that it's not obvious how to distinguish between successful completions and discards, and I wonder if this is an oversight?

The code in question is this:

ActiveSupport::Notifications.instrument("perform_job.good_job", { job: self, execution: execution, process_id: current_thread.process_id, thread_name: current_thread.thread_name }) do |instrument_payload|
value = ActiveJob::Base.execute(active_job_data)
if value.is_a?(Exception)
handled_error = value
value = nil
end
handled_error ||= current_thread.error_on_retry || current_thread.error_on_discard
error_event = if handled_error == current_thread.error_on_discard
:discarded
elsif handled_error == current_thread.error_on_retry
:retried
elsif handled_error == current_thread.error_on_retry_stopped
:retry_stopped
elsif handled_error
:handled
end
instrument_payload.merge!(
value: value,
handled_error: handled_error,
retried: current_thread.retried_job.present?,
error_event: error_event
)
ExecutionResult.new(value: value, handled_error: handled_error, error_event: error_event, retried_job: current_thread.retried_job)

Specifically the issue is here (annotated ##):

            ##  If the job execution succeeds, `handled_error` is `nil`, since so is 
            ## `current_thread.error_on_retry ||  current_thread.error_on_discard`.
            handled_error ||= current_thread.error_on_retry || current_thread.error_on_discard
   
            error_event = if handled_error == current_thread.error_on_discard  ## nil == nil
                            :discarded  ##  error_event = :discarded
                          elsif handled_error == current_thread.error_on_retry
                            :retried
                          elsif handled_error == current_thread.error_on_retry_stopped
                            :retry_stopped
                          elsif handled_error
                            :handled
                          end

due to the logic here, error_event ends up :discarded even if there was no error, and this is what ends up on payload[:error_event]:

(byebug) event.payload
{:job=>#<GoodJob::Job id: "32da31bc-10e5-422f-9458-8ca96830baf6", queue_name: "default", priority: 0, serialized_params: {"job_id"=>"32da31bc-10e5-422f-9458-8ca96830baf6", "locale"=>"en", "priority"=>nil, "timezone"=>"UTC", "arguments"=>[5], "job_class"=>"TestJob", "executions"=>0, "queue_name"=>"default", "enqueued_at"=>"2025-02-21T17:57:22Z", "provider_job_id"=>nil, "exception_executions"=>{}}, scheduled_at: "2025-02-21 17:57:22.786207000 +0000", performed_at: "2025-02-21 17:57:22.821524000 +0000", finished_at: nil, error: nil, created_at: "2025-02-21 17:57:22.786207000 +0000", updated_at: "2025-02-21 17:57:22.830189000 +0000", active_job_id: "32da31bc-10e5-422f-9458-8ca96830baf6", concurrency_key: nil, cron_key: nil, cron_at: nil, batch_id: nil, batch_callback_id: nil, executions_count: 1, job_class: "TestJob", error_event: nil, labels: nil, locked_by_id: "b49f31a7-9a91-4642-a2c1-ef6db94f3305", locked_at: "2025-02-21 17:57:22.821696000 +0000">, :execution=>#<GoodJob::Execution id: "a568dbe9-a7a0-426f-991e-7952076362c2", created_at: "2025-02-21 17:57:22.821524000 +0000", updated_at: "2025-02-21 17:57:22.822639000 +0000", active_job_id: "32da31bc-10e5-422f-9458-8ca96830baf6", job_class: "TestJob", queue_name: "default", serialized_params: {"job_id"=>"32da31bc-10e5-422f-9458-8ca96830baf6", "locale"=>"en", "priority"=>nil, "timezone"=>"UTC", "arguments"=>[5], "job_class"=>"TestJob", "executions"=>0, "queue_name"=>"default", "enqueued_at"=>"2025-02-21T17:57:22Z", "provider_job_id"=>nil, "exception_executions"=>{}}, scheduled_at: "2025-02-21 17:57:22.786207000 +0000", finished_at: nil, error: nil, error_event: nil, error_backtrace: nil, process_id: "b49f31a7-9a91-4642-a2c1-ef6db94f3305", duration: nil>, :process_id=>32948, :thread_name=>"GoodJob::Scheduler(queues=* max_threads=5)-thread-1", :value=>nil, :handled_error=>nil, :retried=>false, :error_event=>:discarded}

As a work-around you can instead check that handled_error (and unhandled_error?) are nil (which is what the following code in fact does) but this is not super intuitive.

Calling the helper methods on payload[:job] or payload[:execution] doesn't help either, since the fields haven't been updated yet.

Have you a recommendation?

@bensheldon
Copy link
Owner

Oops! Sorry, I can't believe I overlooked your issue.

Yes, I don't think that logic is correct when handled_error is nil 🤦🏻 I think there's 2 things that would make this better:

  • The payload should simply have an error key (though I prob can't remove the other values for breakage reasons)
  • There should be a nil check on handled_error

I will fix that real quick! Sorry again about the delay.

@bensheldon
Copy link
Owner

Released! https://github.com/bensheldon/good_job/releases/tag/v4.9.3

Thanks again for opening this issue!

@ellispritchard
Copy link
Author

Thanks @bensheldon - that's easier to work with! 🙇

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging a pull request may close this issue.

2 participants