perform_job.good_job notification for successful jobs has error_event: :discarded #1609

ellispritchard · 2025-02-21T18:45:01Z

I'm trying to add some custom metrics using the perform_job.good_job notification, but I'm finding that it's not obvious how to distinguish between successful completions and discards, and I wonder if this is an oversight?

The code in question is this:

good_job/app/models/good_job/job.rb

Lines 628 to 653 in b557525

    
           ActiveSupport::Notifications.instrument("perform_job.good_job", { job: self, execution: execution, process_id: current_thread.process_id, thread_name: current_thread.thread_name }) do |instrument_payload| 
        
             value = ActiveJob::Base.execute(active_job_data) 
        
             if value.is_a?(Exception) 
        
               handled_error = value 
        
               value = nil 
        
             end 
        
             handled_error ||= current_thread.error_on_retry || current_thread.error_on_discard 
        
             error_event = if handled_error == current_thread.error_on_discard 
        
                             :discarded 
        
                           elsif handled_error == current_thread.error_on_retry 
        
                             :retried 
        
                           elsif handled_error == current_thread.error_on_retry_stopped 
        
                             :retry_stopped 
        
                           elsif handled_error 
        
                             :handled 
        
                           end 
        
             instrument_payload.merge!( 
        
               value: value, 
        
               handled_error: handled_error, 
        
               retried: current_thread.retried_job.present?, 
        
               error_event: error_event 
        
             ) 
        
             ExecutionResult.new(value: value, handled_error: handled_error, error_event: error_event, retried_job: current_thread.retried_job)

Specifically the issue is here (annotated ##):

            ##  If the job execution succeeds, `handled_error` is `nil`, since so is 
            ## `current_thread.error_on_retry ||  current_thread.error_on_discard`.
            handled_error ||= current_thread.error_on_retry || current_thread.error_on_discard
   
            error_event = if handled_error == current_thread.error_on_discard  ## nil == nil
                            :discarded  ##  error_event = :discarded
                          elsif handled_error == current_thread.error_on_retry
                            :retried
                          elsif handled_error == current_thread.error_on_retry_stopped
                            :retry_stopped
                          elsif handled_error
                            :handled
                          end

due to the logic here, error_event ends up :discarded even if there was no error, and this is what ends up on payload[:error_event]:

(byebug) event.payload
{:job=>#<GoodJob::Job id: "32da31bc-10e5-422f-9458-8ca96830baf6", queue_name: "default", priority: 0, serialized_params: {"job_id"=>"32da31bc-10e5-422f-9458-8ca96830baf6", "locale"=>"en", "priority"=>nil, "timezone"=>"UTC", "arguments"=>[5], "job_class"=>"TestJob", "executions"=>0, "queue_name"=>"default", "enqueued_at"=>"2025-02-21T17:57:22Z", "provider_job_id"=>nil, "exception_executions"=>{}}, scheduled_at: "2025-02-21 17:57:22.786207000 +0000", performed_at: "2025-02-21 17:57:22.821524000 +0000", finished_at: nil, error: nil, created_at: "2025-02-21 17:57:22.786207000 +0000", updated_at: "2025-02-21 17:57:22.830189000 +0000", active_job_id: "32da31bc-10e5-422f-9458-8ca96830baf6", concurrency_key: nil, cron_key: nil, cron_at: nil, batch_id: nil, batch_callback_id: nil, executions_count: 1, job_class: "TestJob", error_event: nil, labels: nil, locked_by_id: "b49f31a7-9a91-4642-a2c1-ef6db94f3305", locked_at: "2025-02-21 17:57:22.821696000 +0000">, :execution=>#<GoodJob::Execution id: "a568dbe9-a7a0-426f-991e-7952076362c2", created_at: "2025-02-21 17:57:22.821524000 +0000", updated_at: "2025-02-21 17:57:22.822639000 +0000", active_job_id: "32da31bc-10e5-422f-9458-8ca96830baf6", job_class: "TestJob", queue_name: "default", serialized_params: {"job_id"=>"32da31bc-10e5-422f-9458-8ca96830baf6", "locale"=>"en", "priority"=>nil, "timezone"=>"UTC", "arguments"=>[5], "job_class"=>"TestJob", "executions"=>0, "queue_name"=>"default", "enqueued_at"=>"2025-02-21T17:57:22Z", "provider_job_id"=>nil, "exception_executions"=>{}}, scheduled_at: "2025-02-21 17:57:22.786207000 +0000", finished_at: nil, error: nil, error_event: nil, error_backtrace: nil, process_id: "b49f31a7-9a91-4642-a2c1-ef6db94f3305", duration: nil>, :process_id=>32948, :thread_name=>"GoodJob::Scheduler(queues=* max_threads=5)-thread-1", :value=>nil, :handled_error=>nil, :retried=>false, :error_event=>:discarded}

As a work-around you can instead check that handled_error (and unhandled_error?) are nil (which is what the following code in fact does) but this is not super intuitive.

Calling the helper methods on payload[:job] or payload[:execution] doesn't help either, since the fields haven't been updated yet.

Have you a recommendation?

The text was updated successfully, but these errors were encountered:

bensheldon · 2025-03-10T01:17:01Z

Oops! Sorry, I can't believe I overlooked your issue.

Yes, I don't think that logic is correct when handled_error is nil 🤦🏻 I think there's 2 things that would make this better:

The payload should simply have an error key (though I prob can't remove the other values for breakage reasons)
There should be a nil check on handled_error

I will fix that real quick! Sorry again about the delay.

bensheldon · 2025-03-10T03:19:51Z

Released! https://github.com/bensheldon/good_job/releases/tag/v4.9.3

Thanks again for opening this issue!

ellispritchard · 2025-03-10T15:54:22Z

Thanks @bensheldon - that's easier to work with! 🙇

github-project-automation bot added this to GoodJob Backlog v2 Feb 21, 2025

github-project-automation bot moved this to Inbox in GoodJob Backlog v2 Feb 21, 2025

bensheldon mentioned this issue Mar 10, 2025

Fix instrumentation of "perform_job.good_job" event #1616

Merged

bensheldon closed this as completed in #1616 Mar 10, 2025

github-project-automation bot moved this from Inbox to Done in GoodJob Backlog v2 Mar 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perform_job.good_job notification for successful jobs has error_event: :discarded #1609

perform_job.good_job notification for successful jobs has error_event: :discarded #1609

ellispritchard commented Feb 21, 2025

bensheldon commented Mar 10, 2025

bensheldon commented Mar 10, 2025

ellispritchard commented Mar 10, 2025

perform_job.good_job notification for successful jobs has error_event: :discarded #1609

perform_job.good_job notification for successful jobs has error_event: :discarded #1609

Comments

ellispritchard commented Feb 21, 2025

bensheldon commented Mar 10, 2025

bensheldon commented Mar 10, 2025

ellispritchard commented Mar 10, 2025