Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: change fatal to panic. #12931

Merged
merged 9 commits into from
Apr 12, 2024
Merged

fix: change fatal to panic. #12931

merged 9 commits into from
Apr 12, 2024

Conversation

shuangkun
Copy link
Member

comments here: #12817 (comment)

Motivation

Modifications

Verification

Signed-off-by: shuangkun <tsk2013uestc@163.com>
Signed-off-by: shuangkun <tsk2013uestc@163.com>
Signed-off-by: shuangkun <tsk2013uestc@163.com>
@shuangkun shuangkun requested a review from agilgur5 April 12, 2024 06:18
@shuangkun
Copy link
Member Author

shuangkun commented Apr 12, 2024

@agilgur5 could you help have a look for this, thanks!

Copy link
Member

@agilgur5 agilgur5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some small changes -- we can use log.Panic actually

I also renamed this to a fix since Fatal / os.Exit is normally not recoverable; so these cases probably should use panic

workflow/progress/updater.go Outdated Show resolved Hide resolved
workflow/util/util.go Outdated Show resolved Hide resolved
workflow/util/util.go Outdated Show resolved Hide resolved
workflow/util/util.go Outdated Show resolved Hide resolved
workflow/util/util.go Outdated Show resolved Hide resolved
@agilgur5 agilgur5 changed the title chore: change fatal to warnf. fix: change fatal to panic. Apr 12, 2024
@agilgur5 agilgur5 self-assigned this Apr 12, 2024
@agilgur5 agilgur5 added the area/retry-manual Manual workflow "Retry" Action (API/CLI/UI). See retryStrategy for template-level retries label Apr 12, 2024
Signed-off-by: shuangkun <tsk2013uestc@163.com>
Signed-off-by: shuangkun <tsk2013uestc@163.com>
Copy link
Member

@tczhao tczhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Up for a discussion,

This util FormulateResubmitWorkflow function
is only run by the argo servers,
I'm thinking maybe we should remove and replace all panic/fatal in this function with error/warn
so the server keeps alive and returns the appropriate status code,err

Signed-off-by: shuangkun <tsk2013uestc@163.com>
workflow/controller/operator.go Outdated Show resolved Hide resolved
workflow/controller/operator.go Outdated Show resolved Hide resolved
Signed-off-by: shuangkun <tsk2013uestc@163.com>
@agilgur5
Copy link
Member

agilgur5 commented Apr 12, 2024

is only run by the argo servers,
I'm thinking maybe we should remove and replace all panic/fatal in this function with error/warn
so the server keeps alive and returns the appropriate status code,err

Yes, that's why I suggested the panic vs. fatal. Panics are recoverable and we have a few recover statements and functions.

Regarding the status code, these would generally all be 500s; they're generally indicative of a bug. I would think a panic would end up returning a 500, but I'm not sure

Signed-off-by: shuangkun <tsk2013uestc@163.com>
@shuangkun
Copy link
Member Author

Up for a discussion,

This util FormulateResubmitWorkflow function is only run by the argo servers, I'm thinking maybe we should remove and replace all panic/fatal in this function with error/warn so the server keeps alive and returns the appropriate status code,err

Ok, changed fatal to error in FormulateResubmitWorkflow

workflow/util/util.go Outdated Show resolved Hide resolved
Signed-off-by: shuangkun <tsk2013uestc@163.com>
@tczhao
Copy link
Member

tczhao commented Apr 12, 2024

Yes, that's why I suggested the panic vs. fatal. Panics are recoverable and we have a few recover statements and functions.

right, found recover https://github.com/argoproj/argo-workflows/blob/main/util/grpc/interceptor.go#L21

Copy link
Member

@agilgur5 agilgur5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for fixing these cases

@agilgur5 agilgur5 enabled auto-merge (squash) April 12, 2024 07:40
@agilgur5 agilgur5 added the area/controller Controller issues, panics label Apr 12, 2024
@agilgur5 agilgur5 merged commit ec5b5d5 into argoproj:main Apr 12, 2024
27 checks passed
@agilgur5
Copy link
Member

agilgur5 commented Apr 12, 2024

@tczhao for reference all these node not found errors were added in #11451 as there were some actual bugs where nodes didn't exist but silently returned fine (because of how maps work in Go).

in the cases where we know for sure these errors will never happen, we should ignore the errors and not handle them at all, similar to what I did in #12917 .

the problem is that we don't know all the cases 😕
eventually these should all really be removed or only exist in a few places where, for instance, the status got corrupted and is not actually in the Controller's ability to fix, like if a user literally re-wrote the status with kubectl (or some other wacky race condition)

cc @isubasinghe for reference who did the most work on this (and I may have the second most context on it nowadays?)

@tczhao
Copy link
Member

tczhao commented Apr 12, 2024

@agilgur5 Got it, thank you for the context

@agilgur5 agilgur5 added this to the v3.5.x patches milestone Apr 19, 2024
@agilgur5
Copy link
Member

agilgur5 commented Apr 19, 2024

Backported cleanly to release-3.5 as 9c2581a

agilgur5 pushed a commit that referenced this pull request Apr 19, 2024
Co-authored-by: agilgur5 <agilgur5@gmail.com>
Signed-off-by: shuangkun <tsk2013uestc@163.com>
(cherry picked from commit ec5b5d5)
isubasinghe pushed a commit to isubasinghe/argo-workflows that referenced this pull request May 6, 2024
Co-authored-by: agilgur5 <agilgur5@gmail.com>
Signed-off-by: shuangkun <tsk2013uestc@163.com>
isubasinghe pushed a commit to isubasinghe/argo-workflows that referenced this pull request May 7, 2024
Co-authored-by: agilgur5 <agilgur5@gmail.com>
Signed-off-by: shuangkun <tsk2013uestc@163.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/controller Controller issues, panics area/retry-manual Manual workflow "Retry" Action (API/CLI/UI). See retryStrategy for template-level retries
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants