Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JENKINS-50407] Better diagnosis for certain fatal loading errors #809

Merged
merged 4 commits into from
Oct 23, 2023

Conversation

jglick
Copy link
Member

@jglick jglick commented Oct 23, 2023

Continuing #215.

Without the main patch, you get the reported error in the system log

WARNING	o.j.p.w.cps.CpsVmExecutorService#reportProblem: Unexpected exception in CPS VM thread: CpsFlowExecution[Owner[p/1:p #1]]
java.lang.IllegalStateException: JENKINS-50407: no loaded shell in CpsFlowExecution[Owner[p/1:p #1]]
	at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.run0(SandboxContinuable.java:35)
	at org.jenkinsci.plugins.workflow.cps.CpsThread.runNextChunk(CpsThread.java:187)
	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:423)
	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:331)
	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:295)
	at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:97)
	at …

as well as in the build log

Resuming build at Mon Oct 23 16:48:23 EDT 2023 after Jenkins restart
[Pipeline] End of Pipeline
Also:   org.jenkinsci.plugins.workflow.actions.ErrorAction$ErrorId: ce42faee-d9c6-4e53-9448-32499c419cf5
java.lang.IllegalStateException: JENKINS-50407: no loaded shell in CpsFlowExecution[Owner[p/1:p #1]]
	at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.run0(SandboxContinuable.java:35)
	at org.jenkinsci.plugins.workflow.cps.CpsThread.runNextChunk(CpsThread.java:187)
	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:423)
	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:331)
	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:295)
	at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:97)
	at …
Finished: FAILURE

but that is it; the original exception is swallowed.

(What could be causing the actual problems in the field, I have no idea. It is unlikely to be an error from a shell decorator as in this test; that is just a convenient way I found to simulate a load failure and trigger the condition for CpsFlowExecution.shell to be null.)

Sprinking in some Thread.dumpStacks here and there shows that while a root error might come from

java.lang.IllegalStateException: decorator problem here
	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecutionTest$BrokenDecorator.configureShell(CpsFlowExecutionTest.java:629)
	at org.jenkinsci.plugins.workflow.cps.CpsGroovyShellFactory.build(CpsGroovyShellFactory.java:125)
	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.parseScript(CpsFlowExecution.java:579)
	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.loadProgramAsync(CpsFlowExecution.java:801)
	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.onLoad(CpsFlowExecution.java:771)
	at org.jenkinsci.plugins.workflow.job.WorkflowRun.getExecution(WorkflowRun.java:720)
	at org.jenkinsci.plugins.workflow.job.WorkflowRun.onLoad(WorkflowRun.java:578)
	at …

the improper use of the CPS VM thread comes a bit later

java.lang.Exception: Stack trace
	at java.base/java.lang.Thread.dumpStack(Thread.java:1383)
	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$5.onSuccess(CpsFlowExecution.java:959)
	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$5.onSuccess(CpsFlowExecution.java:954)
	at org.jenkinsci.plugins.workflow.support.concurrent.Futures$1.run(Futures.java:147)
	at org.jenkinsci.plugins.workflow.support.concurrent.DirectExecutor.execute(DirectExecutor.java:33)
	at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1270)
	at com.google.common.util.concurrent.AbstractFuture.addListener(AbstractFuture.java:761)
	at com.google.common.util.concurrent.AbstractFuture$TrustedFuture.addListener(AbstractFuture.java:136)
	at org.jenkinsci.plugins.workflow.support.concurrent.Futures.addCallback(Futures.java:157)
	at org.jenkinsci.plugins.workflow.support.concurrent.Futures.addCallback(Futures.java:97)
	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.runInCpsVmThread(CpsFlowExecution.java:954)
	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.getCurrentExecutions(CpsFlowExecution.java:1047)
	at org.jenkinsci.plugins.workflow.flow.FlowExecutionList$ResumeStepExecutionListener.onResumed(FlowExecutionList.java:284)
	at org.jenkinsci.plugins.workflow.flow.FlowExecutionListener.fireResumed(FlowExecutionListener.java:85)
	at org.jenkinsci.plugins.workflow.job.WorkflowRun.onLoad(WorkflowRun.java:590)
	at hudson.model.RunMap.retrieve(RunMap.java:233)

I am not actually sure what the purpose of the complex logic in loadProgramFailed is; if it ever worked as written, it does not seem to now (as seen by its failing attempt to run SandboxContinuable.run0). The much simpler implementation in the second commit also passes the test with less noise.

@jglick jglick requested a review from a team as a code owner October 23, 2023 21:15
@jglick jglick added the bug label Oct 23, 2023
FlowHead head;

synchronized(this) {
if (heads == null || heads.isEmpty()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do wonder if these cases were significant and ignoring them could cause problems in onProgramEnd in croak. Given they apparently have no specific test coverage though it does seem simplest to just ignore them for now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe? I am not sure what purpose is served by making a fake head. AFAICT the head calculation here was in order to attach the Failed to load build state error, but this does not appear to work anyway.

jglick and others added 2 commits October 23, 2023 18:02
Co-authored-by: Devin Nusbaum <dwnusbaum@users.noreply.github.com>
@jglick jglick enabled auto-merge (squash) October 23, 2023 22:05
@jglick jglick merged commit 769bb74 into jenkinsci:master Oct 23, 2023
14 checks passed
@jglick jglick deleted the diag-JENKINS-50407 branch October 24, 2023 10:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
2 participants