Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Be quieter handling CpsFlowExecution.owner == null in suspendAll #788

Merged
merged 1 commit into from
Oct 4, 2023

Conversation

jglick
Copy link
Member

@jglick jglick commented Sep 11, 2023

See #669 (comment). I did in fact observe

WARNING	o.j.p.w.cps.CpsFlowExecution#suspendAll: Error persisting Pipeline execution at shutdown: null
java.lang.NullPointerException
	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.suspendAll(CpsFlowExecution.java:1674)
	at …

@jglick jglick requested a review from a team as a code owner September 11, 2023 23:07
@jglick jglick added the bug label Sep 11, 2023
@@ -1638,25 +1638,23 @@ public void pause(final boolean v) throws IOException {
@Restricted(DoNotUse.class)
@Terminator(attains = FlowExecutionList.EXECUTIONS_SUSPENDED)
public static void suspendAll() {
CpsFlowExecution exec = null;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Outermost catch clause seemed redundant. (Timeout.close does not throw exceptions, so it was not for that.)

Comment on lines -1645 to +1651
try {
if (execution instanceof CpsFlowExecution) {
CpsFlowExecution cpsExec = (CpsFlowExecution)execution;
if (execution instanceof CpsFlowExecution) {
CpsFlowExecution cpsExec = (CpsFlowExecution) execution;
try {
cpsExec.checkAndAbortNonresumableBuild();

LOGGER.log(Level.FINE, "waiting to suspend {0}", execution);
exec = (CpsFlowExecution) execution;
// Like waitForSuspension but with a timeout:
if (exec.programPromise != null) {
if (cpsExec.programPromise != null) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simplifying

@@ -1671,15 +1669,15 @@ public static void suspendAll() {
}
});
}
cpsExec.getOwner().getListener().getLogger().close();
if (cpsExec.owner != null) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

main fix

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any idea how this would be reachable? Corruption during resumption or something? Perhaps jenkinsci/workflow-api-plugin#304 has made it possible? If I understand right, in this case FlowExecutionList.runningTasks contains a FlowExecutionOwner which successfully returns a FlowExecution from .get(), but for which FlowExecution.owner on the returned object is null, which seems problematic and unexpected.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am afraid I do not know—just saw this in a log.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose [Workflow]Run.reload deserializes execution, but then WorkflowRun.onLoad fails before calling getExecution, or getExecution fails before calling fetchedExecution.onLoad(new Owner(this)), or unmarshal fails partway through, etc. Presumably the build was badly corrupted somehow. The point of this PR is just to avoid unnecessary stack traces after that.

Copy link
Member

@dwnusbaum dwnusbaum Sep 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what it's worth though I saw another case of a CpsFlowExecution with a null owner recently, which is why I am wondering if something has changed things:

java.lang.IllegalStateException: List of flow heads unset for CpsFlowExecution[null]
	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.getCurrentHeads(CpsFlowExecution.java:1018)
        ... insignificant, just a user loading some build page that accessed the heads for an execution ...

I looked at the XML for that execution on disk (based on the thread name) and head was non-null and pointed to a FlowEndNode that did exist in workflow/, so it wasn't obvious what might have gone wrong.

@@ -1671,15 +1669,15 @@ public static void suspendAll() {
}
});
}
cpsExec.getOwner().getListener().getLogger().close();
if (cpsExec.owner != null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any idea how this would be reachable? Corruption during resumption or something? Perhaps jenkinsci/workflow-api-plugin#304 has made it possible? If I understand right, in this case FlowExecutionList.runningTasks contains a FlowExecutionOwner which successfully returns a FlowExecution from .get(), but for which FlowExecution.owner on the returned object is null, which seems problematic and unexpected.

Copy link
Member

@jtnord jtnord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jglick (how) does this relate to #797 ?

@jglick
Copy link
Member Author

jglick commented Oct 4, 2023

@jtnord I believe it is unrelated.

@jglick jglick requested a review from dwnusbaum October 4, 2023 16:10
@jglick jglick merged commit c8575e2 into jenkinsci:master Oct 4, 2023
13 checks passed
@jglick jglick deleted the CpsFlowExecution.owner branch October 4, 2023 18:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants