Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure v2_playbook_on_no_hosts_remaining fires #81550

Open
wants to merge 5 commits into
base: devel
Choose a base branch
from

Conversation

markgoddard
Copy link
Contributor

@markgoddard markgoddard commented Aug 21, 2023

There are various cases where the v2_playbook_on_no_hosts_remaining callback does not fire in both the free and linear strategy plugins. This PR fixes those issues, as well as removing a duplicate call to the same callback when using max_fail_percentage.

Fixes: #81549

@markgoddard markgoddard changed the title issues/81549 Ensure v2_playbook_on_no_hosts_remaining fires Aug 21, 2023
@ansibot ansibot added needs_triage Needs a first human triage before being processed. has_issue needs_revision This PR fails CI tests or a maintainer has requested a review/revision of the PR. labels Aug 21, 2023
@webknjaz webknjaz added the ci_verified Changes made in this PR are causing tests to fail. label Aug 21, 2023
@markgoddard
Copy link
Contributor Author

Various tests failing with this, hard to tell if it's relevant:

TASK [delete temporary directory] **********************************************
05:17 task path: /root/ansible/test/results/.tmp/integration/fetch-ehwhn8l8-ÅÑŚÌβŁÈ/test/integration/targets/fetch/cleanup.yml:12
05:17 fatal: [testhost]: FAILED! => {
05:17     "censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result"
05:17 }

@s-hertel s-hertel removed the needs_triage Needs a first human triage before being processed. label Aug 22, 2023
@ansibot ansibot added the stale_ci This PR has been tested by CI more than one week ago. Close and re-open this PR to get it retested. label Sep 5, 2023
@ansibot ansibot added the needs_rebase https://docs.ansible.com/ansible/devel/dev_guide/developing_rebasing.html label Oct 26, 2023
@ansibot ansibot removed needs_rebase https://docs.ansible.com/ansible/devel/dev_guide/developing_rebasing.html ci_verified Changes made in this PR are causing tests to fail. stale_ci This PR has been tested by CI more than one week ago. Close and re-open this PR to get it retested. labels Nov 20, 2023
@markgoddard
Copy link
Contributor Author

Reproduced the test failure locally, should be able to work it out from here.

@webknjaz webknjaz added the ci_verified Changes made in this PR are causing tests to fail. label Nov 28, 2023
@ansibot ansibot removed the ci_verified Changes made in this PR are causing tests to fail. label Dec 7, 2023
@webknjaz webknjaz added the ci_verified Changes made in this PR are causing tests to fail. label Dec 7, 2023
@markgoddard markgoddard marked this pull request as draft December 8, 2023 12:15
@markgoddard
Copy link
Contributor Author

Problems with rescue, looking into an alternative solution.

@ansibot
Copy link
Contributor

ansibot commented Dec 11, 2023

The test ansible-test sanity --test shellcheck [explain] failed with 3 errors:

test/integration/targets/ansible-playbook-callbacks/runme.sh:5:8: SC2155: Declare and assign separately to avoid masking return values.
test/integration/targets/ansible-playbook-callbacks/runme.sh:6:8: SC2155: Declare and assign separately to avoid masking return values.
test/integration/targets/ansible-playbook-callbacks/runme.sh:11:11: SC2086: Double quote to prevent globbing and word splitting.

click here for bot help

@ansibot ansibot removed the ci_verified Changes made in this PR are causing tests to fail. label Dec 11, 2023
@markgoddard markgoddard force-pushed the issues/81549 branch 2 times, most recently from 25c8339 to 8c11b0b Compare December 11, 2023 17:05
@ansibot ansibot removed the needs_revision This PR fails CI tests or a maintainer has requested a review/revision of the PR. label Dec 11, 2023
@markgoddard markgoddard marked this pull request as ready for review January 19, 2024 11:47
@markgoddard
Copy link
Contributor Author

Dependent change #82407 has now merged, so I've marked this as ready for review.

I've included a fix for the issue mentioned here by @mkrizek.

@ansibot ansibot removed the stale_ci This PR has been tested by CI more than one week ago. Close and re-open this PR to get it retested. label Jan 19, 2024
@ansibot ansibot added the needs_rebase https://docs.ansible.com/ansible/devel/dev_guide/developing_rebasing.html label Jan 26, 2024
@markgoddard
Copy link
Contributor Author

Updated after conflicting change was made in the callbacks tests.

@ansibot ansibot added needs_revision This PR fails CI tests or a maintainer has requested a review/revision of the PR. and removed needs_rebase https://docs.ansible.com/ansible/devel/dev_guide/developing_rebasing.html labels Jan 26, 2024
@webknjaz
Copy link
Member

@markgoddard the CI issues appear to be related to the patch in this PR. The same error output is present in different test runtimes: https://dev.azure.com/ansible/ansible/_build/results?buildId=102426&view=logs&j=b8b182ae-7382-5e2d-7321-805eb8f377b5&t=61efecb5-7da7-54db-c13c-3d39685f67a7&l=5606.

@webknjaz webknjaz added the ci_verified Changes made in this PR are causing tests to fail. label Jan 29, 2024
@webknjaz webknjaz requested a review from mkrizek January 29, 2024 14:23
Previously, v2_playbook_on_no_hosts_remaining would only fire with the
linear strategy plugin if max_fail_percentage was used.  This is because
the 'result' in the linear strategy plugin's run() method was only
changed from RUN_OK in the code path for max_fail_percentage.

Prior to ansible#78680 it also fired
when any_errors_fatal was used, but it no longer does.

This change removes the 'result' variable since it is not really
used any more. The main loop in the run() method exits at the correct
points, and the StrategyModule.run() method on the base class correctly
selects a return code based on the failed and unreachable hosts.

The v2_playbook_on_no_hosts_remaining is now triggered outside of the
main loop in StrategyModule.run(), based on the state of hosts in the
play iterator.  This seems to be a better source of truth than the Task
Queue Manager's _failed_hosts list, which in some cases does not contain
all failed hosts (I noticed this in the integration tests, which set
ANSIBLE_FORCE_HANDLERS=true).

There was also an issue in which v2_playbook_on_no_hosts_remaining would
be triggered twice when using max_fail_percentage, and this has now been
fixed.

Fixes ansible#81549
Previously the v2_playbook_on_no_hosts_remaining callback did not fire
when using the free strategy plugin. This is because the loop never
satisfied the following condition when hosts fail or become unreachable:

  if len(hosts_left) == 0:

This change moves the callback trigger outside of the main loop and uses
the same condition as the linear strategy plugin to detect when there
are no hosts remaining.

Fixes ansible#81549
This change adds a new testcase to the ansible-playbook-callbacks
integration test that checks the callbacks fired when all hosts fail.

This provides a simple regression test for issue 81549.
@ansibot ansibot removed needs_revision This PR fails CI tests or a maintainer has requested a review/revision of the PR. ci_verified Changes made in this PR are causing tests to fail. labels Feb 1, 2024
@markgoddard
Copy link
Contributor Author

@markgoddard the CI issues appear to be related to the patch in this PR. The same error output is present in different test runtimes: https://dev.azure.com/ansible/ansible/_build/results?buildId=102426&view=logs&j=b8b182ae-7382-5e2d-7321-805eb8f377b5&t=61efecb5-7da7-54db-c13c-3d39685f67a7&l=5606.

Thanks for spotting that, it seems the integration tests are a bit of a moving target. I've fixed up the expected output in the callback_default tests and CI looks happy again.

@markgoddard
Copy link
Contributor Author

Hi @webknjaz, how is it looking now?

@ansibot ansibot added the stale_ci This PR has been tested by CI more than one week ago. Close and re-open this PR to get it retested. label Feb 14, 2024
markgoddard added a commit to stackhpc/ansible-collection-kolla that referenced this pull request Feb 21, 2024
This plugin can be used to collect statistics about an Ansible run and
write them to a file in JSON format. The statistics currently include
the count and names of failed and unreachable hosts.

They also include whether execution stopped due to a lack of remaining
hosts. This last signal can be helpful if trying to determine whether
Ansible completed successfully for some but not all hosts, e.g. when
attempting to handle unreachable hosts. Unfortunately the
v2_playbook_on_no_hosts_remaining callback is not reliably fired, as
outlined in [1]. A fix for this has been proposed in [2].

[1] ansible/ansible#81549
[2] ansible/ansible#81550

Change-Id: I6eedc23726f47e88b50efbaa80db33965aa58e1a
markgoddard added a commit to stackhpc/ansible-collection-kolla that referenced this pull request Feb 21, 2024
This plugin can be used to collect statistics about an Ansible run and
write them to a file in JSON format. The statistics currently include
the count and names of failed and unreachable hosts.

They also include whether execution stopped due to a lack of remaining
hosts. This last signal can be helpful if trying to determine whether
Ansible completed successfully for some but not all hosts, e.g. when
attempting to handle unreachable hosts. Unfortunately the
v2_playbook_on_no_hosts_remaining callback is not reliably fired, as
outlined in [1]. A fix for this has been proposed in [2].

[1] ansible/ansible#81549
[2] ansible/ansible#81550

Change-Id: I6eedc23726f47e88b50efbaa80db33965aa58e1a
markgoddard added a commit to stackhpc/ansible-collection-kolla that referenced this pull request Feb 22, 2024
This plugin can be used to collect statistics about an Ansible run and
write them to a file in JSON format. The statistics currently include
the count and names of failed and unreachable hosts.

They also include whether execution stopped due to a lack of remaining
hosts. This last signal can be helpful if trying to determine whether
Ansible completed successfully for some but not all hosts, e.g. when
attempting to handle unreachable hosts. Unfortunately the
v2_playbook_on_no_hosts_remaining callback is not reliably fired, as
outlined in [1]. A fix for this has been proposed in [2].

[1] ansible/ansible#81549
[2] ansible/ansible#81550

Change-Id: I6eedc23726f47e88b50efbaa80db33965aa58e1a
markgoddard added a commit to stackhpc/ansible-collection-kolla that referenced this pull request Mar 7, 2024
This plugin can be used to collect statistics about an Ansible run and
write them to a file in JSON format. The statistics currently include
the count and names of failed and unreachable hosts.

They also include whether execution stopped due to a lack of remaining
hosts. This last signal can be helpful if trying to determine whether
Ansible completed successfully for some but not all hosts, e.g. when
attempting to handle unreachable hosts. Unfortunately the
v2_playbook_on_no_hosts_remaining callback is not reliably fired, as
outlined in [1]. A fix for this has been proposed in [2].

Flake8 error E402 (module level import not at top of file) is ignored
because it is incompatible with Ansible sanity checks for plugins.

[1] ansible/ansible#81549
[2] ansible/ansible#81550

Change-Id: I6eedc23726f47e88b50efbaa80db33965aa58e1a
@markgoddard
Copy link
Contributor Author

Hi @webknjaz, @bcoca. What can I do to help move this forward? I realise you must be busy (I maintain some OpenStack projects, so I've been on both sides of reviews). I'd like to work more on ansible-core, but if I can't get reviews then it's less likely to happen.

@mkrizek
Copy link
Contributor

mkrizek commented Apr 11, 2024

Hi @webknjaz, @bcoca. What can I do to help move this forward?

Thank you for your contribution. Apologies this hasn't had much attention. For this particular change, during the triage meeting the core team agreed that this requires more discussion and research as to what was the original intent behind the v2_playbook_on_no_hosts_remaining callback and what it should be going forward. From the issue you identified it is not clear when the callback should be executed and how it should behave with other features especially within the linear strategy, hence the need for the research. The pull request is still open and we intend to review it time and other priorities permitting.

I'd like to work more on ansible-core, but if I can't get reviews then it's less likely to happen.

There are still a lot of open issues (and PRs) to work on. It's just some (like this one) require more time to review and when they do prioritisation gets in the way.

Thank you for understanding.

@markgoddard
Copy link
Contributor Author

Hi @webknjaz, @bcoca. What can I do to help move this forward?

Thank you for your contribution. Apologies this hasn't had much attention. For this particular change, during the triage meeting the core team agreed that this requires more discussion and research as to what was the original intent behind the v2_playbook_on_no_hosts_remaining callback and what it should be going forward. From the issue you identified it is not clear when the callback should be executed and how it should behave with other features especially within the linear strategy, hence the need for the research. The pull request is still open and we intend to review it time and other priorities permitting.

I'd like to work more on ansible-core, but if I can't get reviews then it's less likely to happen.

There are still a lot of open issues (and PRs) to work on. It's just some (like this one) require more time to review and when they do prioritisation gets in the way.

Thank you for understanding.

Thank you for responding. I'm glad to hear it's on the core team's radar. For context, I'd like to use the callback to determine whether ansible execution reached the end or was terminated prematurely. This would allow me to know when a run had some "acceptable" number of failed or unreachable hosts, and will potentially be used in combination with max_fail_percentage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
has_issue stale_ci This PR has been tested by CI more than one week ago. Close and re-open this PR to get it retested.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

v2_playbook_on_no_hosts_remaining doesn't fire without any_errors_fatal or max_fail_percentage
5 participants