Fix issues with ansible-playbook-callbacks test #82407

markgoddard · 2023-12-12T12:57:55Z

SUMMARY

This PR fixes various issues with the ansible-playbook-callbacks integration test.

ISSUE TYPE

Test Pull Request

ADDITIONAL INFORMATION

The timing of the async tasks was a little unpredictable, meaning that sometimes we would get an unexpected number of v2_runner_on_async_poll callbacks, and fail the test. This change fixes the issue by increasing the poll interval to 2 seconds and the sleep duration to 3 seconds, such that on a reasonably responsive system we will poll twice per task, with the sleep ending in the middle of the two polls.

The include_me.yml file does not exist in this integration test. It has been added.

The remote_tmp_dir.path expression is invalid - the setup_remote_tmp_dir role uses set_fact to set remote_tmp_dir to remote_tmp_dir.path.

The integration tests run with ANSIBLE_HOST_PATTERN_MISMATCH=error, meaning that the final play was never reached. Set ANSIBLE_HOST_PATTERN_MISMATCH=warning to continue past the play and trigger the v2_playbook_on_no_hosts_matched callback.

mkrizek

Thanks for identifying these issues.

In addition to the inline comments the playbook also needs to run with force_handlers turned off as it is on by default for ansible-test so v2_playbook_on_no_hosts_remaining is called as intended. (Un)setting ANSIBLE_FORCE_HANDLERS temporarily for the playbook should be all that is needed.

test/integration/targets/ansible-playbook-callbacks/all-callbacks.yml

mkrizek · 2023-12-13T09:47:27Z

test/integration/targets/ansible-playbook-callbacks/all-callbacks.yml

@@ -96,26 +96,22 @@
      ignore_errors: true

    - name: async poll ok
-      command: sleep 2


I don't think this is needed as I have yet to see this being an issue in our CI. Generally we want the sleep to be as short as possible to reduce tests running time.

I definitely hit the issue locally, and I saw it in CI once too, but that may have been with an earlier iteration of the values. Either way, it should be possible to run the integration tests on a laptop reliably.

I understand the concern around test runtime, and set it up to ensure that only one poll callback is sent per task.

The problem is that the sleep duration is a multiple of the poll interval, so the number of polls can vary. I chose these numbers to make the sleep finish in the middle of the 1st and 2nd poll.

I'll try halving all of the numbers, which should reduce the task duration to 2s, as it was before.

I'm testing this locally like this:

for i in $(seq 1 10); do if ! ansible-test integration ansible-playbook-callbacks; then echo "Failed on $i"; break; fi; done

With the original numbers (sleep 2 poll 1) and it failed on the 2nd attempt. Tried again and it failed on the 5th attempt.

I tried it twice with the change proposed here (sleep 3 poll 2) and it did not fail.

I tried it with the numbers halved (sleep 1.5 poll 1) and it failed on the second attempt.

I tried a few other combinations, but couldn't find something that worked well (poll must be an integer).

So I'm inclined to keep the numbers proposed here - a slow test is better than an unreliable one.

mkrizek · 2023-12-13T09:47:34Z

test/integration/targets/ansible-playbook-callbacks/all-callbacks.yml


    - name: async poll failed
-      shell: sleep 2; false


Same as above.

markgoddard

Thanks for identifying these issues.

In addition to the inline comments the playbook also needs to run with force_handlers turned off as it is on by default for ansible-test so v2_playbook_on_no_hosts_remaining is called as intended. (Un)setting ANSIBLE_FORCE_HANDLERS temporarily for the playbook should be all that is needed.

Thank you for the review. This PR was a spin-off from #81550 in which I'm doing some work around v2_playbook_on_no_hosts_remaining and indeed noticed a change in behaviour with force handlers. Due to the bug being fixed there (#81549), that callback does not fire unless max_fail_percentage is set (which it is in the final play).

I'm inclined to keep force handlers in place, to ensure that the fix for #81549 works in that case. The current proposed solution does, but originally it did not, and this integration test caught that.

markgoddard · 2023-12-13T14:05:31Z

test/integration/targets/ansible-playbook-callbacks/all-callbacks.yml

@@ -96,26 +96,22 @@
      ignore_errors: true

    - name: async poll ok
-      command: sleep 2


I definitely hit the issue locally, and I saw it in CI once too, but that may have been with an earlier iteration of the values. Either way, it should be possible to run the integration tests on a laptop reliably.

I understand the concern around test runtime, and set it up to ensure that only one poll callback is sent per task.

The problem is that the sleep duration is a multiple of the poll interval, so the number of polls can vary. I chose these numbers to make the sleep finish in the middle of the 1st and 2nd poll.

I'll try halving all of the numbers, which should reduce the task duration to 2s, as it was before.

test/integration/targets/ansible-playbook-callbacks/all-callbacks.yml

The timing of the async tasks was a little unpredictable, meaning that sometimes we would get an unexpected number of v2_runner_on_async_poll callbacks, and fail the test. This change fixes the issue by increasing the poll interval to 2 seconds and the sleep duration to 3 seconds, such that on a reasonably responsive system we will poll twice per task, with the sleep ending in the middle of the two polls. The include_me.yml file does not exist in this integration test. It has been added. The remote_tmp_dir.path expression is invalid - the setup_remote_tmp_dir role uses set_fact to set remote_tmp_dir to remote_tmp_dir.path. The integration tests run with ANSIBLE_HOST_PATTERN_MISMATCH=error, meaning that the final play was never reached. Set ANSIBLE_HOST_PATTERN_MISMATCH=warning to continue past the play and trigger the v2_playbook_on_no_hosts_matched callback.

markgoddard · 2023-12-19T11:00:55Z

Hi @mkrizek, how is it looking now?

webknjaz · 2024-01-15T20:00:04Z

@mkrizek could you take a look? It seems like this may fix flaky failures in our own CI that we observe quite regularly.

mkrizek · 2024-01-16T13:10:15Z

test/integration/targets/ansible-playbook-callbacks/runme.sh

@@ -5,6 +5,7 @@ set -eux
 export ANSIBLE_CALLBACK_PLUGINS=../support-callback_plugins/callback_plugins
 export ANSIBLE_ROLES_PATH=../
 export ANSIBLE_STDOUT_CALLBACK=callback_debug
+export ANSIBLE_HOST_PATTERN_MISMATCH=warning


I still think this should be set for the all-callbacks.yml playbook only as opposed to the whole script in case more tests are added in the future but not a blocker.

mkrizek

@webknjaz I was not able to reproduce the failures described in https://github.com/ansible/ansible/pull/82407/files#r1425469321 locally so I cannot verify the changes being correct but feel free to merge to see if it fixes the flaky tests you mentioned.

@markgoddard Thank you for the PR.

The difference in callbacks being run regarding on force_handlers being un/set can be worked on in a separate PR.

markgoddard · 2024-01-16T13:57:36Z

Thank you for the review @mkrizek. Ansibot has added the needs_revision label. Is it because the branch is old? Does it need a merge/rebase?

webknjaz · 2024-01-16T15:12:41Z

I cannot verify the changes being correct but feel free to merge to see if it fixes the flaky tests you mentioned.

I was hoping somebody with a better understanding of the consequences could approve first. The tests are flaky, but the failures happen once in a few days. To check this experimentally, we'd need to watch the CI nightlies post-merge for weeks, before declaring the fix working. I didn't want to experiment like that.

mattclay · 2024-01-16T23:04:49Z

/azp run

azure-pipelines · 2024-01-16T23:05:00Z

Azure Pipelines successfully started running 1 pipeline(s).

markgoddard · 2024-01-19T11:56:30Z

Thank you for approving. This has unlocked #81550.

webknjaz · 2024-01-29T14:20:56Z

FTR I think I haven't seen those failures since the merge. So I think it'd be useful to have this on other branches too (as the issues weren't unique to devel).

@markgoddard could you send us backport PRs against the stable versions (2.16, 2.15, 2.14 and 2.12) per https://docs.ansible.com/ansible/latest/community/development_process.html#backporting-merged-prs-in-ansible-core?

The timing of the async tasks was a little unpredictable, meaning that sometimes we would get an unexpected number of v2_runner_on_async_poll callbacks, and fail the test. This change fixes the issue by increasing the poll interval to 2 seconds and the sleep duration to 3 seconds, such that on a reasonably responsive system we will poll twice per task, with the sleep ending in the middle of the two polls. The include_me.yml file does not exist in this integration test. It has been added. The remote_tmp_dir.path expression is invalid - the setup_remote_tmp_dir role uses set_fact to set remote_tmp_dir to remote_tmp_dir.path. The integration tests run with ANSIBLE_HOST_PATTERN_MISMATCH=error, meaning that the final play was never reached. Set ANSIBLE_HOST_PATTERN_MISMATCH=warning to continue past the play and trigger the v2_playbook_on_no_hosts_matched callback. (cherry picked from commit 4a2de76)

markgoddard · 2024-01-31T12:11:18Z

FTR I think I haven't seen those failures since the merge. So I think it'd be useful to have this on other branches too (as the issues weren't unique to devel).

@markgoddard could you send us backport PRs against the stable versions (2.16, 2.15, 2.14 and 2.12) per https://docs.ansible.com/ansible/latest/community/development_process.html#backporting-merged-prs-in-ansible-core?

I've proposed backports to 2.16 and 2.15, but the callbacks integration test doesn't exist in 2.14 and earlier branches.

#82630
#82631

ansibot added test This PR relates to tests. needs_triage Needs a first human triage before being processed. labels Dec 12, 2023

This comment was marked as resolved.

Sign in to view

webknjaz requested a review from mkrizek December 12, 2023 13:37

markgoddard mentioned this pull request Dec 12, 2023

Ensure v2_playbook_on_no_hosts_remaining fires #81550

Open

bcoca removed the needs_triage Needs a first human triage before being processed. label Dec 12, 2023

bcoca requested a review from mattclay December 12, 2023 15:38

mkrizek requested changes Dec 13, 2023

View reviewed changes

ansibot added the needs_revision This PR fails CI tests or a maintainer has requested a review/revision of the PR. label Dec 13, 2023

markgoddard commented Dec 13, 2023

View reviewed changes

markgoddard force-pushed the fix-ansible-playbook-callbacks branch from 8c5bd40 to 9642495 Compare December 13, 2023 14:54

ansibot added stale_review Updates were made after the last review and the last review is more than 7 days old. and removed needs_revision This PR fails CI tests or a maintainer has requested a review/revision of the PR. labels Dec 13, 2023

ansibot added the stale_ci This PR has been tested by CI more than one week ago. Close and re-open this PR to get it retested. label Jan 2, 2024

markgoddard requested a review from mkrizek January 5, 2024 09:57

mkrizek reviewed Jan 16, 2024

View reviewed changes

mkrizek approved these changes Jan 16, 2024

View reviewed changes

ansibot added needs_revision This PR fails CI tests or a maintainer has requested a review/revision of the PR. and removed stale_review Updates were made after the last review and the last review is more than 7 days old. labels Jan 16, 2024

ansibot removed the stale_ci This PR has been tested by CI more than one week ago. Close and re-open this PR to get it retested. label Jan 16, 2024

mattclay approved these changes Jan 16, 2024

View reviewed changes

mkrizek merged commit 4a2de76 into ansible:devel Jan 17, 2024
62 checks passed

markgoddard deleted the fix-ansible-playbook-callbacks branch January 19, 2024 11:30

ansible locked and limited conversation to collaborators Feb 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix issues with ansible-playbook-callbacks test #82407

Fix issues with ansible-playbook-callbacks test #82407

markgoddard commented Dec 12, 2023 •

edited

This comment was marked as resolved.

This comment was marked as resolved.

mkrizek left a comment

mkrizek Dec 13, 2023

markgoddard Dec 13, 2023

markgoddard Dec 13, 2023

mkrizek Dec 13, 2023

markgoddard left a comment

markgoddard Dec 13, 2023

markgoddard commented Dec 19, 2023

webknjaz commented Jan 15, 2024

mkrizek Jan 16, 2024

mkrizek left a comment

markgoddard commented Jan 16, 2024

webknjaz commented Jan 16, 2024

mattclay commented Jan 16, 2024

azure-pipelines bot commented Jan 16, 2024

markgoddard commented Jan 19, 2024

webknjaz commented Jan 29, 2024

markgoddard commented Jan 31, 2024

Fix issues with ansible-playbook-callbacks test #82407

Fix issues with ansible-playbook-callbacks test #82407

Conversation

markgoddard commented Dec 12, 2023 • edited

SUMMARY

ISSUE TYPE

ADDITIONAL INFORMATION

This comment was marked as resolved.

This comment was marked as resolved.

mkrizek left a comment

Choose a reason for hiding this comment

mkrizek Dec 13, 2023

Choose a reason for hiding this comment

markgoddard Dec 13, 2023

Choose a reason for hiding this comment

markgoddard Dec 13, 2023

Choose a reason for hiding this comment

mkrizek Dec 13, 2023

Choose a reason for hiding this comment

markgoddard left a comment

Choose a reason for hiding this comment

markgoddard Dec 13, 2023

Choose a reason for hiding this comment

markgoddard commented Dec 19, 2023

webknjaz commented Jan 15, 2024

mkrizek Jan 16, 2024

Choose a reason for hiding this comment

mkrizek left a comment

Choose a reason for hiding this comment

markgoddard commented Jan 16, 2024

webknjaz commented Jan 16, 2024

mattclay commented Jan 16, 2024

azure-pipelines bot commented Jan 16, 2024

markgoddard commented Jan 19, 2024

webknjaz commented Jan 29, 2024

markgoddard commented Jan 31, 2024

markgoddard commented Dec 12, 2023 •

edited