Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infinite template re-render loop when using {{ timestamp }} in two different templates of the same task #20618

Open
faryon93 opened this issue May 16, 2024 · 1 comment
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. type/bug

Comments

@faryon93
Copy link

faryon93 commented May 16, 2024

Nomad version

Output from `nomad version`
Nomad v1.7.7
BuildDate 2024-04-16T19:26:43Z
Revision 0f34c85ee63f6472bd2db1e2487611f4b176c70c

Operating system and Environment details

$: cat /etc/debian_version 
12.5

$: uname -a
Linux nomad2 6.1.0-20-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.85-1 (2024-04-11) x86_64 GNU/Linux

$: podman version
Client:       Podman Engine
Version:      4.3.1
API Version:  4.3.1
Go Version:   go1.19.8
Built:        Thu Jan  1 01:00:00 1970
OS/Arch:      linux/amd64

Issue

When I use the {{ timestamp "unix"}} function in two differente template blocks of the same task and one of the templates uses other functions which can cause re-rendering (like {{ key }} or {{ service }}, etc. ) and a re-rendering is performed, the template gets re-rendered in an infinite loop.

The problem does not exist when the task has only one template.

Sidenote: When the task is stuck in the renderloop proper shutdown is not possible.

Reproduction steps

  1. create key foo/bar in consul with some random content
  2. run the jobfile from below
  3. wait until task is running
  4. change the key foo/bar to another value
  5. observe the Recent Events table of the task for re-occurring Template re-rendered messages

Expected Result

The template should be re-rendered exactly once.

Actual Result

The template gets re-rendered in an infinite loop every few seconds.

Screenshot_20240516_192701

Job file (if appropriate)

job "fileserver" {
  group "http" {
    count = 1

    network {
      mode = "bridge"
    }

    task "nginx" {
      driver = "podman"
      config {
        image = "docker.io/library/nginx:latest"
      }

      template {
        data = <<EOF
# Timestamp: {{ timestamp "unix" }}
# K/V: {{ key "foo/bar"}}
        EOF
        destination = "/local/test.txt"
        change_mode   = "signal"
        change_signal = "SIGHUP"
      }


      template {
        data = <<EOF
# Timestamp: {{ timestamp "unix" }}
        EOF
        destination = "/local/test1.txt"
        change_mode   = "signal"
        change_signal = "SIGHUP"
      }
    }
  }
}

Nomad Server / Client logs

2024-05-16T19:24:46.088+0200 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=3a19ad33-e832-7205-167e-936dc7307a94 task=nginx type=Received msg="Task received by client" failed=false
2024-05-16T19:24:51.192+0200 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=3a19ad33-e832-7205-167e-936dc7307a94 task=nginx type="Task Setup" msg="Building Task Directory" failed=false
2024-05-16T19:24:51.298+0200 [INFO]  agent: (runner) creating new runner (dry: false, once: false)
2024-05-16T19:24:51.299+0200 [INFO]  agent: (runner) creating watcher
2024-05-16T19:24:51.299+0200 [INFO]  agent: (runner) starting
2024-05-16T19:24:51.347+0200 [INFO]  agent: (runner) rendered "(dynamic)" => "/opt/nomad/data/alloc/3a19ad33-e832-7205-167e-936dc7307a94/nginx/local/test1.txt"
2024-05-16T19:24:51.419+0200 [INFO]  agent: (runner) rendered "(dynamic)" => "/opt/nomad/data/alloc/3a19ad33-e832-7205-167e-936dc7307a94/nginx/local/test.txt"
2024-05-16T19:24:51.754+0200 [INFO]  client.driver_mgr.nomad-driver-podman: Completely started container: driver=podman @module=podman container=4e213b233753ad26a521001cc646f92fb0e1ca796ee4a59d453fb568e33b37b9 ip=172.26.64.116 taskID=3a19ad33-e832-7205-167e-936dc7307a94/nginx/55e0c66a timestamp="2024-05-16T19:24:51.753+0200"
2024-05-16T19:24:51.757+0200 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=3a19ad33-e832-7205-167e-936dc7307a94 task=nginx type=Started msg="Task started by client" failed=false
2024-05-16T19:25:25.847+0200 [INFO]  agent: (runner) rendered "(dynamic)" => "/opt/nomad/data/alloc/3a19ad33-e832-7205-167e-936dc7307a94/nginx/local/test1.txt"
2024-05-16T19:25:25.895+0200 [INFO]  agent: (runner) rendered "(dynamic)" => "/opt/nomad/data/alloc/3a19ad33-e832-7205-167e-936dc7307a94/nginx/local/test.txt"
2024-05-16T19:25:26.863+0200 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=3a19ad33-e832-7205-167e-936dc7307a94 task=nginx type=Signaling msg="Template re-rendered" failed=false
2024-05-16T19:25:29.775+0200 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=3a19ad33-e832-7205-167e-936dc7307a94 task=nginx type=Signaling msg="Template re-rendered" failed=false
2024-05-16T19:25:30.951+0200 [INFO]  agent: (runner) rendered "(dynamic)" => "/opt/nomad/data/alloc/3a19ad33-e832-7205-167e-936dc7307a94/nginx/local/test1.txt"
2024-05-16T19:25:35.858+0200 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=3a19ad33-e832-7205-167e-936dc7307a94 task=nginx type=Signaling msg="Template re-rendered" failed=false
2024-05-16T19:25:35.941+0200 [INFO]  agent: (runner) rendered "(dynamic)" => "/opt/nomad/data/alloc/3a19ad33-e832-7205-167e-936dc7307a94/nginx/local/test.txt"
2024-05-16T19:25:38.577+0200 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=3a19ad33-e832-7205-167e-936dc7307a94 task=nginx type=Signaling msg="Template re-rendered" failed=false
2024-05-16T19:25:40.997+0200 [INFO]  agent: (runner) rendered "(dynamic)" => "/opt/nomad/data/alloc/3a19ad33-e832-7205-167e-936dc7307a94/nginx/local/test1.txt"
2024-05-16T19:25:43.301+0200 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=3a19ad33-e832-7205-167e-936dc7307a94 task=nginx type=Signaling msg="Template re-rendered" failed=false
2024-05-16T19:25:45.995+0200 [INFO]  agent: (runner) rendered "(dynamic)" => "/opt/nomad/data/alloc/3a19ad33-e832-7205-167e-936dc7307a94/nginx/local/test.txt"
2024-05-16T19:25:49.458+0200 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=3a19ad33-e832-7205-167e-936dc7307a94 task=nginx type=Signaling msg="Template re-rendered" failed=false
2024-05-16T19:25:51.051+0200 [INFO]  agent: (runner) rendered "(dynamic)" => "/opt/nomad/data/alloc/3a19ad33-e832-7205-167e-936dc7307a94/nginx/local/test1.txt"
2024-05-16T19:25:52.674+0200 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=3a19ad33-e832-7205-167e-936dc7307a94 task=nginx type=Signaling msg="Template re-rendered" failed=false
2024-05-16T19:25:56.052+0200 [INFO]  agent: (runner) rendered "(dynamic)" => "/opt/nomad/data/alloc/3a19ad33-e832-7205-167e-936dc7307a94/nginx/local/test.txt"
2024-05-16T19:25:56.056+0200 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=3a19ad33-e832-7205-167e-936dc7307a94 task=nginx type=Signaling msg="Template re-rendered" failed=false
2024-05-16T19:26:01.109+0200 [INFO]  agent: (runner) rendered "(dynamic)" => "/opt/nomad/data/alloc/3a19ad33-e832-7205-167e-936dc7307a94/nginx/local/test1.txt"
2024-05-16T19:26:04.697+0200 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=3a19ad33-e832-7205-167e-936dc7307a94 task=nginx type=Killing msg="Sent interrupt. Waiting 5s before force killing" failed=false
2024-05-16T19:26:04.699+0200 [INFO]  client.driver_mgr.nomad-driver-podman: Stopping task: driver=podman signal="" taskID=3a19ad33-e832-7205-167e-936dc7307a94/nginx/55e0c66a @module=podman timestamp="2024-05-16T19:26:04.699+0200"
2024-05-16T19:26:04.715+0200 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=04eea9f3-7518-cec6-4808-1dbcb9020c37 task=nginx type=Received msg="Task received by client" failed=false
2024-05-16T19:26:05.811+0200 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=3a19ad33-e832-7205-167e-936dc7307a94 task=nginx type=Terminated msg="Exit Code: 0" failed=false
2024-05-16T19:26:05.836+0200 [WARN]  client.driver_mgr.nomad-driver-podman: Could not remove container: driver=podman container=4e213b233753ad26a521001cc646f92fb0e1ca796ee4a59d453fb568e33b37b9 error="cannot delete container, status code: 200" @module=podman timestamp="2024-05-16T19:26:05.835+0200"
2024-05-16T19:26:05.836+0200 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=3a19ad33-e832-7205-167e-936dc7307a94 task=nginx type=Killed msg="Task successfully killed" failed=false
2024-05-16T19:26:06.102+0200 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=3a19ad33-e832-7205-167e-936dc7307a94 task=nginx type=Killing
    msg=
    | Template failed to send signals [hangup]: 1 error occurred:
    | \t* Task not running
    |
    failed=true
    2024-05-16T19:26:06.108+0200 [INFO]  agent: (runner) rendered "(dynamic)" => "/opt/nomad/data/alloc/3a19ad33-e832-7205-167e-936dc7307a94/nginx/local/test.txt"
2024-05-16T19:26:09.841+0200 [WARN]  client.alloc_runner.task_runner.task_hook.logmon.nomad: timed out waiting for read-side of process output pipe to close: alloc_id=3a19ad33-e832-7205-167e-936dc7307a94 task=nginx @module=logmon timestamp="2024-05-16T19:26:09.841+0200"
2024-05-16T19:26:09.848+0200 [INFO]  client.alloc_runner.task_runner.task_hook.logmon: plugin process exited: alloc_id=3a19ad33-e832-7205-167e-936dc7307a94 task=nginx plugin=/usr/bin/nomad id=1227172
2024-05-16T19:26:09.849+0200 [INFO]  agent: (runner) stopping
2024-05-16T19:26:09.849+0200 [INFO]  agent: (runner) received finish
2024-05-16T19:26:09.849+0200 [INFO]  client.gc: marking allocation for GC: alloc_id=3a19ad33-e832-7205-167e-936dc7307a94
@tgross tgross added this to Needs Triage in Nomad - Community Issues Triage via automation May 17, 2024
@pkazmierczak
Copy link
Contributor

Hi @faryon93, thanks for reporting this. I was able to reproduce, and not only with podman but also with docker driver. Indeed there's something fishy happening with the taskrunner here, and we'll look into it.

@pkazmierczak pkazmierczak added the stage/accepted Confirmed, and intend to work on. No timeline committment though. label May 21, 2024
@pkazmierczak pkazmierczak moved this from Needs Triage to Triaging in Nomad - Community Issues Triage May 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. type/bug
Development

No branches or pull requests

2 participants