Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On demand Docker log collection #20558

Open
dani opened this issue May 10, 2024 · 5 comments
Open

On demand Docker log collection #20558

dani opened this issue May 10, 2024 · 5 comments

Comments

@dani
Copy link

dani commented May 10, 2024

Proposal

I just took a look at the memory usage on my Nomad agents, and realized that the overhead of Docker log collection is crazy. On my small scale cluster (personal install with 4 nomad agents, 65 alloc running), it was using about 33% of the total used memory (mainly the nomad logmon and nomad docker_logger processes. I used the reported used memory by systemd, with and without disable_log_collection = true).

Disabling log collection (and using for example fluentd for the docker task driver) is a solution to this insane consumption, but we loose access to the container logs from the web interface or the nomad alloc logs cli, which is convenient for quickly debugging (faster than querying a central log aggregator).

Maybe one way to mitigate this would be to have a 3rd mode for log collection which would be on-demande : as soon as the log streaming API is called, the corresponding logmon & docker_logger process could be spawned (and which would be killed after some timeout)

Use-cases

An on-demand log collection would suppress most of the memory overhead of log collection for the Docker driver, while still allowing logs to be displayed in the web interface or the nomad cli for ponctual debugging

Attempted Solutions

Using an out of band log collector/aggregator and turning disable_log_collection globaly in Nomad's agent conf is a workarround, but loosing access to the logs from the web interface is a serious drawback.

@tgross tgross added this to Needs Triage in Nomad - Community Issues Triage via automation May 13, 2024
@tgross
Copy link
Member

tgross commented May 17, 2024

Maybe one way to mitigate this would be to have a 3rd mode for log collection which would be on-demande : as soon as the log streaming API is called, the corresponding logmon & docker_logger process could be spawned (and which would be killed after some timeout)

This is a clever idea, but the challenge is that the the logmon/docker_logger are just attaching to stdout/stderr of the container. If nothing is reading from those file handles, then the application will not be able to write those logs (potentially causing the entire application to block, but at the very least buffering up a ton of logs). Likewise, we need to attach the logmon so that we can rotate logs safely without dropping any, otherwise a given task can use more than the allowed disk space.

The long-term approach we want to take to this is logging plugins. A design doc from a hack branch I did of this can be found here. A couple of other thoughts along those lines:

  • One of the logging plugins not listed there would be a journald logger, where we're write logs directly to the journal and let the journal's own rate limiting take over.
  • If we could properly enforce disk quotas (ex. the alloc dir was a loopback device or something like that), then we could allow tasks to opt out of Nomad-managed log rotation and that'd keep all the existing features.

In any event, I'll label this as another logging-related idea and we'll look into this when we return to that logging plugin concept. Thanks!

@tgross tgross moved this from Needs Triage to Needs Roadmapping in Nomad - Community Issues Triage May 17, 2024
@dani
Copy link
Author

dani commented May 17, 2024

As a workarround, I'm now using the fluentd logging driver, sent to a local vector instance, which write back the logs where Nomad expects (as described here ). On a small scale test cluster, this setup reduced by ~25GB (about 20% of the total) the global memory consumption, while still allowing access to logs with the Nomad API

@apollo13
Copy link
Contributor

Ha, thank you @dani. I am also capturing docker logs via the splunk exporter into vector (have to see if fluentd would be better), but never thought of writing them back to the nomad locations.

@apollo13
Copy link
Contributor

Fwiw, instead of using env variables you can also use the docker labels, so my docker plugin config looks like this:

        extra_labels = ["*"]
        logging {
            type = "splunk"
            config {
                splunk-token = "localhost-splunk-token"
                splunk-url = "http://127.0.0.1:8089"
                splunk-verify-connection = "false"
                labels-regex = "com\\.hashicorp\\..*"
            }
        }

and the vector configuration looks like this:

sinks:
  loki:
    type: loki
    inputs:
      - splunk
    endpoint: http://localhost:3100
    encoding:
      codec: text
    healthcheck:
      enabled: false
    labels:
      nomad_namespace: '{{ attrs."com.hashicorp.nomad.namespace" }}'
      nomad_job: '{{ attrs."com.hashicorp.nomad.job_name" }}'
      nomad_group: '{{ attrs."com.hashicorp.nomad.task_group_name" }}'
      nomad_task: '{{ attrs."com.hashicorp.nomad.task_name" }}'
      nomad_node: '{{ attrs."com.hashicorp.nomad.node_name" }}'
      nomad_alloc: '{{ attrs."com.hashicorp.nomad.alloc_id" }}'
      host: "${HOSTNAME}"
      log: "nomad"

At least that is the part that passes the data into loki, but you can see how to access the labels again via attrs

@dani
Copy link
Author

dani commented May 31, 2024

Indeed, I could've done this. But in my case, I also have some tasks which sends directly their logs to the same fluentd source, and only has access to the env var, not the labels. So using env everywhere allows the same vector pipeline to be used

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

3 participants