Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vert.x accepting thread causes memory leak #6709

Closed
rafhuys-klarrio opened this issue Dec 10, 2024 · 13 comments · Fixed by #6726
Closed

vert.x accepting thread causes memory leak #6709

rafhuys-klarrio opened this issue Dec 10, 2024 · 13 comments · Fixed by #6726
Assignees
Labels
Milestone

Comments

@rafhuys-klarrio
Copy link

Since upstepping to version 7.0.0, my app gets OOMKilled quite rapidly. I attached VisualVM and see >10k vert.x-eventloop-thread-0 and vert.x-acceptor-thread-0 threads (so all of them with the same name). While trying to relate this to my code I see there's a method that creates a new K8S client (new KubernetesClientBuilder().build()) every minute or so. While this obviously can be reduced to a singleton I wonder if this is where the issue stems from... The client is only used to parse some Yaml and then can be GC'd.

Does the thread name ring a bell? Where does it gets created?

@manusa
Copy link
Member

manusa commented Dec 11, 2024

The underlying Vert.x HttpClient.Factory implementation will create a new Vert.x instance for each build() invocation.

Vert.x uses the Reactor pattern and creates multiple event-loop threads to handle the workload reactively.
So for each instance you'd expect a pool of threads to handle the workloads and to accept jobs.

However, AFAIR vert.x shares the resources of each of those instances.
In addition, the 10k instances don't really correlate to the build() invocation every minute.
So it's likely that there is something else going on in your code.
Are you sure you're not communicating with the cluster? might any of those connections remain open for some reason?

In any case, in general, I'd advise to use a singleton KubernetesClient instance for your application.

Maybe @shawkins has some more clues about what's going on here.

@rafhuys-klarrio
Copy link
Author

I need to dig deeper in the issue, but on the surface it looks like we we're not misbehaving really: there's a single client that does all sorts of things on the cluster: listing, creating, updating, deleting K8S resources. After each of those calls the client is not explicitly closed or anything (I expect vert.x to handle those things). Elsewhere we do create a client each x minutes (this can be easily improved) and only use this client to load yaml

@shawkins
Copy link
Contributor

Maybe @shawkins has some more clues about what's going on here.

vert.x by default is sharing the same threadpool across Vertx instances, but each instance is creating it's own event loop and acceptor threads.

This points to a couple of possiblities:

  • a lot of KubernetesClients / Vertx instances are getting created, but the KubernetesClients are not getting closed
  • the clients are getting closed, but these threads are not being stopped and are non-daemon so they are hanging around

The first is a usage error. The second would be indicative of a problem with fabric8 / vertx.

@4nt1m0n
Copy link

4nt1m0n commented Dec 13, 2024

I am affected by thread leaks since this upgrade as well.

I use the following pattern a lot:

      KubernetesClientBuilder().withConfig(config).build().use { client ->
        // interact with the client
      }

The use should close the client, or at least that is what I was expecting and it seems to have worked fine up until now.

@shawkins
Copy link
Contributor

The issue here is that the vertx http client does not close the vertx instance - there may have been an early assumption that code was reusing the same factory, but that is not the case for the default factory. Also we don't want to close the vext instance when running on quarkus, so we'll need to delegate that to the factory - Quarkus has their own implementation.

Alternatively we can make either the default factory or the Vertx instance referenced by the vertx factory static - but generally we've tried to avoid internal static state.

In any case usage scenarios were kubernetes clients are being repeatedly created / closed shoud be avoided. They are supposed to be used in a singleton pattern. Some of the docs / examples are probably leading people in the wrong direction.

@jhliptak
Copy link

It seems to me that the relationship you have with Vert.X is broken with regards to resource allocation. Even using one singleton is not working correctly.

Reading the Vert.X documentation: Vert.x Core

Threads maintained by Vert.x instances are not daemon threads so they will prevent the JVM from exiting.
If you are embedding Vert.x and you have finished with it, you can call close to close it down.
This will shut-down all internal thread pools and close other resources, and will allow the JVM to exit.

If I create a standard Spring Boot application and configure a CLI option:

spring.main.web-application-type=none

package org.example.fabric87;

import io.fabric8.kubernetes.api.model.HasMetadata;
import io.fabric8.kubernetes.api.model.Pod;
import io.fabric8.kubernetes.client.KubernetesClient;
import io.fabric8.kubernetes.client.KubernetesClientBuilder;
import io.fabric8.kubernetes.client.dsl.NamespaceableResource;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.boot.CommandLineRunner;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;


@SpringBootApplication
public class Fabric87CliApplication implements CommandLineRunner {

    private static Logger log = LoggerFactory.getLogger(Fabric87CliApplication.class);
    public static void main(String[] args) {
        SpringApplication.run(Fabric87CliApplication.class, args);
    }

    @Override
    public void run(String... args) throws Exception {
        log.info("Hi");

        try (KubernetesClient client = new KubernetesClientBuilder().build()) {
            String yaml = """
apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  containers:
  - name: nginx
    image: nginx:1.14.2
    ports:
    - containerPort: 80
                        """;
            NamespaceableResource<HasMetadata> myPod = client.resource(yaml);
        }

    }
}

The program will never exit. The close() call should completely clean up everything that has been allocated by the client - that's what the java.lang.AutoCloseable interface means.

@manusa manusa added bug and removed question labels Dec 14, 2024
@panbingkun
Copy link

It seems to me that the relationship you have with Vert.X is broken with regards to resource allocation. Even using one singleton is not working correctly.

Reading the Vert.X documentation: Vert.x Core

Threads maintained by Vert.x instances are not daemon threads so they will prevent the JVM from exiting.
If you are embedding Vert.x and you have finished with it, you can call close to close it down.
This will shut-down all internal thread pools and close other resources, and will allow the JVM to exit.

If I create a standard Spring Boot application and configure a CLI option:

spring.main.web-application-type=none

package org.example.fabric87;

import io.fabric8.kubernetes.api.model.HasMetadata;
import io.fabric8.kubernetes.api.model.Pod;
import io.fabric8.kubernetes.client.KubernetesClient;
import io.fabric8.kubernetes.client.KubernetesClientBuilder;
import io.fabric8.kubernetes.client.dsl.NamespaceableResource;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.boot.CommandLineRunner;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;


@SpringBootApplication
public class Fabric87CliApplication implements CommandLineRunner {

    private static Logger log = LoggerFactory.getLogger(Fabric87CliApplication.class);
    public static void main(String[] args) {
        SpringApplication.run(Fabric87CliApplication.class, args);
    }

    @Override
    public void run(String... args) throws Exception {
        log.info("Hi");

        try (KubernetesClient client = new KubernetesClientBuilder().build()) {
            String yaml = """
apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  containers:
  - name: nginx
    image: nginx:1.14.2
    ports:
    - containerPort: 80
                        """;
            NamespaceableResource<HasMetadata> myPod = client.resource(yaml);
        }

    }
}

The program will never exit. The close() call should completely clean up everything that has been allocated by the client - that's what the java.lang.AutoCloseable interface means.

This problem has been bothering me for a week. in spark, I solved it through the following code:

https://github.com/apache/spark/pull/49159/files#diff-16671cf9d0bff95338158f5fe51d661c482abbcf0fba8515d920849effad66ebR114
image

https://github.com/apache/spark/pull/49159/files#diff-d548b8df6c6e03f0b2c43538c1374a150f2fefb11e03bb36cfbc132a55605c7bR59
image

@shawkins
Copy link
Contributor

shawkins commented Dec 14, 2024

It seems to me that the relationship you have with Vert.X is broken with regards to resource allocation.

Unfortunately that's because the vertx client was predominately being used by quarkus and/or as a singleton in containerized applications that would forcably terminate, so it wasn't noticed before how the standalone usage worked - especially compared to okhttp.

Note that using the jdk client won't have this issue either.

It definitely seems good to mark the threads as daemon. We should have the vert.x folks comment as well on the handling of vertx instances - @vietj can you comment on how many vertx instances we should be creating?

@manusa
Copy link
Member

manusa commented Dec 16, 2024

I created #6726 as a proposal to fix this issue.

I'd like to merge a fix as soon as possible so I can cut a v7.0.1 release since it seems quite a critical issue.

@manusa manusa moved this from In Progress to Review in Eclipse JKube Dec 16, 2024
@panbingkun
Copy link

May I ask when the v7.0.1 will be released ?
Thanks!

@manusa
Copy link
Member

manusa commented Dec 16, 2024

May I ask when the v7.0.1 will be released ? Thanks!

Thursday at the latest, in a couple of hours as the soonest :)

@panbingkun
Copy link

May I ask when the v7.0.1 will be released ? Thanks!

Thursday at the latest, in a couple of hours as the soonest :)

yeah, thank you very much! ❤️

@github-project-automation github-project-automation bot moved this from Review to Done in Eclipse JKube Dec 16, 2024
@manusa
Copy link
Member

manusa commented Dec 18, 2024

https://github.com/fabric8io/kubernetes-client/releases/tag/v7.0.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
6 participants