vert.x accepting thread causes memory leak #6709

rafhuys-klarrio · 2024-12-10T17:08:05Z

Since upstepping to version 7.0.0, my app gets OOMKilled quite rapidly. I attached VisualVM and see >10k vert.x-eventloop-thread-0 and vert.x-acceptor-thread-0 threads (so all of them with the same name). While trying to relate this to my code I see there's a method that creates a new K8S client (new KubernetesClientBuilder().build()) every minute or so. While this obviously can be reduced to a singleton I wonder if this is where the issue stems from... The client is only used to parse some Yaml and then can be GC'd.

Does the thread name ring a bell? Where does it gets created?

The text was updated successfully, but these errors were encountered:

manusa · 2024-12-11T04:39:16Z

The underlying Vert.x HttpClient.Factory implementation will create a new Vert.x instance for each build() invocation.

Vert.x uses the Reactor pattern and creates multiple event-loop threads to handle the workload reactively.
So for each instance you'd expect a pool of threads to handle the workloads and to accept jobs.

However, AFAIR vert.x shares the resources of each of those instances.
In addition, the 10k instances don't really correlate to the build() invocation every minute.
So it's likely that there is something else going on in your code.
Are you sure you're not communicating with the cluster? might any of those connections remain open for some reason?

In any case, in general, I'd advise to use a singleton KubernetesClient instance for your application.

Maybe @shawkins has some more clues about what's going on here.

rafhuys-klarrio · 2024-12-11T10:05:19Z

I need to dig deeper in the issue, but on the surface it looks like we we're not misbehaving really: there's a single client that does all sorts of things on the cluster: listing, creating, updating, deleting K8S resources. After each of those calls the client is not explicitly closed or anything (I expect vert.x to handle those things). Elsewhere we do create a client each x minutes (this can be easily improved) and only use this client to load yaml

shawkins · 2024-12-11T13:01:05Z

Maybe @shawkins has some more clues about what's going on here.

vert.x by default is sharing the same threadpool across Vertx instances, but each instance is creating it's own event loop and acceptor threads.

This points to a couple of possiblities:

a lot of KubernetesClients / Vertx instances are getting created, but the KubernetesClients are not getting closed
the clients are getting closed, but these threads are not being stopped and are non-daemon so they are hanging around

The first is a usage error. The second would be indicative of a problem with fabric8 / vertx.

4nt1m0n · 2024-12-13T15:47:10Z

I am affected by thread leaks since this upgrade as well.

I use the following pattern a lot:

      KubernetesClientBuilder().withConfig(config).build().use { client ->
        // interact with the client
      }

The use should close the client, or at least that is what I was expecting and it seems to have worked fine up until now.

shawkins · 2024-12-13T17:10:04Z

The issue here is that the vertx http client does not close the vertx instance - there may have been an early assumption that code was reusing the same factory, but that is not the case for the default factory. Also we don't want to close the vext instance when running on quarkus, so we'll need to delegate that to the factory - Quarkus has their own implementation.

Alternatively we can make either the default factory or the Vertx instance referenced by the vertx factory static - but generally we've tried to avoid internal static state.

In any case usage scenarios were kubernetes clients are being repeatedly created / closed shoud be avoided. They are supposed to be used in a singleton pattern. Some of the docs / examples are probably leading people in the wrong direction.

jhliptak · 2024-12-13T21:34:19Z

It seems to me that the relationship you have with Vert.X is broken with regards to resource allocation. Even using one singleton is not working correctly.

Reading the Vert.X documentation: Vert.x Core

Threads maintained by Vert.x instances are not daemon threads so they will prevent the JVM from exiting.
If you are embedding Vert.x and you have finished with it, you can call close to close it down.
This will shut-down all internal thread pools and close other resources, and will allow the JVM to exit.

If I create a standard Spring Boot application and configure a CLI option:

spring.main.web-application-type=none

package org.example.fabric87;

import io.fabric8.kubernetes.api.model.HasMetadata;
import io.fabric8.kubernetes.api.model.Pod;
import io.fabric8.kubernetes.client.KubernetesClient;
import io.fabric8.kubernetes.client.KubernetesClientBuilder;
import io.fabric8.kubernetes.client.dsl.NamespaceableResource;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.boot.CommandLineRunner;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;


@SpringBootApplication
public class Fabric87CliApplication implements CommandLineRunner {

    private static Logger log = LoggerFactory.getLogger(Fabric87CliApplication.class);
    public static void main(String[] args) {
        SpringApplication.run(Fabric87CliApplication.class, args);
    }

    @Override
    public void run(String... args) throws Exception {
        log.info("Hi");

        try (KubernetesClient client = new KubernetesClientBuilder().build()) {
            String yaml = """
apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  containers:
  - name: nginx
    image: nginx:1.14.2
    ports:
    - containerPort: 80
                        """;
            NamespaceableResource<HasMetadata> myPod = client.resource(yaml);
        }

    }
}

The program will never exit. The close() call should completely clean up everything that has been allocated by the client - that's what the java.lang.AutoCloseable interface means.

panbingkun · 2024-12-14T04:50:01Z

It seems to me that the relationship you have with Vert.X is broken with regards to resource allocation. Even using one singleton is not working correctly.

Reading the Vert.X documentation: Vert.x Core

Threads maintained by Vert.x instances are not daemon threads so they will prevent the JVM from exiting.
If you are embedding Vert.x and you have finished with it, you can call close to close it down.
This will shut-down all internal thread pools and close other resources, and will allow the JVM to exit.

If I create a standard Spring Boot application and configure a CLI option:

spring.main.web-application-type=none
package org.example.fabric87;

import io.fabric8.kubernetes.api.model.HasMetadata;
import io.fabric8.kubernetes.api.model.Pod;
import io.fabric8.kubernetes.client.KubernetesClient;
import io.fabric8.kubernetes.client.KubernetesClientBuilder;
import io.fabric8.kubernetes.client.dsl.NamespaceableResource;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.boot.CommandLineRunner;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;


@SpringBootApplication
public class Fabric87CliApplication implements CommandLineRunner {

    private static Logger log = LoggerFactory.getLogger(Fabric87CliApplication.class);
    public static void main(String[] args) {
        SpringApplication.run(Fabric87CliApplication.class, args);
    }

    @Override
    public void run(String... args) throws Exception {
        log.info("Hi");

        try (KubernetesClient client = new KubernetesClientBuilder().build()) {
            String yaml = """
apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  containers:
  - name: nginx
    image: nginx:1.14.2
    ports:
    - containerPort: 80
                        """;
            NamespaceableResource<HasMetadata> myPod = client.resource(yaml);
        }

    }
}
The program will never exit. The close() call should completely clean up everything that has been allocated by the client - that's what the java.lang.AutoCloseable interface means.

This problem has been bothering me for a week. in spark, I solved it through the following code:

https://github.com/apache/spark/pull/49159/files#diff-16671cf9d0bff95338158f5fe51d661c482abbcf0fba8515d920849effad66ebR114

https://github.com/apache/spark/pull/49159/files#diff-d548b8df6c6e03f0b2c43538c1374a150f2fefb11e03bb36cfbc132a55605c7bR59

shawkins · 2024-12-14T20:13:43Z

It seems to me that the relationship you have with Vert.X is broken with regards to resource allocation.

Unfortunately that's because the vertx client was predominately being used by quarkus and/or as a singleton in containerized applications that would forcably terminate, so it wasn't noticed before how the standalone usage worked - especially compared to okhttp.

Note that using the jdk client won't have this issue either.

It definitely seems good to mark the threads as daemon. We should have the vert.x folks comment as well on the handling of vertx instances - @vietj can you comment on how many vertx instances we should be creating?

manusa · 2024-12-16T09:38:30Z

I created #6726 as a proposal to fix this issue.

I'd like to merge a fix as soon as possible so I can cut a v7.0.1 release since it seems quite a critical issue.

panbingkun · 2024-12-16T10:52:11Z

May I ask when the v7.0.1 will be released ?
Thanks!

manusa · 2024-12-16T11:05:15Z

May I ask when the v7.0.1 will be released ? Thanks!

Thursday at the latest, in a couple of hours as the soonest :)

panbingkun · 2024-12-16T11:14:06Z

May I ask when the v7.0.1 will be released ? Thanks!

Thursday at the latest, in a couple of hours as the soonest :)

yeah, thank you very much! ❤️

manusa · 2024-12-18T05:26:05Z

https://github.com/fabric8io/kubernetes-client/releases/tag/v7.0.1

manusa added the question label Dec 11, 2024

manusa added bug and removed question labels Dec 14, 2024

manusa added this to the 7.0.1 milestone Dec 14, 2024

panbingkun mentioned this issue Dec 14, 2024

[SPARK-50493][SPARK-37687][BUILD] Migrate kubernetes-client from 6.x to 7.x apache/spark#49159

Closed

manusa added this to Eclipse JKube Dec 16, 2024

manusa moved this to In Progress in Eclipse JKube Dec 16, 2024

manusa self-assigned this Dec 16, 2024

manusa mentioned this issue Dec 16, 2024

fix(vertx): VertxHttpClientFactory reuses the same Vertx instance for each VertxHttpClient instance #6726

Merged

11 tasks

manusa moved this from In Progress to Review in Eclipse JKube Dec 16, 2024

manusa closed this as completed in #6726 Dec 16, 2024

github-project-automation bot moved this from Review to Done in Eclipse JKube Dec 16, 2024

shawkins mentioned this issue Jan 24, 2025

Shutdown hook in VertxHttpClientFactory prevents further client usage #6792

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vert.x accepting thread causes memory leak #6709

vert.x accepting thread causes memory leak #6709

rafhuys-klarrio commented Dec 10, 2024

manusa commented Dec 11, 2024

rafhuys-klarrio commented Dec 11, 2024

shawkins commented Dec 11, 2024

4nt1m0n commented Dec 13, 2024 •

edited

Loading

shawkins commented Dec 13, 2024

jhliptak commented Dec 13, 2024

panbingkun commented Dec 14, 2024

shawkins commented Dec 14, 2024 •

edited

Loading

manusa commented Dec 16, 2024

panbingkun commented Dec 16, 2024

manusa commented Dec 16, 2024

panbingkun commented Dec 16, 2024

manusa commented Dec 18, 2024

vert.x accepting thread causes memory leak #6709

vert.x accepting thread causes memory leak #6709

Comments

rafhuys-klarrio commented Dec 10, 2024

manusa commented Dec 11, 2024

rafhuys-klarrio commented Dec 11, 2024

shawkins commented Dec 11, 2024

4nt1m0n commented Dec 13, 2024 • edited Loading

shawkins commented Dec 13, 2024

jhliptak commented Dec 13, 2024

panbingkun commented Dec 14, 2024

shawkins commented Dec 14, 2024 • edited Loading

manusa commented Dec 16, 2024

panbingkun commented Dec 16, 2024

manusa commented Dec 16, 2024

panbingkun commented Dec 16, 2024

manusa commented Dec 18, 2024

4nt1m0n commented Dec 13, 2024 •

edited

Loading

shawkins commented Dec 14, 2024 •

edited

Loading