Hight number of processes of /next/dist/compiled/jest-worker/processChild.js still alive after next build #45508

zqjimlove · 2023-02-02T05:55:49Z

Verify canary release

I verified that the issue exists in the latest Next.js canary release

Provide environment information

Operating System:
  Platform: darwin
  Arch: arm64
  Version: Darwin Kernel Version 22.3.0: Thu Jan  5 20:48:54 PST 2023; root:xnu-8792.81.2~2/RELEASE_ARM64_T6000
Binaries:
  Node: 18.13.0
  npm: 8.19.3
  Yarn: 1.22.19
  pnpm: 7.26.2
Relevant packages:
  next: 12.0.9
  react: 17.0.2
  react-dom: 17.0.2

Which area(s) of Next.js are affected? (leave empty if unsure)

CLI (create-next-app)

Link to the code that reproduces this issue

https://github.com/vercel/next.js/files/10565355/reproduce.zip

To Reproduce

reproduce.zip

This problem can reproduce above next@12.0.9, but 12.0.8 was all right.

Or remove getInitialProps in _app.tsx was all right above next@12.0.9.

// GlobalApp.getInitialProps = async function getInitialProps(appContext) {
//   const appProps = await App.getInitialProps(appContext);

//   return {
//     ...appProps,
//   };
// };

Describe the Bug

Hight number of processes of /next/dist/compiled/jest-worker/processChild.js still alive after next build

Expected Behavior

Kill all child processes.

Which browser are you using? (if relevant)

No response

How are you deploying your application? (if relevant)

No response

_NEXT-1348

The text was updated successfully, but these errors were encountered:

yzubkov · 2023-04-24T07:40:56Z

In my case I get several ".../node_modules/next/dist/compiled/jest-worker/processChild.js" of these processes taking up lots of memory. I see these appear after executing "npm run start", and they disappear when I terminate the app (Ctrl+c).
Not sure if or how this potentially relates to the build process.

schorfES · 2023-05-15T13:47:16Z

We have also observed this issue in production, where it consumes memory that is likely not used or needed. This behavior was introduced in version 13.4.0. There is an open discussion about this topic, which you can find at: #49238.

nicosh · 2023-05-24T07:09:15Z

We have the same problem, after few deployments server is going out of memory. As temporary fix i added the following script in the deployment pipeline :

#!/bin/bash

# Find the process IDs of all processes containing the string "processChild.js" in the command path
pids=$(pgrep -f "processChild.js")

# Iterate over each process ID and kill the corresponding process
for pid in $pids; do
    echo "Killing process: $pid"
    kill "$pid"
done

But even with this script seems that applications keep spawning zombie processes.

switz · 2023-05-29T19:46:08Z

Seeing this as well in prod

MonstraG · 2023-05-30T13:51:21Z

Downgrading to <13.4.0 for now I guess

leerob · 2023-05-30T16:21:37Z

Merged this discussion into here: #49238

This might be related: 83b774e#diff-90d1d5f446bdf243be25cc4ea2295a9c91508859d655e51d5ec4a3562d3a24d9L1930

leerob · 2023-05-30T16:22:08Z

Small favor, could you include a reproduction as a CodeSandbox instead of a zip file?

github-actions · 2023-05-30T16:22:47Z

We cannot recreate the issue with the provided information. Please add a reproduction in order for us to be able to investigate.

Why was this issue marked with the `please add a complete reproduction` label?

To be able to investigate, we need access to a reproduction to identify what triggered the issue. We prefer a link to a public GitHub repository (template for pages, template for App Router), but you can also use these templates: CodeSandbox: pages or CodeSandbox: App Router.

To make sure the issue is resolved as quickly as possible, please make sure that the reproduction is as minimal as possible. This means that you should remove unnecessary code, files, and dependencies that do not contribute to the issue.

Please test your reproduction against the latest version of Next.js (next@canary) to make sure your issue has not already been fixed.

I added a link, why was it still marked?

Ensure the link is pointing to a codebase that is accessible (e.g. not a private repository). "example.com", "n/a", "will add later", etc. are not acceptable links -- we need to see a public codebase. See the above section for accepted links.

What happens if I don't provide a sufficient minimal reproduction?

Issues with the please add a complete reproduction label that receives no meaningful activity (e.g. new comments with a reproduction link) are automatically closed and locked after 30 days.

If your issue has not been resolved in that time and it has been closed/locked, please open a new issue with the required reproduction.

I did not open this issue, but it is relevant to me, what can I do to help?

Anyone experiencing the same issue is welcome to provide a minimal reproduction following the above steps. Furthermore, you can upvote the issue using the 👍 reaction on the topmost comment (please do not comment "I have the same issue" without reproduction steps). Then, we can sort issues by votes to prioritize.

I think my reproduction is good enough, why aren't you looking into it quicker?

We look into every Next.js issue and constantly monitor open issues for new comments.

However, sometimes we might miss one or two due to the popularity/high traffic of the repository. We apologize, and kindly ask you to refrain from tagging core maintainers, as that will usually not result in increased priority.

Upvoting issues to show your interest will help us prioritize and address them as quickly as possible. That said, every issue is important to us, and if an issue gets closed by accident, we encourage you to open a new one linking to the old issue and we will look into it.

Useful Resources

bfife-bsci · 2023-05-31T15:08:10Z

I am commenting as a +1 to #49238 which I think more accurately described our issue. We only have 2 processChild.js processes, but this is likely due to running on GKE nodes with 2 CPUs. We run a minimum of 3 pods behind a service/load balancer. We unfortunately do not have a reproduction.

We were running 13.4.1 on node v16.19.0 in our production environment, and discovered that after some volume of requests or perhaps even period of time (as short as a day and a half, as long as 5 days), some next.js servers were becoming slow to unresponsive. New requests would take at least 5 seconds to process a response. CPU usage in the pod was maxed out, divided roughly at 33% user and 66% system. We discovered that requests are being proxied to a processChild.js child process, which is listening on a different port (is this the new App Router?). We observed the following characteristics:

excessive CPU usage
increased memory usage overall
no extra logging was observed (we don't see any logs after server startup)
there were over 3100 TCP connections established between the parent and processChild.js process
the parent process appeared to be retrying/attempting to retransmit requests that were queued up inside of it

strace'ing showed the following signature over and over again with different sockets/URLs

...
write(1593, "GET /URL1"..., 2828) = -1 EAGAIN (Resource temporarily unavailable)
write(1600, "GET /URL2"..., 2833) = -1 EAGAIN (Resource temporarily unavailable)
epoll_wait(13, [{EPOLLOUT, {u32=266, u64=266}}, {EPOLLOUT, {u32=276, u64=276}}, {EPOLLOUT, {u32=280, u64=280}}, {EPOLLOUT, {u32=267, u64=267}}, {EPOLLOUT, {u32=315, u64=315}}, {EPOLLOUT, {u32=20, u64=20}}, {EPOLLOUT, {u32=322, u64=322}}, {EPOLLOUT, {u32=275, u64=275}}, {EPOLLOUT, {u32=325, u64=325}}, {EPOLLOUT, {u32=279, u64=279}}, {EPOLLOUT, {u32=332, u64=332}}, {EPOLLOUT, {u32=336, u64=336}}, {EPOLLOUT, {u32=314, u64=314}}, {EPOLLOUT, {u32=358, u64=358}}, {EPOLLOUT, {u32=324, u64=324}}, {EPOLLOUT, {u32=360, u64=360}}, {EPOLLOUT, {u32=281, u64=281}}, {EPOLLOUT, {u32=335, u64=335}}, {EPOLLOUT, {u32=296, u64=296}}, {EPOLLOUT, {u32=343, u64=343}}, {EPOLLOUT, {u32=377, u64=377}}, {EPOLLOUT, {u32=379, u64=379}}, {EPOLLOUT, {u32=359, u64=359}}, {EPOLLOUT, {u32=285, u64=285}}, {EPOLLOUT, {u32=268, u64=268}}, {EPOLLOUT, {u32=392, u64=392}}, {EPOLLOUT, {u32=366, u64=366}}, {EPOLLOUT, {u32=378, u64=378}}, {EPOLLOUT, {u32=406, u64=406}}, {EPOLLOUT, {u32=326, u64=326}}, {EPOLLOUT, {u32=323, u64=323}}, {EPOLLOUT, {u32=420, u64=420}}, ...], 1024, 0) = 275
write(266, "GET /URL3"..., 2839) = -1 EAGAIN (Resource temporarily unavailable)
write(276, "GET /URL4"..., 2830) = -1 EAGAIN (Resource temporarily unavailable)
write(280, "GET /URL5"..., 2825) = -1 EAGAIN (Resource temporarily unavailable)
...

It looks like the parent process continuously retries sending requests which are not being serviced/read into the child process. We're not sure what puts the server into this state (new requests will still be accepted and responded to slowly), but due to the unresponsiveness we downgraded back to 13.2.3.

billnbell · 2023-06-01T20:55:06Z

I get next/dist/compiled/jest-worker/processChild.js running in NODE_ENV=production when running next start ??

Downgrading.

csi-lk · 2023-06-02T00:19:09Z

Downgrading

Hmm I don't know if downgrading helps @billnbell , I've seen this in our traces going back a few versions now, let me know if you have a specific version where this isn't an issue, I'm worried about memory utilisation as we're seeing it max out on our containers :)

Edit: just read above about < 13.4.0 ill give this a go and report back

cjcheshire · 2023-06-02T07:07:36Z

Here to say me too. We’ve recently jumped on the 13.4 bandwagon and the last two weeks started to see memory maxing.

(Apologies, just read the bot asking me not to say this)

BuddhiAbeyratne · 2023-06-02T16:27:50Z

I just had a massive outage thanks to this it creeps up on you and doesn't die there's no way to easily kill the workers and also it stops build systems once it hits max ram

billnbell · 2023-06-02T16:31:02Z

I can confirm downgrading worked for me. 13.2.3

BuddhiAbeyratne · 2023-06-02T16:33:24Z

maybe this will help too

    git pull
    npm ci || exit
    BUILD_DIR=.nexttemp npm run build || exit
    if [ ! -d ".nexttemp" ]; then\
            echo '\033[31m .nexttemp Directory not exists!\033[0m'; \
            exit 1; \
    fi;
    rm -rf .next
    mv .nexttemp .next
    pm2 reload all --update-env
    echo "Deployment done."

BuddhiAbeyratne · 2023-06-02T16:43:13Z

seems like the jest worker is required or else pm2 can't serve the site on prod mode

BuddhiAbeyratne · 2023-06-02T16:43:48Z

the solution im using now is kill all and restart the service so it only makes 2 workers

cjcheshire · 2023-06-02T16:46:25Z

I just had a massive outage thanks to this it creeps up on you and doesn't die there's no way to easily kill the workers and also it stops build systems once it hits max ram

This is freaky. We just did too!

We have 800 pages, some more than others have more than two api requests to build the page. We had a 1gb limit on our pods, upped to 2gb and has helped us.

BuddhiAbeyratne · 2023-06-02T16:52:53Z

I'm on 13.4.1 if that helps to debug

billnbell · 2023-06-02T18:18:05Z

I'm on 13.4.1 if that helps to debug
That is why I switched to 13.2.3. I have not tried newer versions or canary yet.

billnbell · 2023-06-02T18:20:32Z

Just to be clear - we get out of memory in PRODUCTION mode when serving the site. I know others are seeing it when using next build but we are getting this over time when. using next start. Downgrading worked for us.

I don't really know why jest is running and eating all memory on the box. Can we add a parameter to turn off jest when running next start ?

cjcheshire · 2023-06-02T18:22:57Z

@billnbell its not jest though right it’s the jest-worker package.

We even prune dev dependencies in production!

billnbell · 2023-06-02T18:24:21Z

What is a jest-worker?

cjcheshire · 2023-06-02T18:30:11Z

It’s a package. Which we presume is how the background tasks work for building. https://www.npmjs.com/package/jest-worker?activeTab=readme

S-YOU · 2023-06-02T18:33:30Z

The name jest-worker is actually confusing (at least for me) because of popular test framework jest,
jest itself seems to be huge repository with a lot of packages, it should called facebook's web server / worker or something else.

timneutkens · 2023-07-28T21:00:17Z

I just checked with @ijjk and as it turns out he saw something similar and fixed it in a recent refactor:

next.js/packages/next/src/server/lib/router-server.ts

Lines 247 to 262 in 46677cc

    
           const cleanup = () => { 
        
             debug('router-server process cleanup') 
        
             for (const curWorker of [ 
        
               ...((renderWorkers.app as any)?._workerPool?._workers || []), 
        
               ...((renderWorkers.pages as any)?._workerPool?._workers || []), 
        
             ] as { 
        
               _child?: import('child_process').ChildProcess 
        
             }[]) { 
        
               curWorker._child?.kill('SIGKILL') 
        
             } 
        
           } 
        
           process.on('exit', cleanup) 
        
           process.on('SIGINT', cleanup) 
        
           process.on('SIGTERM', cleanup) 
        
           process.on('uncaughtException', cleanup) 
        
           process.on('unhandledRejection', cleanup)

. Could you try with next@canary?

hanoii · 2023-07-28T21:13:01Z

@timneutkens I tried it locally as I was able to reproduce it as well and yes, next@canary at least doesn't leave the process in a straight out start fail:

I am getting a different error:

[Error: ENOENT: no such file or directory, open '/var/www/html/next/.next/BUILD_ID'] {

but I guess that's ok.

Maybe this fixes it.

sedlukha · 2023-07-29T08:40:29Z

@timneutkens

Also please make it clear what you're running. I.e. @sedlukha is that development? I guess so?

No, this is prod. I run it for 17 apps.

And i've tried canary, now even worse memory usage, 4.9G (13.4.13-canary.6) vs 2.4G (v13.2.4) vs 3.16G (v13.4.12)

sedlukha · 2023-07-29T08:52:09Z

@timneutkens seems that experimental.appDir: false might disable next-render-worker-app process and solve the problem for those, who use only pages routing.

I would be happy to test it, but I can't do it on my real apps because of next issue
#52875

timneutkens · 2023-07-29T10:52:40Z

@sedlukha Seems what you're reporting is exactly the same as #49929 in which I've already explained the memory usage, there is no leak, it's just using multiple processes and we're working on optimizing that: #49929 (comment)

Setting appDir: false is not supported and that option will go away in a future version, we just haven't gotten around to removing the feature flag.

@hanoii thanks for checking 👍

Nirmal1992 · 2023-08-07T11:44:25Z

Same here.. my macbook crashed when I used Nextjs latest with turbo repo.. multiple child processes were running in the background even after terminating the server...

S-YOU · 2023-08-08T19:36:38Z

FYI: experimental: {appDir: false} does not work anymore on 13.4.13 for me (page rendered, but url changes failed to load json and triggering ssr), and now spawning 3 processes apart from main process.

next-router-worker
next-render-worker-app
next-render-worker-pages

space1worm · 2023-08-09T08:37:25Z

I have same issue as well. version 13.4.8

timneutkens · 2023-08-09T13:44:55Z

@Nirmal1992 @S-YOU @space1worm I'm surprised you did not read my previous comment. I thought it was clear that these types of comments are not constructive? #45508 (comment)

@space1worm I'm even more surprised you're posting "same issue" without trying the latest version of Next.js...

space1worm · 2023-08-09T16:42:23Z

@timneutkens Hey, yeah sorry I missed it, here I made my test repo public.

You can check this commit tracer

I had memory usage problem on version 13.4.8, after navigating on any page my pod's memory would skyrocket for some reason... and after that whole app was braking an becoming unresponsive.

not sure, if this problem is related to my codebase or not, would love to hear what is the problem!

one more thing, I tried to increase resources but the application was still unresponsive after breaking.

Here as a reference

timneutkens · 2023-08-09T17:10:36Z

Application is still not using the latest version of Next.js, same in the commit linked: https://gitlab.cern.ch/nzurashv/tracer/-/blob/master/package-lock.json#L4673

space1worm · 2023-08-09T17:18:12Z

@timneutkens I have updated to latest version, created new branch tracer/test

Issue still persist

here you can check this link as well

tracer-test.web.cern.ch

Additionally, I inquired with the support team regarding the cause of the failure, and they provided me with the following explanation.

glaustino · 2023-08-14T02:54:41Z

FYI: experimental: {appDir: false} does not work anymore on 13.4.13 for me (page rendered, but url changes failed to load json and triggering ssr), and now spawning 3 processes apart from main process.

next-router-worker

next-render-worker-app

next-render-worker-pages

I have a question about these child processes, currently it seems they open random ports which broke my application behind WAF in Azure, this happened because we only open certain ports. Is there anyway for me to force the ports these child processes are going to use at all? I am on the latest next release

jrscholey · 2023-08-14T11:53:58Z

FYI: With 13.4.11 we were unable to start our app in Kubernetes. We received a spawn process E2BIG at jest-worker. This only happened when our rewrites (regex path matching) were above a certain length (although still below max).

Downgrading back to 13.2.4 resolved the issue.

S-YOU · 2023-08-14T21:47:55Z

FYI: now main process started with node server.js is gone in Next.js 13.4.15, and next-router-worker's parent PID become 1 (init). This could probably use less memory since It use one less process.

1362416       1      00:00:02 next-router-worker
1362432 1362416      00:00:00 next-render-worker-app
1362433 1362416      00:00:05 next-render-worker-pages

S-YOU · 2023-08-14T21:58:30Z

@timneutkens, sorry, I probably misread it. I do not mean to claim or anything. I am just sharing what I've observed in the version I am using (which supposed to be latest release).

timneutkens · 2023-08-15T07:35:13Z

In 13.4.15 (but really upgrade to 13.4.16 instead) this PR has landed to remove one of the processes indeed: #53523

sedlukha · 2023-08-23T13:31:05Z

@timneutkens I've tried v13.4.20-canary.2.

It was expected that #53523 and #54143 would reduce the number of processes, resulting in lower memory usage.

Yes, the number of processes has been reduced; after the update, I see only two processes. However, memory usage is still higher than it was with v.13.2.4.

node v.16.18.1 (if it matters)

v13.4.20-canary.2

13.2.4

timneutkens · 2023-08-23T13:45:43Z

It's entirely unclear what you're running / filtering by, e.g. you're filtering by next- but 13.2.4 doesn't set process.title to anything specific.

Sharing screenshots is really not useful, I keep having to repeat that in every single comment around these memory issues.

Please share code, I can't do anything to help you otherwise.

magalhas · 2023-08-24T11:46:53Z

I'm seeing this behaviour running next dev starting on 13.3 and newer versions (13.4 included). This isn't happening on 13.2. Somehow this looks like it's happening whenever files are being added/removed from the FS (not sure due to my current use case) while the dev script is running.

Even after closing next dev, jest orphaned processes are leftover.

timneutkens · 2023-08-24T12:36:19Z

I'm amazed by how often my comments are flat out ignored the past few weeks on various issues. We won't be able to investigate/help based on comments saying the equivalent of "It's happening". Please share code, I can't do anything to help you otherwise.

I'll have to close this issue when there is one more comment without a reproduction as I've checked multiple times now and the processes are cleaned up correctly in the latest version.

This implements the same cleanup logic used for start-server and render-workers for the workers used during build. Fixes vercel#45508

magalhas · 2023-08-24T12:49:04Z

By latest version you mean the latest RC @timneutkens ? Sorry can't help with steps to reproduce, this is happening inside a spawn call in a very specific use case so best I can do is confirm that it happens.

This implements the same cleanup logic used for start-server and render-workers for the workers used during build. It's more of a contingency as we do call `.end()` on the worker too. Fixes #45508 Co-authored-by: Zack Tanner <1939140+ztanner@users.noreply.github.com>

zqjimlove added the bug Issue was opened via the bug report template. label Feb 2, 2023

This comment was marked as off-topic.

Sign in to view

leerob added the please add a complete reproduction The issue lacks information for further investigation label May 30, 2023

github-actions bot added the type: needs triage label May 31, 2023

billnbell mentioned this issue Jun 2, 2023

Latest version locking up! #50707

Closed

1 task

This comment was marked as outdated.

Sign in to view

This comment was marked as off-topic.

Sign in to view

magalhas mentioned this issue Aug 24, 2023

Node processes being orphaned and potentially duplicated when changing revisions clutch-creator/clutch#1

Closed

timneutkens mentioned this issue Aug 24, 2023

Add cleanup logic to worker.ts #54500

Merged

timneutkens added a commit to timneutkens/next.js that referenced this issue Aug 24, 2023

Add cleanup logic to worker.ts

4cd0df5

This implements the same cleanup logic used for start-server and render-workers for the workers used during build. Fixes vercel#45508

kodiakhq bot closed this as completed in #54500 Aug 26, 2023

vercel locked as resolved and limited conversation to collaborators Aug 28, 2023

Hight number of processes of /next/dist/compiled/jest-worker/processChild.js still alive after next build #45508

Hight number of processes of /next/dist/compiled/jest-worker/processChild.js still alive after next build #45508

Comments

zqjimlove commented Feb 2, 2023 • edited by padmaia

Verify canary release

Provide environment information

Which area(s) of Next.js are affected? (leave empty if unsure)

Link to the code that reproduces this issue

To Reproduce

Describe the Bug

Expected Behavior

Which browser are you using? (if relevant)

How are you deploying your application? (if relevant)

This comment was marked as off-topic.

yzubkov commented Apr 24, 2023

schorfES commented May 15, 2023

nicosh commented May 24, 2023

switz commented May 29, 2023

MonstraG commented May 30, 2023

leerob commented May 30, 2023

leerob commented May 30, 2023

github-actions bot commented May 30, 2023

Why was this issue marked with the please add a complete reproduction label?

I added a link, why was it still marked?

What happens if I don't provide a sufficient minimal reproduction?

I did not open this issue, but it is relevant to me, what can I do to help?

I think my reproduction is good enough, why aren't you looking into it quicker?

Useful Resources

bfife-bsci commented May 31, 2023 • edited

billnbell commented Jun 1, 2023

csi-lk commented Jun 2, 2023 • edited

cjcheshire commented Jun 2, 2023 • edited

BuddhiAbeyratne commented Jun 2, 2023

billnbell commented Jun 2, 2023

BuddhiAbeyratne commented Jun 2, 2023

BuddhiAbeyratne commented Jun 2, 2023

BuddhiAbeyratne commented Jun 2, 2023

cjcheshire commented Jun 2, 2023

BuddhiAbeyratne commented Jun 2, 2023

billnbell commented Jun 2, 2023

billnbell commented Jun 2, 2023

cjcheshire commented Jun 2, 2023

billnbell commented Jun 2, 2023

cjcheshire commented Jun 2, 2023 • edited

S-YOU commented Jun 2, 2023

timneutkens commented Jul 28, 2023

hanoii commented Jul 28, 2023

sedlukha commented Jul 29, 2023

sedlukha commented Jul 29, 2023 • edited

timneutkens commented Jul 29, 2023 • edited

Nirmal1992 commented Aug 7, 2023

S-YOU commented Aug 8, 2023 • edited

space1worm commented Aug 9, 2023

timneutkens commented Aug 9, 2023

space1worm commented Aug 9, 2023 • edited

timneutkens commented Aug 9, 2023

space1worm commented Aug 9, 2023 • edited

glaustino commented Aug 14, 2023

jrscholey commented Aug 14, 2023 • edited

S-YOU commented Aug 14, 2023 • edited

S-YOU commented Aug 14, 2023

timneutkens commented Aug 15, 2023

This comment was marked as outdated.

sedlukha commented Aug 23, 2023

timneutkens commented Aug 23, 2023

This comment was marked as off-topic.

magalhas commented Aug 24, 2023 • edited

timneutkens commented Aug 24, 2023

magalhas commented Aug 24, 2023 • edited

zqjimlove commented Feb 2, 2023 •

edited by padmaia

Why was this issue marked with the `please add a complete reproduction` label?

bfife-bsci commented May 31, 2023 •

edited

csi-lk commented Jun 2, 2023 •

edited

cjcheshire commented Jun 2, 2023 •

edited

cjcheshire commented Jun 2, 2023 •

edited

sedlukha commented Jul 29, 2023 •

edited

timneutkens commented Jul 29, 2023 •

edited

S-YOU commented Aug 8, 2023 •

edited

space1worm commented Aug 9, 2023 •

edited

space1worm commented Aug 9, 2023 •

edited

jrscholey commented Aug 14, 2023 •

edited

S-YOU commented Aug 14, 2023 •

edited

magalhas commented Aug 24, 2023 •

edited

magalhas commented Aug 24, 2023 •

edited