Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After migration from v1 to v2 silently fails #174

Closed
rlconst opened this issue Feb 2, 2024 · 21 comments
Closed

After migration from v1 to v2 silently fails #174

rlconst opened this issue Feb 2, 2024 · 21 comments

Comments

@rlconst
Copy link

rlconst commented Feb 2, 2024

That's it, nothing helpful for troubleshooting

 ##[group]Run gradle/wrapper-validation-action@v2
 with:
   min-wrapper-count: 1
   allow-snapshots: false
 env:
   JAVA_HOME: /opt/hostedtoolcache/Java_Adopt_jdk/17.0.10-7/x64
   JAVA_HOME_17_X64: /opt/hostedtoolcache/Java_Adopt_jdk/17.0.10-7/x64
   AWS_DEFAULT_REGION: us-east-1
   AWS_REGION: us-east-1
   AWS_ACCESS_KEY_ID: ***
   AWS_SECRET_ACCESS_KEY: ***
 ##[endgroup]
@wjglerum
Copy link

wjglerum commented Feb 2, 2024

We are having the same issue here, no errors or warnings in the logs.

@bigdaz
Copy link
Member

bigdaz commented Feb 2, 2024

As far as I'm aware, the only significant change in v2 is that Node20 is now required to run the action.

Are you using a GitHub-provided runner, or a custom runner?

If there are no errors or warnings, can you describe what "silently fails" looks like? Screenshots or links to runs would be very helpful.

Is there anything more you can share that might help us determine the underlying cause?

@sergei-ivanov
Copy link

We have been suffering from sporadic failures of this action ever since I upgraded it to @v2. There is no output from the action, it is simply marked as failed, and that is it. We are using "larger" GitHub-hosted runners, but it also fails on the standard ubuntu-latest runner.

@sergei-ivanov
Copy link

This is how it looks like in the Actions UI. There's no output at all. The action is marked as failed and the build stops.
The only thing that I suspect may be at play is the use of caches.
BTW, we have also migrated to gradle/actions/setup-gradle@v3 at the same time, but I am not sure if one of them affects the other. Again, possibly a change in caching, but I cannot yet provide any evidence for that.

Screenshot from 2024-02-02 17-18-46

@bigdaz
Copy link
Member

bigdaz commented Feb 2, 2024

Thanks for the further info. Can you please try to enable GitHub Actions debugging and see if that provides more clues?

Also, why are you setting GRADLE_BUILD_ACTION_SETUP_COMPLETED and GRADLE_BUILD_ACTION_CACHE_RESTORED? These are internal flags not designed to be set by users. Besides, they should have no impact on wrapper-validation-action.

@bigdaz
Copy link
Member

bigdaz commented Feb 2, 2024

@TWiStErRob (or anyone else) : can you confirm that v2 is working for you? Just want to see if this issue is widespread.

@sergei-ivanov
Copy link

Also, why are you setting GRADLE_BUILD_ACTION_SETUP_COMPLETED and GRADLE_BUILD_ACTION_CACHE_RESTORED? These are internal flags not designed to be set by users. Besides, they should have no impact on wrapper-validation-action.

I believe those are set by gradle/actions/setup-gradle@v3, which runs just before gradle/wrapper-validation-action@v2. We are not setting them manually anywhere.

@sergei-ivanov
Copy link

Thanks for the further info. Can you please try to enable GitHub Actions debugging and see if that provides more clues?

I tried re-running the previously failed job a few times with debug logs and the action worked every time. But I may have been lucky because it only failed 1 time out of 10 before.

@jprosenbaum
Copy link

V2 works intermittently, we have 2 separate Gradle steps in the same workflow. The second one fails more frequently.
With debug enabled on the repo, I have not seen it fail.

@bigdaz
Copy link
Member

bigdaz commented Feb 3, 2024

Also, why are you setting GRADLE_BUILD_ACTION_SETUP_COMPLETED and GRADLE_BUILD_ACTION_CACHE_RESTORED? These are internal flags not designed to be set by users. Besides, they should have no impact on wrapper-validation-action.

I believe those are set by gradle/actions/setup-gradle@v3, which runs just before gradle/wrapper-validation-action@v2. We are not setting them manually anywhere.

Ah, got it. The image you shared is Github describing the full environment, not the actual config you entered.

I'm guessing there's something different in Node 20 that causes the process to return with a non-zero exit value. I just don't have any idea what that might be.

@TWiStErRob
Copy link
Contributor

TWiStErRob commented Feb 3, 2024

@TWiStErRob (or anyone else) : can you confirm that v2 is working for you?

I think it works for me

2 emails in thread means PR opened and auto-merged. This is from last night.

image

Almost all my repos use this reusable workflow (and version):
https://github.com/TWiStErRob/github-workflows/blob/v1.4.2/.github/workflows/validate.yml

and it's a required check on branch protection rules.

@wjglerum
Copy link

wjglerum commented Feb 5, 2024

Just tried with debugging enabled and saw it fail multiple times with exit code 1

##[debug]Evaluating condition for step: 'run'
##[debug]Evaluating: success()
##[debug]Evaluating success:
##[debug]=> true
##[debug]Result: true
##[debug]Starting: run
##[debug]Loading inputs
##[debug]Loading env
Run gradle/wrapper-validation-action@v2
  with:
    min-wrapper-count: 1
    allow-snapshots: false
  env:
    GITHUB_ACTOR: github-merge-queue[bot]
    GITHUB_TOKEN: ***
    JAVA_HOME: /opt/hostedtoolcache/Java_Temurin-Hotspot_jdk/17.0.10-7/x64
    JAVA_HOME_17_X64: /opt/hostedtoolcache/Java_Temurin-Hotspot_jdk/17.0.10-7/x64
    GRADLE_BUILD_ACTION_SETUP_COMPLETED: true
    GRADLE_BUILD_ACTION_CACHE_RESTORED: true
##[debug]Node Action run completed with exit code 1
##[debug]Finished: run
##[debug]Evaluating condition for step: 'run'
##[debug]Evaluating: success()
##[debug]Evaluating success:
##[debug]=> false
##[debug]Result: false
##[debug]Evaluating condition for step: 'run'
##[debug]Evaluating: success()
##[debug]Evaluating success:
##[debug]=> false
##[debug]Result: false

nothing about any failure or error in the logs

@jprosenbaum
Copy link

Adding the repository variable ACTIONS_RUNNER_DEBUG=true prevents the validation from failing

@deejay1
Copy link
Contributor

deejay1 commented Feb 6, 2024

Every run locally runs (with node versions 20.x, I haven't tested patch releases though), on GitHub runners and also on self hosted runners it seems to fail and debug doesn't show anything, trying to investigate still...

@TWiStErRob
Copy link
Contributor

TWiStErRob commented Feb 6, 2024

I had a quick look at the code and this stood out:

core.setFailed(error.message)

@bigdaz I have a feeling message can be easily "" or undefined, which would set the exit code to 1, but probably wouldn't explain what happened. This is the recommended code in docs, but I think changing it to include Error's toString or Error.stack would help in diagnosing this and future problems.

e.g. core.setFailed(error) (it internally detects Error and does toString on it, but it doesn't show stack, so probably best is: core.setFailed(`Unknown error was thrown: ${error.toString()}\n{error.stack}`)

@deejay1
Copy link
Contributor

deejay1 commented Feb 6, 2024

Error: Unknown error was thrown: AggregateError
AggregateError
    at internalConnectMultiple (node:net:1114:18)
    at internalConnectMultiple (node:net:1177:5)
    at Timeout.internalConnectMultipleTimeout (node:net:1[6]REDACTED:4:8):3)
    at listOnTimeout (node:internal/timers:575:11)
    at process.processTimers (node:internal/timers:514:7)

Refining further (above just to let you know that I'm working on it ;)

@deejay1
Copy link
Contributor

deejay1 commented Feb 6, 2024

Ok, seems like this is just your ordinary timeout error, only hidden away...

Error: Unknown error was thrown: AggregateError
AggregateError
    at internalConnectMultiple (node:net:1114:18)
    at internalConnectMultiple (node:net:1177:5)
    at Timeout.internalConnectMultipleTimeout (node:net:1[6]REDACTED:4:8):3)
    at listOnTimeout (node:internal/timers:575:11)
    at process.processTimers (node:internal/timers:514:7)
Error: connect ETIMEDOUT 104.17.129.37:443,Error: connect ENETUNREACH 2606:4700::6[8]REDACTED#step:4:9)11:8025:443 - Local (:::0),Error: connect ETIMEDOUT [10](REDACTED#step:4:11)4.17.128.37:443,Error: connect ENETUNREACH 2606:4700::68[11](REDACTEDstep:4:13)5:443

@bigdaz
Copy link
Member

bigdaz commented Feb 6, 2024

@deejay1 @TWiStErRob Thanks for investigating. I'm not very familiar with the code in this action, but I agree with your assessments.

My understanding from your analysis is that this isn't a new failure, but that the old ETIMEDOUT failure now results in a silent failure with no reported error message. I guess that something changed from Node 16 -> Node 20 with the way this exception is thrown.

The fix might be as simple as replicating this error handling in the setup-gradle action. https://github.com/gradle/actions/blob/main/sources/src/setup-gradle/main.ts#L26-L31.
We also might want to explicitly unpack AggregateError to reveal the nested causes.

I'm on vacation this week, so any assistance finding a fix will be greatly appreciated.

@deejay1
Copy link
Contributor

deejay1 commented Feb 6, 2024

As far as I managed to found some information NodeJS 17 switched to using the addresses as the DNS resolvers gives them out, so we get IPv6 addresses before/between/first the IPv4 ones (see nodejs/node#39987). Of course now as I set up better logging then every run is good :/

deejay1 added a commit to deejay1/wrapper-validation-action that referenced this issue Feb 6, 2024
# What
Log AggregateError type, when multiple errors are returned
from HTTP client - fixes gradle#174

# Why
We would silently fail otherwise as error.message was empty
for the AggregatedError exception.
@bigdaz bigdaz closed this as completed in 21bea8c Feb 6, 2024
@bigdaz
Copy link
Member

bigdaz commented Feb 7, 2024

This should be fixed in v2.0.1. Please report if this issue (silent failure) continues.

@deejay1
Copy link
Contributor

deejay1 commented Feb 7, 2024

Ok, another cause for the issue at all may be described here nodejs/node#48145

Edit: and nodejs/node#51045 and nodejs/node#47822 Isn't backported to v20 yet :/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants