Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upstream request timeout #856

Closed
jayvdb opened this issue Mar 13, 2024 · 9 comments · Fixed by #860
Closed

upstream request timeout #856

jayvdb opened this issue Mar 13, 2024 · 9 comments · Fixed by #860

Comments

@jayvdb
Copy link

jayvdb commented Mar 13, 2024

We run osv-scanner as part of our CI, and have been seeing lots of the following error for the last few days

scan failed server response error: {"code":504,"message":"upstream request timeout"}

Probably this is related to google/osv.dev#2039

We are scanning a Cargo.lock which is size 202485 and 758 entries in it, so it is a bit large.

AFAICS there is no way to configure the timeout. I am wondering whether that might be a useful addition, as we wou'd prefer it take longer rather than cause a failure in a large complicated CI workflow.

@another-rex
Copy link
Collaborator

We're currently looking into the timeout situation, there seems to be a recent uptick in the number of timeouts we are getting.

Can you share the time it took to timeout for your CI?

@jayvdb
Copy link
Author

jayvdb commented Mar 13, 2024

This command is part of a much larger step in our CI (standard GitHub Actions), so we cant give a duration for the timeout.
I'll run it locally a few times to see if I can reproduce locally.

@jayvdb
Copy link
Author

jayvdb commented Mar 13, 2024

Scanned /home/jayvdb/.../Cargo.lock file and found 758 packages
scan failed server response error: {"message":"upstream request timeout","code":504}


real	1m0.958s
user	0m0.055s
sys	0m0.018s

@jayvdb
Copy link
Author

jayvdb commented Mar 13, 2024

out of 100 attempts, 12 failed with that error.

@oliverchang
Copy link
Collaborator

Thansk @jayvdb ! Is there any way you'd be able to share the Cargo.lock with us? If it's something that's OK to share with us privately, you can email ochang AT google.com.

@jayvdb
Copy link
Author

jayvdb commented Mar 13, 2024

Sent via email

@another-rex
Copy link
Collaborator

The timeout issue should be mitigated now! Please test it out!

AFAICS there is no way to configure the timeout. I am wondering whether that might be a useful addition, as we wou'd prefer it take longer rather than cause a failure in a large complicated CI workflow.

Timeout is on the server side, and ideally no queries should come close to the 60 seconds limit, no matter how many packages are being queried (it's split into chunks of 1000). We're looking at more options to prevent this from happening in the future.

@jayvdb
Copy link
Author

jayvdb commented Mar 13, 2024

We had a enough CI jobs run without this occurring for me to be happy that it is resolved.

As this is essentially a repeat of google/osv.dev#1363 , I do believe something additional is needed.

A few ideas come to mind:

  • have a --verbose mode which emits a note about each chunk of 1000 being processed, and how long the chunk took to complete.
  • when that timeout error occurs in osv-scanner, report how long the application was running
  • add a timeout duration arg, so the end user can figure out how long is needed to get a success.
  • print out a warning any time a chunk of 1000 takes longer than a hard-coded "reasonable" time period to complete (i.e. for 88 of my 100 uses, I would see warnings)
  • configurable retries

@another-rex
Copy link
Collaborator

another-rex commented Mar 18, 2024

I opened #860 to update the retry logic, and we are also doing a few things in the backend to reduce the number of timeouts that will happen as well.

There are some plans for adding verbosity levels to tell the user more information about what is happening (first part is completed in #727 ), though nothing concrete there yet.

another-rex added a commit that referenced this issue Mar 20, 2024
Fixes #856

Fixes a couple of things with our OSV API retry logic:

- Now also retry non 200 responses, in addition to retrying on network
errors. (is there a need to only retry for 500 >= responses? Current
logic retrying all non 200)
- Fix batchquery code so that retries are not sending empty data.
- Increase maxRetryAttempts from 3 to 4
- Implement exponential backoff (actually polynomial/quadratic backoff)
- Correctly close the body after reading error message. 
- Implement retries for determineversion API call
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants