Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.github/workflows/ci.yml: improvements #4551

Open
tarasmadan opened this issue Mar 4, 2024 · 4 comments
Open

.github/workflows/ci.yml: improvements #4551

tarasmadan opened this issue Mar 4, 2024 · 4 comments

Comments

@tarasmadan
Copy link
Collaborator

tarasmadan commented Mar 4, 2024

Is your feature request related to a problem? Please describe.
Arc v2 based ci runners configuration may be better.

Describe the solution you'd like

  1. GCP autoscaling doesn't work. Let's monitor Docker Container Action Jobs failing to schedule on autoscaled cluster actions/runner-container-hooks#140 progress.
  2. The runners scale down to 0. Setting minRunners to 4-6-8 we can save 30 seconds for every job.
  3. Force-push creates new testing request. We don't cancel already started requests for that PR. It makes sense to cancel them. GH documentation knows how to do it.
  4. [Done, .github/workflows/ci.yml: require min 31 core per test job #4537 ] We don't limit the runners count. Having more runners than instances may be the source of timeout-error. It makes sense to fix maxRunners.
  5. The caching doesn't work. Effective cache size is 0.
  6. [Done, it seems to be a spot VMs usage side effect] Some jobs finish with 130 error.
  7. [Done, .github/workflows/ci.yml: workaround codecov limitation to detect environment #4558] CodeCov requires additional configuration. It doesn't properly detect "Github Actions" environment.
@tarasmadan
Copy link
Collaborator Author

#4537 surprisingly limited amount of runners/machine to 1.

@tarasmadan
Copy link
Collaborator Author

Removed spot machines pool. Added normal machines.
As a result, I don't see the jobs to fail with 130 error.
With high probability 130 error was a preemption side affect.
To use spot VMs, ARC has to restart the job in case of error.
We can modify ci.yml to restart jobs, but I don't want to pollute the ci.yml file.

@tarasmadan
Copy link
Collaborator Author

tarasmadan commented Mar 6, 2024

One more problem - codecov detects "Local" environment instead of "Github Actions".
Normal log "['info'] Detected GitHub Actions as the CI provider."
Current log "['info'] Detected Local as the CI provider."

@tarasmadan
Copy link
Collaborator Author

#4558 to workaround codecov problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant