Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent failures during Post Setup Python step for MacOS #857

Open
2 of 5 tasks
andrewkho opened this issue May 1, 2024 · 4 comments
Open
2 of 5 tasks

Intermittent failures during Post Setup Python step for MacOS #857

andrewkho opened this issue May 1, 2024 · 4 comments
Assignees
Labels
bug Something isn't working

Comments

@andrewkho
Copy link

andrewkho commented May 1, 2024

I'm new to Github Actions and I'm having trouble understanding this failure, apologies if this isn't the right way to flag the issue.

Description:
Post Setup Python fails intermittently with macos-latest. On successful runs it's much slower to clean up / shut down than windows and linux.

Action version:
Tested with Actions v3/v4 and setup-python v4/v5

Platform:

  • Ubuntu
  • macOS
  • Windows

Runner type:

  • Hosted
  • Self-hosted

Tools version:
3.8, 3.9, 3.10

Repro steps:

The original workflow yaml is here: https://github.com/pytorch/data/blob/main/.github/workflows/stateful_dataloader_ci.yml

In this failed run I tried updating actions from v3 -> v4 and setup-python from v4 -> v5, and it still exhibits the behaviour:
Example of failed run: https://github.com/pytorch/data/actions/runs/8903946672/job/24452473208?pr=1249
Failed retry with debug logs: https://github.com/pytorch/data/actions/runs/8903946672/job/24475084388

##[debug]Evaluating condition for step: 'Post Setup Python 3.9'
##[debug]Evaluating: success()
##[debug]Evaluating success:
##[debug]=> true
##[debug]Result: true
##[debug]Starting: Post Setup Python [3](https://github.com/pytorch/data/actions/runs/8903946672/job/24475084388#step:24:3).9
##[debug]Loading inputs
##[debug]Evaluating: matrix.python-version
##[debug]Evaluating Index:
##[debug]..Evaluating matrix:
##[debug]..=> Object
##[debug]..Evaluating String:
##[debug]..=> 'python-version'
##[debug]=> 3.[9](https://github.com/pytorch/data/actions/runs/8903946672/job/24475084388#step:24:9)
##[debug]Result: 3.9
##[debug]Evaluating: (((github.server_url == 'https://github.com') && github.token) || '')
##[debug]Evaluating Or:
##[debug]..Evaluating And:
##[debug]....Evaluating Equal:
##[debug]......Evaluating Index:
##[debug]........Evaluating github:
##[debug]........=> Object
##[debug]........Evaluating String:
##[debug]........=> 'server_url'
##[debug]......=> 'https://github.com/'
##[debug]......Evaluating String:
##[debug]......=> 'https://github.com'
##[debug]....=> true
##[debug]....Evaluating Index:
##[debug]......Evaluating github:
##[debug]......=> Object
##[debug]......Evaluating String:
##[debug]......=> 'token'
##[debug]....=> '***'
##[debug]..=> '***'
##[debug]=> '***'
##[debug]Expanded: ((('https://github.com/' == 'https://github.com') && '***') || '')
##[debug]Result: '***'
##[debug]Loading env
Post job cleanup.
##[debug]Re-evaluate condition on job cancellation for step: 'Post Setup Python 3.9'.

Expected behavior:
Expect Post Setup-Python to finish quickly and succeed.

Actual behavior:
Post Setup-Python hangs and marks the run as failed.

@HarithaVattikuti
Copy link
Contributor

Hello @andrewkho
Thank you for creating this issue. We will investigate it and get back to you as soon as we have some feedback.

@aparnajyothi-y
Copy link

Hello @andrewkho, we have investigated the issue and we are not able to reproduce it with actions/setup-python@v3,v4,v5. Please find the screenshots for reference.
We have noticed in the provided run in this issue that post checkout job isn't terminating as expected. It might be due to an external service not responding as expected, causing the job to hang.
Moreover, the workflow provided does interact with a few external services:
1.PyTorch Channels: The step ""Get PyTorch Channel"" determines the URL for either the test or nightly PyTorch builds hosted on ""https://download.pytorch.org/"". This URL is later used in the ""Install dependencies"" step to install PyTorch.
2.GitHub: The step ""Check out source repository"" uses the actions/checkout@v4 GitHub Action to fetch the source code of the repository.
3.PyPI (Python Package Index): Several steps in the workflow involve installing Python packages using pip, which fetches packages from PyPI.
Any of these could potentially cause a hang if the service is down, or there's an issue with the package/tool being fetched.

image image image image

Please let us know in case of any further clarifications needed.

@andrewkho
Copy link
Author

Hi @aparnajyothi-y thanks for trying to repro. I think the issue is that there is no clear error message or way to debug this as far as I can tell. eg I have no idea what the container is doing, if the failure is eg. due to a timeout, if it's a timeout, how long is it? Or is it an OOM? It's really difficult to try and debug without anything to go on

@aparnajyothi-y
Copy link

Hello @andrewkho, to help investigate the error message, could you please enable debug logs and run the workflow? You can follow the steps in this document to do so. Once done, kindly update the link to the repository with the debug logs included. This will assist in further inspection of the setup-python issue mentioned above, as we're currently unable to replicate the error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants