Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Safari runs missing for week of May 15th, 2023 #40085

Closed
DanielRyanSmith opened this issue May 19, 2023 · 23 comments
Closed

Safari runs missing for week of May 15th, 2023 #40085

DanielRyanSmith opened this issue May 19, 2023 · 23 comments

Comments

@DanielRyanSmith
Copy link
Contributor

No stable Safari results have been available since May 15th. wpt.fyi run status is not showing any recent invalid Safari runs. Not quite sure of the reason here. Creating this issue for visibility.

CC @gsnedders

@gsnedders
Copy link
Member

Azure Pipelines seems to have been having problems; the test jobs have been often failing with the test agents going away or stopping responding. Not much we can do here.

@past past added the infra label May 23, 2023
@DanielRyanSmith
Copy link
Contributor Author

Adds a heads-up, this is still a problem and no stable aligned runs (test results on the same hash for Chrome, Edge, Firefox, & Safari) have been produced since May 13th.

@foolip
Copy link
Member

foolip commented May 30, 2023

The most recent run is now from May 23.

@foolip
Copy link
Member

foolip commented May 30, 2023

Edge is failing too, I've filed #40300. But that Edge failing shouldn't affect Safari results being uploaded and vice versa.

@jgraham
Copy link
Contributor

jgraham commented Jun 2, 2023

Edge is fixed, but Safari is still broken to the point that it's effectively breaking Interop scoring (we're getting maybe one update a week). I've submitted #40362 to see if retries work/help, but in any case we need to address the underlying problem. Do we have contacts on the Azure side who could investigate?

@gsnedders
Copy link
Member

We're getting a lot of:

##[error]We stopped hearing from agent Azure Pipelines 10. Verify the agent machine is running and has a healthy network connection. Anything that terminates an agent process, starves it for CPU, or blocks its network access can cause this error. For more information, see: https://go.microsoft.com/fwlink/?linkid=846610

…and there's nothing to suggest that somehow we're actually stopping the agent from running, or somehow ending up with Safari spinning or something. So theoretically something could've changed such that we're now starving the agent process of CPU, but it seems unlikely for that to have suddenly started?

@foolip
Copy link
Member

foolip commented Jun 2, 2023

@mustjab can you help with a contact on the Azure Pipelines team if @gsnedders needs it for debugging this issue?

@mustjab
Copy link
Contributor

mustjab commented Jun 2, 2023

Best way to start would be to file an issue with azure-pipelines-agent team: https://github.com/microsoft/azure-pipelines-agent/issues/new/choose. I've looked through their open issues and didn't find any recent Mac issues, but this might be a related one: microsoft/azure-pipelines-agent#3994

@foolip
Copy link
Member

foolip commented Jun 2, 2023

Thanks @mustjab!

@jgraham
Copy link
Contributor

jgraham commented Jun 6, 2023

I've filed a new bug on Azure. Please let me know if I got any of the details wrong, or missed something important.

@mustjab
Copy link
Contributor

mustjab commented Jun 19, 2023

@jgraham Looks like they've asked us to open issues on agent team instead, did you get a chance to file it?

microsoft/azure-pipelines-agent#4313

@jgraham
Copy link
Contributor

jgraham commented Jun 20, 2023

I filed actions/runner-images#7754

@past
Copy link
Member

past commented Jul 11, 2023

@mustjab do you have any more context on the ongoing investigation that you can share?

@mustjab
Copy link
Contributor

mustjab commented Jul 12, 2023

Checked with the agent team and they haven't looked at your issue yet, but they mentioned that there was a similar report that they already resolved, are you still seeing this happen in WPT runs?

@past
Copy link
Member

past commented Jul 13, 2023

It is still happening, here is one case from today.

@gsnedders
Copy link
Member

I filed actions/runner-images#7754

From there, the internal issue has been resolved, and it seems like things have been much better over the last few days (comparable to where we were on the macos-12 images).

@gsnedders
Copy link
Member

it's gone back to being less reliable, but as mentioned in the other issue:

fix is to be delivered around mid-august (reason for being better right now is not very clear)

@past
Copy link
Member

past commented Aug 24, 2023

The fix seems to be deployed and working reasonably well now, shall we close this?

@gsnedders
Copy link
Member

I've alas been still kicking them manually a whole bunch, not sure it is working that great. But was planning on trying to follow up sometime soon.

@gsnedders
Copy link
Member

See this filtered view of builds—there's still a fair bit of red (and white!) there, even since the new images went live.

@past
Copy link
Member

past commented Oct 26, 2023

@mustjab any further updates on the effort to fix this? The link from Sam's comment above still shows frequent failures.

@gsnedders
Copy link
Member

@past A fair percentage of the failures are Edge, or caused by macOS bugs. https://wpt.fyi/runs?label=master&label=experimental&aligned and https://wpt.fyi/runs?label=master&label=stable&aligned both show plenty of recent aligned runs, so I'm not too concerned at this point. I vote we close this?

@past
Copy link
Member

past commented Oct 27, 2023

Sounds good to me, we can open a new one if needed.

@past past closed this as completed Oct 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants