Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows Server 2022 builds are taking 4x longer than on Windows Server 2019 #5166

Open
1 of 7 tasks
xt0rted opened this issue Mar 2, 2022 · 19 comments
Open
1 of 7 tasks
Labels
bug report investigate Collect additional information, like space on disk, other tool incompatibilities etc. OS: Windows

Comments

@xt0rted
Copy link

xt0rted commented Mar 2, 2022

Description

Over the last 2 weeks I've noticed runs on the windows-2022 image taking 4x longer, and sometimes more than that, than on the windows-2019 image. In all of these instances nothing's changed between runs aside from the version of Windows being used.

One of the examples I'm looking at from about 2 weeks ago took 7 minutes to complete, while a run from a day or two ago took 24 minutes. This increase in time is consistent across all new runs in our org.

In another repo I downgraded our build scripts & vm to Windows Server 2019 and the time went from 19 minutes down to about 5.5 minutes. I'm unable to permanently move back to 2019 though because new builds depend on VS 2022.

worfkflow

While testing something else I ran a simple checkout & build of an empty .net project and the build times for ubuntu-latest (25s) and windows-2019 (1m 12s) were about what I'd expect, while the windows-2022 image clocked in at 8m 14s.

All of this was originally reported to support in ticket 1521042 but I was told to open an issue here instead. That ticket has org/repo names and links to each run.

Virtual environments affected

  • Ubuntu 18.04
  • Ubuntu 20.04
  • macOS 10.15
  • macOS 11
  • Windows Server 2016
  • Windows Server 2019
  • Windows Server 2022

Image version and build link

None of the repos are public but these are the Run Ids for each.

Runs in the image:

Run VM Version Time
1860104886 windows-2019 20220207.1 6m 51s
1912616149 windows-2022 20220220.1 23m 59s

From another similar repo:

Run VM Version Time
1912952429 windows-2019 20220223.1 5m 25s
1912601548 windows-2022 20220220.1 19m 8s

Test repo:

Run VM Version Time
1912879771 ubuntu-latest 20220220.1 25s
1912879232 windows-2019 20220223.1 1m 12s
1912864061 windows-2022 20220220.1 8m 14s

Is it regression?

No response

Expected behavior

For run times on par with Windows 2019

Actual behavior

Run times up to 4-5x longer than Windows 2019

Repro steps

Run a .net full framework build on Windows 2019 and 2022, the 2022 runs should take significantly longer.

@xt0rted
Copy link
Author

xt0rted commented Mar 3, 2022

My build times just dropped to 5-10 minutes. All that changed was I downgraded the .net 6 sdk to .net 5. I doubt that's related, but I am seeing times on par with what I used to get.

@miketimofeev
Copy link
Contributor

miketimofeev commented Mar 17, 2022

@xt0rted hi! Is it possible for you to provide the minimal steps to reproduce the issue? We can make some experiments if we have reproducible examples

@xt0rted
Copy link
Author

xt0rted commented Mar 18, 2022

@miketimofeev I can't reliably reproduce this. I see the issue in a couple of my private repos, but it seems to come and go now. It's only happening with the Windows 2022 workflows though, downgrading to 2019 where possible fixes it and the build times remain consistent. I'll keep seeing if there's a way to reproduce this though.

About 2 weeks ago I had a build on windows-latest which was taking anywhere from 6-10 minutes, switched to ubuntu and it dropped to 58 seconds. I know windows takes longer but it shouldn't be that much longer. I can dig up some details on that if you'd like, but the only change made at that point was switching the OS.

@mikhailkoliada
Copy link
Member

@xt0rted Hello! Thanks, we will take a look!

@mikhailkoliada mikhailkoliada added OS: Windows investigate Collect additional information, like space on disk, other tool incompatibilities etc. and removed needs triage labels Mar 21, 2022
@miketimofeev
Copy link
Contributor

@xt0rted I'm afraid we can't proceed with the investigation without a reproducible example. We don't have access to the tickets, I'm afraid.

@xt0rted
Copy link
Author

xt0rted commented Mar 25, 2022

Not sure if it's related, but here's an instance where the Windows 2022 build took 3x longer than the Windows 2019 build https://github.com/xt0rted/dotnet-rimraf/actions/runs/2041805895.

I don't have a Windows 2019 build to compare to for this one, but you can see the Windows 2022 build is significantly longer than the macOS or Ubuntu builds https://github.com/xt0rted/dotnet-run-script/actions/runs/2041939525.

@rilysh
Copy link

rilysh commented Apr 25, 2022

Can tell that, I having the same issue as it is. Using Windows Server 2022, builds taking quite a longer time than 2019.

@lowlydba
Copy link

lowlydba commented Jun 8, 2022

My Windows 2022 builds are so slow that they're repeatedly timing out and/or having issues provisioning resources. Works fine with 2019.

https://github.com/lowlydba/lowlydba.sqlserver/actions/runs/2458069276

@al-cheb
Copy link
Contributor

al-cheb commented Jun 8, 2022

@lowlydba, Is it possible to replace bash shell to shell: "wsl-bash {0}" for Windows Server 2022 in your CI?

@lowlydba
Copy link

lowlydba commented Jun 8, 2022

D'oh! I'm curious how that worked in the first place. Unfortunately, after fixing it still is erroring out.

@al-cheb
Copy link
Contributor

al-cheb commented Jun 9, 2022

D'oh! I'm curious how that worked in the first place. Unfortunately, after fixing it still is erroring out.

I am able to reproduce this issue on my self-agent. Currently we are getting BSOD on Windows Server 2022 with WSLv1. We are planning to investigate if we could migrate to WSLv2 from WSLv1. I will let you know as soon as I find something.

@dmitry-shibanov
Copy link
Contributor

Hello @xt0rted. Sorry for the late response. Does the issue reproduce with new images ?

@xt0rted
Copy link
Author

xt0rted commented Sep 4, 2022

@dmitry-shibanov I haven't been working on the projects where I first encountered this (the ones in the original screen shots) but I do still see slower Windows 2022 times in one of my public projects. Windows is always the slowest, and in some cases by 6x or more for a pretty simple build & test workflow.

With the announcement of larger runners are there any specs available for what type of hardware they're using? Specifically how does i/o compare on the larger runners vs. the existing ones?

@xt0rted
Copy link
Author

xt0rted commented Sep 13, 2022

@dmitry-shibanov I'm revisiting this and testing Actions vs. DevOps at work. Here's another very consistently reproducible example of the difference in Windows 2022 times vs. other platforms.

image

The Ubuntu runs are failing due to a unit test, but they're running to completion so it's still a fair comparison, and fixing the test doesn't yield different results.

While working on this it was pointed out to me that since it costs 2x per minute for the Windows runners, and they take 2-4x longer to run, that in the end you're paying 4-8x for Windows over Ubunutu in this example. That's pretty ridiculous and makes this a really hard sell over sticking with a larger custom VM on DevOps.

@Piedone
Copy link

Piedone commented Sep 21, 2022

For those interested, I did some investigation of various performance optimization options (NTFS settings, virtual drives with different file systems), including the necessary scripts and measurements here: Lombiq/GitHub-Actions#32 Spoiler alert: I didn't find anything that would be possible to use and would help.

@AHuusom
Copy link
Contributor

AHuusom commented Sep 30, 2022

Does it have anything to do with .NET Core SDK 6.0.401 only being installed on 2022 and not on 2019?

@Shane32
Copy link

Shane32 commented Apr 26, 2023

Windows runners are always painfully slow. I use both Windows and Linux runners a lot, and Windows is always much much slower. Take a look at this public repository's workflows -- click on any run:

https://github.com/graphql-dotnet/server/actions/workflows/test.yml

You'll see that runs on Windows always take nearly double the time. Here's another repository:

https://github.com/graphql-dotnet/graphql-dotnet/actions/workflows/test-code.yml

Sometimes the Windows runners do run similar to Linux runners. But more often than not they are nearly twice as slow. For example see this run -- 6 min on Linux vs 12 min on Windows -- and note that the Ubuntu workflow has more build steps!

https://github.com/graphql-dotnet/graphql-dotnet/actions/runs/4791811277/jobs/8522652049

While working on this it was pointed out to me that since it costs 2x per minute for the Windows runners, and they take 2-4x longer to run, that in the end you're paying 4-8x for Windows over Ubunutu in this example. That's pretty ridiculous and makes this a really hard sell over sticking with a larger custom VM on DevOps.

Totally agree. It would at least be better if it ran equally as fast and they just charged 4x.

fortuna added a commit to Jigsaw-Code/outline-sdk that referenced this issue May 2, 2023
This is to make sure we stay cross-platform.

Note that the Windows build and test is [a lot slower than Linux and Windows](actions/runner-images#7320). Because of that, I had to change the wait in one of the tests.

Also, I'm using windows-2019, which is at least 2x faster than windows-latest: actions/runner-images#5166

On macOS, I ran into an issue where CloseRead on a connection that was already fully read causes an error, so I needed to update the test there too. It was a huge pain to figure out what was going on.
ramosbugs added a commit to unflakable/unflakable-javascript that referenced this issue Jul 3, 2023
ramosbugs added a commit to unflakable/unflakable-javascript that referenced this issue Jul 4, 2023
@Danielku15
Copy link

I also can see a significantly worse performance on Windows agents in my new project where I build a in a quite wide matrix of configurations:
https://github.com/CoderLine/alphaSkia/actions/workflows/build.yml

Just some numbers from my pipeline showing the differences:

  • Cloning my repository:windows-latest 38s, ubuntu-latest 15s, macos-latest 25s
  • Compiling: windows-latest 2m 30s, ubuntu-latest 50s, macos-latest 35s

Or also here:
image

When looking into the steps they all take a significance longer on Windows. Cloning, building, artifacts, CPU tasks.

1kastner added a commit to 1kastner/conflowgen that referenced this issue Dec 14, 2023
Try out windows-2019 instead of windows-latest (which should resolve to windows-2022)

also see actions/runner-images#5166
1kastner added a commit to 1kastner/conflowgen that referenced this issue Dec 14, 2023
1kastner added a commit to 1kastner/conflowgen that referenced this issue Dec 14, 2023
* Explain how to deal with the prepared databases as a contributor

* Add helper script to download all sqlite databases

* Drop dangerous test that gamble with floating point accuracy

* Stop tracking SQLITE files as github is expensive

* Set version number to 2.1.1

* Move JupyterLab out of dev, simplify jupyterlab target

* Restrict several package versions and explicitly mention sphinx-tabs to improve pip version resolution

* Use windows-2019 instead of windows-latest in GitHub CI - see speed issues in actions/runner-images#5166
@alexciarlillo
Copy link

This may just be anecdotal but my hope is maybe it spurs an idea for someone or maybe some other person searching for answers will see this. We run CI tests which detect hotkey presses from a native node module. By design when the native module detects an event is passes the event to a JS thread's callback to handle it using libuv. When we moved to windows-2022 these stopped functioning. I tracked it down to extreme delays in libuv's uv_send_async. The targeted threads were not waking up to receive events for 10+ seconds. Since we require VS 2022 for our newer builds we were forced to go with a self hosted agent running Win 11 for now which does not exhibit this issue. The issue was also not present in Win 10 builds we tried and has not appeared on any of our clients in the wild. I had to stop digging into the issue once we had a workaround but it seems like overall performance issues on these machines could be tied to the behavior I was seeing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug report investigate Collect additional information, like space on disk, other tool incompatibilities etc. OS: Windows
Projects
None yet
Development

No branches or pull requests