Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cpu usage #111

Closed
coolit opened this issue Aug 11, 2020 · 22 comments
Closed

cpu usage #111

coolit opened this issue Aug 11, 2020 · 22 comments
Labels
C-Performance A change motivated by improving speed, memory usage or compile times

Comments

@coolit
Copy link

coolit commented Aug 11, 2020

e.g. In the "button" example, sometimes the cpu usage is about 10-20% in release mode, which is relatively high.
Is there any plan for future improvement? Thanks.
edit: I used win10, Intel 2.40GHz.

@joseluis
Copy link

joseluis commented Aug 11, 2020

CPU usage is between 300% and 400% for me with the simplest examples (button, sprite...)

EDIT: Using Mint 20 on Linux 5.4 on Intel i7-8705G and dual video card Radeon RX Vega M GL + HD Graphics 630

@martin-fl
Copy link

I confirm this.
On a i7-7700HQ CPU @ 2.80GHz × 8 and iGPU, htop outputs indicates that e.g the breakout example takes ~40% on all cores

@skreborn
Copy link

@coolit @joseluis @martin-fl What operating systems, hardware, and render backend do you all use?

@martin-fl
Copy link

@skreborn
I'm using PopOS 20.04 with Linux 5.4 on a Intel® Core™ i7-7700HQ CPU @ 2.80GHz × 8 and Mesa Intel® HD Graphics 630 (KBL GT2). The render backend is Vulkan I suppose.

@skreborn
Copy link

I have anywhere between 50% and 70% usage on Windows 10 with an Intel® Core™ i7-9750H CPU @ 2.60GHz × 12 and an NVIDIA GeForce RTX 2080 Max-Q for button.

It's worth noting that release mode reduces that to slightly above 30%.

@MGlolenstine
Copy link
Contributor

MGlolenstine commented Aug 11, 2020

It's not just low-end hardware
image
image
EDIT:
Even in release mode, the CPU usage is at 60%.
Apparently the logic updates every "frame" and it's not set, but faster the CPU, more the frames.

We should lock logic frames to a sane number like 90 or 180.

@martin-fl
Copy link

martin-fl commented Aug 11, 2020

It's worth noting that release mode reduces that to slightly above 30%.

Compiling in release mode does in fact reduce CPU usage to ~10% on the button and breakout example on my computer. Still a bit high for a single button though.

@Timidger
Copy link

Release mode had the sprite example running at around 30% for me, based on htop. Bit disappointing such a simple example isn't performant, but the library looks like a great start so I feel silly even making the complaint. I figure this will be fixed quickly.

here is a flamegraph, hopefully it's helpful

@cart
Copy link
Member

cart commented Aug 12, 2020

I expect there to be a ton of low hanging fruit when it comes to optimization. So far the focus has been on api surface and building solid foundations. For example, right now Bevy is way more hash-ey per-frame than I would like it to be. We can fix most of the CPU getting eaten there by using the new "change detection" features in Bevy ECS.

On top of that, I think some persistent CPU usage is expected, as we (currently) use rayon under the hood and other projects have encountered similar behavior. I think the most important metric bevy can optimize is frame_time (which you can measure by adding the FrameTimeDiagnosticsPlugin and PrintDiagnosticsPlugin to your app).

This is the sort of issue that will never fully be resolved. There will always optimization work to do.

I am inclined to close this issue (and reference it whenever a new one comes up). Feel free to open issues for specific cases where you have isolated slow parts of Bevy.

(also i wont close this for a small period of time. feel free to respond here with rationale if you think leaving this open is better)

@karroffel karroffel added the C-Performance A change motivated by improving speed, memory usage or compile times label Aug 13, 2020
@aclysma
Copy link
Contributor

aclysma commented Aug 16, 2020

This is a problem with rayon. The more cores you have, the more CPU you burn. My 3950x pins the CPU at 100% in the window_settings example. I've seen this behavior on several projects that use rayon.

rayon-rs/rayon#642

@aclysma
Copy link
Contributor

aclysma commented Aug 17, 2020

As a workaround for anyone having an unpleasant time with this, you can use the environment variable RAYON_NUM_THREADS to limit the number of threads. I would recommend anyone with more than 8 cores set it to something <=8.

Just to be clear, this isn't "bevy is using a lot of CPU" it's "rayon doesn't properly idle threads that have no work to do". A 32 logical core CPU will pin at 100%, but when forced to use 8 logical cores, will run at about 13% utilization (despite that being 25% of the cores). That is still higher than it ought to be. I don't think bevy's examples generate enough workload to justify saturating multiple cores.

@tbillington
Copy link
Contributor

Would something like Unity's Application.targetFrameRate be related to this issue?

@multun
Copy link
Contributor

multun commented Aug 17, 2020

that's probably fixed by rayon-rs/rayon#746

@shmolyneaux
Copy link

Rayon 1.4.0 has been released with the fix! rayon-rs/rayon#784

Release notes - "Implemented a new thread scheduler, RFC 5, which uses targeted wakeups for new work and for notifications of completed stolen work, reducing wasteful CPU usage in idle threads."

@dseevr
Copy link

dseevr commented Aug 28, 2020

I ran the breakout example on my old quadcore machine using master and also using master + rayon bumped to 1.4. Both were run in release mode:

Master CPU usage: 220%
Master + rayon 1.4 CPU usage: 55%

Huge improvement but still extremely high for only rendering a few rectangles. Hope to see even more improvements in the future! 🎉

@aclysma
Copy link
Contributor

aclysma commented Aug 29, 2020

The extreme CPU usage requires more than 4 cores to reproduce.

I roughly reproduce your results if forcing RAYON_NUM_THREADS=4
Debug 1.3 limited to 4 cores: 150%
Release 1.3 limited to 4 cores: 50%

On a 3950x (32 cores) using Bevy 0.1.3:
Debug Rayon 1.3: 900%
Release Rayon 1.3: 450%

I saw no difference with rayon 1.4:
Debug Rayon 1.4: 900%
Release Rayon 1.4: 450%

Moving to the new task system I get:
Debug Task system: 110%
Release Task system: 40%

(For whatever reason I'm not seeing the all-cores-100% I was seeing the other night but I've seen it in other projects using rayon.)

@CleanCut
Copy link
Member

The new task system (to replace Rayon) was just merged in #384

@Moxinilian
Copy link
Member

Moxinilian commented Sep 7, 2020

Can anybody previously experiencing high CPU usage testify the new task system fixes this issue?

@MGlolenstine
Copy link
Contributor

I will test it out on my work computer (the one I ran it on before) in about 9 hours.

@CleanCut
Copy link
Member

CleanCut commented Sep 8, 2020

Examples no longer saturate all cores on my computer. They use far less than a single core now. Seems good to me on mac. 👍

@MGlolenstine
Copy link
Contributor

MGlolenstine commented Sep 8, 2020

The CPU usage while running the Button example has been fixed on my work computer as well.

image
(The CPU usage is mostly connected to other stuff I'm running.)
image
The usage is still high, but that could be due to an unoptimised example.

@cart
Copy link
Member

cart commented Sep 8, 2020

Alrighty I think that's enough evidence to close this out. CPU usage will always be a moving target and there's still plenty of optimization potential, but we've made enough progress here that I think we can move from "generic CPU usage issue" to "specific issues for specific optimization cases".

@cart cart closed this as completed Sep 8, 2020
hymm referenced this issue in hymm/bevy Jan 29, 2023
more trace cleanup: remove some references to stage
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-Performance A change motivated by improving speed, memory usage or compile times
Projects
None yet
Development

No branches or pull requests