-
-
Notifications
You must be signed in to change notification settings - Fork 748
[BUG] Poor performance using Metal on MacOS #1699
Comments
Update: after some further testing, I can confirm it seems that this affects all amethyst projects on my computer, including 2D. Oddly enough, the first few times I ran examples, they worked fine. After a while though, the performance took a hit for all subsequent examples. I'm not sure if this is a local issue or something like a throttling issue, etc. If anyone can test and repro let me know. |
We did some further testing, and it seems like the slowdown is likely happening on buffer writes. Single update of view matrix took 100ms on it's own. But this turned out to not be stable actually. Also the issue sometimes randomly disappear and performance looks decent.. |
That sounds like it must be an API issue within HAL, or how it is performing buffer writes. 100ms for a uniform buffer write is beyond insane - something has to be happening incorrectly under the hood for that specific set of hardware. |
Attaching thread_profile_debug.json.gz at the request of @Frizi from our Discord discussion. Also experienced the same issue on the release build, at no time was the build performant. Running on macOS 10.14.5 on tag v0.11.0 (e75dc2c) with a Radeon Pro 560X and Intel UHD Graphics 630 This was the "pong" example. |
@cgati those profiles looks very similarily to what i've got from @piedoom before, i.e. most calls that write anything are super slow, but also not every single one of them. It looks really weird and kinda unpredictible. It might be a driver bug, but who knows. I suspect that the root cause is in how HAL translates writes to metal, but to confirm that we would have to reproduce it outside of amethyst. One way to do that would be to try running something on vulkan through hal portability, like DOTA or dolphin emulator for example (those two were tested quite a lot by gfx team, so should otherwise be quite stable). If those are smooth, then we have to find what we specificially do that makes it slow, and other engines are not doing. Also another approach is to profile it in greater details, see what exact code is executing so slowly/what stalls the GPU if anything. Xcode's "Instruments" can help here, but i don't know how to get meaningful data from it yet. I also don't have a hardware to reproduce that, so can't really experiment on my own beyond just learning the tools on fully performant setup. |
Another thread_profile.json.zip for data collection. |
And a pong.log.zip with some potentially relevant gfx_backend_metal::window warns |
I'm hitting the same issue in my game: the first couple of seconds everything works great, with the frame limiter doing its job just fine. After 2.5-3 seconds, I'm using just a My environment:
Profiling: |
I am encountering this issue with a project without any systems activated and also on the pong example in 0.11.1. It runs at probably 1-2 FPS. Here is my thread profile. I tried disabling the frame limiter and still see the issue with or without it. |
All of the thread profiles shows that gpu memory write operations are super slow. I suspect it is either something about rendy or hal that makes it super slow on those drivers. I need more granular profile to actually pinpoint the exact problem. Can somebody run this through xcode tools to gather function-level timings and see what takes so much time inside hal? |
For anyone seeing this still, i'd really appreciate a profile. Here are steps how to gather a flamegraph on recent macos.
keep it running for a short while and press ESC. There should be a "flamegraph.svg" generated. If it failed with an error, try again. Please zip and upload that generated file here. |
pinging for visibility @OliverHofkens @piedoom @cgati @ohnoimdead |
@Frizi Thanks for the clear steps! The Rendy example acts really weird. It shows a window with the moving spheres for a second and then crashes with the following error log, complaining about missing assets: I've tried multiple times, the same thing happens every time. You probably can't learn much from the flamegraph in this state: Other examples seem to work fine. |
Not sure if this is representative but this is the flamegraph of my own project (details in my previous comment: #1699 (comment)): |
@OliverHofkens Hard to tell from your project's flamegraph really. I can't see anything from your second graph. The errors in rendy example are because you have cloned the repository without git-lfs. Try installing it and doing Alternatively, You can try running |
The rendy example just crashes so I ran the animation example (which was constantly complaining about "Failed to acquire animated component." but did at-least move the sphere around when I pressed space) |
Hey again, sorry for the late reply. I fetched the assets with |
@Frizi Here is my flamegraph for the rendy example as you requested: In case you need device information: @asgeir @OliverHofkens I initially had the same problem, |
Looks like master is not building the rendy example at all:
|
I just upgraded my project to Amethyst I tried to build the Not sure this might help but this is the flamegraph of the first seconds before the crash: |
I'm experiencing a similar issue. Amethyst v0.12.0 Following the profiling recipe posted by @Frizi above. Rendy example on master has problems so you'd need to checkout the latest stable tag:
Here is the zipped flamegraph for rendy example My issue emerged while following the pong tutorial and when I came to the step where we move the paddles. It was very sluggish. I blamed axes emulation but then got sluggish results with renderable example. I've also tried other libraries that use metal, like gfx-rs (which amethyst relies on) also wgpu-rs, their examples run fine. Interesting note. If I connect external screens, I get better performance. Unplug the screens, sluggish again. Also while running CPU load didn't go beyond 2% |
Same issue. Rust: rustc 1.37.0 (eae3437df 2019-08-13) Both examples built in release mode with feature=metal sudo flamegraph ./animation 2> animation.err > animation.log sudo flamegraph ./pong_tutorial_05 2> pong_05.err > pong_05.log Fairly short runs for the sake of clean graphs. Performance degrades rapidly the longer it runs. Rough eye-timing for Pong goes something like 6->0.2->0.05 FPS in 1 minute intervals. Game continues to run, and eventual frame updates show massive jumps in scores. Only render pass seems to fall behind. I stopped it early, but usually "No frame is available" starts getting output to the stdout. |
@Deedasmi It would actually be benefitial to let it run a little bit longer, to see what really takes time once the slowdown happens. From your trace it seems like the most cpu time is spent spinlocking inside rayon. Second biggest is eventloop. There is also visible queue submit, but the time spent there is minimal. This pattern is actually visible in most of the graphs submitted here. I don't think the bug is really about event loop, that contradicts our manual tracing logs that show much time being spent on buffer writes. I think we need better way to profile this. Those flamegraphs are unfortunately way less useful than I anticipated. I don't know much about profiling tools available on macos though, all i can guess is that there is a right one inside "instruments" app. |
Ran it again this morning and got same 3-5 FPS, then it completely stopped rendering at 10/10. The log shows the score counting up between numerous "Frame not available" messages, but the screen never updated. Looking into the instruments app now. |
@Frizi I've got a trace I captured using "Time Profiler". Would this be of any help? Edit: Strangely, the demo runs just fine when using the game profiler, which is recording in windowed mode. |
This seems partially fixed after updating to Catalina. Some of the examples that ran slow previously now work well (gltf). However, now the rendy example completely freezes after the assets are loaded in. In the first 3 seconds (before anything other than the spheres are visible) the performance is actually better, but then as soon as the other assets appear on the screen the demo completely freezes. |
I'm not an Amethyst user, but I wanted to pop in and confirm that I've seen this exact behavior in my own Metal apps. The performance is totally inconsistent and unpredictable between runs of an app, almost like it's flipping a coin to decide whether to be fast or not. This issue came up in my search for a solution. All that to say, if this project is experiencing inconsistent slowdowns too, I think there's a good chance the issue lies with Metal drivers or thermal throttling. I've only encountered this issue on my MacBook Pro 2016 (Intel Iris Graphics 540) on macOS Catalina. I know some folks have seen this behavior on AMD cards, but has anyone managed to repro this on a desktop machine? |
Just FYI, a gfx dev suggested that this might get resolved with the new swapchain model. |
An observation: I'm running my app on an iMac (Retina 5K, 27-inch, 2017). It's a simple app. It runs ragged (but no error) if I don't draw anything on the UI layer. The moment I add an entity with UIImage(Sprite(sprite_render)), the app crashes consistently with:
|
Also getting very poor performance in OSX 10.14.6 (Mojave). In addition, I've noticed that the CPU usage for most examples goes above 200%, and some examples output "No frame is available" |
CPU usage is an unrelated rayon design choice.
rayon-rs/rayon#642
…On Sat, Feb 1, 2020, 11:43 AM Kaelan Cooter ***@***.***> wrote:
Also getting very poor performance in OSX 10.14.6 (Mojave). In addition,
I've noticed that the CPU usage for most examples goes above 200%
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1699?email_source=notifications&email_token=AAV7LDIYDWPGO72Z3UUZ6ZTRAXUJFA5CNFSM4HXPOD52YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKRHLUY#issuecomment-581072339>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAV7LDOKC6EMUWNCC4UVAOLRAXUJFANCNFSM4HXPOD5Q>
.
|
Regarding rayon CPU usage--they've nearly completed a new scheduler which doesn't use so much CPU, but the PR hasn't had any activity since the end of May. rayon-rs/rayon#746 |
This issue has been automatically marked as stale because it has not had recent activity. Maybe it's time to revisit this? |
gfx-hal is updated to 0.5 on master and hopefully remedies this issue. closing. |
Description
When running certain Amethyst examples (as well as in games that use Rendy), 3D performance is poor on MacOS. 2D seems to be unaffected.
Reproduction Steps
--release
(preferably one that contains moving elements or an fps counter)What You Expected to Happen
Assets should load, display, and refresh at a reasonable rate
What Actually Happened
Assets loaded and displayed, but FPS is poor at ~1-5 fps.
Your Environment
MacOS mojave 10.14.3, Amethyst Master
GPU: Radeon Pro 455 2048 MB (Additionally, Intel HD Graphics 530 1536MB, but I suspect it is using the former)
Additional Context
~~This happens only with select examples, and may be due to a pass(?). For example, Debug Lines example runs great, and so does the animation example. Arc Ball Camera and Rendy example all have poor performance. ~~ this happens with all examples
The text was updated successfully, but these errors were encountered: