Skip to content
This repository has been archived by the owner on Apr 18, 2022. It is now read-only.

[BUG] Poor performance using Metal on MacOS #1699

Closed
piedoom opened this issue Jun 12, 2019 · 35 comments
Closed

[BUG] Poor performance using Metal on MacOS #1699

piedoom opened this issue Jun 12, 2019 · 35 comments
Labels
After 0.16 pri: important Something other teams are relying on, or a low-level, critical piece of functionality. stale Issue had no recent activity team: rendering type: bug Something is not working as it should.
Projects

Comments

@piedoom
Copy link
Contributor

piedoom commented Jun 12, 2019

Description

When running certain Amethyst examples (as well as in games that use Rendy), 3D performance is poor on MacOS. 2D seems to be unaffected.

Reproduction Steps

  1. Use a MacOS
  2. Run any example with --release (preferably one that contains moving elements or an fps counter)
  3. It may take a bit for the performance to slow. Try running several examples.

What You Expected to Happen

Assets should load, display, and refresh at a reasonable rate

What Actually Happened

Assets loaded and displayed, but FPS is poor at ~1-5 fps.

Your Environment

MacOS mojave 10.14.3, Amethyst Master
GPU: Radeon Pro 455 2048 MB (Additionally, Intel HD Graphics 530 1536MB, but I suspect it is using the former)

Additional Context

~~This happens only with select examples, and may be due to a pass(?). For example, Debug Lines example runs great, and so does the animation example. Arc Ball Camera and Rendy example all have poor performance. ~~ this happens with all examples

@piedoom piedoom added type: bug Something is not working as it should. status: needs triage labels Jun 12, 2019
@piedoom piedoom changed the title [BUG] Poor performance using Metal / Intel on MacOS [BUG] Poor 3D performance using Metal / Intel on MacOS Jun 12, 2019
@piedoom piedoom changed the title [BUG] Poor 3D performance using Metal / Intel on MacOS [BUG] Poor 3D performance using Metal on MacOS Jun 12, 2019
@piedoom piedoom changed the title [BUG] Poor 3D performance using Metal on MacOS [BUG] Poor performance using Metal on MacOS Jun 13, 2019
@piedoom
Copy link
Contributor Author

piedoom commented Jun 13, 2019

Update: after some further testing, I can confirm it seems that this affects all amethyst projects on my computer, including 2D.

Oddly enough, the first few times I ran examples, they worked fine. After a while though, the performance took a hit for all subsequent examples. I'm not sure if this is a local issue or something like a throttling issue, etc. If anyone can test and repro let me know.

@Frizi
Copy link
Member

Frizi commented Jun 14, 2019

We did some further testing, and it seems like the slowdown is likely happening on buffer writes. Single update of view matrix took 100ms on it's own. But this turned out to not be stable actually. Also the issue sometimes randomly disappear and performance looks decent..

@jaynus
Copy link
Contributor

jaynus commented Jun 14, 2019

That sounds like it must be an API issue within HAL, or how it is performing buffer writes. 100ms for a uniform buffer write is beyond insane - something has to be happening incorrectly under the hood for that specific set of hardware.

@cgati
Copy link

cgati commented Jun 16, 2019

Attaching thread_profile_debug.json.gz at the request of @Frizi from our Discord discussion.

Also experienced the same issue on the release build, at no time was the build performant.
thread_profile_release.json.gz

Running on macOS 10.14.5 on tag v0.11.0 (e75dc2c) with a Radeon Pro 560X and Intel UHD Graphics 630

This was the "pong" example.

@Frizi
Copy link
Member

Frizi commented Jun 17, 2019

@cgati those profiles looks very similarily to what i've got from @piedoom before, i.e. most calls that write anything are super slow, but also not every single one of them. It looks really weird and kinda unpredictible. It might be a driver bug, but who knows. I suspect that the root cause is in how HAL translates writes to metal, but to confirm that we would have to reproduce it outside of amethyst. One way to do that would be to try running something on vulkan through hal portability, like DOTA or dolphin emulator for example (those two were tested quite a lot by gfx team, so should otherwise be quite stable). If those are smooth, then we have to find what we specificially do that makes it slow, and other engines are not doing.

Also another approach is to profile it in greater details, see what exact code is executing so slowly/what stalls the GPU if anything. Xcode's "Instruments" can help here, but i don't know how to get meaningful data from it yet. I also don't have a hardware to reproduce that, so can't really experiment on my own beyond just learning the tools on fully performant setup.

@treshenry
Copy link
Contributor

treshenry commented Jun 18, 2019

Another thread_profile.json.zip for data collection.

@treshenry
Copy link
Contributor

And a pong.log.zip with some potentially relevant gfx_backend_metal::window warns

@distransient distransient added this to Needs triage in Bugs Jun 18, 2019
@distransient distransient added pri: important Something other teams are relying on, or a low-level, critical piece of functionality. status: discussing team: rendering and removed status: needs triage labels Jun 18, 2019
@distransient distransient moved this from Needs triage to Important priority in Bugs Jun 18, 2019
@OliverHofkens
Copy link

I'm hitting the same issue in my game: the first couple of seconds everything works great, with the frame limiter doing its job just fine. After 2.5-3 seconds, amethyst_rendy::pass::flat2d:write starts taking around 100ms. Every now and then, things speed up again for a couple of frames.

I'm using just a DrawFlat2DTransparentDesc pass, with some 8x8 sprites.

My environment:

OS X 10.14.5
Radeon Pro 555X (or Intel UHD Graphics 630)
rustc 1.36.0 (a53f9df32 2019-07-03)
Amethyst 0.11.0

Profiling:
thread_profile.json.gz

@resinten
Copy link

I am encountering this issue with a project without any systems activated and also on the pong example in 0.11.1. It runs at probably 1-2 FPS. Here is my thread profile. I tried disabling the frame limiter and still see the issue with or without it.
thread_profile.json.gz

@Frizi
Copy link
Member

Frizi commented Jul 16, 2019

All of the thread profiles shows that gpu memory write operations are super slow. I suspect it is either something about rendy or hal that makes it super slow on those drivers. I need more granular profile to actually pinpoint the exact problem. Can somebody run this through xcode tools to gather function-level timings and see what takes so much time inside hal?

@Frizi
Copy link
Member

Frizi commented Jul 17, 2019

For anyone seeing this still, i'd really appreciate a profile. Here are steps how to gather a flamegraph on recent macos.

  1. clone amethyst and add this to main cargo.toml
[profile.release]
debug = true
  1. install cargo flamegraph
cargo install flamegraph
  1. in project directory, build 'rendy' example
cargo build --release --example rendy --features="animation gltf metal"
  1. copy executable to amethyst directory and run with profiling (sudo required because for macos dtrace special reasons)
cp ./target/release/examples/rendy .
sudo flamegraph ./rendy

keep it running for a short while and press ESC.

There should be a "flamegraph.svg" generated. If it failed with an error, try again.

Please zip and upload that generated file here.

@jaynus
Copy link
Contributor

jaynus commented Jul 17, 2019

pinging for visibility @OliverHofkens @piedoom @cgati @ohnoimdead

@OliverHofkens
Copy link

@Frizi Thanks for the clear steps!

The Rendy example acts really weird. It shows a window with the moving spheres for a second and then crashes with the following error log, complaining about missing assets:
rendy_example.log.zip

I've tried multiple times, the same thing happens every time. You probably can't learn much from the flamegraph in this state:
flamegraph.svg.zip

Other examples seem to work fine.

@OliverHofkens
Copy link

Not sure if this is representative but this is the flamegraph of my own project (details in my previous comment: #1699 (comment)):

flamegraph.svg.zip

@Frizi
Copy link
Member

Frizi commented Jul 17, 2019

@OliverHofkens Hard to tell from your project's flamegraph really. I can't see anything from your second graph. The errors in rendy example are because you have cloned the repository without git-lfs. Try installing it and doing git lfs fetch.

Alternatively, You can try running animation example instead, but please verify if it's still stuttering by activating an animation (press space or other keys).

@asgeir
Copy link

asgeir commented Jul 22, 2019

The rendy example just crashes so I ran the animation example (which was constantly complaining about "Failed to acquire animated component." but did at-least move the sphere around when I pressed space)
flamegraph.svg.zip

@OliverHofkens
Copy link

Hey again, sorry for the late reply. I fetched the assets with git lfs fetch, but the rendy example still crashes with similar errors.
The animation example works fine, no stuttering at all. It's constantly complaining about acquiring an animated component though, just like @asgeir says...

@vultix
Copy link
Contributor

vultix commented Aug 11, 2019

@Frizi Here is my flamegraph for the rendy example as you requested:
flamegraph.svg.zip
The framerate was terrible (as it always is for me) during the run. I see the same low framerate for the evoli project as well, so if you want a flamegraph of that project let me know!

In case you need device information:
2018 15-inch Macbook Pro
Processor: 2.9 GHz Intel Core i9
Memory: 32 GB 2400 MHz DDR4
Graphics: Radeon Pro 560X 4 GB, Intel UHD Graphics 630 1536 MB

@asgeir @OliverHofkens I initially had the same problem, git lfs fetch executed but the rendy example would crash. I had to completely remove the project and run git clone again to fix the issue.

@blakehawkins
Copy link

Looks like master is not building the rendy example at all:

$ cargo build --release --example rendy --features="animation gltf metal"
   Compiling amethyst v0.12.0 (/Users/blakeh/Documents/blakehawkins/amethyst)
error[E0425]: cannot find value `world` in this scope
   --> examples/rendy/main.rs:601:77
    |
601 |             PrefabLoaderSystemDesc::<ScenePrefabData>::default().build(&mut world),
    |                                                                             ^^^^^ not found in this scope

error[E0425]: cannot find value `world` in this scope
   --> examples/rendy/main.rs:606:45
    |
606 |             GltfSceneLoaderSystem::new(&mut world),
    |                                             ^^^^^ not found in this scope

error[E0425]: cannot find value `assets_directory` in this scope
   --> examples/rendy/main.rs:657:22
    |
657 |         assets_dir & assets_directory,
    |                      ^^^^^^^^^^^^^^^^ not found in this scope

error[E0425]: cannot find value `world` in this scope
   --> examples/rendy/main.rs:660:9
    |
660 |         world,
    |         ^^^^^ not found in this scope

warning: unused import: `ReadExpect`
  --> examples/rendy/main.rs:16:13
   |
16 |             ReadExpect, ReadStorage, System, SystemData, World, Write, WriteStorage,
   |             ^^^^^^^^^^
   |
   = note: `#[warn(unused_imports)]` on by default

error[E0061]: this function takes 3 parameters but 4 parameters were supplied
   --> examples/rendy/main.rs:656:20
    |
656 |       let mut game = Application::new(
    |  ____________________^
657 | |         assets_dir & assets_directory,
658 | |         Example::new(),
659 | |         game_data,
660 | |         world,
661 | |     )?;
    | |_____^ expected 3 parameters

error[E0369]: binary operation `&` cannot be applied to type `std::path::PathBuf`
   --> examples/rendy/main.rs:657:20
    |
657 |         assets_dir & assets_directory,
    |         ---------- ^ ---------------- _
    |         |
    |         std::path::PathBuf
    |
    = note: an implementation of `std::ops::BitAnd` might be missing for `std::path::PathBuf`

error: aborting due to 6 previous errors

Some errors have detailed explanations: E0061, E0369, E0425.
For more information about an error, try `rustc --explain E0061`.
error: Could not compile `amethyst`.

To learn more, run the command again with --verbose.

@OliverHofkens
Copy link

I just upgraded my project to Amethyst v0.12.0 and it feels like this problem has become worse (although I can't prove it). Rendy also complains more in the terminal.

I tried to build the rendy example on the v0.12.0 tag, which compiles, but still fails after drawing 2 or 3 frames with: Failed to load sub asset Asset("prefab/rendy_example_scene.ron"). The file definitely exists in the correct location, as do all files imported or mentioned within. I tried to completely remove the project and start over as @vultix suggested, but that didn't seem to help.

Not sure this might help but this is the flamegraph of the first seconds before the crash:
flamegraph.svg.zip

@ekarademir
Copy link

I'm experiencing a similar issue.

Amethyst v0.12.0
macOS High Sierra v10.13.6
MacBook Pro 15 inch
Intel Core i7 3.1 GHz
Radeon Pro 560 and Intel HD Graphics 630
16 GB Memory

Following the profiling recipe posted by @Frizi above. Rendy example on master has problems so you'd need to checkout the latest stable tag:

git checkout -b v0.12.0

Here is the zipped flamegraph for rendy example
flamegraph.svg.zip

My issue emerged while following the pong tutorial and when I came to the step where we move the paddles. It was very sluggish. I blamed axes emulation but then got sluggish results with renderable example.

I've also tried other libraries that use metal, like gfx-rs (which amethyst relies on) also wgpu-rs, their examples run fine.

Interesting note. If I connect external screens, I get better performance. Unplug the screens, sluggish again.

Also while running CPU load didn't go beyond 2%

@Deedasmi
Copy link
Contributor

Deedasmi commented Sep 6, 2019

Same issue.

Rust: rustc 1.37.0 (eae3437df 2019-08-13)
Amethyst: Master
Mac:
Screen Shot 2019-09-06 at 9 22 47 AM

Both examples built in release mode with feature=metal

sudo flamegraph ./animation 2> animation.err > animation.log
animation.tar.gz

sudo flamegraph ./pong_tutorial_05 2> pong_05.err > pong_05.log
pong.tar.gz

Fairly short runs for the sake of clean graphs. Performance degrades rapidly the longer it runs. Rough eye-timing for Pong goes something like 6->0.2->0.05 FPS in 1 minute intervals. Game continues to run, and eventual frame updates show massive jumps in scores. Only render pass seems to fall behind. I stopped it early, but usually "No frame is available" starts getting output to the stdout.

@Frizi
Copy link
Member

Frizi commented Sep 6, 2019

@Deedasmi It would actually be benefitial to let it run a little bit longer, to see what really takes time once the slowdown happens. From your trace it seems like the most cpu time is spent spinlocking inside rayon. Second biggest is eventloop. There is also visible queue submit, but the time spent there is minimal.

This pattern is actually visible in most of the graphs submitted here. I don't think the bug is really about event loop, that contradicts our manual tracing logs that show much time being spent on buffer writes.

I think we need better way to profile this. Those flamegraphs are unfortunately way less useful than I anticipated. I don't know much about profiling tools available on macos though, all i can guess is that there is a right one inside "instruments" app.

@Deedasmi
Copy link
Contributor

Deedasmi commented Sep 6, 2019

Ran it again this morning and got same 3-5 FPS, then it completely stopped rendering at 10/10. The log shows the score counting up between numerous "Frame not available" messages, but the screen never updated.
pong_05_2.tar.gz

Looking into the instruments app now.

@5nefarious
Copy link

5nefarious commented Oct 5, 2019

@Frizi I've got a trace I captured using "Time Profiler". Would this be of any help?
rendy.trace.zip

Edit: Strangely, the demo runs just fine when using the game profiler, which is recording in windowed mode.

@0xflick
Copy link
Contributor

0xflick commented Oct 18, 2019

This seems partially fixed after updating to Catalina. Some of the examples that ran slow previously now work well (gltf).

However, now the rendy example completely freezes after the assets are loaded in. In the first 3 seconds (before anything other than the spheres are visible) the performance is actually better, but then as soon as the other assets appear on the screen the demo completely freezes.

@TheSpydog
Copy link

Oddly enough, the first few times I ran examples, they worked fine. After a while though, the performance took a hit for all subsequent examples. I'm not sure if this is a local issue or something like a throttling issue, etc. If anyone can test and repro let me know.

But this turned out to not be stable actually. Also the issue sometimes randomly disappear and performance looks decent..

most calls that write anything are super slow, but also not every single one of them. It looks really weird and kinda unpredictible. It might be a driver bug, but who knows.

I'm not an Amethyst user, but I wanted to pop in and confirm that I've seen this exact behavior in my own Metal apps. The performance is totally inconsistent and unpredictable between runs of an app, almost like it's flipping a coin to decide whether to be fast or not. This issue came up in my search for a solution.

All that to say, if this project is experiencing inconsistent slowdowns too, I think there's a good chance the issue lies with Metal drivers or thermal throttling. I've only encountered this issue on my MacBook Pro 2016 (Intel Iris Graphics 540) on macOS Catalina. I know some folks have seen this behavior on AMD cards, but has anyone managed to repro this on a desktop machine?

@ishitatsuyuki
Copy link
Contributor

Just FYI, a gfx dev suggested that this might get resolved with the new swapchain model.

@trsoluti
Copy link

An observation: I'm running my app on an iMac (Retina 5K, 27-inch, 2017). It's a simple app. It runs ragged (but no error) if I don't draw anything on the UI layer. The moment I add an entity with UIImage(Sprite(sprite_render)), the app crashes consistently with:

[WARN][gfx_backend_metal::window] Swapchain drawables are changed, unable to wait for 1
[WARN][gfx_backend_metal::window] Failed to get the drawable of frame 1
[WARN][gfx_backend_metal::window] Swapchain drawables are changed, unable to wait for 2
[WARN][gfx_backend_metal::window] Failed to get the drawable of frame 2
[WARN][gfx_backend_metal::window] Swapchain drawables are changed, unable to wait for 0
[WARN][gfx_backend_metal::window] Failed to get the drawable of frame 0
[WARN][gfx_backend_metal::window] No frame is available

@robert-w-gries
Copy link

@trsoluti This issue is seen by me and @jchitel as well.

Like mentioned above, the new swapchain model is believed to fix these issues, but I don't think we can't verify it until the rendy 0.5 PR #2040 is merged.

@eranimo
Copy link

eranimo commented Feb 1, 2020

Also getting very poor performance in OSX 10.14.6 (Mojave). In addition, I've noticed that the CPU usage for most examples goes above 200%, and some examples output "No frame is available"

@jaynus
Copy link
Contributor

jaynus commented Feb 1, 2020 via email

@CleanCut
Copy link
Member

Regarding rayon CPU usage--they've nearly completed a new scheduler which doesn't use so much CPU, but the PR hasn't had any activity since the end of May. rayon-rs/rayon#746

@stale
Copy link

stale bot commented Jan 9, 2021

This issue has been automatically marked as stale because it has not had recent activity. Maybe it's time to revisit this?

@stale stale bot added the stale Issue had no recent activity label Jan 9, 2021
@ezpuzz
Copy link
Contributor

ezpuzz commented Jan 10, 2021

gfx-hal is updated to 0.5 on master and hopefully remedies this issue. closing.

@ezpuzz ezpuzz closed this as completed Jan 10, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
After 0.16 pri: important Something other teams are relying on, or a low-level, critical piece of functionality. stale Issue had no recent activity team: rendering type: bug Something is not working as it should.
Projects
No open projects
Bugs
  
Important priority
Development

No branches or pull requests