Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expensive-looking oddities in CUDA profile #146

Open
inducer opened this issue Jul 5, 2021 · 2 comments
Open

Expensive-looking oddities in CUDA profile #146

inducer opened this issue Jul 5, 2021 · 2 comments

Comments

@inducer
Copy link
Owner

inducer commented Jul 5, 2021

Running this benchmark based on wave-op-mpi.py on 1c44c4b with the command

PYTHONHASHSEED=17 PYOPENCL_TEST=port:nvid setarch -R numactl -C 2 -m 0 nvprof -f -o yoink.nvvp python -O wave-op-mpi.py --dim=3 --order=4   

on dunkel gives me the following profile in Nvidia's visual profiler:
grafik

There are at least two things wrong here (both circled):

  • There are a bunch of big gaps where nothing seems to be happening. Why?
  • Every now and then, a cuLaunchKernel seems to take a very long time. Why?

Curiously, there seem to be periods that don't suffer from this:
grafik

If we could fix these two types of stalls, I suspect our performance story would look quite a bit different.

cc @matthiasdiener @lukeolson

Other versions in use, for reproducibility:

@inducer
Copy link
Owner Author

inducer commented Aug 13, 2021

We could try using https://github.com/conda-forge/lttng-ust-feedstock for tracing. POCL already supports that tracing using LTTng: pocl/pocl@ef737d3

@kaushikcfd
Copy link
Collaborator

I faced a similar issue whenever I was using almost all of the device's global memory:
image

(Notice the two unexplainable voids in the profile)

On moving to a slightly coarser mesh, the phenomena wasn't seen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants