Limited GPU Utilization with NVIDIA RTX 4000 Ada Gen #844

James-Shared-Studios · 2024-05-17T04:23:21Z

I am experiencing limited GPU utilization with the NVIDIA RTX 4000 Ada Gen card while running on Windows 10 1809
CPU: AMD EPYC 3251 8-Core Processor 2.5 GHz
RAM: 32GB
GPU: NVIDIA RTX 4000 Ada Gen 20 GB
CUDA Toolkit Version: 12.3
GPU Driver Version: 546.12

Python code:

   device = 'cuda'
   compute_type = 'int8_float16'
   model_size = 'medium.en'

   print(f"Loading model...")

   start_time = time.time()
   model = WhisperModel(model_size, device=device, 
                        compute_type=compute_type)
   end_time = time.time()
   execution_time = end_time - start_time
   print(f"Model loading time: {execution_time:.2f} seconds")
   folder_path = r"C:\Users\XYZ\Downloads\AI voice"
   max_new_tokens = 10
   beam_size = 10

   for filename in os.listdir(folder_path):
       if filename.endswith(".mp3") or filename.endswith(".m4a") or filename.endswith(".mp4") or filename.endswith(".wav"):
           file_path = os.path.join(folder_path, filename)
           print(f"Transcribing file: {file_path}")
           start_time = time.time()
           segments, _ = model.transcribe(file_path,
                                          beam_size=beam_size,
                                          max_new_tokens=max_new_tokens,
                                          word_timestamps = False,
                                          prepend_punctuations = "",
                                          append_punctuations = "",
                                          language="en", condition_on_previous_text=False)
           for segment in segments:
               print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
           end_time = time.time()
           execution_time = end_time - start_time
           print(f"Execution time: {execution_time:.2f} seconds")
           total_processing_time += execution_time

While running my code, I'm only observing around 10% GPU utilization.

However, the same code achieves 100% utilization on an NVIDIA GeForce RTX 4070.

The text was updated successfully, but these errors were encountered:

Napuh · 2024-05-17T21:48:00Z

Try to repeat the test but show the CUDA graph which shows CUDA utilization.

To do that, click here:

And select CUDA

James-Shared-Studios · 2024-05-20T02:06:54Z

for CUDA it's barely reached 70% utilization

Napuh · 2024-05-20T07:18:17Z

How does it compare with a bigger model?

phineas-pta · 2024-05-20T11:54:05Z

u should compare speed

utilization matters less

James-Shared-Studios · 2024-05-20T15:59:04Z

u should compare speed

utilization matters less

The average processing time with GeForce 4070 is 0.16 seconds, compared to 0.51 seconds with RTX 4000 Ada. I would expect faster performance from RTX 4000 Ada, that's why I was wondering if the RTX 4000 Ada has been limited in some way.

James-Shared-Studios · 2024-05-20T16:59:50Z

How does it compare with a bigger model?

the same results for large-v1, large-v2 and large-v3

phineas-pta · 2024-05-20T21:49:27Z

I would expect faster performance from RTX 4000 Ada

no u should expect the inverse: 4070 is faster

James-Shared-Studios · 2024-05-21T15:28:53Z

I would expect faster performance from RTX 4000 Ada

no u should expect the inverse: 4070 is faster

why is that? could you provide more context please? Thank you.

phineas-pta · 2024-05-21T15:54:43Z

since the model can fit to gpu, vram is not a factor, it comes down to memory bandwidth (more impactful when cuda cores count isnt much different)

u can take a look at their theoretical fp32 & fp16 performance:

James-Shared-Studios · 2024-05-21T16:06:42Z

4070 FP16 (half) 29.15 TFLOPS vs RTX 4000 Ada FP16 (half) 26.73 TFLOPS (1:1) so RTX 4000 Ada should not be three times slower than 4070, correct?

phineas-pta · 2024-05-21T16:14:15Z

the execution time is too short, there's additionally i/o overhead

for better benchmark, use longer audio/video to reduce overhead time part

James-Shared-Studios · 2024-05-21T16:18:32Z

the execution time is too short, there's additionally i/o overhead

for better benchmark, use longer audio/video to reduce overhead time part

That makes sense. I will try a longer audio and see if it improves the results. Thank you so much for your help.

Napuh · 2024-05-31T10:48:58Z

What's the conclusion? @James-Shared-Studios

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Limited GPU Utilization with NVIDIA RTX 4000 Ada Gen #844

Limited GPU Utilization with NVIDIA RTX 4000 Ada Gen #844

James-Shared-Studios commented May 17, 2024

Napuh commented May 17, 2024

James-Shared-Studios commented May 20, 2024

Napuh commented May 20, 2024

phineas-pta commented May 20, 2024

James-Shared-Studios commented May 20, 2024

James-Shared-Studios commented May 20, 2024 •

edited

phineas-pta commented May 20, 2024

James-Shared-Studios commented May 21, 2024

phineas-pta commented May 21, 2024

James-Shared-Studios commented May 21, 2024

phineas-pta commented May 21, 2024

James-Shared-Studios commented May 21, 2024

Napuh commented May 31, 2024

Limited GPU Utilization with NVIDIA RTX 4000 Ada Gen #844

Limited GPU Utilization with NVIDIA RTX 4000 Ada Gen #844

Comments

James-Shared-Studios commented May 17, 2024

Napuh commented May 17, 2024

James-Shared-Studios commented May 20, 2024

Napuh commented May 20, 2024

phineas-pta commented May 20, 2024

James-Shared-Studios commented May 20, 2024

James-Shared-Studios commented May 20, 2024 • edited

phineas-pta commented May 20, 2024

James-Shared-Studios commented May 21, 2024

phineas-pta commented May 21, 2024

James-Shared-Studios commented May 21, 2024

phineas-pta commented May 21, 2024

James-Shared-Studios commented May 21, 2024

Napuh commented May 31, 2024

James-Shared-Studios commented May 20, 2024 •

edited