Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add hipBLAS for windows #135

Merged
merged 6 commits into from Oct 6, 2023
Merged

add hipBLAS for windows #135

merged 6 commits into from Oct 6, 2023

Conversation

Cyberhan123
Copy link
Contributor

@Cyberhan123 Cyberhan123 commented Sep 29, 2023

Support hipBLAS #133

@Cyberhan123
Copy link
Contributor Author

Although it was compiled successfully, I saw that the model was not successfully offloaded to the GPU.

CMakeLists.txt Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
@saharNooby
Copy link
Collaborator

Although it was compiled successfully, I saw that the model was not successfully offloaded to the GPU.

That's why I'd like to request benchmark results. At the minimum, please provide per-token latencies on your machine for CPU-only and GPU-only modes -- GPU should be significantly lower, if the new backend works. You can use existing script measure_pexplexity.py for measuring.

@Cyberhan123
Copy link
Contributor Author

Although it was compiled successfully, I saw that the model was not successfully offloaded to the GPU.

That's why I'd like to request benchmark results. At the minimum, please provide per-token latencies on your machine for CPU-only and GPU-only modes -- GPU should be significantly lower, if the new backend works. You can use existing script measure_pexplexity.py for measuring.

ggml_init_cublas: found 1 ROCm devices:
Device 0: AMD Radeon RX 7900 XTX, compute capability 11.0
Loading text
Loading World v20230424 tokenizer
273 tokens in the text
Token #0/273, 0%, ETA 2 m 16 s
Token #27/273, 9%, ETA 1 m 48 s, averages so far: loss [3.631], perplexity 37.749
Token #54/273, 19%, ETA 1 m 36 s, averages so far: loss [3.381], perplexity 29.408
Token #81/273, 29%, ETA 1 m 24 s, averages so far: loss [3.031], perplexity 20.719
Token #108/273, 39%, ETA 1 m 12 s, averages so far: loss [2.510], perplexity 12.306
Token #135/273, 49%, ETA 1 m 0 s, averages so far: loss [2.351], perplexity 10.491
Token #162/273, 59%, ETA 0 m 48 s, averages so far: loss [2.150], perplexity 8.582
Token #189/273, 69%, ETA 0 m 36 s, averages so far: loss [2.031], perplexity 7.621
Token #216/273, 79%, ETA 0 m 24 s, averages so far: loss [1.881], perplexity 6.561
Token #243/273, 89%, ETA 0 m 12 s, averages so far: loss [1.878], perplexity 6.540
Token #270/273, 98%, ETA 0 m 0 s, averages so far: loss [1.856], perplexity 6.400

Model: RWKV-novel-4-World-7B-20230810-ctx128k-ggml-f16.bin, data: test.txt with 273 tokens, skipped 2 tokens, averages: loss [1.859], perplexity 6.419, latency 447 ms per token

@Cyberhan123
Copy link
Contributor Author

Cyberhan123 commented Oct 2, 2023

It was my mistake, I needed to manually offload the context onto the gpu, I just found out.

@saharNooby
Copy link
Collaborator

latency 447 ms per token

Is this result for CPU or GPU? In any case, a second number is needed for comparison.

@Cyberhan123
Copy link
Contributor Author

latency 447 ms per token

Is this result for CPU or GPU? In any case, a second number is needed for comparison.

This is a GPU test, but it is not offloaded to the GPU correctly. Now by setting rwkv_gpu_offload_layers, it will be offloaded to the GPU correctly. I will check and improve the benchmark test.

@Cyberhan123
Copy link
Contributor Author

Cyberhan123 commented Oct 5, 2023

@saharNooby I think this PR has been completed.

@saharNooby saharNooby merged commit 22a2778 into RWKV:master Oct 6, 2023
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants