Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

基本支持RWKV6 #171

Open
wants to merge 10 commits into
base: master
Choose a base branch
from
Open

基本支持RWKV6 #171

wants to merge 10 commits into from

Conversation

YuChuXi
Copy link

@YuChuXi YuChuXi commented Apr 4, 2024

基本支持RWKV6
转换,加载,lora好了,但是计算图有问题,rwkv_graph.inc:348行附近怎么都改不好,我打了感叹号标记

@saharNooby
Copy link
Collaborator

saharNooby commented Apr 6, 2024

Hi!

Although I've stepped down as the maintainer of rwkv.cpp, I think my input will still be valuable:

  1. rwkv_att_v6 looks like a naive (as in "creates and operates on full-blown matrices") implementation of the attention. If it's true, then v6 models will be too slow and probably unusable on large context lengths. See comparisons for v5. I recommend pulling an optimized implementation from somewhere and adapting it into rwkv.cpp, as was done with v5.
  2. there are no tests for v6 models, no Tiny RWKV trained, etc. I would not merge v6 support without proper tests added, because it is asking for something to be unknowingly broken.

Edit: I may be mistaken about performance issues, because rwkv_att_v6 actually calls rwkv_wkv_v5. In any case, I recommend doing latency/memory usage measurements and comparing them with v5 models.

The second point still stands -- there must be quality assurance.

@LaylBongers
Copy link

Hey there! Thanks for the pull request for the long-awaited RWKV V6 support. I'll soon get to reviewing the code, it's on my schedule! Unfortunately I cannot read Chinese, so I've had to get some help translating.
As already mentioned, to make sure support is (and continues to be) functional, we need to include some tests.

@YuChuXi
Copy link
Author

YuChuXi commented Apr 10, 2024

Sorry, it's not convenient for me to reply at school.
This pr only supports the conversion and loading of RWKV6.
The file "rwkv_graph.inc" has an issue near lines 342-347 that I cannot solve.
No matter how I modify it, I may encounter errors such as "GGML_ASSERT: /media/yuchuxi/YuZi/Project/Mozi/rwkv.cpp/ggml/src/ggml.c:6499: ggml_is_contiguous(a)
"

@YuChuXi
Copy link
Author

YuChuXi commented Apr 10, 2024

I'll go find a small model to test

w = ggml_reshape_4d(ctx, w, 1, head_size, head_count, sequence_length);

// w = torch.exp(-torch.exp(w))
w = rwkv_exp(ctx,rwkv_1_minus_x(ctx,rwkv_exp(ctx, w)));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this call to rwkv_1_minus_x() belong here? The python function isn't w = torch.exp(1 - torch.exp(w)), so shouldn't it be w = rwkv_exp (ctx, ggml_neg (ctx, rwkv_exp (ctx, w)));

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rwkv_1_minus_x is defined as 1 - x, ggml_neg is defined as -x. So the code looks correct to me -- indeed, we want exp(1 - exp(w)), not exp(-exp(w)).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This refers to the line here: https://github.com/BlinkDL/ChatRWKV/blob/8c7956743703afddd9bbb09ec5fbaf95e5b05227/RWKV_v6_demo.py#L187

  • w = torch.exp(-torch.exp(w.float()))

There's no subtraction operation, only negation

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, okay. I misread your comment, sorry; looks like you are right :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants