Releases · RWKV/rwkv.cpp

26 Jun 11:24

ffc085c

master-ffc085c

Update GGML (#103)

* Update GGML

* Fix linux build

Of course we forgot why we did this, and broke the build again, in
the exact same way, a second time.

* Fix cuBLAS

Properly set the backend and then call ggml_cuda_transform_tensor

* Rename xx to x_prev

probably should slip this in now before we forget it's a thing.

* See how easy updates are now? (update GGML)

Assets 7

21 Jun 16:13

github-actions

master-9cbb9d9

9cbb9d9

master-9cbb9d9

Various improvements (#104)

* Make rwkv_gpu_offload_layers return true only if layers were actually offloaded

* Validate device of tensors

* Offload all layers during test

* Consistently use FP16 and FP32 instead of float16/fp16/F16/etc.

* Use spaces for indentation

* Remove spaces between type name and []

* Add cuBLAS on Windows guide, refactor docs structure

* Insert replacement characters when decoding invalid UTF-8 sequences

* Fix compatibility

* Fix formatting

* Fix copy-pasted tensor validation

Assets 7

15 Jun 11:17

github-actions

master-6b26e0d

6b26e0d

master-6b26e0d

Add Python support for sequence mode (#101)

Assets 7

14 Jun 15:57

github-actions

master-5316068

5316068

master-5316068

fix static linking for tests and extras, remove unneeded -static flag…

… (#98)

* fix static linking for tests and extras, remove unneeded -static flag

* Update extras/CMakeLists.txt for proper formating

Co-authored-by: Alex <saharNooby@users.noreply.github.com>

* revert last format commit

* fix indentation once more

---------

Co-authored-by: Alex <saharNooby@users.noreply.github.com>

Assets 7

14 Jun 15:59

github-actions

master-15b7c7b

15b7c7b

master-15b7c7b

add standalone build option (#99)

* add standalone build option

* Update CMakeLists.txt for more clarity in comment

Co-authored-by: Alex <saharNooby@users.noreply.github.com>

* add endofline properly for right formating

---------

Co-authored-by: Alex <saharNooby@users.noreply.github.com>

Assets 7

13 Jun 11:26

github-actions

master-c64009e

c64009e

master-c64009e

Fix typo in rwkv.h docs for n_vocab (#96)

World models actually have 65536, not 65535, oops

Assets 7

13 Jun 14:20

github-actions

master-bd65c97

bd65c97

master-bd65c97

Make sampling with bias numerically stable (#90)

* Update sampling.py

Remove a slow for loop on logit bias. Make the numpy re-softmax operation numerically stable.

* Update sampling.py

Assets 7

13 Jun 14:52

github-actions

master-69639f2

69639f2

master-69639f2

Cosmetic changes (code style, documentation, etc.) (#97)

* Fix README

* Update ggml

* Add missing items to code style document

* Reformat code, fix documentation and messages

* Fix some warnings on MSVC

* Refactor World tokenizer

* Fix duration formatting

* Refactor sampling

* Apply suggestions

Assets 7

12 Jun 11:38

github-actions

master-c41ed98

c41ed98

master-c41ed98

Sequence mode (#89)

* Sequence mode prototype

This is a prototype of sequence mode.

Load model ... 1.318s
Serial mode to process 30 tokens ... 2.116s
Sequence mode to process 30 tokens ... 0.509s
Logits total diff = 0.00000
Logits identical = TRUE

This is only for testing. It runs into precision and capacity
limits at large lengths. The goal is to support sequences of up to
25k tokens.

It is also likely that the dedicated single token functions should
be brought back. Again, only prototype.

* Move out rwkv_att_inner

* Move out more graph functions

* Print system info in sequence.c

* Small single-token optimizations

* Add function to estimate graph work size

* Avoid allocating new sequence graph every rwkv_eval_sequence

we still build one, but that seems necessary for ggml.

* Remove sequence capability from ops that do not need it

* Add GPU offload to sequence.c benchmark

* Only calculate 1 - x tensors once per layer

* use ggml_cpy in sequence mode xx output

* Rename "inputs" to "state" in rwkv_eval_sequence

* Basic sequence mode graph caching

This is a huge speedup when the same sequence length is used many
times in a row. I intend to clean up this code very soon

* Revert "Only calculate 1 - x tensors once per layer"

It doesn't actually matter

* Clean up code around graph building and ggml contexts

* Remove unused parameter from rwkv_att_wkv_size

* Fix printf integer width in rwkv_eval

* Correct assert return types, whoops

* Free rwkv_context at the end of sequence.c

* Fix typo I didn't make

* Expand single-line return conditions

* Enable sanitizer in macOS workflows

Sanitizer is enabled to fix issues discovered when testing #89. It
needs to be disabled as soon as it is possible (that is, master is
able to be built on MacOS GitHub runner again)

* Add doc comments and expand ser->serial, seq->sequence

* Adjust doc comment in rwkv.h

* Add thread safety note to rwkv_eval_sequence as well

* Remove entire rwkv.cpp source code from sequence.c

* Don't validate when sequence is NULL

lol

* Fix OOM on cuBLAS-enabled quantized models

* Remove sequence.c

Assets 7

12 Jun 16:11

github-actions

master-7199f5b

7199f5b

master-7199f5b

Phase out very verbose element_count functions (#95)

* Phase out very verbose element_count functions

This could have been done better

* Add "get" to other getters

* Specify "float elements" in rwkv_get_logits_len docs

* Use traditional for-loop for rwkv_init_state writes

* Newline

Assets 7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: RWKV/rwkv.cpp

master-ffc085c

master-9cbb9d9

master-6b26e0d

master-5316068

master-15b7c7b

master-c64009e

master-bd65c97

master-69639f2

master-c41ed98

master-7199f5b