Skip to content

Releases: RWKV/rwkv.cpp

master-ffc085c

26 Jun 11:24
ffc085c
Compare
Choose a tag to compare
Update GGML (#103)

* Update GGML

* Fix linux build

Of course we forgot why we did this, and broke the build again, in
the exact same way, a second time.

* Fix cuBLAS

Properly set the backend and then call ggml_cuda_transform_tensor

* Rename xx to x_prev

probably should slip this in now before we forget it's a thing.

* See how easy updates are now? (update GGML)

master-9cbb9d9

21 Jun 16:13
9cbb9d9
Compare
Choose a tag to compare
Various improvements (#104)

* Make rwkv_gpu_offload_layers return true only if layers were actually offloaded

* Validate device of tensors

* Offload all layers during test

* Consistently use FP16 and FP32 instead of float16/fp16/F16/etc.

* Use spaces for indentation

* Remove spaces between type name and []

* Add cuBLAS on Windows guide, refactor docs structure

* Insert replacement characters when decoding invalid UTF-8 sequences

* Fix compatibility

* Fix formatting

* Fix copy-pasted tensor validation

master-6b26e0d

15 Jun 11:17
6b26e0d
Compare
Choose a tag to compare
Add Python support for sequence mode (#101)

master-5316068

14 Jun 15:57
5316068
Compare
Choose a tag to compare
fix static linking for tests and extras, remove unneeded -static flag…

… (#98)

* fix static linking for tests and extras, remove unneeded -static flag

* Update extras/CMakeLists.txt for proper formating

Co-authored-by: Alex <saharNooby@users.noreply.github.com>

* revert last format commit

* fix indentation once more

---------

Co-authored-by: Alex <saharNooby@users.noreply.github.com>

master-15b7c7b

14 Jun 15:59
15b7c7b
Compare
Choose a tag to compare
add standalone build option (#99)

* add standalone build option

* Update CMakeLists.txt for more clarity in comment

Co-authored-by: Alex <saharNooby@users.noreply.github.com>

* add endofline properly for right formating

---------

Co-authored-by: Alex <saharNooby@users.noreply.github.com>

master-c64009e

13 Jun 11:26
c64009e
Compare
Choose a tag to compare
Fix typo in rwkv.h docs for n_vocab (#96)

World models actually have 65536, not 65535, oops

master-bd65c97

13 Jun 14:20
bd65c97
Compare
Choose a tag to compare
Make sampling with bias numerically stable (#90)

* Update sampling.py

Remove a slow for loop on logit bias. Make the numpy re-softmax operation numerically stable.

* Update sampling.py

master-69639f2

13 Jun 14:52
69639f2
Compare
Choose a tag to compare
Cosmetic changes (code style, documentation, etc.) (#97)

* Fix README

* Update ggml

* Add missing items to code style document

* Reformat code, fix documentation and messages

* Fix some warnings on MSVC

* Refactor World tokenizer

* Fix duration formatting

* Refactor sampling

* Apply suggestions

master-c41ed98

12 Jun 11:38
c41ed98
Compare
Choose a tag to compare
Sequence mode (#89)

* Sequence mode prototype

This is a prototype of sequence mode.

Load model ... 1.318s
Serial mode to process 30 tokens ... 2.116s
Sequence mode to process 30 tokens ... 0.509s
Logits total diff = 0.00000
Logits identical = TRUE

This is only for testing. It runs into precision and capacity
limits at large lengths. The goal is to support sequences of up to
25k tokens.

It is also likely that the dedicated single token functions should
be brought back. Again, only prototype.

* Move out rwkv_att_inner

* Move out more graph functions

* Print system info in sequence.c

* Small single-token optimizations

* Add function to estimate graph work size

* Avoid allocating new sequence graph every rwkv_eval_sequence

we still build one, but that seems necessary for ggml.

* Remove sequence capability from ops that do not need it

* Add GPU offload to sequence.c benchmark

* Only calculate 1 - x tensors once per layer

* use ggml_cpy in sequence mode xx output

* Rename "inputs" to "state" in rwkv_eval_sequence

* Basic sequence mode graph caching

This is a huge speedup when the same sequence length is used many
times in a row. I intend to clean up this code very soon

* Revert "Only calculate 1 - x tensors once per layer"

It doesn't actually matter

* Clean up code around graph building and ggml contexts

* Remove unused parameter from rwkv_att_wkv_size

* Fix printf integer width in rwkv_eval

* Correct assert return types, whoops

* Free rwkv_context at the end of sequence.c

* Fix typo I didn't make

* Expand single-line return conditions

* Enable sanitizer in macOS workflows

Sanitizer is enabled to fix issues discovered when testing #89. It
needs to be disabled as soon as it is possible (that is, master is
able to be built on MacOS GitHub runner again)

* Add doc comments and expand ser->serial, seq->sequence

* Adjust doc comment in rwkv.h

* Add thread safety note to rwkv_eval_sequence as well

* Remove entire rwkv.cpp source code from sequence.c

* Don't validate when sequence is NULL

lol

* Fix OOM on cuBLAS-enabled quantized models

* Remove sequence.c

master-7199f5b

12 Jun 16:11
7199f5b
Compare
Choose a tag to compare
Phase out very verbose element_count functions (#95)

* Phase out very verbose element_count functions

This could have been done better

* Add "get" to other getters

* Specify "float elements" in rwkv_get_logits_len docs

* Use traditional for-loop for rwkv_init_state writes

* Newline