Releases: RWKV/rwkv.cpp
Releases · RWKV/rwkv.cpp
master-ffc085c
Update GGML (#103) * Update GGML * Fix linux build Of course we forgot why we did this, and broke the build again, in the exact same way, a second time. * Fix cuBLAS Properly set the backend and then call ggml_cuda_transform_tensor * Rename xx to x_prev probably should slip this in now before we forget it's a thing. * See how easy updates are now? (update GGML)
master-9cbb9d9
Various improvements (#104) * Make rwkv_gpu_offload_layers return true only if layers were actually offloaded * Validate device of tensors * Offload all layers during test * Consistently use FP16 and FP32 instead of float16/fp16/F16/etc. * Use spaces for indentation * Remove spaces between type name and [] * Add cuBLAS on Windows guide, refactor docs structure * Insert replacement characters when decoding invalid UTF-8 sequences * Fix compatibility * Fix formatting * Fix copy-pasted tensor validation
master-6b26e0d
Add Python support for sequence mode (#101)
master-5316068
fix static linking for tests and extras, remove unneeded -static flag… … (#98) * fix static linking for tests and extras, remove unneeded -static flag * Update extras/CMakeLists.txt for proper formating Co-authored-by: Alex <saharNooby@users.noreply.github.com> * revert last format commit * fix indentation once more --------- Co-authored-by: Alex <saharNooby@users.noreply.github.com>
master-15b7c7b
add standalone build option (#99) * add standalone build option * Update CMakeLists.txt for more clarity in comment Co-authored-by: Alex <saharNooby@users.noreply.github.com> * add endofline properly for right formating --------- Co-authored-by: Alex <saharNooby@users.noreply.github.com>
master-c64009e
Fix typo in rwkv.h docs for n_vocab (#96) World models actually have 65536, not 65535, oops
master-bd65c97
Make sampling with bias numerically stable (#90) * Update sampling.py Remove a slow for loop on logit bias. Make the numpy re-softmax operation numerically stable. * Update sampling.py
master-69639f2
Cosmetic changes (code style, documentation, etc.) (#97) * Fix README * Update ggml * Add missing items to code style document * Reformat code, fix documentation and messages * Fix some warnings on MSVC * Refactor World tokenizer * Fix duration formatting * Refactor sampling * Apply suggestions
master-c41ed98
Sequence mode (#89) * Sequence mode prototype This is a prototype of sequence mode. Load model ... 1.318s Serial mode to process 30 tokens ... 2.116s Sequence mode to process 30 tokens ... 0.509s Logits total diff = 0.00000 Logits identical = TRUE This is only for testing. It runs into precision and capacity limits at large lengths. The goal is to support sequences of up to 25k tokens. It is also likely that the dedicated single token functions should be brought back. Again, only prototype. * Move out rwkv_att_inner * Move out more graph functions * Print system info in sequence.c * Small single-token optimizations * Add function to estimate graph work size * Avoid allocating new sequence graph every rwkv_eval_sequence we still build one, but that seems necessary for ggml. * Remove sequence capability from ops that do not need it * Add GPU offload to sequence.c benchmark * Only calculate 1 - x tensors once per layer * use ggml_cpy in sequence mode xx output * Rename "inputs" to "state" in rwkv_eval_sequence * Basic sequence mode graph caching This is a huge speedup when the same sequence length is used many times in a row. I intend to clean up this code very soon * Revert "Only calculate 1 - x tensors once per layer" It doesn't actually matter * Clean up code around graph building and ggml contexts * Remove unused parameter from rwkv_att_wkv_size * Fix printf integer width in rwkv_eval * Correct assert return types, whoops * Free rwkv_context at the end of sequence.c * Fix typo I didn't make * Expand single-line return conditions * Enable sanitizer in macOS workflows Sanitizer is enabled to fix issues discovered when testing #89. It needs to be disabled as soon as it is possible (that is, master is able to be built on MacOS GitHub runner again) * Add doc comments and expand ser->serial, seq->sequence * Adjust doc comment in rwkv.h * Add thread safety note to rwkv_eval_sequence as well * Remove entire rwkv.cpp source code from sequence.c * Don't validate when sequence is NULL lol * Fix OOM on cuBLAS-enabled quantized models * Remove sequence.c
master-7199f5b
Phase out very verbose element_count functions (#95) * Phase out very verbose element_count functions This could have been done better * Add "get" to other getters * Specify "float elements" in rwkv_get_logits_len docs * Use traditional for-loop for rwkv_init_state writes * Newline