Skip to content

Releases: RWKV/rwkv.cpp

master-43c78f2

12 Jun 15:45
43c78f2
Compare
Choose a tag to compare
Fix build on non-MSVC compilers for Windows platforms (#94)

master-b88ae59

11 Jun 06:49
b88ae59
Compare
Choose a tag to compare
Fix bug in world tokenizer (#93)

master-82c4ac7

08 Jun 11:40
82c4ac7
Compare
Choose a tag to compare
Add support for the world tokenizer (#86)

* Add support for the world tokenizer

* Move tokenizer logic to rwkv_tokenizer.py

* Added test for the tokenizer

master-09ec314

07 Jun 11:47
09ec314
Compare
Choose a tag to compare
Fix visual bug in quantization (#92)

It didn't calculate the compression ratio properly because of a
copy/paste error :(

master-fb6708b

03 Jun 10:14
fb6708b
Compare
Choose a tag to compare
Fix pytorch storage warnings, fixes #80 (#88)

we seriously don't care what type of storage we get, pytorch sucks

master-5b41cd7

03 Jun 10:48
5b41cd7
Compare
Choose a tag to compare
Add capability for extra binaries to be built with rwkv.cpp (#87)

* Add capability for examples

This also adds a quantizer that works without python.
in the future, we might be able to convert from pytorch as well,
without python.

* example implied code style

* rename examples to tools

* rename cpuinfo.c to cpu_info.c

* include ggml header again

* Return EXIT_FAILURE on help

* done with this

* final name: extras

* going To have a seizure

* wait literal double n

master-3f8bb2c

03 Jun 10:08
3f8bb2c
Compare
Choose a tag to compare
Allow creating multiple contexts per model (#83)

* Allow creating multiple contexts per model

This allows for parallel inference and I am preparing to support
sequence mode using a method similar to this

* Fix cuBLAS

* Update rwkv.h

Co-authored-by: Alex <saharNooby@users.noreply.github.com>

* Update rwkv.cpp

Co-authored-by: Alex <saharNooby@users.noreply.github.com>

* Inherit print_errors from parent ctx when cloning

* Add context cloning test

* Free

* Free ggml context when last rwkv_context is freed

* Free before exit

* int main

* add explanation of ffn_key_size

* Update rwkv_instance and rwkv_context comments

* Thread safety notes

---------

Co-authored-by: Alex <saharNooby@users.noreply.github.com>

master-363dfb1

31 May 11:33
363dfb1
Compare
Choose a tag to compare
File parsing and memory usage optimization (#74)

* Rework the entire file parsing system

prepare for future changes

* Estimate memory usage perfectly

Removes whatever issue with small models that used to exist

* Fix file stream ops on macOS

for me this compiles on Windows 11, Ubuntu 20.04, and macOS 10.14

* Fix rwkv.cpp for non-WIN32 MSVC invocations like bindgen-rs

* Implement Q8_1 quantization

...and disable the type, because GGML doesn't support the ops
required to run inference with it.

It's not worth any nasty hacks or workarounds right now, Q8_0 is
very very similar if one wants 8-bit quantization.

* Completely remove Q8_1 type

This type isn't meant to be user-facing in any way so I may as well
get rid of it now since it will probably never exist as a data
format.

* Switch from std::vector to unique array for model layers

These don't ever need to be resized

* Factor ffn.key.weight height into memory estimate

some models have this set weirdly, in various different ways.
just give up and record the actual size of it and use that

* Make a few more operations inplace

ggml doesn't currently expose most of the stuff it supports, so
force some things. not 100% sure about this, I don't think the
memory savings are that worth it

* attempt a perfect upper bound size for the scratch space

This should be the largest work_size seen in any model, since it
is always larger than any of the other paramters except vocab
(which does not participate in the graph work size).

* Revert "Make a few more operations inplace"

This reverts commit f94d6eb216040ae0ad23d2b9c87fae8349882f89.

* Make less calls to fread

micro-optimization

* Fix memory size estimation for smaller models

ggml works with some larger formats internally

* print location in all assert macros

* remove trailing whitespace

* add type_to_string entry for unknown

* Simplify quantization a bit

* fix cuBLAS compatibility

adding n_gpu_layers to rwkv_init_from_file won't work.
add an extra function instead

* fix quantize

* quantize: don't create output file if opening input fails

* Rename gpu offload layers

might want to avoid branding it with cublas in case we add something
like clblast support in the future

* Remove old read_int32 and write_int32 functions

It's all uints now

* Remove static from things

* Only call gpu_offload_layers if gpu_layer_count > 0

* Add rwkv_ prefix to all structures

* Braces

* Functions naming convention

* Remove blank line after comment

* Capitalize comments

* Re-add quantize explanatory comment

* Re-add histogram comment

* Convert all error messages to uppercase

* Make type conversions extern

for ffi bindings from other langs

* Name the state parts

The code in rwkv_eval to initialize the state (when state_in is
NULL) was getting very confusing so I just put everything in a
struct to name it.

* Fnvalid

master-241350f

29 May 12:14
241350f
Compare
Choose a tag to compare
Feature add cublas support (#65)

* chore: add ggml import in the head of rwkv.h

* chore: add ggml import in the head of rwkv.h

* feat: add cublas support

* feat: update rwkv.cpp

* feat: remove unused change

* chore: fix linux build issue

* chore: sync ggml and offload tensor to gpu

* chore: comment out tensors which occurs error on GPU

* chore: update comment and readme

* chore: update ggml to recent

* chore: add more performance test results

* chore: add more performance test results

* chore: fix problem of reading file more than 2 gb

* chore: merge master

* chore: remove unused comment

* chore: fix for comments

* Update README.md

* Update rwkv.cpp

---------

Co-authored-by: Alex <saharNooby@users.noreply.github.com>

master-dea929f

27 May 11:05
dea929f
Compare
Choose a tag to compare
Various improvements & upgrade ggml (#75)

* Use types from typing for better compatibility with older Python versions

* Split last double end of line token as per BlinkDL's suggestion

* Fix MSVC warnings

* Drop Q4_2 support

* Update ggml

* Bump file format version for quantization changes

* Apply suggestions