Various improvements #131

saharNooby · 2023-09-23T13:13:32Z

This time, there are actually new features and QoL improvements for end-users!

added rwkv_eval_sequence_in_chunks: an easy to use function for processing whole prompts, instead of splitting them in chunks manually (which is prone to errors) or using eval (which is very slow)
added model head offloading: helps to cut ~10 ms on my machine when using CUDA
removed dependency on PyTorch for inference in Python; you still need PyTorch for model conversion and LoRA application
removed dependency on tokenizers for World models inference in Python; you still need tokenizers to run Pile and Raven models
added function gpu_offload_layers to RWKVModel, now you don't need to guess how many layers you need to offload before creating the model: you can create the model, get n_layer and call gpu_offload_layers after that
tokenizer argument is now optional in Python scripts, it will be guessed from n_vocab of the loaded model

Closes #106

saharNooby added 5 commits September 21, 2023 19:57

Implement model head offloading

10b0e2b

Guess the tokenizer from n_vocab

8a09877

Make PyTorch optional for inference

d6a9b24

Add function to offload layers

8260553

Add rwkv_eval_sequence_in_chunks

3caaa1d

saharNooby merged commit 39ed572 into master Sep 23, 2023
24 checks passed

saharNooby deleted the improvements-2023-09-21 branch September 23, 2023 13:18

Provide feedback