Skip to content

Commit

Permalink
Cosmetic changes (code style, documentation, etc.) (#97)
Browse files Browse the repository at this point in the history
* Fix README

* Update ggml

* Add missing items to code style document

* Reformat code, fix documentation and messages

* Fix some warnings on MSVC

* Refactor World tokenizer

* Fix duration formatting

* Refactor sampling

* Apply suggestions
  • Loading branch information
saharNooby committed Jun 13, 2023
1 parent bd65c97 commit 69639f2
Show file tree
Hide file tree
Showing 19 changed files with 304 additions and 953 deletions.
8 changes: 7 additions & 1 deletion CODE_STYLE.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,16 +11,22 @@ Overall, keep code in similar style as it was before.
- Keep lines at 180 characters or shorter.
- Separate logically grouped pieces of code with empty lines.
- Surround `if`, `for`, `while`, `do` and other similar statements with empty lines.
- Add trailing new line to the end of the file.

### Comments and messages

- Write documentation for public functions indended for outside use.
- Place single-line comments on the line before, not right after the code line.
- Start comments with a capital letter, use correct grammar and punctuation.
- Begin comments with a capital letter, use correct grammar and punctuation.
- Begin messages, including error messages, with a capital letter.

## C/C++

- Use 4 spaces for indentation.
- Use [The One True Brace Style](https://en.wikipedia.org/wiki/Indentation_style#Variant:_1TBS_(OTBS)):
- Place braces on the same line as the statement.
- Always add braces to `if`, `for`, `while`, `do` and other similar statements.
- Prefix top-level function and struct names with `rwkv_`.

## Python

Expand Down
12 changes: 6 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

This is a port of [BlinkDL/RWKV-LM](https://github.com/BlinkDL/RWKV-LM) to [ggerganov/ggml](https://github.com/ggerganov/ggml).

Besides the usual **FP32**, it supports **FP16**, **quantized INT4, INT5 and INT8** inference. This project is **CPU only**.
Besides the usual **FP32**, it supports **FP16**, **quantized INT4, INT5 and INT8** inference. This project is **focused on CPU**, but cuBLAS is also supported.

This project provides [a C library rwkv.h](rwkv.h) and [a convinient Python wrapper](rwkv%2Frwkv_cpp_model.py) for it.

Expand All @@ -28,7 +28,7 @@ Below table is for reference only. Measurements were made on 4C/8T x86 CPU with

#### With cuBLAS

Measurements were made on 3060Ti 8G + i7 13700K. Latency per token shown.
Measurements were made on Intel i7 13700K & NVIDIA 3060 Ti 8G. Latency per token shown.

| Model | Layers on GPU | Format | 24 Threads | 8 Threads | 4 Threads | 2 Threads | 1 Threads |
|-----------------------|---------------|--------|-------------|------------|------------|------------|------------|
Expand All @@ -39,7 +39,7 @@ Measurements were made on 3060Ti 8G + i7 13700K. Latency per token shown.
| `RWKV-4-Raven-7B-v11` | 32 | `Q4_1` | 94.5 ms | 54.3 ms | 49.7 ms | 51.8 ms | 59.2 ms |
| `RWKV-4-Raven-7B-v11` | 32 | `Q5_1` | 101.6 ms | 72.3 ms | 67.2 ms | 69.3 ms | 77.0 ms |

Note: since there is only `ggml_mul_mat()` supported with cuBLAS, we still need to assign few CPU resources to execute remaining operations.
Note: since cuBLAS is supported only for `ggml_mul_mat()`, we still need to use few CPU resources to execute remaining operations.

## How to use

Expand Down Expand Up @@ -79,7 +79,7 @@ If everything went OK, `bin\Release\rwkv.dll` file should appear.

##### Windows + cuBLAS

**Important**: Since there is no cuBLAS static libraries for Windows, after compiling with dynamic libraries following DLLs should be copied from `{CUDA}/bin` into `build/bin/Release`: `cudart64_12.dll`, `cublas64_12.dll`, `cublasLt64_12.dll`.
**Important**: Since there are no cuBLAS static libraries for Windows, after compiling with dynamic libraries following DLLs should be copied from `{CUDA}/bin` into `build/bin/Release`: `cudart64_12.dll`, `cublas64_12.dll`, `cublasLt64_12.dll`.

```commandline
mkdir build
Expand Down Expand Up @@ -116,7 +116,7 @@ If everything went OK, `librwkv.so` (Linux) or `librwkv.dylib` (MacOS) file shou

#### Option 3.1. Download pre-quantized Raven model

There are pre-quantized Raven models available on [Hugging Face](https://huggingface.co/BlinkDL/rwkv-4-raven/tree/main). Check that you are downloading `.bin` file, NOT `.pth`.
There are pre-quantized Raven models available on [Hugging Face](https://huggingface.co/BlinkDL/rwkv-4-raven/tree/main). Check that you are downloading `.bin` file, **not** `.pth`.

#### Option 3.2. Convert and quantize PyTorch model

Expand Down Expand Up @@ -222,4 +222,4 @@ See also [FILE_FORMAT.md](FILE_FORMAT.md) for version numbers of `rwkv.cpp` mode

## Contributing

There is no complete contributor guide yet; but we have [CODE_STYLE.md](CODE_STYLE.md).
Please follow the code style described in [CODE_STYLE.md](CODE_STYLE.md).
2 changes: 1 addition & 1 deletion extras/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,4 @@ endfunction()
file(GLOB extras *.c)
foreach (extra ${extras})
rwkv_add_extra(${extra})
endforeach()
endforeach()
2 changes: 1 addition & 1 deletion extras/cpu_info.c
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,4 @@

int main() {
printf("%s", rwkv_get_system_info_string());
}
}
24 changes: 12 additions & 12 deletions extras/quantize.c
Original file line number Diff line number Diff line change
Expand Up @@ -5,15 +5,6 @@
#include <stdio.h>
#include <string.h>

enum ggml_type type_from_string(const char* string) {
if (strcmp(string, "Q4_0") == 0) return GGML_TYPE_Q4_0;
if (strcmp(string, "Q4_1") == 0) return GGML_TYPE_Q4_1;
if (strcmp(string, "Q5_0") == 0) return GGML_TYPE_Q5_0;
if (strcmp(string, "Q5_1") == 0) return GGML_TYPE_Q5_1;
if (strcmp(string, "Q8_0") == 0) return GGML_TYPE_Q8_0;
return GGML_TYPE_COUNT;
}

#ifdef _WIN32
bool QueryPerformanceFrequency(uint64_t* lpFrequency);
bool QueryPerformanceCounter(uint64_t* lpPerformanceCount);
Expand All @@ -31,7 +22,16 @@ bool QueryPerformanceCounter(uint64_t* lpPerformanceCount);
#define TIME_DIFF(freq, start, end) (double) ((end.tv_nsec - start.tv_nsec) / 1000000) / 1000
#endif

int main(int argc, char* argv[]) {
enum ggml_type type_from_string(const char* string) {
if (strcmp(string, "Q4_0") == 0) return GGML_TYPE_Q4_0;
if (strcmp(string, "Q4_1") == 0) return GGML_TYPE_Q4_1;
if (strcmp(string, "Q5_0") == 0) return GGML_TYPE_Q5_0;
if (strcmp(string, "Q5_1") == 0) return GGML_TYPE_Q5_1;
if (strcmp(string, "Q8_0") == 0) return GGML_TYPE_Q8_0;
return GGML_TYPE_COUNT;
}

int main(int argc, char * argv[]) {
if (argc != 4 || type_from_string(argv[3]) == GGML_TYPE_COUNT) {
fprintf(stderr, "Usage: %s INPUT OUTPUT FORMAT\n\nAvailable formats: Q4_0 Q4_1 Q5_0 Q5_1 Q8_0\n", argv[0]);
return EXIT_FAILURE;
Expand All @@ -40,7 +40,7 @@ int main(int argc, char* argv[]) {
time_t freq, start, end;
time_calibrate(freq);

fprintf(stderr, "Quantizing ...\n");
fprintf(stderr, "Quantizing...\n");

time_measure(start);
bool success = rwkv_quantize_model_file(argv[1], argv[2], argv[3]);
Expand All @@ -55,4 +55,4 @@ int main(int argc, char* argv[]) {
fprintf(stderr, "Error in %.3fs: 0x%.8X\n", diff, rwkv_get_last_error(NULL));
return EXIT_FAILURE;
}
}
}

0 comments on commit 69639f2

Please sign in to comment.