-
Notifications
You must be signed in to change notification settings - Fork 77
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add support for Q5_0, Q5_1 and Q8_0 formats; remove Q4_1_O format (#44)
* Remove Q4_3 support * Add Q5_0, Q5_1, Q8_0 support * Add more clear message when loading Q4_3 model * Remove Q4_1_O format * Fix indentation in .gitmodules * Simplify sanitizer matrix
- Loading branch information
1 parent
c736ef5
commit 1198892
Showing
14 changed files
with
233 additions
and
425 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,4 @@ | ||
[submodule "ggml"] | ||
path = ggml | ||
url = https://github.com/saharNooby/ggml | ||
branch = master-2023-04-29 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,53 @@ | ||
# rwkv.cpp file format | ||
|
||
This format is used by `rwkv.cpp` to store RWKV model checkpoints. | ||
|
||
Preferred file extension: `.bin` | ||
|
||
Specification in C-like pseudocode: | ||
|
||
``` | ||
RWKVModelFile { | ||
// All ints and floats are in machine byte order. | ||
// Magic is "ggml" string bytes. | ||
int32 magic = 0x67676d66; | ||
int32 version = 100; | ||
int32 n_vocab; | ||
int32 n_embed; | ||
int32 n_layer; | ||
// Data type of most of the parameters. See "Data types" below for possible values. | ||
int32 data_type; | ||
// Read until EOF. | ||
Parameter[] parameters; | ||
} | ||
Parameter { | ||
int32 dim_count; | ||
int32 key_length; | ||
// Data type of the parameter. See "Data types" below for possible values. | ||
int32 data_type; | ||
// Compared to PyTorch's parameter.shape, dimension order is reversed here! | ||
int32[dim_count] shape; | ||
// Keys are like "emb.weight", "block.0.ln1.weight". | ||
uint8[key_length] key_utf8; | ||
// Length of the data array depends on parameter data type: | ||
// - FP32: 4 * element_count | ||
// - FP16: 2 * element_count | ||
// - QX_Y (quantized): element_count / QKX_Y * sizeof(block_qx_y) | ||
// See ggml.c for values of QK and block sizes of specific formats. | ||
byte[] data; | ||
} | ||
``` | ||
|
||
## Data types | ||
|
||
- 0: `FP32` | ||
- 1: `FP16` | ||
- 2: `Q4_0` | ||
- 3: `Q4_1` | ||
- 4: *unused* | ||
- 5: `Q4_2` | ||
- 6: *unused* | ||
- 7: `Q5_0` | ||
- 8: `Q5_1` | ||
- 9: `Q8_0` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Submodule ggml
updated
from bfa8d5 to a0687a
Oops, something went wrong.