Skip to content

Commit

Permalink
Merge remote-tracking branch 'upstream' into update-ggml
Browse files Browse the repository at this point in the history
  • Loading branch information
LoganDark committed Jun 21, 2023
2 parents 013ce1b + 9cbb9d9 commit e077496
Show file tree
Hide file tree
Showing 15 changed files with 300 additions and 102 deletions.
14 changes: 11 additions & 3 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,9 @@ option(RWKV_ACCELERATE "rwkv: enable Accelerate framework"
option(RWKV_OPENBLAS "rwkv: use OpenBLAS" OFF)
option(RWKV_CUBLAS "rwkv: use cuBLAS" OFF)

# Build only shared library without building tests and extras
option(RWKV_STANDALONE "rwkv: build only RWKV library" OFF)

#
# Compile flags
#
Expand Down Expand Up @@ -284,6 +287,11 @@ if (GGML_CUDA_SOURCES)
set_property(TARGET rwkv PROPERTY CUDA_ARCHITECTURES OFF)
endif()

enable_testing()
add_subdirectory(tests)
add_subdirectory(extras)
if (NOT RWKV_STANDALONE)
set_property(TARGET ggml PROPERTY GGML_STANDALONE OFF)
enable_testing()
add_subdirectory(tests)
add_subdirectory(extras)
elseif()
set_property(TARGET ggml PROPERTY GGML_STANDALONE ON)
endif()
44 changes: 19 additions & 25 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,16 +28,19 @@ Below table is for reference only. Measurements were made on 4C/8T x86 CPU with

#### With cuBLAS

Measurements were made on Intel i7 13700K & NVIDIA 3060 Ti 8G. Latency per token shown.
Measurements were made on Intel i7 13700K & NVIDIA 3060 Ti 8 GB. Latency per token in ms shown.

| Model | Layers on GPU | Format | 24 Threads | 8 Threads | 4 Threads | 2 Threads | 1 Threads |
|-----------------------|---------------|--------|-------------|------------|------------|------------|------------|
| `RWKV-4-Pile-169M` | 12 | `Q4_0` | 20.6 ms | 8.6 ms | 6.9 ms | 6.2 ms | 7.9 ms |
| `RWKV-4-Pile-169M` | 12 | `Q4_1` | 21.4 ms | 8.6 ms | 6.9 ms | 6.7 ms | 7.8 ms |
| `RWKV-4-Pile-169M` | 12 | `Q5_1` | 22.2 ms | 9.0 ms | 6.9 ms | 6.7 ms | 8.1 ms |
| `RWKV-4-Raven-7B-v11` | 32 | `Q4_0` | 94.9 ms | 54.3 ms | 50.2 ms | 51.6 ms | 59.2 ms |
| `RWKV-4-Raven-7B-v11` | 32 | `Q4_1` | 94.5 ms | 54.3 ms | 49.7 ms | 51.8 ms | 59.2 ms |
| `RWKV-4-Raven-7B-v11` | 32 | `Q5_1` | 101.6 ms | 72.3 ms | 67.2 ms | 69.3 ms | 77.0 ms |
| Model | Layers on GPU | Format | 1 thread | 2 threads | 4 threads | 8 threads | 24 threads |
|-----------------------|---------------|--------|----------|-----------|-----------|-----------|------------|
| `RWKV-4-Pile-169M` | 12 | `Q4_0` | 7.9 | 6.2 | 6.9 | 8.6 | 20 |
| `RWKV-4-Pile-169M` | 12 | `Q4_1` | 7.8 | 6.7 | 6.9 | 8.6 | 21 |
| `RWKV-4-Pile-169M` | 12 | `Q5_1` | 8.1 | 6.7 | 6.9 | 9.0 | 22 |

| Model | Layers on GPU | Format | 1 thread | 2 threads | 4 threads | 8 threads | 24 threads |
|-----------------------|---------------|--------|----------|-----------|-----------|-----------|------------|
| `RWKV-4-Raven-7B-v11` | 32 | `Q4_0` | 59 | 51 | 50 | 54 | 94 |
| `RWKV-4-Raven-7B-v11` | 32 | `Q4_1` | 59 | 51 | 49 | 54 | 94 |
| `RWKV-4-Raven-7B-v11` | 32 | `Q5_1` | 77 | 69 | 67 | 72 | 101 |

Note: since cuBLAS is supported only for `ggml_mul_mat()`, we still need to use few CPU resources to execute remaining operations.

Expand Down Expand Up @@ -68,7 +71,7 @@ This option is recommended for maximum performance, because the library would be

##### Windows

**Requirements**: [CMake](https://cmake.org/download/) or [CMake from anaconda](https://anaconda.org/conda-forge/cmake), MSVC compiler.
**Requirements**: [CMake](https://cmake.org/download/) or [CMake from anaconda](https://anaconda.org/conda-forge/cmake), [Build Tools for Visual Studio 2019](https://visualstudio.microsoft.com/vs/older-downloads/).

```commandline
cmake .
Expand All @@ -79,14 +82,7 @@ If everything went OK, `bin\Release\rwkv.dll` file should appear.

##### Windows + cuBLAS

**Important**: Since there are no cuBLAS static libraries for Windows, after compiling with dynamic libraries following DLLs should be copied from `{CUDA}/bin` into `build/bin/Release`: `cudart64_12.dll`, `cublas64_12.dll`, `cublasLt64_12.dll`.

```commandline
mkdir build
cd build
cmake .. -DRWKV_CUBLAS=ON
cmake --build . --config Release
```
Refer to [docs/cuBLAS_on_Windows.md](docs%2FcuBLAS_on_Windows.md) for a comprehensive guide.

##### Linux / MacOS

Expand All @@ -104,9 +100,7 @@ If everything went OK, `librwkv.so` (Linux) or `librwkv.dylib` (MacOS) file shou
##### Linux / MacOS + cuBLAS

```commandline
mkdir build
cd build
cmake .. -DRWKV_CUBLAS=ON
cmake . -DRWKV_CUBLAS=ON
cmake --build . --config Release
```

Expand All @@ -130,10 +124,10 @@ This option would require a little more manual work, but you can use it with any

```commandline
# Windows
python rwkv\convert_pytorch_to_ggml.py C:\RWKV-4-Pile-169M-20220807-8023.pth C:\rwkv.cpp-169M.bin float16
python rwkv\convert_pytorch_to_ggml.py C:\RWKV-4-Pile-169M-20220807-8023.pth C:\rwkv.cpp-169M.bin FP16
# Linux / MacOS
python rwkv/convert_pytorch_to_ggml.py ~/Downloads/RWKV-4-Pile-169M-20220807-8023.pth ~/Downloads/rwkv.cpp-169M.bin float16
python rwkv/convert_pytorch_to_ggml.py ~/Downloads/RWKV-4-Pile-169M-20220807-8023.pth ~/Downloads/rwkv.cpp-169M.bin FP16
```

**Optionally**, quantize the model into one of quantized formats from the table above:
Expand Down Expand Up @@ -218,8 +212,8 @@ For reference only, here is a list of latest versions of `rwkv.cpp` that have su
- `Q4_3`, `Q4_1_O`
- [commit c736ef5](https://github.com/saharNooby/rwkv.cpp/commit/c736ef5411606b529d3a74c139ee111ef1a28bb9), [release with prebuilt binaries](https://github.com/saharNooby/rwkv.cpp/releases/tag/master-1c363e6)

See also [FILE_FORMAT.md](FILE_FORMAT.md) for version numbers of `rwkv.cpp` model files and their changelog.
See also [docs/FILE_FORMAT.md](docs/FILE_FORMAT.md) for version numbers of `rwkv.cpp` model files and their changelog.

## Contributing

Please follow the code style described in [CODE_STYLE.md](CODE_STYLE.md).
Please follow the code style described in [docs/CODE_STYLE.md](docs/CODE_STYLE.md).
File renamed without changes.
File renamed without changes.
68 changes: 68 additions & 0 deletions docs/cuBLAS_on_Windows.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
# Using cuBLAS on Windows

To get cuBLAS in `rwkv.cpp` working on Windows, go through this guide section by section.

## Build Tools for Visual Studio 2019

Skip this step if you already have Build Tools installed.

To install Build Tools, go to [Visual Studio Older Downloads](https://visualstudio.microsoft.com/vs/older-downloads/), download `Visual Studio 2019 and other Products` and run the installer.

## CMake

Skip this step if you already have CMake installed: running `cmake --version` should output `cmake version x.y.z`.

Download latest `Windows x64 Installer` from [Download | CMake](https://cmake.org/download/) and run it.

## CUDA Toolkit

Skip this step if you already have CUDA Toolkit installed: running `nvcc --version` should output `nvcc: NVIDIA (R) Cuda compiler driver`.

CUDA Toolkit must be installed **after** CMake, or else CMake would not be able to see it and you will get error [No CUDA toolset found](https://stackoverflow.com/questions/56636714/cuda-compile-problems-on-windows-cmake-error-no-cuda-toolset-found).

Download an installer from [CUDA Toolkit Archive](https://developer.nvidia.com/cuda-toolkit-archive) and run it.

When installing:

- check `Visual Studio Integration`, or else CMake would not be able to see the toolkit
- optionally, uncheck driver installation — depending on the downloaded version of the toolkit, you may get an unwanted driver downgrade

## Building rwkv.cpp

The only thing different from the regular CPU build is `-DRWKV_CUBLAS=ON` option:

```commandline
cmake . -DRWKV_CUBLAS=ON
cmake --build . --config Release
```

If everything went OK, `bin\Release\rwkv.dll` file should appear.

## Using the GPU

You need to choose layer count that will be offloaded onto the GPU. In general, the more layers offloaded, the better will be the performance; but you may be constrained by VRAM size of your GPU. Increase offloaded layer count until you get "CUDA out of memory" errors.

If most of the computation is performed on GPU, you will not need high thread count. Optimal value may be as low as 1, since any additional threads would just eat CPU cycles while waiting for GPU operation to complete.

To offload layers to GPU:

- if using Python model: pass non-zero number in `gpu_layer_count` to constructor of `rwkv.rwkv_cpp_model.RWKVModel`
- if using Python wrapper for C library: call `rwkv.rwkv_cpp_shared_library.RWKVSharedLibrary.rwkv_gpu_offload_layers`
- if using C library directly: call `bool rwkv_gpu_offload_layers(struct rwkv_context * ctx, const uint32_t n_layers)`

## Fixing issues

You may get `FileNotFoundError: Could not find module '...\rwkv.dll' (or one of its dependencies). Try using the full path with constructor syntax.` error.

This means that the application couldn't find CUDA libraries that `rwkv.dll` depends on.

To fix this:

- navigate to the folder where CUDA Toolkit is installed
- usually, it looks like `C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7\bin`
- find three DLLs in the `bin` folder:
- `cudart64_110.dll`
- `cublas64_11.dll`
- `cublasLt64_11.dll`
- copy these DDLs to the folder containing `rwkv.dll`
- usually, the folder is `rwkv.cpp/bin/Release`
13 changes: 9 additions & 4 deletions extras/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,10 +1,15 @@
function(rwkv_add_extra source)
get_filename_component(EXTRA_TARGET ${source} NAME_WE)
add_executable(rwkv_${EXTRA_TARGET} ${source})
target_link_libraries(rwkv_${EXTRA_TARGET} PRIVATE ggml rwkv)
get_filename_component(EXTRA_TARGET ${source} NAME_WE)
add_executable(rwkv_${EXTRA_TARGET} ${source})
target_link_libraries(rwkv_${EXTRA_TARGET} PRIVATE ggml rwkv)
if (RWKV_STATIC)
get_target_property(target_LINK_OPTIONS rwkv_${EXTRA_TARGET} LINK_OPTIONS)
list(REMOVE_ITEM target_LINK_OPTIONS "-static")
set_target_properties(rwkv_${EXTRA_TARGET} PROPERTIES LINK_OPTIONS "${target_LINK_OPTIONS}")
endif()
endfunction()

file(GLOB extras *.c)
foreach (extra ${extras})
rwkv_add_extra(${extra})
rwkv_add_extra(${extra})
endforeach()

0 comments on commit e077496

Please sign in to comment.