Releases · RWKV/RWKV-infctx-trainer

25 Jan 00:09

PicoCreator

v2.3.0

2db0b8d

v2.3.0 - S3FS datapath support Latest

Latest

What's Changed

Fix a bug in DS3 for A100/H100 nodes
Added support for S3 datapath
Change lr scheduler to cosine (credit: @SmerkyG )
Changes to step calculation (may affect your training scripts/templates)
Several bug fixes (credit: @SmerkyG )

Full Changelog: v2.2.1...v2.3.0

Example of S3 datapath config

data:
  # Skip the datapath setup
  #
  # ignored if using the preload_datapath.py, useful for speeding up the trainer startup
  # provided you have your datasets all properly preinitialized
  # ---
  skip_datapath_setup: True

  # dataset_path for the prebuilt dataset, using HF `load_from_disk()`
  #
  # Use this if you have built your own dataset and saved it with `save_to_disk()`
  # with source left as null. Other wise configure this to a directory which the 
  # dataset will be built and tokenized by the huggingface dataset process.
  #
  # If using relative path, this should be relative to the trainer script path
  data_path: s3://bucket-name/subpath/

  # Data path storage options, this is used to support cloud storage
  # via the huggingface dataset API. See:
  # https://huggingface.co/docs/datasets/v2.16.1/en/filesystems#amazon-s3
  #
  # Note: As of Jan 2023, these options has been only tested to work with AWS S3, and backblaze. YMMV
  #       For S3 bucket support you will also need to install s3fs `python3 -m pip install s3fs`
  #
  # If you want to reduce the risk of accidental key/secret commits, you can use
  # `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` environment variables instead
  #
  # For datapath, it should use the `s3://bucket-name/subpath` format
  # ---
  data_path_storage_options:
     key: <example S3 key>
     secret: <example S3 secret>
     endpoint_url: <example S3 endpoint>

Contributors

SmerkyG

Assets 2

18 Jan 07:15

PicoCreator

v2.2.1

36e6737

v2.2.1 - bug fix for minibatch and mixed dataset + some experimental flags

What's Changed

Fixed a bug where microbatch > 1 and mixed dataset size has errors
Additional experimental flags for training tweaks

Full Changelog: v2.2.0...v2.2.1

Assets 2

17 Jan 05:07

PicoCreator

v2.2.0

492c41f

v2.2.0

What's Changed

Dataset packing by @PicoCreator in #56
- When combined with microbatches, it can drastically increase the tokens/s for a well tuned setup - this is similar to what was done for the axolot trainer.
V5 batching support by @PicoCreator in #46
Updating to latest state / saaaa / sbbbb / scccc / sdddd code impleme… by @PicoCreator in #47
fix: loss forward segment count moved to correct device by @m8than in #48
Fix config example by @m8than in #49
Conversational data feature by @PicoCreator in #55

Full Changelog: v2.1.0...v2.2.0

Contributors

PicoCreator and m8than

Assets 2

16 Nov 08:23

PicoCreator

v2.1.0

75b3ba5

v2.1.0 - Batching support, minor logging behaviour and default changes

What's Changed

Added batching support with new trainer config microbatch_size - use this to trade vram for substential tokens/sec increase (50%++)
- This brings infctx trainer speed to be much closer to the main trainer, i should have done this earlier lol
Change real_ctx_len to data_ctx_len in wandb logging, to better reflect the above, as it will be now be an average of the microbatch
Changed default behaviour of multi_column, and fix #35
Improved documentation for #34
Microbatch support #25

Full Changelog: v2.0.1...v2.1.0

Assets 2

15 Nov 01:54

PicoCreator

v2.0.1

1e287e9

v2.0.1 - Breaking change, for v5.2 / V5-R4 specification support

WARNING: this version breaks only supports the latest v5.2 (aka v5-R4) models, and breaks all previous models. Moving forward we will only be supporting v5-R4

What's Changed

Fix segment_count not properly moved onto device by @TearGosling in #37
Full v5 r4 rewrite by @PicoCreator @harrisonvanderbyl in #44
Dropped the v5beta2 folder
bptt_truncate=true is enforced, until state gradients is supported by the cuda kernel
Technically the non cuda kernel works, but its so slow (~100x slower) that you really should not train without GPUs

New Contributors / Acknowledgements

@TearGosling made their first contribution in #37
@harrisonvanderbyl who helped directly to debug the code, and for his RNN-Factory codebase which was used for reference

Full Changelog: v1.1.1...v2.0.1

Contributors

PicoCreator, harrisonvanderbyl, and TearGosling

Assets 2

08 Sep 00:34

PicoCreator

v1.1.1

c19a127

v1.1.1 - v5r3 model bug fix

Fixed an issue of missing output normalisation used in v5 model

Full Changelog: v1.1.0...v1.1.1

Assets 2

06 Sep 20:56

PicoCreator

v1.1.0

beb46d5

v1.1.0 - breaking change for v5 (treat it as beta3)

WARNING: this version breaks existing v5 models, use v5-beta2 for previous models. Until an official v5 1.5B or higher official model is released, we will not be treating v5 breaking changes as major version changes. The existing v5 code (r3) has not been throughly tested, and maybe subjected to future changes.

What's Changed

Upgrading v5 to be in sync with v5r3 from blinks official repo
Move existing v5 code to v5-beta2 folder (as i know some folks have already started experimenting with v5)
various readme / documentation / example changes
Added data offset and limit params (to document)
WIP docker container
Fix for older python/lightning version for multi-gpu sync

Additional changes that was merged in

limited dataloader num_worker max to 8 by @diannaojiang in #17
(optional) Added token 0 to the tokenizer. by @m8than in #18
Dataset Sorting + multi column suffix features by @m8than in #23

New Contributors

@diannaojiang made their first contribution in #17
@m8than made their first contribution in #18

Full Changelog: v1.0.2...v1.1.0

Contributors

diannaojiang and m8than

Assets 2

21 Aug 16:40

PicoCreator

v1.0.2

505622b

v1.0.2 - Fixing world tokenizer vocab size

What's Changed

fixing vocab size mismatch by @PicoCreator in #12

Full Changelog: v1.0.1...v1.0.2

Contributors

PicoCreator

Assets 2

21 Aug 16:22

PicoCreator

v1.0.1

fd6bbfb

v1.0.1 - Incremental bug fixes

Various incremental fix

Export with BF16 support
potential fix for 4 tokens with issues in world tokenizer
Updated requirements.txt with missing jsonargparse[signatures]
Fix an issue with python 3.10 when doing GPU sync (you should still use 3.11 though)

The following non stable features were added:

WIP: Docker Env Container
PROTOTYPE: loss_bias support

Assets 2

16 Aug 14:57

PicoCreator

v1.0.0

6cb4ec4

v1.0.0 official release of infctx trainer

This is a major release from the original infctx trainer with a huge list of features, from the original infctx trainer

HF First dataset configuration (see: https://github.com/RWKV/RWKV-infctx-trainer/tree/main/notebook/dataset-config)
Deepspeed 3 support
Support for world tokenizer
Script included to initialize new models, to train models from scratch
RWKV v5 support (to finetune upcoming models)
BPTT support (default), for training arbitary data context length

Thanks for all those who helped test the trainer for bugs and issues, even when it was in a very rough early stages. While there are still some features that need to be added, or performance and docs that needs improving. For the vast majority of use cases, you should be able to get started with this new trainer for your finetuning (non LoRA) needs.

Special thanks to @Blealtan @BlinkDL @Yuzaboto @Bananaman

Contributors

Blealtan, BlinkDL, and Bananaman

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's Changed

Example of S3 datapath config

Contributors

What's Changed

What's Changed

Contributors

What's Changed

What's Changed

New Contributors / Acknowledgements

Contributors

What's Changed

Additional changes that was merged in

New Contributors

Contributors

What's Changed

Contributors

Contributors

Releases: RWKV/RWKV-infctx-trainer

v2.3.0 - S3FS datapath support

What's Changed

Example of S3 datapath config

Contributors

v2.2.1 - bug fix for minibatch and mixed dataset + some experimental flags

What's Changed

v2.2.0

What's Changed

Contributors

v2.1.0 - Batching support, minor logging behaviour and default changes

What's Changed

v2.0.1 - Breaking change, for v5.2 / V5-R4 specification support

What's Changed

New Contributors / Acknowledgements

Contributors

v1.1.1 - v5r3 model bug fix

v1.1.0 - breaking change for v5 (treat it as beta3)

What's Changed

Additional changes that was merged in

New Contributors

Contributors

v1.0.2 - Fixing world tokenizer vocab size

What's Changed

Contributors

v1.0.1 - Incremental bug fixes

v1.0.0 official release of infctx trainer

Contributors