Skip to content

Releases: RWKV/RWKV-infctx-trainer

v2.3.0 - S3FS datapath support

25 Jan 00:09
2db0b8d
Compare
Choose a tag to compare

What's Changed

  • Fix a bug in DS3 for A100/H100 nodes
  • Added support for S3 datapath
  • Change lr scheduler to cosine (credit: @SmerkyG )
  • Changes to step calculation (may affect your training scripts/templates)
  • Several bug fixes (credit: @SmerkyG )

Full Changelog: v2.2.1...v2.3.0

Example of S3 datapath config

data:
  # Skip the datapath setup
  #
  # ignored if using the preload_datapath.py, useful for speeding up the trainer startup
  # provided you have your datasets all properly preinitialized
  # ---
  skip_datapath_setup: True

  # dataset_path for the prebuilt dataset, using HF `load_from_disk()`
  #
  # Use this if you have built your own dataset and saved it with `save_to_disk()`
  # with source left as null. Other wise configure this to a directory which the 
  # dataset will be built and tokenized by the huggingface dataset process.
  #
  # If using relative path, this should be relative to the trainer script path
  data_path: s3://bucket-name/subpath/

  # Data path storage options, this is used to support cloud storage
  # via the huggingface dataset API. See:
  # https://huggingface.co/docs/datasets/v2.16.1/en/filesystems#amazon-s3
  #
  # Note: As of Jan 2023, these options has been only tested to work with AWS S3, and backblaze. YMMV
  #       For S3 bucket support you will also need to install s3fs `python3 -m pip install s3fs`
  #
  # If you want to reduce the risk of accidental key/secret commits, you can use
  # `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` environment variables instead
  #
  # For datapath, it should use the `s3://bucket-name/subpath` format
  # ---
  data_path_storage_options:
     key: <example S3 key>
     secret: <example S3 secret>
     endpoint_url: <example S3 endpoint>

v2.2.1 - bug fix for minibatch and mixed dataset + some experimental flags

18 Jan 07:15
36e6737
Compare
Choose a tag to compare

What's Changed

  • Fixed a bug where microbatch > 1 and mixed dataset size has errors
  • Additional experimental flags for training tweaks

Full Changelog: v2.2.0...v2.2.1

v2.2.0

17 Jan 05:07
Compare
Choose a tag to compare

What's Changed

  • Dataset packing by @PicoCreator in #56
    • When combined with microbatches, it can drastically increase the tokens/s for a well tuned setup - this is similar to what was done for the axolot trainer.
  • V5 batching support by @PicoCreator in #46
  • Updating to latest state / saaaa / sbbbb / scccc / sdddd code impleme… by @PicoCreator in #47
  • fix: loss forward segment count moved to correct device by @m8than in #48
  • Fix config example by @m8than in #49
  • Conversational data feature by @PicoCreator in #55

Full Changelog: v2.1.0...v2.2.0

v2.1.0 - Batching support, minor logging behaviour and default changes

16 Nov 08:23
75b3ba5
Compare
Choose a tag to compare

What's Changed

  • Added batching support with new trainer config microbatch_size - use this to trade vram for substential tokens/sec increase (50%++)
    • This brings infctx trainer speed to be much closer to the main trainer, i should have done this earlier lol
  • Change real_ctx_len to data_ctx_len in wandb logging, to better reflect the above, as it will be now be an average of the microbatch
  • Changed default behaviour of multi_column, and fix #35
  • Improved documentation for #34
  • Microbatch support #25

Full Changelog: v2.0.1...v2.1.0

v2.0.1 - Breaking change, for v5.2 / V5-R4 specification support

15 Nov 01:54
1e287e9
Compare
Choose a tag to compare

WARNING: this version breaks only supports the latest v5.2 (aka v5-R4) models, and breaks all previous models. Moving forward we will only be supporting v5-R4

What's Changed

  • Fix segment_count not properly moved onto device by @TearGosling in #37
  • Full v5 r4 rewrite by @PicoCreator @harrisonvanderbyl in #44
  • Dropped the v5beta2 folder
  • bptt_truncate=true is enforced, until state gradients is supported by the cuda kernel
  • Technically the non cuda kernel works, but its so slow (~100x slower) that you really should not train without GPUs

New Contributors / Acknowledgements

Full Changelog: v1.1.1...v2.0.1

v1.1.1 - v5r3 model bug fix

08 Sep 00:34
Compare
Choose a tag to compare

Fixed an issue of missing output normalisation used in v5 model

Full Changelog: v1.1.0...v1.1.1

v1.1.0 - breaking change for v5 (treat it as beta3)

06 Sep 20:56
Compare
Choose a tag to compare

WARNING: this version breaks existing v5 models, use v5-beta2 for previous models. Until an official v5 1.5B or higher official model is released, we will not be treating v5 breaking changes as major version changes. The existing v5 code (r3) has not been throughly tested, and maybe subjected to future changes.

What's Changed

  • Upgrading v5 to be in sync with v5r3 from blinks official repo
  • Move existing v5 code to v5-beta2 folder (as i know some folks have already started experimenting with v5)
  • various readme / documentation / example changes
  • Added data offset and limit params (to document)
  • WIP docker container
  • Fix for older python/lightning version for multi-gpu sync

Additional changes that was merged in

  • limited dataloader num_worker max to 8 by @diannaojiang in #17
  • (optional) Added token 0 to the tokenizer. by @m8than in #18
  • Dataset Sorting + multi column suffix features by @m8than in #23

New Contributors

Full Changelog: v1.0.2...v1.1.0

v1.0.2 - Fixing world tokenizer vocab size

21 Aug 16:40
505622b
Compare
Choose a tag to compare

What's Changed

Full Changelog: v1.0.1...v1.0.2

v1.0.1 - Incremental bug fixes

21 Aug 16:22
fd6bbfb
Compare
Choose a tag to compare

Various incremental fix

  • Export with BF16 support
  • potential fix for 4 tokens with issues in world tokenizer
  • Updated requirements.txt with missing jsonargparse[signatures]
  • Fix an issue with python 3.10 when doing GPU sync (you should still use 3.11 though)

The following non stable features were added:

  • WIP: Docker Env Container
  • PROTOTYPE: loss_bias support

v1.0.0 official release of infctx trainer

16 Aug 14:57
Compare
Choose a tag to compare

This is a major release from the original infctx trainer with a huge list of features, from the original infctx trainer

Thanks for all those who helped test the trainer for bugs and issues, even when it was in a very rough early stages. While there are still some features that need to be added, or performance and docs that needs improving. For the vast majority of use cases, you should be able to get started with this new trainer for your finetuning (non LoRA) needs.

Special thanks to @Blealtan @BlinkDL @Yuzaboto @Bananaman