add rope alibi to encoder #1687

vince62s · 2024-04-29T07:47:31Z

No description provided.

LynxPDA · 2024-05-22T17:21:32Z

@vince62s I updated Ctranslate and the conversion went without errors.
The only thing is that I saved the changes by adding the line "gated-gelu" to _SUPPORTED_ACTIVATIONS:

_SUPPORTED_ACTIVATIONS = {
    "gelu": common_spec.Activation.GELU,
    "fast_gelu": common_spec.Activation.GELUTanh,
    "relu": common_spec.Activation.RELU,
    "silu": common_spec.Activation.SWISH,
    "gated-gelu": common_spec.Activation.GELU,
}

without this it still gave the error:

- Option --pos_ffn_activation_fn gated-gelu is not supported (supported activations are: gelu, fast_gelu, relu, silu)

vince62s · 2024-05-22T19:42:02Z

yes you're correct. Were you able to process inference in ct2 without any issue?

I am not merging yet because for AliBi it requires additional changes in C

LynxPDA · 2024-05-23T07:12:01Z

So far tried inference only with gated-gelu activation on Ctranslate2 v3.20.0, no issues. I plan to try output with ROPE in the next month after training the corresponding model.

lecoqnicolas

Hello,
Would like to insert a line between lines 8 and 9 to support gated-gelu activation
"gated-gelu": common_spec.Activation.GELU,
Also, gated-gelu does not feature in the "transformers.py" script. Might want to add it after line 30, and modify line 1308 to include gated-gelu.

lecoqnicolas · 2024-06-06T15:18:14Z

Hello, Would like to insert a line between lines 8 and 9 to support gated-gelu activation "gated-gelu": common_spec.Activation.GELU, Also, gated-gelu does not feature in the "transformers.py" script. Might want to add it after line 30, and modify line 1308 to include gated-gelu.

Forget the transformers.py script : GEMMA transformers implement GeGLU, but with a GELUTanh approximation (just read an article about it) so no need to update.

LynxPDA · 2024-06-09T17:27:09Z

Were you able to process inference in ct2 without any issue?

Yes, I confirm. Successfully trained a model with gated-GELU and RoPE and inference it with Libretranslate (Ctranslate2 v3.20.0)

lecoqnicolas · 2024-06-11T11:57:45Z

I tried updating CT2 to 4.2.1 and pulling a training. It breaks upon validation step with the errors below. At first, I thought it was the "None" values set in the fix dated March12, so I downgraded onmt to 3.5.0 and further on down to the original pinned version (3.4.1). But best case scenatio, I get the same errors (worst, training aborts at step 1 or doesn't start at all).

Tried pretty much any version of CT4.x and onmt3.5x with compatible torch/cuda. I also tried different data to check this out, and different population method. Do you have any idea?

From the error, I think this is related to a "filtertoolong' transform that is systematically inserted in the data, but I am not sure.

[2024-06-11 09:42:02,000 INFO] Start training loop and validate every 100 steps...
[2024-06-11 09:42:02,015 INFO] Scoring with: ['sentencepiece', 'filtertoolong', 'prefix']
[2024-06-11 09:48:24,332 INFO] Step 50/32000; acc: 0.4; ppl: 31126.4; xent: 10.3; lr: 0.00000; sents:  295284; bsz: 6628/7296/236; 21670/23854 tok/s;    382 sec;
[2024-06-11 09:53:54,287 INFO] Step 100/32000; acc: 4.7; ppl: 25540.5; xent: 10.1; lr: 0.00000; sents:  292443; bsz: 6604/7244/234; 25018/27441 tok/s;    712 sec;
[2024-06-11 09:54:36,382 INFO] valid stats calculation
                           took: 42.094335317611694 s.
Traceback (most recent call last):
  File "C:\Program Files\Python39\lib\runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Program Files\Python39\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Program Files\Python39\Scripts\onmt_train.exe\__main__.py", line 7, in <module>
  File "C:\Program Files\Python39\lib\site-packages\onmt\bin\train.py", line 67, in main
    train(opt)
  File "C:\Program Files\Python39\lib\site-packages\onmt\bin\train.py", line 52, in train
    train_process(opt, device_id=0)
  File "C:\Program Files\Python39\lib\site-packages\onmt\train_single.py", line 238, in main
    trainer.train(
  File "C:\Program Files\Python39\lib\site-packages\onmt\trainer.py", line 332, in train
    valid_stats = self.validate(
  File "C:\Program Files\Python39\lib\site-packages\onmt\trainer.py", line 420, in validate
    preds, texts_ref = self.scoring_preparator.translate(
  File "C:\Program Files\Python39\lib\site-packages\onmt\utils\scoring_utils.py", line 111, in translate
    _, preds = translator._translate(
  File "C:\Program Files\Python39\lib\site-packages\onmt\translate\translator.py", line 494, in _translate
    for batch, bucket_idx in infer_iter:
  File "C:\Program Files\Python39\lib\site-packages\onmt\inputters\dynamic_iterator.py", line 341, in __iter__
    for bucket, bucket_idx in self._bucketing():
  File "C:\Program Files\Python39\lib\site-packages\onmt\inputters\dynamic_iterator.py", line 286, in _bucketing
    yield (self._tuple_to_json_with_tokIDs(bucket), self.bucket_idx)
  File "C:\Program Files\Python39\lib\site-packages\onmt\inputters\dynamic_iterator.py", line 247, in _tuple_to_json_with_tokIDs
    tuple_bucket = process(self.task, tuple_bucket)
  File "C:\Program Files\Python39\lib\site-packages\onmt\inputters\text_utils.py", line 95, in process
    transf_bucket = transform.batch_apply(
  File "C:\Program Files\Python39\lib\site-packages\onmt\transforms\transform.py", line 232, in batch_apply
    batch = transform.batch_apply(
  File "C:\Program Files\Python39\lib\site-packages\onmt\transforms\transform.py", line 70, in batch_apply
    example = self.apply(example, is_train=is_train, **kwargs)
  File "C:\Program Files\Python39\lib\site-packages\onmt\transforms\misc.py", line 56, in apply
    or len(example["tgt"]) > self.tgt_seq_length - 2
TypeError: object of type 'NoneType' has no len()
Total checkpoints: 0

lecoqnicolas · 2024-06-11T15:04:18Z

Tried pretty much any version of CT4.x and onmt3.5x with compatible torch/cuda. Also tried different data to check this out, and different population method. Do you have any idea?

Well, what I did not try was to install two seemingly incompatible versions of CT2 (4.2.1) and ONMT-py (3.4.3). @LynxPDA had me update this way, and now it does work (at least with absolute PE, and probably RPE).

Although.... when I tried RoPE, upon converting the checkpoint to a CT2 model, I've got an "unexpected argument" error, explicit enough for me to comment lines 343 & 344 of the transformer_spec.py script and make it through with a seemingly working model (which was a toy config though, I still have to make a full training).

The point now, I do not know if these specs really work and even if RoPE is sufficiently implemented in OpenNMT-py 3.4.3 to perform as intended.

So, we'll have to work on a fix to use onmt-py 3.5.x I guess. Does the abovementioned bug come from the applied transforms? There are also empty "tgt_prefix" prefixes on top of the filtertoolong, could it be the issue? I attach the corresponding config file, since you have developed a lot of code for opennmt-py as well, you surely know better
config.txt

vince62s added 2 commits April 29, 2024 09:44

add rope alibito encoder

c54f8c9

black

4affe85

vince62s mentioned this pull request Apr 29, 2024

Fix 'ValueError' in 'ct2-opennmt-py-converter' for Unsupported '--self_attn_type' #1647

Closed

fix test

5159595

vince62s mentioned this pull request Apr 29, 2024

Error when converting NMT model with ALiBi or RoPe #1657

Open

lecoqnicolas reviewed Jun 5, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add rope alibi to encoder #1687

add rope alibi to encoder #1687

vince62s commented Apr 29, 2024

LynxPDA commented May 22, 2024

vince62s commented May 22, 2024 •

edited

LynxPDA commented May 23, 2024

lecoqnicolas left a comment

lecoqnicolas commented Jun 6, 2024

LynxPDA commented Jun 9, 2024 •

edited

lecoqnicolas commented Jun 11, 2024 •

edited

lecoqnicolas commented Jun 11, 2024 •

edited

add rope alibi to encoder #1687

Are you sure you want to change the base?

add rope alibi to encoder #1687

Conversation

vince62s commented Apr 29, 2024

LynxPDA commented May 22, 2024

vince62s commented May 22, 2024 • edited

LynxPDA commented May 23, 2024

lecoqnicolas left a comment

Choose a reason for hiding this comment

lecoqnicolas commented Jun 6, 2024

LynxPDA commented Jun 9, 2024 • edited

lecoqnicolas commented Jun 11, 2024 • edited

lecoqnicolas commented Jun 11, 2024 • edited

vince62s commented May 22, 2024 •

edited

LynxPDA commented Jun 9, 2024 •

edited

lecoqnicolas commented Jun 11, 2024 •

edited

lecoqnicolas commented Jun 11, 2024 •

edited