Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add rope alibi to encoder #1687

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Conversation

vince62s
Copy link
Member

No description provided.

@LynxPDA
Copy link

LynxPDA commented May 22, 2024

@vince62s I updated Ctranslate and the conversion went without errors.
The only thing is that I saved the changes by adding the line "gated-gelu" to _SUPPORTED_ACTIVATIONS:

_SUPPORTED_ACTIVATIONS = {
    "gelu": common_spec.Activation.GELU,
    "fast_gelu": common_spec.Activation.GELUTanh,
    "relu": common_spec.Activation.RELU,
    "silu": common_spec.Activation.SWISH,
    "gated-gelu": common_spec.Activation.GELU,
}

without this it still gave the error:

- Option --pos_ffn_activation_fn gated-gelu is not supported (supported activations are: gelu, fast_gelu, relu, silu)

@vince62s
Copy link
Member Author

vince62s commented May 22, 2024

yes you're correct. Were you able to process inference in ct2 without any issue?

I am not merging yet because for AliBi it requires additional changes in C

@LynxPDA
Copy link

LynxPDA commented May 23, 2024

So far tried inference only with gated-gelu activation on Ctranslate2 v3.20.0, no issues. I plan to try output with ROPE in the next month after training the corresponding model.

Copy link

@lecoqnicolas lecoqnicolas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello,
Would like to insert a line between lines 8 and 9 to support gated-gelu activation
"gated-gelu": common_spec.Activation.GELU,
Also, gated-gelu does not feature in the "transformers.py" script. Might want to add it after line 30, and modify line 1308 to include gated-gelu.

@lecoqnicolas
Copy link

Hello, Would like to insert a line between lines 8 and 9 to support gated-gelu activation "gated-gelu": common_spec.Activation.GELU, Also, gated-gelu does not feature in the "transformers.py" script. Might want to add it after line 30, and modify line 1308 to include gated-gelu.

Forget the transformers.py script : GEMMA transformers implement GeGLU, but with a GELUTanh approximation (just read an article about it) so no need to update.

@LynxPDA
Copy link

LynxPDA commented Jun 9, 2024

Were you able to process inference in ct2 without any issue?

Yes, I confirm. Successfully trained a model with gated-GELU and RoPE and inference it with Libretranslate (Ctranslate2 v3.20.0)

@lecoqnicolas
Copy link

lecoqnicolas commented Jun 11, 2024

I tried updating CT2 to 4.2.1 and pulling a training. It breaks upon validation step with the errors below. At first, I thought it was the "None" values set in the fix dated March12, so I downgraded onmt to 3.5.0 and further on down to the original pinned version (3.4.1). But best case scenatio, I get the same errors (worst, training aborts at step 1 or doesn't start at all).

Tried pretty much any version of CT4.x and onmt3.5x with compatible torch/cuda. I also tried different data to check this out, and different population method. Do you have any idea?

From the error, I think this is related to a "filtertoolong' transform that is systematically inserted in the data, but I am not sure.

[2024-06-11 09:42:02,000 INFO] Start training loop and validate every 100 steps...
[2024-06-11 09:42:02,015 INFO] Scoring with: ['sentencepiece', 'filtertoolong', 'prefix']
[2024-06-11 09:48:24,332 INFO] Step 50/32000; acc: 0.4; ppl: 31126.4; xent: 10.3; lr: 0.00000; sents:  295284; bsz: 6628/7296/236; 21670/23854 tok/s;    382 sec;
[2024-06-11 09:53:54,287 INFO] Step 100/32000; acc: 4.7; ppl: 25540.5; xent: 10.1; lr: 0.00000; sents:  292443; bsz: 6604/7244/234; 25018/27441 tok/s;    712 sec;
[2024-06-11 09:54:36,382 INFO] valid stats calculation
                           took: 42.094335317611694 s.
Traceback (most recent call last):
  File "C:\Program Files\Python39\lib\runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Program Files\Python39\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Program Files\Python39\Scripts\onmt_train.exe\__main__.py", line 7, in <module>
  File "C:\Program Files\Python39\lib\site-packages\onmt\bin\train.py", line 67, in main
    train(opt)
  File "C:\Program Files\Python39\lib\site-packages\onmt\bin\train.py", line 52, in train
    train_process(opt, device_id=0)
  File "C:\Program Files\Python39\lib\site-packages\onmt\train_single.py", line 238, in main
    trainer.train(
  File "C:\Program Files\Python39\lib\site-packages\onmt\trainer.py", line 332, in train
    valid_stats = self.validate(
  File "C:\Program Files\Python39\lib\site-packages\onmt\trainer.py", line 420, in validate
    preds, texts_ref = self.scoring_preparator.translate(
  File "C:\Program Files\Python39\lib\site-packages\onmt\utils\scoring_utils.py", line 111, in translate
    _, preds = translator._translate(
  File "C:\Program Files\Python39\lib\site-packages\onmt\translate\translator.py", line 494, in _translate
    for batch, bucket_idx in infer_iter:
  File "C:\Program Files\Python39\lib\site-packages\onmt\inputters\dynamic_iterator.py", line 341, in __iter__
    for bucket, bucket_idx in self._bucketing():
  File "C:\Program Files\Python39\lib\site-packages\onmt\inputters\dynamic_iterator.py", line 286, in _bucketing
    yield (self._tuple_to_json_with_tokIDs(bucket), self.bucket_idx)
  File "C:\Program Files\Python39\lib\site-packages\onmt\inputters\dynamic_iterator.py", line 247, in _tuple_to_json_with_tokIDs
    tuple_bucket = process(self.task, tuple_bucket)
  File "C:\Program Files\Python39\lib\site-packages\onmt\inputters\text_utils.py", line 95, in process
    transf_bucket = transform.batch_apply(
  File "C:\Program Files\Python39\lib\site-packages\onmt\transforms\transform.py", line 232, in batch_apply
    batch = transform.batch_apply(
  File "C:\Program Files\Python39\lib\site-packages\onmt\transforms\transform.py", line 70, in batch_apply
    example = self.apply(example, is_train=is_train, **kwargs)
  File "C:\Program Files\Python39\lib\site-packages\onmt\transforms\misc.py", line 56, in apply
    or len(example["tgt"]) > self.tgt_seq_length - 2
TypeError: object of type 'NoneType' has no len()
Total checkpoints: 0

@lecoqnicolas
Copy link

lecoqnicolas commented Jun 11, 2024

Tried pretty much any version of CT4.x and onmt3.5x with compatible torch/cuda. Also tried different data to check this out, and different population method. Do you have any idea?

Well, what I did not try was to install two seemingly incompatible versions of CT2 (4.2.1) and ONMT-py (3.4.3). @LynxPDA had me update this way, and now it does work (at least with absolute PE, and probably RPE).

Although.... when I tried RoPE, upon converting the checkpoint to a CT2 model, I've got an "unexpected argument" error, explicit enough for me to comment lines 343 & 344 of the transformer_spec.py script and make it through with a seemingly working model (which was a toy config though, I still have to make a full training).
image

The point now, I do not know if these specs really work and even if RoPE is sufficiently implemented in OpenNMT-py 3.4.3 to perform as intended.

So, we'll have to work on a fix to use onmt-py 3.5.x I guess. Does the abovementioned bug come from the applied transforms? There are also empty "tgt_prefix" prefixes on top of the filtertoolong, could it be the issue? I attach the corresponding config file, since you have developed a lot of code for opennmt-py as well, you surely know better
config.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants