Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option --self_attn_type scaled-dot-flash is not supported (supported values are: scaled-dot) #1702

Open
randomicity opened this issue May 17, 2024 · 6 comments

Comments

@randomicity
Copy link

Hello,

I'm trying to convert a model with CTranslate2 4.3 that has been trained with OpenNMT-py 3.5.1 but I get this error:

Converting Saved_Data/Models/fr_en_step_195000.pt to ctranslate2 format...
Traceback (most recent call last):
File "/home/username/anaconda3/envs/neu/bin/ct2-opennmt-py-converter", line 8, in
sys.exit(main())
File "/home/username/anaconda3/envs/neu/lib/python3.10/site-packages/ctranslate2/converters/opennmt_py.py", line 355, in main
OpenNMTPyConverter(args.model_path).convert_from_args(args)
File "/home/username/anaconda3/envs/neu/lib/python3.10/site-packages/ctranslate2/converters/converter.py", line 50, in convert_from_args
return self.convert(
File "/home/username/anaconda3/envs/neu/lib/python3.10/site-packages/ctranslate2/converters/converter.py", line 89, in convert
model_spec = self._load()
File "/home/username/anaconda3/envs/neu/lib/python3.10/site-packages/ctranslate2/converters/opennmt_py.py", line 181, in _load
check_opt(checkpoint["opt"], num_source_embeddings=len(src_vocabs))
File "/home/username/anaconda3/envs/neu/lib/python3.10/site-packages/ctranslate2/converters/opennmt_py.py", line 55, in check_opt
check.validate()
File "/home/username/anaconda3/envs/neu/lib/python3.10/site-packages/ctranslate2/converters/utils.py", line 106, in validate
raise_unsupported(self._unsupported_reasons)
File "/home/username/anaconda3/envs/neu/lib/python3.10/site-packages/ctranslate2/converters/utils.py", line 93, in raise_unsupported
raise ValueError(message)
ValueError: The model you are trying to convert is not supported by CTranslate2. We identified the following reasons:

  • Option --self_attn_type scaled-dot-flash is not supported (supported values are: scaled-dot)

I trained the model using Flash Attention in OpenNMY-py 3.5.1:

self_attn_type: scaled-dot-flash

If I modify the opennmt_py.py converter to accept scaled-dot-flash by replacing scaled-dot with it I once again get this error:

Traceback (most recent call last):
File "/home/username/anaconda3/envs/neu/bin/ct2-opennmt-py-converter", line 8, in
sys.exit(main())
File "/home/username/anaconda3/envs/neu/lib/python3.10/site-packages/ctranslate2/converters/opennmt_py.py", line 355, in main
OpenNMTPyConverter(args.model_path).convert_from_args(args)
File "/home/username/anaconda3/envs/neu/lib/python3.10/site-packages/ctranslate2/converters/converter.py", line 50, in convert_from_args
return self.convert(
File "/home/username/anaconda3/envs/neu/lib/python3.10/site-packages/ctranslate2/converters/converter.py", line 89, in convert
model_spec = self._load()
File "/home/username/anaconda3/envs/neu/lib/python3.10/site-packages/ctranslate2/converters/opennmt_py.py", line 200, in _load
return _get_model_spec_seq2seq(
File "/home/username/anaconda3/envs/neu/lib/python3.10/site-packages/ctranslate2/converters/opennmt_py.py", line 90, in _get_model_spec_seq2seq
set_transformer_spec(model_spec, variables)
File "/home/username/anaconda3/envs/neu/lib/python3.10/site-packages/ctranslate2/converters/opennmt_py.py", line 210, in set_transformer_spec
set_transformer_encoder(spec.encoder, variables)
File "/home/username/anaconda3/envs/neu/lib/python3.10/site-packages/ctranslate2/converters/opennmt_py.py", line 215, in set_transformer_encoder
set_input_layers(spec, variables, "encoder")
File "/home/username/anaconda3/envs/neu/lib/python3.10/site-packages/ctranslate2/converters/opennmt_py.py", line 241, in set_input_layers
set_position_encodings(
File "/home/username/anaconda3/envs/neu/lib/python3.10/site-packages/ctranslate2/converters/opennmt_py.py", line 341, in set_position_encodings
spec.encodings = _get_variable(variables, "%s.pe" % scope).squeeze()
File "/home/username/anaconda3/envs/neu/lib/python3.10/site-packages/ctranslate2/converters/opennmt_py.py", line 345, in _get_variable
return variables[name]
KeyError: 'encoder.embeddings.make_embedding.pe.pe'

Probably because it can't handle RoPE, my settings are:

position_encoding: false
max_relative_positions: -1

The model trains and inferences without problems in OpenNMT-py.

@randomicity
Copy link
Author

Okay apparently these are already on pull requests, I will test those.

@randomicity
Copy link
Author

Okay I tried the modifications that are not yet committed and I got this error:

Traceback (most recent call last):
File "/home/username/anaconda3/envs/neu/bin/ct2-opennmt-py-converter", line 8, in
sys.exit(main())
File "/home/username/anaconda3/envs/neu/lib/python3.10/site-packages/ctranslate2/converters/opennmt_py.py", line 375, in main
OpenNMTPyConverter(args.model_path).convert_from_args(args)
File "/home/username/anaconda3/envs/neu/lib/python3.10/site-packages/ctranslate2/converters/converter.py", line 50, in convert_from_args
return self.convert(
File "/home/username/anaconda3/envs/neu/lib/python3.10/site-packages/ctranslate2/converters/converter.py", line 89, in convert
model_spec = self._load()
File "/home/username/anaconda3/envs/neu/lib/python3.10/site-packages/ctranslate2/converters/opennmt_py.py", line 220, in _load
return _get_model_spec_seq2seq(
File "/home/username/anaconda3/envs/neu/lib/python3.10/site-packages/ctranslate2/converters/opennmt_py.py", line 88, in _get_model_spec_seq2seq
model_spec = transformer_spec.TransformerSpec.from_config(
File "/home/username/anaconda3/envs/neu/lib/python3.10/site-packages/ctranslate2/specs/transformer_spec.py", line 480, in from_config
encoder = TransformerEncoderSpec(
File "/home/username/anaconda3/envs/neu/lib/python3.10/site-packages/ctranslate2/specs/model_spec.py", line 86, in call
instance = super().call(*args, **kwargs)
File "/home/username/anaconda3/envs/neu/lib/python3.10/site-packages/ctranslate2/specs/transformer_spec.py", line 79, in init
raise ValueError(
ValueError: Enabling multi_query_attention implies num_heads_kv=1

The model has num_kv: 8 and heads: 16 which I understand is the setup for GQA.

@vince62s
Copy link
Member

post your yaml file (of onmt-py training)

@randomicity
Copy link
Author

[2024-05-17 15:57:29,656 INFO] Loading checkpoint from Saved_Data/Models/fr_en_step_175000.pt
[2024-05-17 15:57:31,659 INFO] Building model...
[2024-05-17 15:57:31,930 INFO] Switching model to float32 for amp/apex_amp
[2024-05-17 15:57:31,930 INFO] Non quantized layer compute is fp16
[2024-05-17 15:57:32,169 INFO] NMTModel(
  (encoder): TransformerEncoder(
    (embeddings): Embeddings(
      (make_embedding): Sequential(
        (emb_luts): Elementwise(
          (0): Embedding(32000, 1024, padding_idx=1)
        )
      )
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (transformer): ModuleList(
      (0-5): 6 x TransformerEncoderLayer(
        (self_attn): MultiHeadedAttention(
          (linear_keys): Linear(in_features=1024, out_features=512, bias=False)
          (linear_values): Linear(in_features=1024, out_features=512, bias=False)
          (linear_query): Linear(in_features=1024, out_features=1024, bias=False)
          (softmax): Softmax(dim=-1)
          (dropout): Dropout(p=0.1, inplace=False)
          (final_linear): Linear(in_features=1024, out_features=1024, bias=False)
        )
        (feed_forward): PositionwiseFeedForward(
          (w_1): Linear(in_features=1024, out_features=4096, bias=False)
          (w_2): Linear(in_features=4096, out_features=1024, bias=False)
          (dropout_1): Dropout(p=0.1, inplace=False)
          (dropout_2): Dropout(p=0.1, inplace=False)
        )
        (layer_norm): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)
        (dropout): Dropout(p=0.1, inplace=False)
      )
    )
    (layer_norm): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)
  )
  (decoder): TransformerDecoder(
    (embeddings): Embeddings(
      (make_embedding): Sequential(
        (emb_luts): Elementwise(
          (0): Embedding(32000, 1024, padding_idx=1)
        )
      )
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (layer_norm): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)
    (transformer_layers): ModuleList(
      (0-5): 6 x TransformerDecoderLayer(
        (self_attn): MultiHeadedAttention(
          (linear_keys): Linear(in_features=1024, out_features=512, bias=False)
          (linear_values): Linear(in_features=1024, out_features=512, bias=False)
          (linear_query): Linear(in_features=1024, out_features=1024, bias=False)
          (softmax): Softmax(dim=-1)
          (dropout): Dropout(p=0.1, inplace=False)
          (final_linear): Linear(in_features=1024, out_features=1024, bias=False)
        )
        (feed_forward): PositionwiseFeedForward(
          (w_1): Linear(in_features=1024, out_features=4096, bias=False)
          (w_2): Linear(in_features=4096, out_features=1024, bias=False)
          (dropout_1): Dropout(p=0.1, inplace=False)
          (dropout_2): Dropout(p=0.1, inplace=False)
        )
        (layer_norm_1): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)
        (dropout): Dropout(p=0.1, inplace=False)
        (context_attn): MultiHeadedAttention(
          (linear_keys): Linear(in_features=1024, out_features=512, bias=False)
          (linear_values): Linear(in_features=1024, out_features=512, bias=False)
          (linear_query): Linear(in_features=1024, out_features=1024, bias=False)
          (softmax): Softmax(dim=-1)
          (dropout): Dropout(p=0.1, inplace=False)
          (final_linear): Linear(in_features=1024, out_features=1024, bias=False)
        )
        (layer_norm_2): LayerNorm((1024,), eps=1e-06, elementwise_affine=True)
      )
    )
  )
  (generator): Linear(in_features=1024, out_features=32000, bias=True)
)
[2024-05-17 15:57:32,171 INFO] encoder: 101988352
[2024-05-17 15:57:32,171 INFO] decoder: 153675008
[2024-05-17 15:57:32,171 INFO] * number of parameters: 255663360
[2024-05-17 15:57:32,171 INFO] Trainable parameters = {'torch.float32': 255663360, 'torch.float16': 0, 'torch.uint8': 0, 'torch.int8': 0}
[2024-05-17 15:57:32,171 INFO] Non trainable parameters = {'torch.float32': 0, 'torch.float16': 0, 'torch.uint8': 0, 'torch.int8': 0}
[2024-05-17 15:57:32,171 INFO]  * src vocab size = 32000
[2024-05-17 15:57:32,171 INFO]  * tgt vocab size = 32000
[2024-05-17 15:57:32,829 INFO] Starting training on GPU: [0]
[2024-05-17 15:57:32,830 INFO] Start training loop and validate every 10000 steps...
[2024-05-17 15:57:32,830 INFO] Scoring with: ['sentencepiece', 'filtertoolong']

# Optimization
model_dtype: "fp16"
optim: "adam"
learning_rate: 2.0
warmup_steps: 50000
decay_method: "noam"
adam_beta2: 0.998
max_grad_norm: 5
label_smoothing: 0.1
param_init: 0
param_init_glorot: true
normalization: "tokens"

# Model
encoder_type: transformer
decoder_type: transformer
self_attn_type: scaled-dot-flash
position_encoding: false
parallel_residual: true
shared_layer_norm: true
multiquery: true
num_kv: 8
max_relative_positions: -1
pos_ffn_activation_fn: "relu"
enc_layers: 6
dec_layers: 6
heads: 16
hidden_size: 1024
word_vec_size: 1024
transformer_ff: 4096
dropout_steps: [0]
dropout: [0.1]
attention_dropout: [0.1]

@vince62s
Copy link
Member

Yeah it's unclear between multiquery and num_kv. you need to force multiquery to false in your checkpoint.
you load it manually and change i and save your checkpoint
multiquery=True should be only for num_kv=1 but I know it's unclear in the docs/examples.

@randomicity
Copy link
Author

I know it's hard to document everything, thank you for your immense work Vincent. On the OpenNMT-py side setting num_kv to half of the number of heads seems to be working okay, I tried some short runs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants