无法保存变压器模型 [英] Transformer model not able to be saved

查看:170
本文介绍了无法保存变压器模型的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试遵循以下惯例: NotImplementedError:在__init__中带有参数的图层必须覆盖get_config 我从答案中了解到,我需要将编码器和解码器作为类进行自定义(而不是像colab tutrial那样保留其功能),因此我在这里返回此模型的张量流文档:

I'm trying to follow this tutrial https://colab.research.google.com/github/tensorflow/examples/blob/master/community/en/transformer_chatbot.ipynb, However, when I tried to save the model in order to load it again without training I got an error mentioned here NotImplementedError: Layers with arguments in `__init__` must override `get_config` I understood from the answer that I need to make the encoder and decoder as classes and customise it(instead of leaving it as functions like the colab tutrial) so I went back to tensor flow documentation of this model here: https://www.tensorflow.org/tutorials/text/transformer#encoder_layer and tried to edit in it. I made the encoder layer as:

class EncoderLayer(tf.keras.layers.Layer):
  def __init__(self, d_model, num_heads,  rate=0.1,**kwargs,):
    #super(EncoderLayer, self).__init__()
    super().__init__(**kwargs)
    self.mha = MultiHeadAttention(d_model, num_heads)
    self.ffn = point_wise_feed_forward_network(d_model, dff)

    self.layernorm1 = tf.keras.layers.LayerNormalization(epsilon=1e-6)
    self.layernorm2 = tf.keras.layers.LayerNormalization(epsilon=1e-6)

    self.dropout1 = tf.keras.layers.Dropout(rate)
    self.dropout2 = tf.keras.layers.Dropout(rate)
  def get_config(self):

        config = super().get_config().copy()
        config.update({
            #'vocab_size': self.vocab_size,
            #'num_layers': self.num_layers,
            #'units': self.units,
            'd_model': self.d_model,
            'num_heads': self.num_heads,
            'dropout': self.dropout,
        })
        return config

  def call(self, x, training, mask):

    attn_output, _ = self.mha(x, x, x, mask)  # (batch_size, input_seq_len, d_model)
    attn_output = self.dropout1(attn_output, training=training)
    out1 = self.layernorm1(x + attn_output)  # (batch_size, input_seq_len, d_model)

    ffn_output = self.ffn(out1)  # (batch_size, input_seq_len, d_model)
    ffn_output = self.dropout2(ffn_output, training=training)
    out2 = self.layernorm2(out1 + ffn_output)  # (batch_size, input_seq_len, d_model)

    return out2

,与解码器层类相同.然后在tf文档中使用相同的编码器

and same for the decoder layer class. Then the same encoder in the documentation of tf

class Encoder(tf.keras.layers.Layer):
  def __init__(self, num_layers, d_model, num_heads, dff, input_vocab_size,
               maximum_position_encoding, rate=0.1):
    super(Encoder, self).__init__()

    self.d_model = d_model
    self.num_layers = num_layers

    self.embedding = tf.keras.layers.Embedding(input_vocab_size, d_model)
    self.pos_encoding = positional_encoding(maximum_position_encoding, 
                                            self.d_model)


    self.enc_layers = [EncoderLayer(d_model, num_heads, dff, rate) 
                       for _ in range(num_layers)]

    self.dropout = tf.keras.layers.Dropout(rate)

  def call(self, x, training, mask):

    seq_len = tf.shape(x)[1]

    # adding embedding and position encoding.
    x = self.embedding(x)  # (batch_size, input_seq_len, d_model)
    x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
    x += self.pos_encoding[:, :seq_len, :]

    x = self.dropout(x, training=training)

    for i in range(self.num_layers):
      x = self.enc_layers[i](x, training, mask)

    return x  # (batch_size, input_seq_len, d_model)

模型的功能为:

def transformer(vocab_size,
                num_layers,
                units,
                d_model,
                num_heads,
                dropout,
                name="transformer"):
  inputs = tf.keras.Input(shape=(None,), name="inputs")
  dec_inputs = tf.keras.Input(shape=(None,), name="dec_inputs")

  enc_padding_mask = tf.keras.layers.Lambda(
      create_padding_mask, output_shape=(1, 1, None),
      name='enc_padding_mask')(inputs)
  # mask the future tokens for decoder inputs at the 1st attention block
  look_ahead_mask = tf.keras.layers.Lambda(
      create_look_ahead_mask,
      output_shape=(1, None, None),
      name='look_ahead_mask')(dec_inputs)
  # mask the encoder outputs for the 2nd attention block
  dec_padding_mask = tf.keras.layers.Lambda(
      create_padding_mask, output_shape=(1, 1, None),
      name='dec_padding_mask')(inputs)

  enc_outputs = Encoder(
      num_layers=num_layers, d_model=d_model, num_heads=num_heads, 
                         input_vocab_size=vocab_size,


  )(inputs=[inputs, enc_padding_mask])

  dec_outputs = Decoder(
      num_layers=num_layers, d_model=d_model, num_heads=num_heads, 
                          target_vocab_size=vocab_size,


  )(inputs=[dec_inputs, enc_outputs, look_ahead_mask, dec_padding_mask])

  outputs = tf.keras.layers.Dense(units=vocab_size, name="outputs")(dec_outputs)

  return tf.keras.Model(inputs=[inputs, dec_inputs], outputs=outputs, name=name)

并调用模型:

#the model itself with its paramters:
# Hyper-parameters
NUM_LAYERS = 3
D_MODEL = 256
#D_MODEL=tf.cast(D_MODEL, tf.float32)

NUM_HEADS = 8
UNITS = 512
DROPOUT = 0.1
model = transformer(
    vocab_size=VOCAB_SIZE,
    num_layers=NUM_LAYERS,
    units=UNITS,
    d_model=D_MODEL,
    num_heads=NUM_HEADS,
    dropout=DROPOUT)

但是,我得到了这个错误: TypeError: __init__() missing 2 required positional arguments: 'dff' and 'maximum_position_encoding' 我真的很困惑,我不理解文档中的dff和最大位置编码是什么意思,当我从编码器和解码器类中删除它们时,出现了另一个错误,因为positional_encoding函数将最大位置作为输入,并且dff传递为在班级内部输入.我不确定应该做什么,因为我不确定我是否遵循正确的步骤

However, I got that error: TypeError: __init__() missing 2 required positional arguments: 'dff' and 'maximum_position_encoding' I am really confused and I don't understand what dff and maximum position encoding mean in the documentation and when I removed them from the encoder and decoder classes, I got anther error as positional_encoding function takes maximum position as input and also dff is passed as input inside the class. I am not so sure what I should do as I am not sure whether I am following the right steps or not

推荐答案

如果调用transformer时收到此错误,则说明问题是创建模型,而不是保存模型.

If you get this error while calling transformer then your problem is with creating the model, not saving it.

除此之外,我发现您的get_config有几个问题:

Other than that, I see several issues with your get_config:

  1. 您定义了dropout而不是rate.
  2. __init__上未定义或分配您要处理的属性(self.d_model等).
  3. 您的Encoder类不存在.
  1. You defined dropout instead of rate.
  2. The attributes you address (self.d_model etc.) are not defined or assigned at __init__.
  3. It doesn't exist for your Encoder class.

这篇关于无法保存变压器模型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆