无法理解Tensorflow keras层(tf.keras.layers.Layer)中方法`build`的行为 [英] Unable to understand the behavior of method `build` in tensorflow keras layers (tf.keras.layers.Layer)
问题描述
tensorflow keras中的图层具有方法build
,该方法用于将权重创建推迟到您看到输入将是什么的时候. 图层的构建方法
Layers in tensorflow keras have a method build
that is used to defer the weights creation to a time when you have seen what the input is going to be. a layer's build method
我有几个问题无法找到答案:
I have a few questions i have not been able to find the answer of:
- 在这里据说
如果将Layer实例分配为另一个Layer的属性,则外层将开始跟踪内层的权重.
If you assign a Layer instance as attribute of another Layer, the outer layer will start tracking the weights of the inner layer.
跟踪图层的权重是什么意思?
What does it mean to track the weights of a layer?
- 同一链接还提到
- The same link also mentions that
我们建议在 init 方法中创建此类子层(由于子层通常具有构建方法,因此将在构建外层时构建它们).
We recommend creating such sublayers in the init method (since the sublayers will typically have a build method, they will be built when the outer layer gets built).
这是否意味着在运行子类(自身)的build
方法时,将遍历self
的所有属性,而从tf.keras.layer.Layer
的实例中发现的子类中的任何一个都将具有他们的build
方法自动运行吗?
Does it mean that while running the build
method of child class (self), there will an iteration through all the attributes of self
and whichever are found to be subclassed from (instances of) tf.keras.layer.Layer
will have their build
methods run automatically?
- 我可以运行以下代码:
class Net(tf.keras.Model):
"""A simple linear model."""
def __init__(self):
super(Net, self).__init__()
self.l1 = tf.keras.layers.Dense(5)
def call(self, x):
return self.l1(x)
net = Net()
print(net.variables)
但不是这样:
class Net(tf.keras.Model):
"""A simple linear model."""
def __init__(self):
super(Net, self).__init__()
self.l1 = tf.keras.layers.Dense(5)
def build(self,input_shape):
super().build()
def call(self, x):
return self.l1(x)
net = Net()
print(net.variables)
为什么?
推荐答案
例如,当您构建自定义的tf.keras.Model时,我会说 build 的意思是
I would say the build mentioned means, when you build a self-defined tf.keras.Model for example
net = Net()
然后将获得在__init__
中创建的所有tf.keras.layers.Layer
对象,并将其存储在作为可调用对象的net
中.在这种情况下,它将成为TF稍后训练的完整对象,这就是要跟踪的.下次致电net(inputs)
时,您将获得输出.
then you will get all the tf.keras.layers.Layer
objects create in __init__
, and being stored in net
which is a callable object. In this case, it will become a completed object for TF to train later, this is what it said to track. The next time you call net(inputs)
you'll can get your outputs.
以下是Tensorflow自定义解码器的一个示例
Here is a example of Tensorflow self-defined decoder with attention
class BahdanauAttention(tf.keras.layers.Layer):
def __init__(self, units):
super(BahdanauAttention, self).__init__()
self.W1 = tf.keras.layers.Dense(units)
self.W2 = tf.keras.layers.Dense(units)
self.V = tf.keras.layers.Dense(1)
def call(self, query, values):
# query hidden state shape == (batch_size, hidden size)
# query_with_time_axis shape == (batch_size, 1, hidden size)
# values shape == (batch_size, max_len, hidden size)
# we are doing this to broadcast addition along the time axis to calculate the score
query_with_time_axis = tf.expand_dims(query, 1)
# score shape == (batch_size, max_length, 1)
# we get 1 at the last axis because we are applying score to self.V
# the shape of the tensor before applying self.V is (batch_size, max_length, units)
score = self.V(tf.nn.tanh(
self.W1(query_with_time_axis) + self.W2(values)))
# attention_weights shape == (batch_size, max_length, 1)
attention_weights = tf.nn.softmax(score, axis=1)
# context_vector shape after sum == (batch_size, hidden_size)
context_vector = attention_weights * values
context_vector = tf.reduce_sum(context_vector, axis=1)
return context_vector, attention_weights
class Decoder(tf.keras.Model):
def __init__(self, vocab_size, embedding_dim, dec_units, batch_sz):
super(Decoder, self).__init__()
self.batch_sz = batch_sz
self.dec_units = dec_units
self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)
self.gru = tf.keras.layers.GRU(self.dec_units,
return_sequences=True,
return_state=True,
recurrent_initializer='glorot_uniform')
self.fc = tf.keras.layers.Dense(vocab_size)
# used for attention
self.attention = BahdanauAttention(self.dec_units)
def call(self, x, hidden, enc_output):
# enc_output shape == (batch_size, max_length, hidden_size)
context_vector, attention_weights = self.attention(hidden, enc_output)
# x shape after passing through embedding == (batch_size, 1, embedding_dim)
x = self.embedding(x)
# x shape after concatenation == (batch_size, 1, embedding_dim + hidden_size)
x = tf.concat([tf.expand_dims(context_vector, 1), x], axis=-1)
# passing the concatenated vector to the GRU
output, state = self.gru(x)
# output shape == (batch_size * 1, hidden_size)
output = tf.reshape(output, (-1, output.shape[2]))
# output shape == (batch_size, vocab)
x = self.fc(output)
return x, state, attention_weights
我试图将tf.keras.layers.Layer
对象放入call
并获得非常差的结果,这是因为如果将其放入call
,则它将被调用多次,而每次发生向前-向后传播
I have tried to put tf.keras.layers.Layer
object in call
and got really poor outcome, guess that was because if you put it in call
then it will be call multiple times while each time a forward-backward propagation happends.
这篇关于无法理解Tensorflow keras层(tf.keras.layers.Layer)中方法`build`的行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!