Keras 嵌入层中的 mask_zero 如何工作? [英] How does mask_zero in Keras Embedding layer work?

查看:88
本文介绍了Keras 嵌入层中的 mask_zero 如何工作?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我认为当输入值为 0 时 mask_zero=True 会输出 0,所以下面的层可以跳过计算或其他东西.

I thought mask_zero=True will output 0's when the input value is 0, so the following layers could skip computation or something.

mask_zero 如何工作?

示例:

data_in = np.array([
  [1, 2, 0, 0]
])
data_in.shape
>>> (1, 4)

# model
x = Input(shape=(4,))
e = Embedding(5, 5, mask_zero=True)(x)

m = Model(inputs=x, outputs=e)
p = m.predict(data_in)
print(p.shape)
print(p)

实际输出为:(数字是随机的)

The actual output is: (the numbers are random)

(1, 4, 5)
[[[ 0.02499047  0.04617121  0.01586803  0.0338897   0.009652  ]
  [ 0.04782704 -0.04035913 -0.0341589   0.03020919 -0.01157228]
  [ 0.00451764 -0.01433611  0.02606953  0.00328832  0.02650392]
  [ 0.00451764 -0.01433611  0.02606953  0.00328832  0.02650392]]]

但是,我认为输出将是:

However, I thought the output will be:

[[[ 0.02499047  0.04617121  0.01586803  0.0338897   0.009652  ]
  [ 0.04782704 -0.04035913 -0.0341589   0.03020919 -0.01157228]
  [ 0 0 0 0 0]
  [ 0 0 0 0 0]]]

推荐答案

实际上,为 Embedding 层设置 mask_zero=True 不会导致返回零向量.相反,嵌入层的行为不会改变,它会返回索引为零的嵌入向量.您可以通过检查嵌入层权重来确认这一点(即在您提到的示例中,它是 m.layers[0].get_weights()).相反,它会影响后续层(例如 RNN 层)的行为.

Actually, setting mask_zero=True for the Embedding layer does not result in returning a zero vector. Rather, the behavior of the Embedding layer would not change and it would return the embedding vector with index zero. You can confirm this by checking the Embedding layer weights (i.e. in the example you mentioned it would be m.layers[0].get_weights()). Instead, it would affect the behavior of the following layers such as RNN layers.

如果您检查嵌入层的源代码,您会看到一个名为 compute_mask:

If you inspect the source code of Embedding layer you would see a method called compute_mask:

def compute_mask(self, inputs, mask=None):
    if not self.mask_zero:
        return None
    output_mask = K.not_equal(inputs, 0)
    return output_mask

此输出掩码将作为 mask 参数传递给以下支持掩码的层.这已在 __call__<中实现/code>基础层的方法,Layer:

This output mask will be passed, as the mask argument, to the following layers which support masking. This has been implemented in the __call__ method of base layer, Layer:

# Handle mask propagation.
previous_mask = _collect_previous_mask(inputs)
user_kwargs = copy.copy(kwargs)
if not is_all_none(previous_mask):
    # The previous layer generated a mask.
    if has_arg(self.call, 'mask'):
        if 'mask' not in kwargs:
            # If mask is explicitly passed to __call__,
            # we should override the default mask.
            kwargs['mask'] = previous_mask

这使得下面的层忽略(即在他们的计算中不考虑)这个输入步骤.这是一个最小的例子:

And this makes the following layers to ignore (i.e. does not consider in their computations) this inputs steps. Here is a minimal example:

data_in = np.array([
  [1, 0, 2, 0]
])

x = Input(shape=(4,))
e = Embedding(5, 5, mask_zero=True)(x)
rnn = LSTM(3, return_sequences=True)(e)

m = Model(inputs=x, outputs=rnn)
m.predict(data_in)

array([[[-0.00084503, -0.00413611,  0.00049972],
        [-0.00084503, -0.00413611,  0.00049972],
        [-0.00144554, -0.00115775, -0.00293898],
        [-0.00144554, -0.00115775, -0.00293898]]], dtype=float32)

如您所见,LSTM 层第二和第四个时间步的输出分别与第一和第三个时间步的输出相同.这意味着这些时间步长已被屏蔽.

As you can see the outputs of the LSTM layer for the second and forth timesteps are the same as the output of first and third timesteps, respectively. This means that those timesteps have been masked.

更新:在计算损失时也会考虑掩码,因为损失函数在内部进行了增强以支持使用 weighted_masked_objective:

Update: The mask will also be considered when computing the loss since the loss functions are internally augmented to support masking using weighted_masked_objective:

def weighted_masked_objective(fn):
    """Adds support for masking and sample-weighting to an objective function.
    It transforms an objective function `fn(y_true, y_pred)`
    into a sample-weighted, cost-masked objective function
    `fn(y_true, y_pred, weights, mask)`.
    # Arguments
        fn: The objective function to wrap,
            with signature `fn(y_true, y_pred)`.
    # Returns
        A function with signature `fn(y_true, y_pred, weights, mask)`.
    """

编译模型时:

weighted_losses = [weighted_masked_objective(fn) for fn in loss_functions]

您可以使用以下示例验证这一点:

You can verify this using the following example:

data_in = np.array([[1, 2, 0, 0]])
data_out = np.arange(12).reshape(1,4,3)

x = Input(shape=(4,))
e = Embedding(5, 5, mask_zero=True)(x)
d = Dense(3)(e)

m = Model(inputs=x, outputs=d)
m.compile(loss='mse', optimizer='adam')
preds = m.predict(data_in)
loss = m.evaluate(data_in, data_out, verbose=0)
print(preds)
print('Computed Loss:', loss)

[[[ 0.009682    0.02505393 -0.00632722]
  [ 0.01756451  0.05928303  0.0153951 ]
  [-0.00146054 -0.02064196 -0.04356086]
  [-0.00146054 -0.02064196 -0.04356086]]]
Computed Loss: 9.041069030761719

# verify that only the first two outputs 
# have been considered in the computation of loss
print(np.square(preds[0,0:2] - data_out[0,0:2]).mean())

9.041070036475277

这篇关于Keras 嵌入层中的 mask_zero 如何工作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆