Tensorflow与Keras中的RNN,tf.nn.dynamic_rnn()的折旧 [英] RNN in Tensorflow vs Keras, depreciation of tf.nn.dynamic_rnn()

查看:215
本文介绍了Tensorflow与Keras中的RNN,tf.nn.dynamic_rnn()的折旧的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的问题是: tf.nn.dynamic_rnn 确实与文档中所述完全相同?

My question is: Are the tf.nn.dynamic_rnn and keras.layers.RNN(cell) truly identical as stated in docs?

我正在计划构建RNN,但是,似乎 tf.nn.dynamic_rnn 对Keras表示敬意.

I am planning on building an RNN, however, it seems that tf.nn.dynamic_rnn is depricated in favour of Keras.

其中特别指出:

警告:不建议使用此功能.以后将其删除 版本.更新说明:请使用keras.layers.RNN(cell), 等同于该API

Warning: THIS FUNCTION IS DEPRECATED. It will be removed in a future version. Instructions for updating: Please use keras.layers.RNN(cell), which is equivalent to this API

但是在可变长度的情况下,我看不出API的等效性!

But I don't see how the APIs are equivalent, in the case of variable sequence lengths!

在原始TF中,我们可以指定形状为(batch_size, seq_lengths)的张量.这样,如果我们的序列为[0, 1, 2, 3, 4],并且批处理中最长的序列的大小为10,则可以用0和[0, 1, 2, 3, 4, 0, 0, 0, 0, 0]填充它,我们可以说seq_length=5处理[0, 1, 2, 3, 4].

In raw TF, we can specify a tensor of shape (batch_size, seq_lengths). This way, if our sequence is [0, 1, 2, 3, 4] and the longest sequence in the batch is of size 10, we can pad it with 0s and [0, 1, 2, 3, 4, 0, 0, 0, 0, 0], we can say seq_length=5 to process [0, 1, 2, 3, 4].

但是,在Keras中,它不是这样工作的!我们可以做的是在先前的图层中指定mask_zero=True,例如嵌入层.这也将掩盖第一个零!

However, in Keras, this is not how it works! What we can do, is specify the mask_zero=True in previous Layers, e.g. the Embedding Layer. This will also mask the 1st zero!

我可以通过在整个向量上加一个来解决它,但这就是我在使用tft.compute_vocabulary()处理后需要做的额外预处理,该处理将词汇表单词映射到0个索引向量.

I can go around it by adding ones to the whole vector, but then thats extra preprocessing that I need to do after processing using tft.compute_vocabulary(), which maps vocabulary words to 0 indexed vector.

推荐答案

否,但它们也(或可以使之成为)也没有太大不同.

No, but they are (or can be made to be) not so different either.

tf.nn.dynamic_rnn在序列以0s结束后替换元素.据我所知,这不能用tf.keras.layers.*复制,但是您可以通过RNN(Masking(...)方法获得类似的行为:它只是停止计算并携带最后的输出并转发状态.您将获得与从tf.nn.dynamic_rnn获得的输出相同(非填充)的输出.

tf.nn.dynamic_rnn replaces elements after the sequence end with 0s. This cannot be replicated with tf.keras.layers.* as far as I know, but you can get a similar behaviour with RNN(Masking(...) approach: it simply stops the computation and carries the last outputs and states forward. You will get the same (non-padding) outputs as those obtained from tf.nn.dynamic_rnn.

这是一个最小的工作示例,展示了 tf.nn.dynamic_rnn tf.keras.layers.GRU ,无论是否使用 tf.keras.layers.Masking 层.

Here is a minimal working example demonstrating the differences between tf.nn.dynamic_rnn and tf.keras.layers.GRU with and without the use of tf.keras.layers.Masking layer.

import numpy as np
import tensorflow as tf

test_input = np.array([
    [1, 2, 1, 0, 0],
    [0, 1, 2, 1, 0]
], dtype=int)
seq_length = tf.constant(np.array([3, 4], dtype=int))

emb_weights = (np.ones(shape=(3, 2)) * np.transpose([[0.37, 1, 2]])).astype(np.float32)
emb = tf.keras.layers.Embedding(
    *emb_weights.shape,
    weights=[emb_weights],
    trainable=False
)
mask = tf.keras.layers.Masking(mask_value=0.37)
rnn = tf.keras.layers.GRU(
    1,
    return_sequences=True,
    activation=None,
    recurrent_activation=None,
    kernel_initializer='ones',
    recurrent_initializer='zeros',
    use_bias=True,
    bias_initializer='ones'
)


def old_rnn(inputs):
    rnn_outputs, rnn_states = tf.nn.dynamic_rnn(
        rnn.cell,
        inputs,
        dtype=tf.float32,
        sequence_length=seq_length
    )
    return rnn_outputs


x = tf.keras.layers.Input(shape=test_input.shape[1:])
m0 = tf.keras.Model(inputs=x, outputs=emb(x))
m1 = tf.keras.Model(inputs=x, outputs=rnn(emb(x)))
m2 = tf.keras.Model(inputs=x, outputs=rnn(mask(emb(x))))

print(m0.predict(test_input).squeeze())
print(m1.predict(test_input).squeeze())
print(m2.predict(test_input).squeeze())

sess = tf.keras.backend.get_session()
print(sess.run(old_rnn(mask(emb(x))), feed_dict={x: test_input}).squeeze())

m0的输出在那里显示了应用嵌入层的结果. 请注意,根本没有零条目:

The outputs from m0 are there to show the result of applying the embedding layer. Note that there are no zero entries at all:

[[[1.   1.  ]    [[0.37 0.37]
  [2.   2.  ]     [1.   1.  ]
  [1.   1.  ]     [2.   2.  ]
  [0.37 0.37]     [1.   1.  ]
  [0.37 0.37]]    [0.37 0.37]]]

现在是m1m2old_rnn体系结构的实际输出:

Now here are the actual outputs from the m1, m2 and old_rnn architectures:

m1: [[  -6.  -50. -156. -272.7276 -475.83362]
     [  -1.2876 -9.862801 -69.314 -213.94202 -373.54672 ]]
m2: [[  -6.  -50. -156. -156. -156.]
     [   0.   -6.  -50. -156. -156.]]
old [[  -6.  -50. -156.    0.    0.]
     [   0.   -6.  -50. -156.    0.]]

摘要

  • 旧的tf.nn.dynamic_rnn用于用零掩盖填充元素.
  • 没有屏蔽的新RNN层 在填充元素上运行,就好像它们是数据一样.
  • 新的rnn(mask(...))方法只是简单地停止计算,并传送最后的输出和状态.请注意,我通过这种方法获得的(非填充)输出与tf.nn.dynamic_rnn中的输出完全相同.
  • Summary

    • The old tf.nn.dynamic_rnn used to mask padding elements with zeros.
    • The new RNN layers without masking run over the padding elements as if they were data.
    • The new rnn(mask(...)) approach simply stops the computation and carries the last outputs and states forward. Note that the (non-padding) outputs that I obtained for this approach are exactly the same as those from tf.nn.dynamic_rnn.
    • 无论如何,我无法涵盖所有​​可能的极端情况,但我希望您可以使用此脚本来进一步了解问题.

      Anyway, I cannot cover all possible edge cases, but I hope that you can use this script to figure things out further.

      这篇关于Tensorflow与Keras中的RNN,tf.nn.dynamic_rnn()的折旧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆