Tensorflow dynamic_rnn弃用 [英] Tensorflow dynamic_rnn deprecation

查看:164
本文介绍了Tensorflow dynamic_rnn弃用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

tf.nn.dynamic_rnn似乎已被弃用:

警告:不建议使用此功能.它将在将来的版本中删除.更新说明:请使用keras.layers.RNN(cell),它等效于此API

Warning: THIS FUNCTION IS DEPRECATED. It will be removed in a future version. Instructions for updating: Please use keras.layers.RNN(cell), which is equivalent to this API

我检查了keras.layers.RNN(cell),它说它可以使用掩膜,我认为它可以代替dynamic_rnnsequence_length参数?

I have checked out keras.layers.RNN(cell) and it says that it can use masking which I assume can act as a replacement for dynamic_rnn's sequence_length parameter?

该层支持对具有可变时间步长的输入数据进行屏蔽.要将掩码引入数据,请使用embedding层,其mask_zero参数设置为True.

This layer supports masking for input data with a variable number of timesteps. To introduce masks to your data, use an Embedding layer with the mask_zero parameter set to True.

但是即使在嵌入文档中也没有关于如何使用mask_zero=True容纳可变序列长度的更多信息.另外,如果我只是使用嵌入层来添加蒙版,那么如何防止嵌入层更改输入并受到训练?

But there is no further information even in the Embedding docs for how I can use mask_zero=True to accommodate variable sequence lengths. Also, if I am using an embedding layer just to add a mask, how do I prevent the Embedding from changing my input and being trained?

类似于此问题 Tensorflow中的RNN vs Keras,tf.nn.dynamic_rnn()贬值,但我想知道如何使用遮罩替换sequence_length

Similar to this question RNN in Tensorflow vs Keras, depreciation of tf.nn.dynamic_rnn() but I want to know how to use the mask to replace sequence_length

推荐答案

我也需要一个答案,并通过问题底部的链接找出了我需要的东西.

I needed an answer to this too, and figured out what I needed through the link at the bottom of your question.

简而言之,您可以按照链接中的答案进行操作,但是如果您不愿意使用嵌入层,则可以简单地"省去嵌入层.我强烈建议阅读并理解链接答案(详细信息),以及屏蔽,但这是一个修改后的版本,它在序列输入上使用了一个屏蔽层来替换"sequence_length":

In short, you do as the answer in the link says, but you 'simply' leave out the embedding layer if you're not interested in using one. I'd highly recommend reading and understanding the linked answer as it goes into more detail, and the docs on Masking, but here's a modified version which uses a masking layer over the sequence inputs to replace 'sequence_length':

import numpy as np
import tensorflow as tf

pad_value = 0.37
# This is our input to the RNN, in [batch_size, max_sequence_length, num_features] shape
test_input = np.array(
[[[1.,   1.  ],
  [2,    2.  ],
  [1.,   1.  ],
  [pad_value, pad_value], # <- a row/time step which contains all pad_values will be masked through the masking layer
  [pad_value, pad_value]],

 [[pad_value, pad_value],
  [1.,   1.  ],
  [2,    2.  ],
  [1.,   1.  ],
  [pad_value, pad_value]]])

# Define the mask layer, telling it to mask all time steps that contain all pad_value values
mask = tf.keras.layers.Masking(mask_value=pad_value)
rnn = tf.keras.layers.GRU(
    1,
    return_sequences=True,
    activation=None, # <- these values and below are just used to initialise the RNN in a repeatable way for this example
    recurrent_activation=None,
    kernel_initializer='ones',
    recurrent_initializer='zeros',
    use_bias=True,
    bias_initializer='ones'
)

x = tf.keras.layers.Input(shape=test_input.shape[1:])
m0 = tf.keras.Model(inputs=x, outputs=rnn(x))
m1 = tf.keras.Model(inputs=x, outputs=mask(x))
m2 = tf.keras.Model(inputs=x, outputs=rnn(mask(x)))

print('raw inputs\n', test_input)
print('raw rnn output (no mask)\n', m0.predict(test_input).squeeze())
print('masked inputs\n', m1.predict(test_input).squeeze())
print('masked rnn output\n', m2.predict(test_input).squeeze())

退出:

raw inputs
 [[[1.   1.  ]
  [2.   2.  ]
  [1.   1.  ]
  [0.37 0.37]
  [0.37 0.37]]

 [[0.37 0.37]
  [1.   1.  ]
  [2.   2.  ]
  [1.   1.  ]
  [0.37 0.37]]]
raw rnn output (no mask)
 [[  -6.        -50.       -156.       -272.7276   -475.83362 ]
 [  -1.2876     -9.862801  -69.314    -213.94202  -373.54672 ]]
masked inputs
 [[[1. 1.]
  [2. 2.]
  [1. 1.]
  [0. 0.]
  [0. 0.]]

 [[0. 0.]
  [1. 1.]
  [2. 2.]
  [1. 1.]
  [0. 0.]]]
masked rnn output
 [[  -6.  -50. -156. -156. -156.]
 [   0.   -6.  -50. -156. -156.]]

请注意,在应用遮罩的情况下,如何在激活遮罩的时间步长(即填充序列的时间步)上不执行计算.取而代之的是,将前一个时间步骤的状态继续进行.

Notice how with the mask applied, the calculations are not performed on a time step where the mask is active (i.e. where the sequence is padded out). Instead, state from the previous time step is carried forward.

需要注意的其他几点:

  • 在链接的(和本例)示例中,使用各种激活和初始化参数创建RNN.我假设这是将RNN初始化为已知状态,以实现示例的可重复性.在实践中,您将按照自己的意愿初始化RNN.
  • 填充值可以是您指定的任何值.通常,使用零填充.在链接的(和此)示例中,使用的值为0.37.我只能假定它是一个任意值,以显示原始和掩蔽的RNN输出中的差异,因为在此示例中,输入值为零,RNN初始化在输出中几乎没有差异,因此某个"值(即0.37)说明了遮罩的效果.
  • 屏蔽文档指出行/时间步长仅当该时间步长的所有全部包含掩码值时才被掩码.例如,在上面,[0.37, 2]的时间步长仍将与这些值一起馈送到网络,但是,[0.37, 0.37]的时间步长将被跳过.
  • 解决此问题而不是掩蔽的另一种方法是通过将不同的序列长度分批在一起进行几次训练.例如,如果您混合使用10、20和30的序列长度,而不是将它们全部填充到30并进行掩蔽,请使用所有10个序列长度进行训练,然后依次使用20s和30s.或者,如果您说很多100个序列长度,也有3、4、5个序列长度,则可能需要将较小的序列长度填充到全部5个长度,并使用100s和填充/蒙版5s训练两次.您可能会获得训练速度,但是要以较低的精度为代价,因为您将无法在不同序列长度的批次之间进行混洗.
  • In the linked (and this) example, the RNN is created with various activation and initializer parameters. I assume this is to initialize the RNN to a known state for repeatability for the example. In practice, you would initialize the RNN how you would like.
  • The pad value can be any value you specify. Typically, padding using zeros is used. In the linked (and this) example, a value of 0.37 is used. I can only assume it is an arbitrary value to show the difference in the raw and masked RNN outputs, as a zero input value with this example RNN initialisation gives little/no difference in output, therefore 'some' value (i.e. 0.37) demonstrates the effect of the masking.
  • The Masking docs state that rows/time steps are masked only if all of the values for that time step contain the mask value. For example, in the above, a time step of [0.37, 2] would still be fed to the network with those values, however, a time step of [0.37, 0.37] would be skipped over.
  • An alternative approach to this problem instead of masking would be to train several times by batching the different sequence lengths together. For example, if you have a mix of sequence lengths of 10, 20, and 30, instead of padding them all out to 30 and masking, train using all your 10 sequence lengths, then your 20s, then 30s. Or if you have say lots of 100 sequence lengths, and also lots of 3, 4, 5 sequence lengths, you may want to pad your smaller ones to all 5 length and train twice using 100s and padded/masked 5s. You will likely gain training speed, but at the trade-off of less accuracy as you won't be able to shuffle between batches of different sequence lengths.

这篇关于Tensorflow dynamic_rnn弃用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆