Tensorflow 中基于 CuDnnGRU 的 RNN 实现的简单示例 [英] Simple example of CuDnnGRU based RNN implementation in Tensorflow
问题描述
我将以下代码用于标准 GRU 实现:
I am using the following code for standard GRU implementation:
def BiRNN_deep_dynamic_FAST_FULL_autolength(x,batch_size,dropout,hidden_dim):
seq_len=length_rnn(x)
with tf.variable_scope('forward'):
lstm_cell_fwd =tf.contrib.rnn.GRUCell(hidden_dim,kernel_initializer=tf.contrib.layers.xavier_initializer(),bias_initializer=tf.contrib.layers.xavier_initializer())
lstm_cell_fwd = tf.contrib.rnn.DropoutWrapper(lstm_cell_fwd, output_keep_prob=dropout)
with tf.variable_scope('backward'):
lstm_cell_back =tf.contrib.rnn.GRUCell(hidden_dim,kernel_initializer=tf.contrib.layers.xavier_initializer(),bias_initializer=tf.contrib.layers.xavier_initializer())
lstm_cell_back = tf.contrib.rnn.DropoutWrapper(lstm_cell_back, output_keep_prob=dropout)
outputs,_= tf.nn.bidirectional_dynamic_rnn(cell_fw=lstm_cell_fwd,cell_bw= lstm_cell_back,inputs=x,sequence_length=seq_len,dtype=tf.float32,time_major=False)
outputs_fwd,outputs_bck=outputs
### fwd matrix is the matrix that keeps all the last [-1] vectors
fwd_matrix=tf.gather_nd(outputs_fwd, tf.stack([tf.range(batch_size), seq_len-1], axis=1)) ### 99,64
outputs_fwd=tf.transpose(outputs_fwd,[1,0,2])
outputs_bck=tf.transpose(outputs_bck,[1,0,2])
return outputs_fwd,outputs_bck,fwd_matrix
谁能提供一个简单的例子来说明如何以类似的方式使用 tf.contrib.cudnn_rnn.CudnnGRU 单元?只是换出单元格是行不通的.
Can anyone provide a simple example of how to use the tf.contrib.cudnn_rnn.CudnnGRU Cell in a similar fashion? Just swapping out the cells doesn't work.
第一个问题是 CuDnnGRU 单元没有 dropout 包装器,这很好.其次,它似乎不适用于 tf.nn.bidirectional_dynamic_rnn.任何帮助表示赞赏.
First issue is that there is no dropout wrapper for CuDnnGRU cell, which is fine. Second it doesnt seem to work with tf.nn.bidirectional_dynamic_rnn. Any help appreciated.
推荐答案
CudnnGRU
不是 RNNCell
实例.它更类似于 dynamic_rnn
.
CudnnGRU
is not an RNNCell
instance. It's more akin to dynamic_rnn
.
下面的张量操作是等效的,其中 input_tensor
是一个时间主张量,即形状为 [max_sequence_length, batch_size, embedding_size]
.CudnnGRU 期望输入张量是时间主要的(而不是更标准的批量格式,即形状 [batch_size, max_sequence_length, embedding_size]
),并且使用时间主要无论如何都是带有 RNN 操作的张量,因为它们有点快.
The tensor manipulations below are equivalent, where input_tensor
is a time-major tensor, i.e. of shape [max_sequence_length, batch_size, embedding_size]
. CudnnGRU expects the input tensor to be time-major (as opposed to the more standard batch-major format i.e. of shape [batch_size, max_sequence_length, embedding_size]
), and it's a good practice to use time-major tensors with RNN ops anyways since they're somewhat faster.
CudnnGRU:
rnn = tf.contrib.cudnn_rnn.CudnnGRU(
num_rnn_layers, hidden_size, direction='bidirectional')
rnn_output = rnn(input_tensor)
CudnnCompatibleGRUCell:
CudnnCompatibleGRUCell:
rnn_output = input_tensor
sequence_length = tf.reduce_sum(
tf.sign(inputs),
reduction_indices=0) # 1 if `input_tensor` is batch-major.
for _ in range(num_rnn_layers):
fw_cell = tf.contrib.cudnn_rnn.CudnnCompatibleGRUCell(hidden_size)
bw_cell = tf.contrib.cudnn_rnn.CudnnCompatibleGRUCell(hidden_size)
rnn_output = tf.nn.bidirectional_dynamic_rnn(
fw_cell, bw_cell, rnn_output, sequence_length=sequence_length,
dtype=tf.float32, time_major=True)[1] # Set `time_major` accordingly
注意以下几点:
- 如果您使用的是 LSTM,则无需使用
CudnnCompatibleLSTMCell
;你可以使用标准的LSTMCell
.但是对于 GRU,Cudnn 实现具有固有的不同数学运算,特别是更多的权重(参见文档). - 与
dynamic_rnn
不同,CudnnGRU
不允许您指定序列长度.尽管如此,它还是快了一个数量级,但是您必须小心提取输出的方式(例如,如果您对每个填充且长度不同的序列的最终隐藏状态感兴趣,您将需要每个序列的长度). rnn_output
可能是一个元组,在这两种情况下都有很多(不同的)东西.请参阅文档或直接将其打印出来,以检查您需要输出的哪些部分.
- If you were using LSTMs, you need not use
CudnnCompatibleLSTMCell
; you can use the standardLSTMCell
. But with GRUs, the Cudnn implementation has inherently different math operations, and in particular, more weights (see the documentation). - Unlike
dynamic_rnn
,CudnnGRU
doesn't allow you to specify sequence lengths. Still, it is over an order of magnitude faster, but you will have to be careful on how you extract your outputs (e.g. if you're interested in the final hidden state of each sequence that is padded and of varying length, you will need each sequence's length). rnn_output
is probably a tuple with lots of (distinct) stuff in both cases. Refer to the documentation, or just print it out, to inspect what parts of the output you need.
这篇关于Tensorflow 中基于 CuDnnGRU 的 RNN 实现的简单示例的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!