Tensorflow 中基于 CuDnnGRU 的 RNN 实现的简单示例 [英] Simple example of CuDnnGRU based RNN implementation in Tensorflow

查看:229
本文介绍了Tensorflow 中基于 CuDnnGRU 的 RNN 实现的简单示例的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我将以下代码用于标准 GRU 实现:

I am using the following code for standard GRU implementation:

def BiRNN_deep_dynamic_FAST_FULL_autolength(x,batch_size,dropout,hidden_dim):

seq_len=length_rnn(x)

with tf.variable_scope('forward'):
    lstm_cell_fwd =tf.contrib.rnn.GRUCell(hidden_dim,kernel_initializer=tf.contrib.layers.xavier_initializer(),bias_initializer=tf.contrib.layers.xavier_initializer())
    lstm_cell_fwd = tf.contrib.rnn.DropoutWrapper(lstm_cell_fwd, output_keep_prob=dropout)
with tf.variable_scope('backward'):
    lstm_cell_back =tf.contrib.rnn.GRUCell(hidden_dim,kernel_initializer=tf.contrib.layers.xavier_initializer(),bias_initializer=tf.contrib.layers.xavier_initializer())
    lstm_cell_back = tf.contrib.rnn.DropoutWrapper(lstm_cell_back, output_keep_prob=dropout)

outputs,_= tf.nn.bidirectional_dynamic_rnn(cell_fw=lstm_cell_fwd,cell_bw= lstm_cell_back,inputs=x,sequence_length=seq_len,dtype=tf.float32,time_major=False)
outputs_fwd,outputs_bck=outputs

### fwd matrix is the matrix that keeps all the last [-1] vectors
fwd_matrix=tf.gather_nd(outputs_fwd, tf.stack([tf.range(batch_size), seq_len-1], axis=1))       ###  99,64

outputs_fwd=tf.transpose(outputs_fwd,[1,0,2])
outputs_bck=tf.transpose(outputs_bck,[1,0,2])

return outputs_fwd,outputs_bck,fwd_matrix

谁能提供一个简单的例子来说明如何以类似的方式使用 tf.contrib.cudnn_rnn.CudnnGRU 单元?只是换出单元格是行不通的.

Can anyone provide a simple example of how to use the tf.contrib.cudnn_rnn.CudnnGRU Cell in a similar fashion? Just swapping out the cells doesn't work.

第一个问题是 CuDnnGRU 单元没有 dropout 包装器,这很好.其次,它似乎不适用于 tf.nn.bidirectional_dynamic_rnn.任何帮助表示赞赏.

First issue is that there is no dropout wrapper for CuDnnGRU cell, which is fine. Second it doesnt seem to work with tf.nn.bidirectional_dynamic_rnn. Any help appreciated.

推荐答案

CudnnGRU 不是 RNNCell 实例.它更类似于 dynamic_rnn.

CudnnGRU is not an RNNCell instance. It's more akin to dynamic_rnn.

下面的张量操作是等效的,其中 input_tensor 是一个时间主张量,即形状为 [max_sequence_length, batch_size, embedding_size].CudnnGRU 期望输入张量是时间主要的(而不是更标准的批量格式,即形状 [batch_size, max_sequence_length, embedding_size]),并且使用时间主要无论如何都是带有 RNN 操作的张量,因为它们有点快.

The tensor manipulations below are equivalent, where input_tensor is a time-major tensor, i.e. of shape [max_sequence_length, batch_size, embedding_size]. CudnnGRU expects the input tensor to be time-major (as opposed to the more standard batch-major format i.e. of shape [batch_size, max_sequence_length, embedding_size]), and it's a good practice to use time-major tensors with RNN ops anyways since they're somewhat faster.

CudnnGRU:

rnn = tf.contrib.cudnn_rnn.CudnnGRU(
  num_rnn_layers, hidden_size, direction='bidirectional')

rnn_output = rnn(input_tensor)

CudnnCompatibleGRUCell:

CudnnCompatibleGRUCell:

rnn_output = input_tensor
sequence_length = tf.reduce_sum(
  tf.sign(inputs),
  reduction_indices=0)  # 1 if `input_tensor` is batch-major.

  for _ in range(num_rnn_layers):
    fw_cell = tf.contrib.cudnn_rnn.CudnnCompatibleGRUCell(hidden_size)
    bw_cell = tf.contrib.cudnn_rnn.CudnnCompatibleGRUCell(hidden_size)
    rnn_output = tf.nn.bidirectional_dynamic_rnn(
      fw_cell, bw_cell, rnn_output, sequence_length=sequence_length,
      dtype=tf.float32, time_major=True)[1]  # Set `time_major` accordingly

注意以下几点:

  1. 如果您使用的是 LSTM,则无需使用 CudnnCompatibleLSTMCell;你可以使用标准的LSTMCell.但是对于 GRU,Cudnn 实现具有固有的不同数学运算,特别是更多的权重(参见文档).
  2. dynamic_rnn 不同,CudnnGRU 不允许您指定序列长度.尽管如此,它还是快了一个数量级,但是您必须小心提取输出的方式(例如,如果您对每个填充且长度不同的序列的最终隐藏状态感兴趣,您将需要每个序列的长度).
  3. rnn_output 可能是一个元组,在这两种情况下都有很多(不同的)东西.请参阅文档或直接将其打印出来,以检查您需要输出的哪些部分.
  1. If you were using LSTMs, you need not use CudnnCompatibleLSTMCell; you can use the standard LSTMCell. But with GRUs, the Cudnn implementation has inherently different math operations, and in particular, more weights (see the documentation).
  2. Unlike dynamic_rnn, CudnnGRU doesn't allow you to specify sequence lengths. Still, it is over an order of magnitude faster, but you will have to be careful on how you extract your outputs (e.g. if you're interested in the final hidden state of each sequence that is padded and of varying length, you will need each sequence's length).
  3. rnn_output is probably a tuple with lots of (distinct) stuff in both cases. Refer to the documentation, or just print it out, to inspect what parts of the output you need.

这篇关于Tensorflow 中基于 CuDnnGRU 的 RNN 实现的简单示例的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆