如何在TensorFlow中创建端执行基本LSTM网络? [英] How to create end execute a basic LSTM network in TensorFlow?

查看:242
本文介绍了如何在TensorFlow中创建端执行基本LSTM网络?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想创建一个基本的LSTM网络,该网络接受5维矢量的序列(例如,作为N x 5数组),并返回4维隐藏矢量和单元矢量的相应序列(N x 4数组),其中N是时间步数.

我怎么做TensorFlow?

添加

到目前为止,我可以运行以下代码:

num_units = 4
lstm = tf.nn.rnn_cell.LSTMCell(num_units = num_units)

timesteps = 18
num_input = 5
X = tf.placeholder("float", [None, timesteps, num_input])
x = tf.unstack(X, timesteps, 1)
outputs, states = tf.contrib.rnn.static_rnn(lstm, x, dtype=tf.float32)

sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)

x_val = np.random.normal(size = (12,18,5))
res = sess.run(outputs, feed_dict = {X:x_val})
sess.close()

但是,有很多悬而未决的问题:

  1. 为什么要预设时间步数? LSTM不能接受任意长度的序列吗?
  2. 我们为什么要按时间步长拆分数据(使用unstack)?
  3. 如何解释输出"和状态"?

解决方案

为什么要预设时间步数? LSTM是否应该接受 任意长度的序列?

如果要接受任意长度的序列,建议使用dynamic_rnn.您可以参考

dynamic_rnn在一批中需要相同的长度,但是当您需要在一批中具有任意长度时,您可以在填充批数据之后使用sequence_length参数指定每个长度.

我们是否按时间步长(使用unstack)拆分数据?

只需static_rnn需要使用unstack拆分数据,这取决于它们的不同输入要求. static_rnn的输入形状是[timesteps,batch_size, features],这是形状为[batch_size, features]的2D张量的列表.但是dynamic_rnn的输入形状是[timesteps,batch_size, features][batch_size,timesteps, features],具体取决于time_major是True还是False.

如何解释输出"和状态"?

在LSTMCell中,states的形状为[2,batch_size,num_units ],一个[batch_size, num_units ]表示 C ,另一个[batch_size, num_units ]表示 h .您可以在下面看到图片.

以同样的方式,您将在GRUCell中得到states的形状为[batch_size, num_units ].

outputs表示每个时间步的输出,因此默认情况下(time_major = False)其形状为[batch_size, timesteps, num_units].您可以轻松得出结论 state[1, batch_size, : ] == outputs[ batch_size, -1, : ].

I want to create a basic LSTM network that accept sequences of 5 dimensional vectors (for example as a N x 5 arrays) and returns the corresponding sequences of 4 dimensional hidden- and cell-vectors (N x 4 arrays), where N is the number of time steps.

How can I do it TensorFlow?

ADDED

So, far I got the following code working:

num_units = 4
lstm = tf.nn.rnn_cell.LSTMCell(num_units = num_units)

timesteps = 18
num_input = 5
X = tf.placeholder("float", [None, timesteps, num_input])
x = tf.unstack(X, timesteps, 1)
outputs, states = tf.contrib.rnn.static_rnn(lstm, x, dtype=tf.float32)

sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)

x_val = np.random.normal(size = (12,18,5))
res = sess.run(outputs, feed_dict = {X:x_val})
sess.close()

However, there are many open questions:

  1. Why number of time steps is preset? Shouldn't LSTM be able to accept sequences of arbitrary length?
  2. Why do we split data by time-steps (using unstack)?
  3. How to interpret the "outputs" and "states"?

解决方案

Why number of time steps is preset? Shouldn't LSTM be able to accept sequences of arbitrary length?

If you want to accept sequences of arbitrary length, I recommend using dynamic_rnn.You can refer here to understand the difference between them.

For example:

num_units = 4
lstm = tf.nn.rnn_cell.LSTMCell(num_units = num_units)

num_input = 5
X = tf.placeholder("float", [None, None, num_input])
outputs, states = tf.nn.dynamic_rnn(lstm, X, dtype=tf.float32)

sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)

x_val = np.random.normal(size = (12,18,5))
res = sess.run(outputs, feed_dict = {X:x_val})

x_val = np.random.normal(size = (12,16,5))
res = sess.run(outputs, feed_dict = {X:x_val})
sess.close()

dynamic_rnn require same length in one batch , but you can specify every length using the sequence_length parameter after you pad batch data when you need arbitrary length in one batch.

We do we split data by time-steps (using unstack)?

Just static_rnn needs to split data with unstack,this depending on their different input requirements. The input shape of static_rnn is [timesteps,batch_size, features], which is a list of 2D tensors of shape [batch_size, features]. But the input shape of dynamic_rnn is either [timesteps,batch_size, features] or [batch_size,timesteps, features] depending on time_major is True or False.

How to interpret the "outputs" and "states"?

The shape of states is [2,batch_size,num_units ] in LSTMCell, one [batch_size, num_units ] represents C and the other [batch_size, num_units ] represents h. You can see pictures below.

In the same way, You will get the shape of states is [batch_size, num_units ] in GRUCell.

outputs represents the output of each time step, so by default(time_major=False) its shape is [batch_size, timesteps, num_units]. And You can easily conclude that state[1, batch_size, : ] == outputs[ batch_size, -1, : ].

这篇关于如何在TensorFlow中创建端执行基本LSTM网络?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆