在批次之间传递LSTM状态的最佳方法 [英] The best way to pass the LSTM state between batches
问题描述
我正在尝试找到在批次之间传递LSTM状态的最佳方法.我已经搜索了所有内容,但是找不到当前实现的解决方案.假设我有类似的东西:
I am trying to find the best way to pass the LSTM state between batches. I have searched everything but I could not find a solution for the current implementation. Imagine I have something like:
cells = [rnn.LSTMCell(size) for size in [256,256]
cells = rnn.MultiRNNCell(cells, state_is_tuple=True)
init_state = cells.zero_state(tf.shape(x_hot)[0], dtype=tf.float32)
net, new_state = tf.nn.dynamic_rnn(cells, x_hot, initial_state=init_state ,dtype=tf.float32)
现在,我想在每个批次中高效地传递 new_state
,因此无需将其存储回内存,然后使用 feed_dict
将其重新馈送到tf.更精确地说,我找到的所有解决方案都使用 sess.run
评估 new_state
和 feed-dict
并将其传递到 init_state
.有没有办法避免使用 feed-dict
的瓶颈?
Now I would like to pass the new_state
in each batch efficiently, so without storing it back to memory and then re-feed to tf using feed_dict
. To be more precise, all the solutions I found use sess.run
to evaluate new_state
and feed-dict
to pass it into init_state
. Is there any way to do so without having the bottleneck of using feed-dict
?
我认为我应该以某种方式使用 tf.assign
,但是文档不完整,找不到任何解决方法.
I think I should use tf.assign
in some way but the doc is incomplete and I could not find any workaround.
我要感谢提前询问的每个人.
I want to thank everybody that will ask in advance.
干杯
弗朗切斯科·萨维里奥(Francesco Saverio)
Francesco Saverio
我在堆栈溢出中发现的所有其他答案都适用于旧版本,或使用"feed-dict"方法传递新状态.例如:
All the others answers that I found on stack overflow works for older version or use the 'feed-dict' method to pass the new state. For instance:
1) TensorFlow:记住下一批的LSTM状态(有状态LSTM)这可以通过使用"feed-dict"提供状态占位符来实现,而我想避免这种情况
1) TensorFlow: Remember LSTM state for next batch (stateful LSTM) This works by using 'feed-dict' to feed the state placeholder and I want to avoid that
2) Tensorflow-批处理中的LSTM状态重用处于扭曲状态
3)在Tensorflow中运行之间保存LSTM RNN状态在这里
推荐答案
LSTMStateTuple
只不过是输出和隐藏状态的元组. tf.assign
创建一个操作,该操作在运行时将张量中存储的值分配给变量(如果您有特定的问题,请询问以便改进文档).您可以通过使用元组的 c
属性从元组中检索隐藏状态张量,来将解决方案与 tf.assign
一起使用(假设您想要隐藏状态)- new_state.c
LSTMStateTuple
is nothing more than a tuple of output and hidden state. tf.assign
creates an operation that when run, assigns a value stored in a tensor to a variable (if you have specific questions, please ask so that docs can be improved). You can use the solution with tf.assign
by retrieving the hidden state tensor using from the tuple using the c
attribute of the tuple (assuming you want the hidden state) - new_state.c
以下是有关玩具问题的完整的独立示例: https://gist.github.com/iganichev/632b425fed0263d0274ec5b922aa3b2f
Here is a complete self-contained example on a toy problem: https://gist.github.com/iganichev/632b425fed0263d0274ec5b922aa3b2f
这篇关于在批次之间传递LSTM状态的最佳方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!