从梯度/损失计算中解耦出队操作 [英] Decouple dequeue operation from gradient/loss computation
问题描述
为了支持更大的数据集,我目前正在尝试停止使用提要并开始使用队列.使用队列对于 tensorflow 中的优化器来说效果很好,因为它们只为每个出队操作评估一次梯度.但是,我已经实现了与其他执行线搜索的优化器的接口,我不仅需要评估梯度,还需要评估同一批次的多个点的损失.不幸的是,在正常的排队系统中,每次损失评估都会执行一次出队,而不是多次为同一批次计算.
I'm currently trying to move away from using feeds and start using queues in order to support larger datasets. Using queues works fine for the optimizers in tensorflow, since they only evaluate the gradient once for each dequeue operation. However, I have implemented interfaces with other optimizers that perform line searches, and I need to evaluate not just the gradient, but also the loss at multiple points for the same batch. Unfortunately, with the normal queueing system each loss evaluation will execute a dequeue instead of computing for the same batch several times.
有没有办法将出队操作与梯度/损失计算解耦,这样我可以执行一次出队,然后在当前批次上执行多次梯度/损失计算?
Is there a way to decouple the dequeuing operation from the gradient/loss computation in such a way that I can execute dequeue once and then execute the gradient/loss computation several times on the current batch?
请注意,我的输入张量的大小在批次之间是可变的.我们使用分子数据,每个分子都有不同数量的原子.这与图像数据完全不同,图像数据通常缩放到具有相同的尺寸.
Please note that the size of my input tensor is variable between batches. We work with molecular data, and each molecule has a different number of atoms. This is quite different from image data, where everything is typically scaled to have identical dimensions.
推荐答案
通过创建一个存储出队值的变量来解耦它,然后依赖这个变量而不是出队操作.推进队列发生在 assign
Decouple it by creating a variable stores dequeued value, and then depend on this variable instead of dequeue op. Advancing the queue happens during assign
解决方案#1:固定大小的数据,使用变量
Solution #1: fixed size data, use Variables
(image_batch_live,) = tf.train.batch([image],batch_size=5,num_threads=1,capacity=614)
image_batch = tf.Variable(
tf.zeros((batch_size, image_size, image_size, color_channels)),
trainable=False,
name="input_values_cached")
advance_batch = tf.assign(image_batch, image_batch_live)
现在 image_batch
给出队列的最新值而不推进队列,advance_batch
推进队列.
Now image_batch
gives latest value of queue without advancing it and advance_batch
advances the queue.
解决方案#2:可变大小的数据,使用持久张量
Solution #2: variable size data, use persistent Tensors
这里我们通过引入dequeue_op
和dequeue_op2
来解耦工作流.所有计算都依赖于 dequeue_op2
,它被提供给 dequeue_op
的保存值.使用 get_session_tensor/get_session_handle
可确保实际数据保留在 TensorFlow 运行时中,并且通过 feed_dict
传递的值是一个短字符串标识符.API 有点尴尬,因为 dummy_handle
,我提出了这个问题 这里
Here we decouple the workflow by introducing dequeue_op
and dequeue_op2
. All computation depends on dequeue_op2
which is fed the saved value of dequeue_op
. Using get_session_tensor/get_session_handle
ensures that actual data remains in TensorFlow runtime and the value that's passed through feed_dict
is a short string identifier. The API is a little awkward because of dummy_handle
, I've brought up this issue here
import tensorflow as tf
def create_session():
sess = tf.InteractiveSession(config=tf.ConfigProto(operation_timeout_in_ms=3000))
return sess
tf.reset_default_graph()
sess = create_session()
dt = tf.int32
dummy_handle = sess.run(tf.get_session_handle(tf.constant(1)))
q = tf.FIFOQueue(capacity=20, dtypes=[dt])
enqueue_placeholder = tf.placeholder(dt, shape=[None])
enqueue_op = q.enqueue(enqueue_placeholder)
dequeue_op = q.dequeue()
size_op = q.size()
dequeue_handle_op = tf.get_session_handle(dequeue_op)
dequeue_placeholder, dequeue_op2 = tf.get_session_tensor(dummy_handle, dt)
compute_op1 = tf.reduce_sum(dequeue_op2)
compute_op2 = tf.reduce_sum(dequeue_op2)+1
# fill queue with variable size data
for i in range(10):
sess.run(enqueue_op, feed_dict={enqueue_placeholder:[1]*(i+1)})
sess.run(q.close())
try:
while(True):
dequeue_handle = sess.run(dequeue_handle_op) # advance the queue
val1 = sess.run(compute_op1, feed_dict={dequeue_placeholder: dequeue_handle.handle})
val2 = sess.run(compute_op2, feed_dict={dequeue_placeholder: dequeue_handle.handle})
size = sess.run(size_op)
print("val1 %d, val2 %d, queue size %d" % (val1, val2, size))
except tf.errors.OutOfRangeError:
print("Done")
当你运行它时,你应该会看到类似下面的内容
You should see something like below when you run it
val1 1, val2 2, queue size 9
val1 2, val2 3, queue size 8
val1 3, val2 4, queue size 7
val1 4, val2 5, queue size 6
val1 5, val2 6, queue size 5
val1 6, val2 7, queue size 4
val1 7, val2 8, queue size 3
val1 8, val2 9, queue size 2
val1 9, val2 10, queue size 1
val1 10, val2 11, queue size 0
Done
这篇关于从梯度/损失计算中解耦出队操作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!