TensorFlow:在输入处获取梯度时性能缓慢 [英] TensorFlow: slow performance when getting gradients at inputs

查看:36
本文介绍了TensorFlow:在输入处获取梯度时性能缓慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在用 TensorFlow 构建一个简单的多层感知器,我还需要获取神经网络输入端损失的梯度(或误差信号).

I'm building a simple multilayer perceptron with TensorFlow, and I also need to obtain the gradients (or error signal) of the loss at the neural network's inputs.

这是我的代码,它有效:

Here's my code, which works:

cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(self.network, self.y))
optimizer = tf.train.AdagradOptimizer(learning_rate=nn_learning_rate).minimize(cost)
...
for i in range(epochs):
    ....
    for batch in batches:
        ...
        sess.run(optimizer, feed_dict=feed_dict)
        grads_wrt_input = sess.run(tf.gradients(cost, self.x), feed_dict=feed_dict)[0]

(编辑为包括训练循环)

(edited to include training loop)

没有最后一行 (grads_wrt_input...),这在 CUDA 机器上运行得非常快.但是,tf.gradients() 将性能大大降低了十倍甚至更多.

Without the last line (grads_wrt_input...), this runs really fast on a CUDA machine. However, tf.gradients() reduces performance greatly by tenfold or more.

我记得节点处的错误信号是作为反向传播算法中的中间值计算的,我已经使用 Java 库 DeepLearning4j 成功地完成了这一点.我还认为这将是对 optimizer 已经构建的计算图的轻微修改.

I recall that the error signals at the nodes are computed as intermediate values in the backpropagation algorithm, and I have successfully done this using the Java library DeepLearning4j. I was also under the impression that this would be a slight modification to the computation graph already built by optimizer.

如何才能更快,或者有没有其他方法可以计算损失 w.r.t. 的梯度?输入?

How can this be made faster, or is there any other way to compute the gradients of the loss w.r.t. the inputs?

推荐答案

tf.gradients() 函数每次调用都会构建一个新的反向传播图,所以速度变慢的原因是TensorFlow 必须在循环的每次迭代中解析一个新图.(这可能非常昂贵:当前版本的 TensorFlow 已针对多次执行相同图进行了优化.)

The tf.gradients() function builds a new backpropagation graph each time it is called, so the reason for the slowdown is that TensorFlow has to parse a new graph on each iteration of the loop. (This can be surprisingly expensive: the current version of TensorFlow is optimized for executing the same graph a large number of times.)

幸运的是,解决方案很简单:只需在循环外计算一次梯度.您可以按如下方式重构代码:

Fortunately the solution is easy: just compute the gradients once, outside the loop. You can restructure your code as follows:

cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(self.network, self.y))
optimizer = tf.train.AdagradOptimizer(learning_rate=nn_learning_rate).minimize(cost)
grads_wrt_input_tensor = tf.gradients(cost, self.x)[0]
# ...
for i in range(epochs):
    # ...
    for batch in batches:
        # ...
        _, grads_wrt_input = sess.run([optimizer, grads_wrt_input_tensor],
                                      feed_dict=feed_dict)

请注意,为了性能,我还合并了两个 sess.run() 调用.这确保了前向传播和大部分反向传播将被重用.

Note that, for performance, I also combined the two sess.run() calls. This ensures that the forward propagation, and much of the backpropagation, will be reused.

顺便说一句,找到这样的性能错误的一个技巧是调用 tf.get_default_graph().finalize() 在开始你的训练循环之前.如果您不小心向图中添加了任何节点,这将引发异常,从而更容易追踪这些错误的原因.

As an aside, one tip to find performance bugs like this is to call tf.get_default_graph().finalize() before starting your training loop. This will raise an exception if you inadvertantly add any nodes to the graph, which makes it easier to trace the cause of these bugs.

这篇关于TensorFlow:在输入处获取梯度时性能缓慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆