tensorflow 中每个示例的未聚合梯度/梯度 [英] Unaggregated gradients / gradients per example in tensorflow
问题描述
在 tensorflow 中给出一个关于 mnist 的简单小批量梯度下降问题(就像在这个 tutorial),如何分别检索批处理中每个示例的梯度.
Given a simple mini-batch gradient descent problem on mnist in tensorflow (like in this tutorial), how can I retrieve the gradients for each example in the batch individually.
tf.gradients()
似乎返回批次中所有示例的平均梯度.有没有办法在聚合前检索梯度?
tf.gradients()
seems to return gradients averaged over all examples in the batch. Is there a way to retrieve gradients before aggregation?
解决这个问题的第一步是弄清楚 tensorflow 在哪个点对批次中的示例的梯度进行平均.我以为这发生在
A first step towards this answer is figuring out at which point tensorflow averages the gradients over the examples in the batch. I thought this happened in _AggregatedGrads, but that doesn't appear to be the case. Any ideas?
推荐答案
tf.gradients
返回关于损失的梯度.这意味着如果您的损失是每个示例损失的总和,那么梯度也是每个示例损失梯度的总和.
tf.gradients
returns the gradient with respect to the loss. This means that if your loss is a sum of per-example losses, then the gradient is also the sum of per-example loss gradients.
总结是隐含的.例如,如果您想最小化 Wx-y
误差的范数平方和,关于 W
的梯度是 2(WX-Y)X'
其中 X
是观察的批次,Y
是标签的批次.您永远不会明确地形成稍后总结的每个示例"梯度,因此这不是删除梯度管道中的某个阶段的简单问题.
The summing up is implicit. For instance if you want to minimize the sum of squared norms of Wx-y
errors, the gradient with respect to W
is 2(WX-Y)X'
where X
is the batch of observations and Y
is the batch of labels. You never explicitly form "per-example" gradients that you later sum up, so it's not a simple matter of removing some stage in the gradient pipeline.
获得 k
每示例损失梯度的一种简单方法是使用大小为 1 的批次并执行 k
次传递.Ian Goodfellow 写了如何在一个单一的通过,为此您需要明确指定渐变而不是依赖 tf.gradients
方法
A simple way to get k
per-example loss gradients is to use batches of size 1 and do k
passes. Ian Goodfellow wrote up how to get all k
gradients in a single pass, for this you would need to specify gradients explicitly and not rely on tf.gradients
method
这篇关于tensorflow 中每个示例的未聚合梯度/梯度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!