张量流中的损失裁剪(在DeepMind的DQN上) [英] Loss clipping in tensor flow (on DeepMind's DQN)

查看：221 发布时间：2020/5/17 19:14:41 neural-network tensorflow deep-learning conv-neural-network

本文介绍了张量流中的损失裁剪(在DeepMind的DQN上)的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试使用Deepmind自己的DQN论文在张量流中的实现，并且在裁剪损失函数时遇到了困难.

I am trying my own implementation of the DQN paper by Deepmind in tensor flow and am running into difficulty with clipping of the loss function.

以下是自然论文的摘录，描述了损失削减:

Here is an excerpt from the nature paper describing the loss clipping:

我们还发现将更新中的误差项限制在-1和1之间是有帮助的.因为绝对值损失函数| x |对于x的所有负值具有-1的导数，对于x的所有正值具有-1的导数，将平方误差裁剪为-1和1之间对应于对(--之外的误差使用绝对值损失函数1,1)间隔.这种错误裁剪的形式进一步提高了算法的稳定性.

We also found it helpful to clip the error term from the update to be between −1 and 1. Because the absolute value loss function |x| has a derivative of −1 for all negative values of x and a derivative of 1 for all positive values of x, clipping the squared error to be between −1 and 1 corresponds to using an absolute value loss function for errors outside of the (−1,1) interval. This form of error clipping further improved the stability of the algorithm.

(链接到全文: http://www. nature.com/nature/journal/v518/n7540/full/nature14236.html )

到目前为止，我尝试使用的是

What I have tried so far is using

clipped_loss_vec = tf.clip_by_value(loss, -1, 1)

将我计算出的损失限制在-1和+1之间.在这种情况下，代理没有学习适当的策略.我打印出网络的渐变，并意识到，如果损耗降到-1以下，则渐变都会突然变为0！

to clip the loss I calculate between -1 and +1. The agent is not learning the proper policy in this case. I printed out the gradients of the network and realized that if the loss falls below -1, the gradients all suddenly turn to 0!

我发生这种情况的原因是，削波损耗是(-inf，-1)U(1，inf)中的常数函数，这意味着它在那些区域中的梯度为零.反过来，这又确保了整个网络的梯度为零(考虑到它，无论我提供网络的任何输入图像，由于被裁剪，损耗在本地邻域中均保持为-1).

My reasoning for this happening is that the clipped loss is a constant function in (-inf,-1) U (1,inf), which means it has zero gradient in those regions. This in turn ensures that the gradients throughout the network are zero (think of it as, whatever input image I provide the network, the loss stays at -1 in the local neighborhood because it has been clipped).

所以，我的问题分为两个部分:

So, my question is two parts:

Deepmind在摘录中究竟是什么意思?他们的意思是说，低于-1的损失被限制为-1，而高于+1的损失被限制为+1.如果是这样，它们如何处理梯度(即，绝对值函数的那部分是什么?)

What exactly did Deepmind mean in the excerpt? Did they mean that the loss below -1 is clipped to -1 and above +1 is clipped to +1. If so, how did they deal with the gradients (i.e. what is all that part about absolute value functions?)

我应该如何在张量流中实现损耗限幅，以使梯度在限幅范围之外不会变为零(但可能保持在+1和-1)? 谢谢！

How should I implement loss clipping in tensor flow such that the gradients do not go to zero outside the clipped range (but maybe stay at +1 and -1)? Thanks!

推荐答案

我怀疑它们的意思是您应该将 gradient 裁剪为[-1,1]，而不是裁剪 loss函数.因此，您可以照常计算梯度，但随后将梯度的每个分量裁剪为[-1,1]范围(因此，如果它大于+1，则将其替换为+1；如果它小于+1，则将其替换为+1). -1，则将其替换为-1)；然后在梯度下降更新步骤中使用结果，而不是使用未修改的梯度.

I suspect they mean that you should clip the gradient to [-1,1], not clip the loss function. Thus, you compute the gradient as usual, but then clip each component of the gradient to be in the range [-1,1] (so if it is larger than +1, you replace it with +1; if it is smaller than -1, you replace it with -1); and then you use the result in the gradient descent update step instead of using the unmodified gradient.

等效地:如下定义函数f:

f(x) = x^2          if x in [-0.5,0.5]
f(x) = |x| - 0.25   if x < -0.5 or x > 0.5

他们建议不要使用s^2形式的损失函数(其中s是一些复杂的表达式)，而是建议使用f(s)作为损失函数.这是平方损失与绝对值损失之间的某种混合:当s较小时，其行为类似于s^2，但是当s较大时，其行为类似于绝对值(|s|)

Instead of using something of the form s^2 as the loss function (where s is some complicated expression), they suggest to use f(s) as the loss function. This is some kind of hybrid between squared-loss and absolute-value-loss: will behave like s^2 when s is small, but when s gets larger, it will behave like the absolute value (|s|).

请注意，f的导数具有很好的属性，即其导数将始终在[-1,1]范围内:

Notice that the derivative of f has the nice property that its derivative will always be in the range [-1,1]:

f'(x) = 2x    if x in [-0.5,0.5]
f'(x) = +1    if x > +1
f'(x) = -1    if x < -1

因此，当您使用基于f的损失函数的梯度时，结果将与计算平方损失的梯度并对其进行裁剪相同.

Thus, when you take the gradient of this f-based loss function, the result will be the same as computing the gradient of a squared-loss and then clipping it.

因此，他们在做什么实际上是用休伯损失代替平方损失. .函数f仅是delta = 0.5时Huber损耗的两倍.

Thus, what they're doing is effectively replacing a squared-loss with a Huber loss. The function f is just two times the Huber loss for delta = 0.5.

现在的要点是，以下两个选择是等效的:

Now the point is that the following two alternatives are equivalent:

使用平方损失函数.计算此损失函数的梯度，但在进行梯度下降的更新步骤之前，将梯度计算为[-1,1].

Use a squared loss function. Compute the gradient of this loss function, but the gradient to [-1,1] before doing the update step of the gradient descent.

使用Huber损失函数代替平方损失函数.在梯度下降时直接(不变)计算该损失函数的梯度.

Use a Huber loss function instead of a squared loss function. Compute the gradient of this loss function directly (unchanged) in the gradient descent.

前者易于实现.后者具有不错的属性(提高了稳定性；它比绝对值损失更好，因为它避免了在最小值附近波动).因为两者是等效的，所以这意味着我们得到了一个易于实现的方案，该方案具有平方损失的简单性以及Huber损失的稳定性和鲁棒性.

The former is easy to implement. The latter has nice properties (improves stability; it's better than absolute-value-loss because it avoids oscillating around the minimum). Because the two are equivalent, this means we get an easy-to-implement scheme that has the simplicity of squared-loss with the stability and robustness of the Huber loss.

这篇关于张量流中的损失裁剪(在DeepMind的DQN上)的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

张量流中的损失裁剪(在DeepMind的DQN上) [英] Loss clipping in tensor flow (on DeepMind's DQN)

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

张量流中的损失裁剪(在DeepMind的DQN上) [英] Loss clipping in tensor flow (on DeepMind&#39;s DQN)

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

张量流中的损失裁剪(在DeepMind的DQN上) [英] Loss clipping in tensor flow (on DeepMind's DQN)

登录关闭