Tensorflow:_variable_with_weight_decay(...) 解释 [英] Tensorflow: _variable_with_weight_decay(...) explanation

查看：174 发布时间：2021/6/7 19:57:39 python tensorflow neural-network

本文介绍了Tensorflow:_variable_with_weight_decay(...) 解释的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

目前我正在查看 cifar10 示例，我注意到了该功能_variable_with_weight_decay(...) 在文件 cifar10.py.代码如下:

at the moment I'm looking at the cifar10 example and I noticed the function _variable_with_weight_decay(...) in the file cifar10.py. The code is as follows:

def _variable_with_weight_decay(name, shape, stddev, wd):
  """Helper to create an initialized Variable with weight decay.
  Note that the Variable is initialized with a truncated normal distribution.
  A weight decay is added only if one is specified.
  Args:
    name: name of the variable
    shape: list of ints
    stddev: standard deviation of a truncated Gaussian
    wd: add L2Loss weight decay multiplied by this float. If None, weight
        decay is not added for this Variable.
  Returns:
    Variable Tensor
  """
  dtype = tf.float16 if FLAGS.use_fp16 else tf.float32
  var = _variable_on_cpu(
      name,
      shape,
      tf.truncated_normal_initializer(stddev=stddev, dtype=dtype))
  if wd is not None:
    weight_decay = tf.mul(tf.nn.l2_loss(var), wd, name='weight_loss')
    tf.add_to_collection('losses', weight_decay)
  return var

我想知道这个函数是否像它所说的那样.很明显，当给定权重衰减因子(wd 不是 None)时，计算衰减值(weight_decay).但它是每一个应用吗?最后返回的是未修改的变量 (var)，还是我遗漏了什么?

I'm wondering if this function does what it says. It is clear that when a weight decay factor is given (wd not None) the deacy value (weight_decay) is computed. But is it every applied? At the end the unmodified variable (var) is return, or am I missing something?

第二个问题是如何解决这个问题?据我了解，必须从权重矩阵中的每个元素中减去标量 weight_decay 的值，但我无法找到可以做到这一点的 tensorflow 操作(从张量的每个元素中添加/减去单个值).有这样的op吗?作为一种解决方法，我认为可以创建一个使用 weight_decay 值初始化的新张量并使用 tf.subtract(...) 来实现相同的结果.或者这是正确的方法吗?

Second question would be how to fix this? As I understand the value of the scalar weight_decay must be subtracted from each element in the weight matrix, but I'm unable to find a tensorflow op that can do that (adding/subtracting a single value from every element of a tensor). Is there any op like this? As a workaround I thought it might be possible to create a new tensor initialized with the value of weight_decay and use tf.subtract(...) to achieve the same result. Or is this the right way to go anyway?

提前致谢.

推荐答案

代码如其所愿.您应该对 'losses' 集合中的所有内容(权重衰减项添加到倒数第二行)中的所有内容求和，以获取传递给优化器的损失.在该示例中的 loss() 函数中:

The code does what it says. You are supposed to sum everything in the 'losses' collection (which the weight decay term is added to in the second to last line) for the loss that you pass to the optimizer. In the loss() function in that example:

tf.add_to_collection('losses', cross_entropy_mean)
[...]
return tf.add_n(tf.get_collection('losses'), name='total_loss')

所以 loss() 函数返回的是分类损失加上之前 'losses' 集合中的所有内容.

so what the loss() function returns is the classification loss plus everything that was in the 'losses' collection before.

作为旁注，作为更新步骤的一部分，权重衰减并不意味着您从张量中的每个值中减去 wd 的值，而是将值乘以 (1-learning_rate*wd)(在普通 SGD 中).要了解为什么会这样，请回想一下 l2_loss 计算

As a side note, weight decay does not mean you subtract the value of wd from every value in the tensor as part of the update step, it multiplies the value by (1-learning_rate*wd) (in plain SGD). To see why this is so, recall that l2_loss computes

output = sum(t_i ** 2) / 2

t_i 是张量的元素.这意味着 l2_loss 相对于每个张量元素的导数是该张量元素本身的值，并且由于您使用 wd 缩放了 l2_loss导数也按比例缩放.

with t_i being the elements of the tensor. This means that the derivative of l2_loss with regard to each tensor element is the value of that tensor element itself, and since you scaled l2_loss with wd the derivative is scaled as well.

由于更新步骤(再次，在普通 SGD 中)是(请原谅我省略了时间步索引)

Since the update step (again, in plain SGD) is (forgive me for omitting the time step indexes)

w := w - learning_rate * dL/dw

你明白，如果你只有权重衰减项

you get, if you only had the weight decay term

w := w - learning_rate * wd * w

或

w := w * (1 - learning_rate * wd)

这篇关于Tensorflow:_variable_with_weight_decay(...) 解释的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Tensorflow:_variable_with_weight_decay(...) 解释 [英] Tensorflow: _variable_with_weight_decay(...) explanation

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Tensorflow:_variable_with_weight_decay(...) 解释 [英] Tensorflow: _variable_with_weight_decay(...) explanation

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭