如何在张量流中的MLP中实现最大范数约束? [英] How can I implement max norm constraints in an MLP in tensorflow?

查看:86
本文介绍了如何在张量流中的MLP中实现最大范数约束?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何在张量流中对MLP中的权重实施最大范数约束? Hinton和Dean在他们关于暗知识的工作中描述的那种.也就是说,默认情况下,tf.nn.dropout是否实现了重量约束,还是我们需要显式地实现它,如

How can I implement max norm constraints on the weights in an MLP in tensorflow? The kind that Hinton and Dean describe in their work on dark knowledge. That is, does tf.nn.dropout implement the weight constraints by default, or do we need to do it explicitly, as in

https://arxiv.org/pdf/1207.0580.pdf

"如果这些网络为存在的隐藏单位共享相同的权重. 我们使用标准的随机梯度下降程序来训练辍学神经 培训案例的小批量生产网络,但我们修改了通常是惩罚性的条款 用于防止砝码过大.而不是惩罚平方长度 整个权重向量的(L2范数),我们为传入的L2范数设置一个上限 每个隐藏单元的权重向量.如果权重更新违反了此约束,我们 通过除法对隐藏单元的权重进行归一化."

"If these networks share the same weights for the hidden units that are present. We use the standard, stochastic gradient descent procedure for training the dropout neural networks on mini-batches of training cases, but we modify the penalty term that is normally used to prevent the weights from growing too large. Instead of penalizing the squared length (L2 norm) of the whole weight vector, we set an upper bound on the L2 norm of the incoming weight vector for each individual hidden unit. If a weight-update violates this constraint, we renormalize the weights of the hidden unit by division."

Keras似乎拥有它

Keras appears to have it

http://keras.io/constraints/

推荐答案

tf.nn.dropout 不会施加任何规范约束.我相信您正在寻找的是"在应用渐变之前先处理渐变,使用 tf.clip_by_norm .

tf.nn.dropout does not impose any norm constraint. I believe what you're looking for is to "process the gradients before applying them" using tf.clip_by_norm.

例如,而不是简单地:

# Create an optimizer + implicitly call compute_gradients() and apply_gradients()
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)

您可以:

# Create an optimizer.
optimizer = tf.train.GradientDescentOptimizer(learning_rate)
# Compute the gradients for a list of variables.
grads_and_vars = optimizer.compute_gradients(loss, [weights1, weights2, ...])
# grads_and_vars is a list of tuples (gradient, variable).
# Do whatever you need to the 'gradient' part, for example cap them, etc.
capped_grads_and_vars = [(tf.clip_by_norm(gv[0], clip_norm=123.0, axes=0), gv[1])
                         for gv in grads_and_vars]
# Ask the optimizer to apply the capped gradients
optimizer = optimizer.apply_gradients(capped_grads_and_vars)

我希望这会有所帮助.关于tf.clip_by_normaxes参数的最后说明:

I hope this helps. Final notes about tf.clip_by_norm's axes parameter:

  1. 如果要计算tf.nn.xw_plus_b(x, weights, biases)或等效地matmul(x, weights) + biases,当xweights的尺寸分别为(batch, in_units)(in_units, out_units)时,则可能需要设置axes == [0](因为在这种用法中,每个都将所有传入的权重详细说明到特定单位).
  2. 请注意上面变量的形状/尺寸,以及是否/如何精确地clip_by_norm每个变量!例如.如果[weights1, weights2, ...]的某些是矩阵而有些不是,并且您使用与上述列表理解"相同的axes值在grads_and_vars上调用clip_by_norm(),则对于所有变数!实际上,如果幸运的话,这会导致类似ValueError: Invalid reduction dimension 1 for input with 1 dimensions的怪异错误,但除此之外,这是一个非常偷偷摸摸的错误.
  1. If you're calculating tf.nn.xw_plus_b(x, weights, biases), or equivalently matmul(x, weights) + biases, when the dimensions of x and weights are (batch, in_units) and (in_units, out_units) respectively, then you probably want to set axes == [0] (because in this usage each column details all incoming weights to a specific unit).
  2. Pay attention to the shape/dimensions of your variables above and whether/how exactly you want to clip_by_norm each of them! E.g. if some of [weights1, weights2, ...] are matrices and some aren't, and you call clip_by_norm() on the grads_and_vars with the same axes value like in the List Comprehension above, this doesn't mean the same thing for all the variables! In fact, if you're lucky, this will result in a weird error like ValueError: Invalid reduction dimension 1 for input with 1 dimensions, but otherwise it's a very sneaky bug.

这篇关于如何在张量流中的MLP中实现最大范数约束?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆