如何在张量流中的MLP中实现最大范数约束? [英] How can I implement max norm constraints in an MLP in tensorflow?
问题描述
如何在张量流中对MLP中的权重实施最大范数约束? Hinton和Dean在他们关于暗知识的工作中描述的那种.也就是说,默认情况下,tf.nn.dropout是否实现了重量约束,还是我们需要显式地实现它,如
How can I implement max norm constraints on the weights in an MLP in tensorflow? The kind that Hinton and Dean describe in their work on dark knowledge. That is, does tf.nn.dropout implement the weight constraints by default, or do we need to do it explicitly, as in
https://arxiv.org/pdf/1207.0580.pdf
"如果这些网络为存在的隐藏单位共享相同的权重. 我们使用标准的随机梯度下降程序来训练辍学神经 培训案例的小批量生产网络,但我们修改了通常是惩罚性的条款 用于防止砝码过大.而不是惩罚平方长度 整个权重向量的(L2范数),我们为传入的L2范数设置一个上限 每个隐藏单元的权重向量.如果权重更新违反了此约束,我们 通过除法对隐藏单元的权重进行归一化."
"If these networks share the same weights for the hidden units that are present. We use the standard, stochastic gradient descent procedure for training the dropout neural networks on mini-batches of training cases, but we modify the penalty term that is normally used to prevent the weights from growing too large. Instead of penalizing the squared length (L2 norm) of the whole weight vector, we set an upper bound on the L2 norm of the incoming weight vector for each individual hidden unit. If a weight-update violates this constraint, we renormalize the weights of the hidden unit by division."
Keras似乎拥有它
Keras appears to have it
推荐答案
tf.nn.dropout
不会不施加任何规范约束.我相信您正在寻找的是"在应用渐变之前先处理渐变,使用 tf.clip_by_norm
.
tf.nn.dropout
does not impose any norm constraint. I believe what you're looking for is to "process the gradients before applying them" using tf.clip_by_norm
.
例如,而不是简单地:
# Create an optimizer + implicitly call compute_gradients() and apply_gradients()
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)
您可以:
# Create an optimizer.
optimizer = tf.train.GradientDescentOptimizer(learning_rate)
# Compute the gradients for a list of variables.
grads_and_vars = optimizer.compute_gradients(loss, [weights1, weights2, ...])
# grads_and_vars is a list of tuples (gradient, variable).
# Do whatever you need to the 'gradient' part, for example cap them, etc.
capped_grads_and_vars = [(tf.clip_by_norm(gv[0], clip_norm=123.0, axes=0), gv[1])
for gv in grads_and_vars]
# Ask the optimizer to apply the capped gradients
optimizer = optimizer.apply_gradients(capped_grads_and_vars)
我希望这会有所帮助.关于tf.clip_by_norm
的axes
参数的最后说明:
I hope this helps. Final notes about tf.clip_by_norm
's axes
parameter:
- 如果要计算
tf.nn.xw_plus_b(x, weights, biases)
或等效地matmul(x, weights) + biases
,当x
和weights
的尺寸分别为(batch, in_units)
和(in_units, out_units)
时,则可能需要设置axes == [0]
(因为在这种用法中,每个列都将所有传入的权重详细说明到特定单位). - 请注意上面变量的形状/尺寸,以及是否/如何精确地
clip_by_norm
每个变量!例如.如果[weights1, weights2, ...]
的某些是矩阵而有些不是,并且您使用与上述列表理解"相同的axes
值在grads_and_vars
上调用clip_by_norm()
,则对于所有变数!实际上,如果幸运的话,这会导致类似ValueError: Invalid reduction dimension 1 for input with 1 dimensions
的怪异错误,但除此之外,这是一个非常偷偷摸摸的错误.
- If you're calculating
tf.nn.xw_plus_b(x, weights, biases)
, or equivalentlymatmul(x, weights) + biases
, when the dimensions ofx
andweights
are(batch, in_units)
and(in_units, out_units)
respectively, then you probably want to setaxes == [0]
(because in this usage each column details all incoming weights to a specific unit). - Pay attention to the shape/dimensions of your variables above and whether/how exactly you want to
clip_by_norm
each of them! E.g. if some of[weights1, weights2, ...]
are matrices and some aren't, and you callclip_by_norm()
on thegrads_and_vars
with the sameaxes
value like in the List Comprehension above, this doesn't mean the same thing for all the variables! In fact, if you're lucky, this will result in a weird error likeValueError: Invalid reduction dimension 1 for input with 1 dimensions
, but otherwise it's a very sneaky bug.
这篇关于如何在张量流中的MLP中实现最大范数约束?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!