如何在 Tensorflow 中设置分层学习率? [英] How to set layer-wise learning rate in Tensorflow?

查看:81
本文介绍了如何在 Tensorflow 中设置分层学习率?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道是否有一种方法可以像 Caffe 中那样对不同的层使用不同的学习率.我正在尝试修改预先训练的模型并将其用于其他任务.我想要的是加快对新添加层的训练,并将训练后的层保持在低学习率,以防止它们被扭曲.例如,我有一个 5-conv-layer 预训练模型.现在我添加一个新的 conv 层并对其进行微调.前 5 层的学习率为 0.00001,最后一层的学习率为 0.001.知道如何实现这一目标吗?

I am wondering if there is a way that I can use different learning rate for different layers like what is in Caffe. I am trying to modify a pre-trained model and use it for other tasks. What I want is to speed up the training for new added layers and keep the trained layers at low learning rate in order to prevent them from being distorted. for example, I have a 5-conv-layer pre-trained model. Now I add a new conv layer and fine tune it. The first 5 layers would have learning rate of 0.00001 and the last one would have 0.001. Any idea how to achieve this?

推荐答案

使用 2 个优化器可以很容易地实现:

It can be achieved quite easily with 2 optimizers:

var_list1 = [variables from first 5 layers]
var_list2 = [the rest of variables]
train_op1 = GradientDescentOptimizer(0.00001).minimize(loss, var_list=var_list1)
train_op2 = GradientDescentOptimizer(0.0001).minimize(loss, var_list=var_list2)
train_op = tf.group(train_op1, train_op2)

这种实现的一个缺点是它在优化器中计算了两次 tf.gradients(.) ,因此它在执行速度方面可能不是最佳的.这可以通过显式调用 tf.gradients(.)、将列表分成 2 个并将相应的梯度传递给两个优化器来缓解.

One disadvantage of this implementation is that it computes tf.gradients(.) twice inside the optimizers and thus it might not be optimal in terms of execution speed. This can be mitigated by explicitly calling tf.gradients(.), splitting the list into 2 and passing corresponding gradients to both optimizers.

相关问题:在优化器期间保持变量不变

添加了更高效但更长的实现:

Added more efficient but longer implementation:

var_list1 = [variables from first 5 layers]
var_list2 = [the rest of variables]
opt1 = tf.train.GradientDescentOptimizer(0.00001)
opt2 = tf.train.GradientDescentOptimizer(0.0001)
grads = tf.gradients(loss, var_list1 + var_list2)
grads1 = grads[:len(var_list1)]
grads2 = grads[len(var_list1):]
tran_op1 = opt1.apply_gradients(zip(grads1, var_list1))
train_op2 = opt2.apply_gradients(zip(grads2, var_list2))
train_op = tf.group(train_op1, train_op2)

您可以使用 tf.trainable_variables() 获取所有训练变量并决定从中进行选择.不同之处在于在第一个实现中 tf.gradients(.) 在优化器中被调用了两次.这可能会导致执行一些冗余操作(例如,第一层的梯度可以重用一些计算用于后续层的梯度).

You can use tf.trainable_variables() to get all training variables and decide to select from them. The difference is that in the first implementation tf.gradients(.) is called twice inside the optimizers. This may cause some redundant operations to be executed (e.g. gradients on the first layer can reuse some computations for the gradients of the following layers).

这篇关于如何在 Tensorflow 中设置分层学习率?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆