如何在Tensorflow中设置分层学习率? [英] How to set layer-wise learning rate in Tensorflow?

查看:394
本文介绍了如何在Tensorflow中设置分层学习率?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道是否有一种方法可以对Caffe中的不同层使用不同的学习率。我正在尝试修改预训练的模型,并将其用于其他任务。我想要的是加快对新添加的层的培训,并使受过培训的层保持较低的学习率,以防止它们变形。例如,我有一个5转换层的预训练模型。现在,我添加了一个新的转换层并对其进行了微调。前5层的学习率为0.00001,后5层的学习率为0.001。知道如何实现吗?

I am wondering if there is a way that I can use different learning rate for different layers like what is in Caffe. I am trying to modify a pre-trained model and use it for other tasks. What I want is to speed up the training for new added layers and keep the trained layers at low learning rate in order to prevent them from being distorted. for example, I have a 5-conv-layer pre-trained model. Now I add a new conv layer and fine tune it. The first 5 layers would have learning rate of 0.00001 and the last one would have 0.001. Any idea how to achieve this?

推荐答案

可以使用2个优化器轻松实现:

It can be achieved quite easily with 2 optimizers:

var_list1 = [variables from first 5 layers]
var_list2 = [the rest of variables]
train_op1 = GradientDescentOptimizer(0.00001).minimize(loss, var_list=var_list1)
train_op2 = GradientDescentOptimizer(0.0001).minimize(loss, var_list=var_list2)
train_op = tf.group(train_op1, train_op2)

此实现的一个缺点是,它在优化器中两次计算tf.gradients(。),因此就执行速度而言可能不是最佳的。可以通过显式调用tf.gradients(。),将列表分成2个并将相应的梯度传递给两个优化器来缓解这种情况。

One disadvantage of this implementation is that it computes tf.gradients(.) twice inside the optimizers and thus it might not be optimal in terms of execution speed. This can be mitigated by explicitly calling tf.gradients(.), splitting the list into 2 and passing corresponding gradients to both optimizers.

相关问题:在优化程序中保持变量不变

编辑:增加了效率,但执行时间更长:

Added more efficient but longer implementation:

var_list1 = [variables from first 5 layers]
var_list2 = [the rest of variables]
opt1 = tf.train.GradientDescentOptimizer(0.00001)
opt2 = tf.train.GradientDescentOptimizer(0.0001)
grads = tf.gradients(loss, var_list1 + var_list2)
grads1 = grads[:len(var_list1)]
grads2 = grads[len(var_list1):]
tran_op1 = opt1.apply_gradients(zip(grads1, var_list1))
train_op2 = opt2.apply_gradients(zip(grads2, var_list2))
train_op = tf.group(train_op1, train_op2)

您可以使用 tf.trainable _variables()获取所有训练变量并决定从中选择。
的区别在于,在第一个实现中, tf.gradients(。)在优化器中被两次调用。这可能会导致执行一些多余的操作(例如,第一层的渐变可以将某些计算重新用于后续层的渐变)。

You can use tf.trainable_variables() to get all training variables and decide to select from them. The difference is that in the first implementation tf.gradients(.) is called twice inside the optimizers. This may cause some redundant operations to be executed (e.g. gradients on the first layer can reuse some computations for the gradients of the following layers).

这篇关于如何在Tensorflow中设置分层学习率?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆