为什么Adadelta优化器不会降低学习率? [英] Why doesn't the Adadelta optimizer decay the learning rate?

查看:348
本文介绍了为什么Adadelta优化器不会降低学习率?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经在Keras中初始化了Adadelta优化器(使用Tensorflow后端)并将其分配给模型:

I have initialised an Adadelta optimizer in Keras (using Tensorflow backend) and assigned it to a model:

my_adadelta = keras.optimizers.Adadelta(learning_rate=0.01, rho=0.95)
my_model.compile(optimizer=my_adadelta, loss="binary_crossentropy")

在培训期间,我使用回调函数在每个时期后打印学习率:

During training, I am using a callback to print the learning rate after every epoch:

class LRPrintCallback(Callback):
    def on_epoch_end(self, epoch, logs=None):
        lr = self.model.optimizer.lr
        print(K.eval(lr))

但是,这会在每个时期后打印出相同的(初始)学习率. 如果我像这样初始化优化器,也会发生同样的事情:

However, this prints the same (initial) learning rate after every epoch. The same thing happens if I initialize the optimizer like this:

my_adadelta = keras.optimizers.Adadelta(learning_rate=0.01, decay=0.95)

我在初始化时做错什么了吗?学习率可能会改变,但是我没有打印正确的东西吗?

Am I doing something wrong in the initialization? Is the learning rate maybe changing but I am not printing the right thing?

推荐答案

如相关

As discussed in a relevant Github thread, the decay does not affect the variable lr itself, which is used only to store the initial value of the learning rate. In order to print the decayed value, you need to explicitly compute it yourself and store it in a separate variable lr_with_decay; you can do so by using the following callback:

class MyCallback(Callback):
    def on_epoch_end(self, epoch, logs=None):
        lr = self.model.optimizer.lr
        decay = self.model.optimizer.decay
        iterations = self.model.optimizer.iterations
        lr_with_decay = lr / (1. + decay * K.cast(iterations, K.dtype(decay)))
        print(K.eval(lr_with_decay))

此处

as explained here and here. In fact, the specific code snippet suggested there, i.e.

lr = self.lr
if self.initial_decay > 0:
    lr *= (1. / (1. + self.decay * K.cast(self.iterations, K.dtype(self.decay))))

直接来自基础Keras 源代码Adadelta .

comes directly from the underlying Keras source code for Adadelta.

从对链接源代码的检查中可以明显看出,此处用于降低学习率的感兴趣的参数是decay,而不是rho;尽管在文档中也使用了衰减"一词来描述rho,但这是一个不同的衰减与学习率有关:

As clear from the inspection of the linked source code, the parameter of interest here for decaying the learning rate is decay, and not rho; despite the term 'decay' used also for describing rho in the documentation, it is a different decay not having anything to do with the learning rate:

rho :浮点> =0.Adadelta衰减因子,对应于每个时间步长要保留的梯度分数.

rho: float >= 0. Adadelta decay factor, corresponding to fraction of gradient to keep at each time step.

这篇关于为什么Adadelta优化器不会降低学习率?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆