Keras中Adam优化器的衰减参数 [英] Decay parameter of Adam optimizer in Keras

查看:884
本文介绍了Keras中Adam优化器的衰减参数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我认为Adam优化器的设计可以自动调整学习率. 但是有一个选项可以明确提及Keras中Adam参数选项的衰减. 我想澄清一下衰减对Keras中Adam优化器的影响. 如果我们使用衰减对模型进行编译,比如说在lr = 0.001上为0.01,然后对运行50个历元的模型进行拟合,那么学习率是否在每个历元后降低了0.01倍?

I think that Adam optimizer is designed such that it automtically adjusts the learning rate. But there is an option to explicitly mention the decay in the Adam parameter options in Keras. I want to clarify the effect of decay on Adam optimizer in Keras. If we compile the model using decay say 0.01 on lr = 0.001, and then fit the model running for 50 epochs, then does the learning rate get reduced by a factor of 0.01 after each epoch?

有什么方法可以指定学习率仅在运行一定时期后才衰减?

Is there any way where we can specify that the learning rate should decay only after running for certain number of epochs?

在pytorch中,有一个名为AdamW的不同实现,它在标准keras库中不存在. 这是否与如上所述在每个时期之后改变衰减相同?

In pytorch there is a different implementation called AdamW, which is not present in the standard keras library. Is this the same as varying the decay after every epoch as mentioned above?

预先感谢您的答复.

推荐答案

来自源代码decay根据

lr = lr * (1. / (1. + decay * iterations))  # simplified

请参见下面的图像.这是与时代无关的. iterations在每次批量匹配时增加1(例如,每次调用train_on_batch时,或者model.fit(x)x中的多少个批次-通常是len(x) // batch_size批次).

see image below. This is epoch-independent. iterations is incremented by 1 on each batch fit (e.g. each time train_on_batch is called, or how many ever batches are in x for model.fit(x) - usually len(x) // batch_size batches).

要实现您所描述的内容,可以使用如下所示的回调:

To implement what you've described, you can use a callback as below:

from keras.callbacks import LearningRateScheduler
def decay_schedule(epoch, lr):
    # decay by 0.1 every 5 epochs; use `% 1` to decay after each epoch
    if (epoch % 5 == 0) and (epoch != 0):
        lr = lr * 0.1
    return lr

lr_scheduler = LearningRateScheduler(decay_schedule)
model.fit(x, y, epochs=50, callbacks=[lr_scheduler])

LearningRateScheduler以一个函数作为参数,并且该函数在每个纪元的开始处被纪元索引和lr馈入,由.fit传递.然后,它将根据该函数更新lr-因此在下一个时期,该函数将被馈给更新 lr.

The LearningRateScheduler takes a function as an argument, and the function is fed the epoch index and lr at the beginning of each epoch by .fit. It then updates lr according to that function - so on next epoch, the function is fed the updated lr.

我还提供了AdamW,NadamW和SGDW的Keras实现- Keras AdamW .

Also, there is a Keras implementation of AdamW, NadamW, and SGDW, by me - Keras AdamW.

澄清:对.fit()的首次调用会调用

Clarification: the very first call to .fit() invokes on_epoch_begin with epoch = 0 - if we don't wish lr to be decayed immediately, we should add a epoch != 0 check in decay_schedule. Then, epoch denotes how many epochs have already passed - so when epoch = 5, the decay is applied.

这篇关于Keras中Adam优化器的衰减参数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆