Keras中Adam优化器的衰减参数 [英] Decay parameter of Adam optimizer in Keras
问题描述
我认为Adam优化器的设计可以自动调整学习率. 但是有一个选项可以明确提及Keras中Adam参数选项的衰减. 我想澄清一下衰减对Keras中Adam优化器的影响. 如果我们使用衰减对模型进行编译,比如说在lr = 0.001上为0.01,然后对运行50个历元的模型进行拟合,那么学习率是否在每个历元后降低了0.01倍?
I think that Adam optimizer is designed such that it automtically adjusts the learning rate. But there is an option to explicitly mention the decay in the Adam parameter options in Keras. I want to clarify the effect of decay on Adam optimizer in Keras. If we compile the model using decay say 0.01 on lr = 0.001, and then fit the model running for 50 epochs, then does the learning rate get reduced by a factor of 0.01 after each epoch?
有什么方法可以指定学习率仅在运行一定时期后才衰减?
Is there any way where we can specify that the learning rate should decay only after running for certain number of epochs?
在pytorch中,有一个名为AdamW的不同实现,它在标准keras库中不存在. 这是否与如上所述在每个时期之后改变衰减相同?
In pytorch there is a different implementation called AdamW, which is not present in the standard keras library. Is this the same as varying the decay after every epoch as mentioned above?
预先感谢您的答复.
推荐答案
来自源代码,decay
根据
lr = lr * (1. / (1. + decay * iterations)) # simplified
请参见下面的图像.这是与时代无关的. iterations
在每次批量匹配时增加1(例如,每次调用train_on_batch
时,或者model.fit(x)
中x
中的多少个批次-通常是len(x) // batch_size
批次).
see image below. This is epoch-independent. iterations
is incremented by 1 on each batch fit (e.g. each time train_on_batch
is called, or how many ever batches are in x
for model.fit(x)
- usually len(x) // batch_size
batches).
要实现您所描述的内容,可以使用如下所示的回调:
To implement what you've described, you can use a callback as below:
from keras.callbacks import LearningRateScheduler
def decay_schedule(epoch, lr):
# decay by 0.1 every 5 epochs; use `% 1` to decay after each epoch
if (epoch % 5 == 0) and (epoch != 0):
lr = lr * 0.1
return lr
lr_scheduler = LearningRateScheduler(decay_schedule)
model.fit(x, y, epochs=50, callbacks=[lr_scheduler])
LearningRateScheduler
以一个函数作为参数,并且该函数在每个纪元的开始处被纪元索引和lr
馈入,由.fit
传递.然后,它将根据该函数更新lr
-因此在下一个时期,该函数将被馈给更新 lr
.
The LearningRateScheduler
takes a function as an argument, and the function is fed the epoch index and lr
at the beginning of each epoch by .fit
. It then updates lr
according to that function - so on next epoch, the function is fed the updated lr
.
我还提供了AdamW,NadamW和SGDW的Keras实现- Keras AdamW .
Also, there is a Keras implementation of AdamW, NadamW, and SGDW, by me - Keras AdamW.
Clarification: the very first call to .fit()
invokes on_epoch_begin
with epoch = 0
- if we don't wish lr
to be decayed immediately, we should add a epoch != 0
check in decay_schedule
. Then, epoch
denotes how many epochs have already passed - so when epoch = 5
, the decay is applied.
这篇关于Keras中Adam优化器的衰减参数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!