是否可以基于批次标签(y_true)分布来更新每个批次的学习率? [英] Is it possible to update the learning rate, each batch, based on batch label (y_true) distribution?
问题描述
有关此解决方案,请参见此问题的结尾
TL; DR:我需要找到一种方法来计算每个批次的标签分布,并更新学习率.有没有办法访问当前模型的优化器来更新每批的learning_rate?
TL;DR: I need to find a way to calculate the label distribution per-batch, and update the learning rate. Is there a way to access the optimizer of the current model to update the learning_rate, per batch?
下面是如何计算标签分布的方法.可以在损失函数中完成,因为默认情况下,损失是按批次计算的.可以在哪里执行该代码,该代码也可以访问模型的优化器?
Below is how to calculate the label distribution. It can be done in the loss function, as by default the loss is calculated batch-wise. Where can this code be executed which also has access to the model's optimizer?
def loss(y_true, y_pred):
y = math_ops.argmax(y_true, axis=1)
freqs = tf.gather(lf, y) # equal to lf[y] if `lf` and `y` were numpy array's
inv_freqs = math_ops.pow(freqs, -1)
E = 1 / math_ops.reduce_sum(inv_freqs) # value to use when updating learning rate
其他详细信息
如本文所述,为了实施学习率计划,我相信我需要通过根据批次中的真实标签 (通常在keras/tensorflow中表示为y_true
)的标签分布计算出的值来更新每个批次在训练过程中学习率的方法>
Further Details
In order to implement a learning rate schedule, as described in this paper, I believe I need a way to update the learning rate during training, each batch, by a value calcuated from the label distribution of the true labels in the batch (y_true
as it's typically denoted in keras/tensorflow)
哪里...
x 模型的输出
y 相应的地面真相标签
Β m 个样本(例如64 )的小批量
Β the minibatch of m samples (e.g. 64)
n y 地面真理标签y的整个训练样本量
ny the entire training sample size for ground truth label y
n y -1 逆标签频率
ny-1 the inverse label frequency
我关注的公式部分是α和Δθ
The portion of the formula I'm focused on is the part between α and Δθ
我可以通过自定义损失函数轻松实现这一目标,但是我不知道如何提高损失函数的学习率,即使可以的话.
I can achieve this with ease from within a custom loss function, but I do not know how to upadte the learning rate--if you even can--from the loss function.
def loss(y_true, y_pred):
y = math_ops.argmax(y_true, axis=1)
freqs = tf.gather(lf, y) # equal to lf[y] if `lf` and `y` were numpy array's
inv_freqs = math_ops.pow(freqs, -1)
E = 1 / math_ops.reduce_sum(inv_freqs) # value to use when updating learning rate
哪里...
lf 每个类别的采样频率.例如2类,c0 = 10个示例,c1 = 100-> lf == [10, 100]
lf the sample frequencies for each class. e.g. 2 classes, c0 = 10 examples, c1 = 100 --> lf == [10, 100]
是否可以通过一些花哨的方式来更新优化器的学习率,例如可以通过CallBack进行的操作?
Is there some fancy way I can update the optimizers learning rate, like what can be done from a CallBack?
def on_batch_begin(self, batch, log):
# note: batch is just an incremented value to indicate batch index
self.model.optimizer.lr # learning rate, can be modified from callback
在此先感谢您的帮助!
非常感谢@mrk向正确的方向推动我解决这个问题!
为了计算每个批次的标签分布,然后使用该值更新优化器的学习率,必须...
In order to compute the per-batch label distributions, then use that value to update the optimizer's learning rate, one must ...
- 创建一个自定义指标,计算标签分配,每批并返回频率数组(默认情况下,keras是逐批优化的,因此每批都会计算指标).
- 通过将
keras.callbacks.History
类子类化,创建一个典型的学习率调度程序 - 覆盖调度程序的
on_batch_end
功能,logs
dict将保留批次的所有计算量度,包括我们的自定义标签分配量度!
- Create a custom Metric which computes the label distribution, per-batch, and returns the frequency array (by default keras is optimized batch-wise, hence metrics are calcuated each batch).
- Create a typical learning rate scheduler, by subclassing the
keras.callbacks.History
class - Override the
on_batch_end
function of the scheduler, thelogs
dict will ontain all computed metrics for the batch including our custom label distribution metric!
创建自定义指标
class LabelDistribution(tf.keras.metrics.Metric):
"""
Computes the per-batch label distribution (y_true) and stores the array as
a metric which can be accessed via keras CallBack's
:param n_class: int - number of distinct output class(es)
"""
def __init__(self, n_class, name='batch_label_distribution', **kwargs):
super(LabelDistribution, self).__init__(name=name, **kwargs)
self.n_class = n_class
self.label_distribution = self.add_weight(name='ld', initializer='zeros',
aggregation=VariableAggregation.NONE,
shape=(self.n_class, ))
def update_state(self, y_true, y_pred, sample_weight=None):
y_true = mo.cast(y_true, 'int32')
y = mo.argmax(y_true, axis=1)
label_distrib = mo.bincount(mo.cast(y, 'int32'))
self.label_distribution.assign(mo.cast(label_distrib, 'float32'))
def result(self):
return self.label_distribution
def reset_states(self):
self.label_distribution.assign([0]*self.n_class)
创建DRW学习费率计划程序
class DRWLearningRateSchedule(keras.callbacks.History):
"""
Used to implement the Differed Re-weighting strategy from
[Kaidi Cao, et al. "Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss." (2019)]
(https://arxiv.org/abs/1906.07413)
To be included as a metric to model.compile
`model.compile(..., metrics=[DRWLearningRateSchedule(.01)])`
"""
def __init__(self, base_lr, ld_metric='batch_label_distribution'):
super(DRWLearningRateSchedule, self).__init__()
self.base_lr = base_lr
self.ld_metric = ld_metric # name of the LabelDistribution metric
def on_batch_end(self, batch, logs=None):
ld = logs.get(self.ld_metric) # the per-batch label distribution
current_lr = self.model.optimizer.lr
# example below of updating the optimizers learning rate
K.set_value(self.model.optimizer.lr, current_lr * (1 / math_ops.reduce_sum(ld)))
推荐答案
基于Keras损失的学习率调整
经过一番研究,我发现此,除了触发衰减之外,您还可以为学习率定义另一个函数或值.
After some research I found this, instead of triggering a decay you could as well define another function or value to your learning rate.
from __future__ import absolute_import
from __future__ import print_function
import keras
from keras import backend as K
import numpy as np
class LossLearningRateScheduler(keras.callbacks.History):
"""
A learning rate scheduler that relies on changes in loss function
value to dictate whether learning rate is decayed or not.
LossLearningRateScheduler has the following properties:
base_lr: the starting learning rate
lookback_epochs: the number of epochs in the past to compare with the loss function at the current epoch to determine if progress is being made.
decay_threshold / decay_multiple: if loss function has not improved by a factor of decay_threshold * lookback_epochs, then decay_multiple will be applied to the learning rate.
spike_epochs: list of the epoch numbers where you want to spike the learning rate.
spike_multiple: the multiple applied to the current learning rate for a spike.
"""
def __init__(self, base_lr, lookback_epochs, spike_epochs = None, spike_multiple = 10, decay_threshold = 0.002, decay_multiple = 0.5, loss_type = 'val_loss'):
super(LossLearningRateScheduler, self).__init__()
self.base_lr = base_lr
self.lookback_epochs = lookback_epochs
self.spike_epochs = spike_epochs
self.spike_multiple = spike_multiple
self.decay_threshold = decay_threshold
self.decay_multiple = decay_multiple
self.loss_type = loss_type
def on_epoch_begin(self, epoch, logs=None):
if len(self.epoch) > self.lookback_epochs:
current_lr = K.get_value(self.model.optimizer.lr)
target_loss = self.history[self.loss_type]
loss_diff = target_loss[-int(self.lookback_epochs)] - target_loss[-1]
if loss_diff <= np.abs(target_loss[-1]) * (self.decay_threshold * self.lookback_epochs):
print(' '.join(('Changing learning rate from', str(current_lr), 'to', str(current_lr * self.decay_multiple))))
K.set_value(self.model.optimizer.lr, current_lr * self.decay_multiple)
current_lr = current_lr * self.decay_multiple
else:
print(' '.join(('Learning rate:', str(current_lr))))
if self.spike_epochs is not None and len(self.epoch) in self.spike_epochs:
print(' '.join(('Spiking learning rate from', str(current_lr), 'to', str(current_lr * self.spike_multiple))))
K.set_value(self.model.optimizer.lr, current_lr * self.spike_multiple)
else:
print(' '.join(('Setting learning rate to', str(self.base_lr))))
K.set_value(self.model.optimizer.lr, self.base_lr)
return K.get_value(self.model.optimizer.lr)
def main():
return
if __name__ == '__main__':
main()
这篇关于是否可以基于批次标签(y_true)分布来更新每个批次的学习率?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!